Skip to main content
  1. Readings/
  2. Books/
  3. Real World OCaml: Functional Programming for the Masses/

Chapter 1: Guided Tour

··2468 words·12 mins

Tooling #

  • the book is going to prefer using Base over the standard library that OCaml has. Checks out because one of the authors is a Jane Street senior (Yaron Minsky) and Base is from Jane Street.
  • utop is an improvement from the older ocaml shell

Basic Language Syntax #

operators #

  • = can be used for let-binding, can also be used for equality testing, they can be used in the same sentence like so:

      let even x =
        x % 2 = 0;;
    
      let sum_if_true test first second =
        (if test first then first else 0)
        + (if test second then second else 0);;
    
  • Boolean binary operators: && and || short-circuit

  • NOTE: differences between Base and ocaml stdlib:

    1. the operator for float vs integer exponentiation differs. Base differs because of a leaning towards consistency and clarity.

      For example, in Base, **. is float exponentiation, and ** is integer exponentiation, whereas in the standard library, ** is float exponentiation, and integer exponentiation isn’t exposed as an operator.

    2. To use float comparison when using Base, we need to open Float.O.

      When using Base, the default comparison operators work only on integers, and you need to explicitly choose other comparison operators when you want them. OCaml also offers a special set of polymorphic comparison operators that can work on almost any type, but those are considered to be problematic, and so are hidden by default by Base

           let is_inside_scene_element point scene_element =
             let open Float.O in
             match scene_element with
             | Circle { center; radius } ->
                distance center point < radius
             | Rect { lower_left; width; height } ->
                point.x    > lower_left.x && point.x < lower_left.x + width
                && point.y > lower_left.y && point.y < lower_left.y + height
             | Segment _ -> false
      

expressions: #

  • numerical expressions

    • numbers can have _ arbitrarily inserted into it. This is sugar and is there for readability. So 300_0 and 3000 are the same.
    • speciality: math operations are explicitly typed and don’t do any typing assumptions (e.g. float addition vs integer addition)
  • type inference:

    OCaml gives the freedom for us to be implicit or explicit in the types used in our code without creating ambiguity in either choices.

    expressions have types and the type of an expression is inferred from the available information about the components of that expression. This is almost like a logical engine, for which the programmer usually builds an intuition on over time.

    explicit typing is more for documentation purposes, or to assist correctness checks by humans.

      (* Here's the untyped version *)
      let sum_if_true test first second =
        (if test first then first else 0)
        + (if test second then second else 0);;
    
      (* Here's the explicitly typed version *)
      let sum_if_true (test : int -> bool) (x : int) (y : int) : int =
        (if test x then x else 0)
        + (if test y then y else 0);;
    
    • generic type inference and parametric-polymorphism

      When it’s not possible to tie it down to concrete types, a type variable e.g. 'a is a generic type that is used to represent the generic type. The ' is convention for showing that the name 'a represents a type var.

      This is similar to how generics is handled by other languages like Java (perhaps Python’s typing system as well).

      example from utop:

          let first_if_true test x y =
      
            if test x then x else y;;
      
          (* my toplevel output was: *)
          (*   val first_if_true : ('a -> bool/2) -> 'a -> 'a -> 'a = <fun> *)
          let long_string s = String.length s > 6;;
          let big_number x = x > 3;;
      
          first_if_true long_string "nice" "well done, let's do this";;
          first_if_true big_number  3  2;;
      

name binding #

let-binding is how OCaml does name binding, which has some implicit meanings that get floated as rules:

  • variable name identifiers must start with lowercase letter or underscore. The lowercase part is because uppercased first letter names are treated as type constructors.

      (* Legal examples *)
      let seven = 3 + 4;;
      let _seven = "seven";;
      let s3v3n = "hello";;
    
      (* Illegal examples *)
      (* let Eight = 4 + 4;; (\* This will error out on "Eight" because it's things it's a constructor *\) *)
    
  • functions are values so names also can be let-binded to functions. The syntax for definition and invocation of functions is more Unix-shell-like (space separated) instead of by tokens like parentheses.

We can co nested let-bindings as a way to define names within a local scope. This helps us write out complex expressions easily. The scope of let bindings are terminated by ;;.

let x = 7 in
let y = x * x in
x + y;;

Modules: #

  • . is for name-spaced access rather than OOP-like accessing of a method.

  • name shadowing:

    in OCaml it’s when there’s a clash in name identifiers, it’s not an automatic erasure of the old one, it’s just a hiding/superceding within a given scope. At the point where that scope ends, the original definition is visible again.

    Shadowing is not destructive.

    Shadowing can be constrained to a limited, local scope like so:

      let ratio x y =
        let open Float.O in
        of_int x / of_int y;;
    

    alternative (more concise) syntax:

      let ratio x y =
        Float.O.(of_int x / of_int y);;
    
    • open allows availability of names in the current toplevel without having to prefix the module name.
      • Is open Explicit or Implicit Shadowing?
        • depends on whether you’re doing it with the knowledge that name shadowing will occur or not.

Type Errors vs Exceptions #

OCaml distinguishes compiler errors (pre-runtime) from runtime Exceptions (at runtime).

Structural/ADTs: tuples, lists, options, records #

These are data types that are predefined, polymorphic, algebraic data types. In other languages, we’d have described it as “container types”, that term in the OCaml world is for reference holders.

Why “Container” Is Not Precise

In OCaml, the term “container” is best reserved for types that implement explicit interfaces for holding collections of items—like sets, maps, or similar abstractions (as opposed to data types defined by their constructors as in options, lists, or tuples). Lists qualify as containers in this generic sense, but options and tuples serve somewhat different structural roles—they embody fixed or optional structure rather than arbitrary-sized holding. (ref)

  • Tuples as cartesian product types

    • the type description (e.g. val a_tuple : int * string = (3, "three")) hints at a mental model where we consider the cartesian product of the two types t and s, that’s why the * is used.

    • syntax is similar for constructing a tuple (let a_tuple = (3,"three");;) and for extracting from a tuple (let (x,y) = a_tuple;;)

    • we can pattern match function arguments, this is the same stuff that we’re familiar with in the elixir world.

  • List and options as polymorphic types

    • List

      • elements must be homogenous in their type.

      • constructor is :: and it’s right associative, empty list is []

        we can do list concat but remember it’s NOT in constant time

              [1;2;3] @ [2;4;12]
        
      • SPECIALITY: list elements are separated using ;. , will automatically give us tuples so careful on this. I’d imagine this is a common beginner’s silent bug because that’s legal but often unintended.

      • Pattern matching:

        It’s a good idea to try and represent outcomes exhaustively.

        • we could choose to pattern match using the [] and :: operator primitives but oftentimes, we may miss out some cases (the compiler is smart enough to warn against this).

          Instead, we could use match expressions

                  (* this compiles with a warning, may throw runtime exception if we pass in [] *)
                  let my_favorite_language_1 (my_favorite :: the_rest) =
                    my_favorite;;
          
                  (* with match expression *)
                  let my_favorite_language_2 languages =
                    match languages with
                    | first :: the_rest -> first
                    | [] -> "OCaml" (* A good default! *);;
          
    • Option

      • Some and None are constructors that help us build optional values.

              (* Constructing options: *)
              let divide x y =
                if y = 0 then None else Some (x / y);;
        
              (* Pattern matching useful for examining options *)
              let downcase_extension filename =
                match String.rsplit2 filename ~on:'.' with
                | None -> filename
                | Some (base,ext) ->
                  base ^ "." ^ String.lowercase ext;;
        

        In a way, only Options are nullable in OCaml, which differs from other langauges where any type may be nullable since it may be instantiated but not assigned a value.

        Missing values are explicit in OCaml, which is beautiful (old Yegor Bugayenko essay on “Why Null is Bad” and another Tony Hoare essay on “Null References: The Billion Dollar Mistake” )

        OCaml’s result type is used similarly for expressing error or absence but is intended to signal computation errors rather than optional absence.

  • Records as labelled product types

    We should contrast these with positional typing that we see in tuples. Since it’s labelled product types, we have a bunch of ways we can access and use them.

    Accessing can be done via dot notation, pattern matching or field punning.

      (* Dot notation: *)
      let distance v1 v2 =
        magnitude { x = v1.x -. v2.x; y = v1.y -. v2.y };;
    
      (* field punning: field name reuse from the record *)
      let magnitude { x; y } = Float.sqrt (x **. 2. +. y **. 2.);;
    
  • Variant Types (aka Algebraic Data Types): representing multi-object scenes

    Say we wish to describe a multi-object scene, we use Variant types to describe a unified representation as a single type.

    Unlike union types in other languages (which are usually untagged), variants are tagged types

      type scene_element =
        | Circle  of circle_desc (* So, Circle is the tag for this *)
        | Rect    of rect_desc
        | Segment of segment_desc
    

Example of expressiveness:

  type point2d = { x : float; y : float };
  type circle_desc  = { center: point2d; radius: float };
  type rect_desc    = { lower_left: point2d; width: float; height: float };
  type segment_desc = { endpoint1: point2d; endpoint2: point2d };
  type scene_element =
    | Circle  of circle_desc
    | Rect    of rect_desc
    | Segment of segment_desc;

  let distance v1 v2 =
    magnitude { x = v1.x -. v2.x; y = v1.y -. v2.y };

  (* We're matching based on the variant tags *)
  let is_inside_scene_element point scene_element =
    let open Float.O in
    match scene_element with
    | Circle { center; radius } ->
      distance center point < radius
    | Rect { lower_left; width; height } ->
      point.x    > lower_left.x && point.x < lower_left.x + width
      && point.y > lower_left.y && point.y < lower_left.y + height
    | Segment _ -> false;

  let is_inside_scene point scene =
    List.exists scene
      ~f:(fun el -> is_inside_scene_element point el) (* this is an anon function! *)

  (* examples of usage: *)
  is_inside_scene {x=3.;y=7.}
  [ Circle {center = {x=4.;y= 4.}; radius = 5.0 } ];;

Pattern Matching: some generic points about it #

Like in Elixir, pattern matching gives us a lot of joyful ways to express things.

The cases within a match expression are described as “arms” of the match.

SPECIALITY: designing a recursive function can be seen as the definition of inductive cases which eventually reduce to base cases. When writing recursive list functions, pattern matching becomes very useful:

let rec sum l =
  match l with
  | [] -> 0                   (* base case *)
  | hd :: tl -> hd + sum tl   (* inductive case *);;

let rec remove_sequential_duplicates list =
  match list with
  | [] -> []
  | first :: second :: tl ->
 if first = second then
   remove_sequential_duplicates (second :: tl)
 else
   first :: remove_sequential_duplicates (second :: tl);;

For match cases, we can actually stack up the cases and have one arm for multiple matches like so: it’s called an or-pattern

let rec find_first_repeat list =
  match list with
  | [] | [_] ->
    (* only zero or one elements, so no repeats *)
    None
  | x :: y :: tl ->
    if x = y then Some x else find_first_repeat (y::tl);;

Imperative Programming #

Consider the following distinguishing of imperative vs functional programming:

  • imperative: computations are structured as sequences of instructions that operate by making modifications to the state of the program.
  • functional: variable bindings and most data structures being immutable.

OCaml supports imperative programming via the usage of mutable data structures (e.g. arrays, hash tables) and control flow constructs (for, while loops). We control the sequence of operations using ; because order might matter since we’re doing mutations.

Arrays #

  • direct access and modification of array elements in constant time.
  • one of the most compact DSes, better than lists.
  • return value for this is the unit type, treated as a placeholder that communicates that side-effects have happened. Something like void in C/Java.

examples:

let numbers = [| 1; 2; 3; 4 |];;
numbers.(2) <- 4;;  (* 0-idx-ed access and modification *)
numbers;;

Mutable Record Fields #

Fields within records can be defined as mutable so that we an keep modifying the same record. We can have a hybrid set of fields here, some mutable and some immutable.

Typically means we should have some crud-like functions that interact with this as well.

type running_sum =
  { mutable sum: float;
    mutable sum_sq: float; (* sum of squares *)
    mutable samples: int;
  }

let mean rsum = rsum.sum /. Float.of_int rsum.samples;;
let stdev rsum =
  Float.sqrt
    (rsum.sum_sq /. Float.of_int rsum.samples -. mean rsum **. 2.);;

let create () = { sum = 0.; sum_sq = 0.; samples = 0 };;
let update rsum x =
  rsum.samples <- rsum.samples + 1;
  rsum.sum     <- rsum.sum     +. x;
  rsum.sum_sq  <- rsum.sum_sq  +. x *. x;;

Refs #

Single mutable values, it’s actually just a record type with contents as the single mutable field. Should be seen as a builtin type alias to a record with a single field called contents.

(* Refs-useful funtions and operators for ease *)
let x = ref 0  (* create a ref, i.e., { contents = 0 } *);;
!x             (* get the contents of a ref, i.e., x.contents *);;
x := !x + 1    (* assignment, i.e., x.contents <- ... *);;
!x;;

Refs can be used as accumulators in iterations but that’s not idiomatic:

let sum list =
  let sum = ref 0 in
  List.iter list ~f:(fun x -> sum := !sum + x);
  !sum;;

for and while loops #

let permute array =
  let length = Array.length array in
  for i = 0 to length - 2 do
    (* pick a j to swap with *)
    let j = i + Random.int (length - i) in
    (* Swap i and j *)
    let tmp = array.(i) in
    array.(i) <- array.(j);
    array.(j) <- tmp
  done;;
let find_first_negative_entry array =
  let pos = ref 0 in
  while !pos < Array.length array && array.(!pos) >= 0 do
    (*we're prevent OOB errors by using the short-circuiting abilities of &&, if we swap the left and right expressions then we can force out errors*)
    pos := !pos + 1
  done;
  if !pos = Array.length array then None else Some !pos;;