Chapter 1: Guided Tour

Tooling

  • the book is going to prefer using Base over the standard library that OCaml has. Checks out because one of the authors is a Jane Street senior (Yaron Minsky) and Base is from Jane Street.
  • utop is an improvement from the older ocaml shell

Basic Language Syntax

operators

  • = can be used for let-binding, can also be used for equality testing, they can be used in the same sentence like so:

    1
    2
    3
    4
    5
    6
    
      let even x =
        x % 2 = 0;;
    
      let sum_if_true test first second =
        (if test first then first else 0)
        + (if test second then second else 0);;
    
  • Boolean binary operators: && and || short-circuit

  • NOTE: differences between Base and ocaml stdlib:

    1. the operator for float vs integer exponentiation differs. Base differs because of a leaning towards consistency and clarity.

      For example, in Base, **. is float exponentiation, and ** is integer exponentiation, whereas in the standard library, ** is float exponentiation, and integer exponentiation isn’t exposed as an operator.

    2. To use float comparison when using Base, we need to open Float.O.

      When using Base, the default comparison operators work only on integers, and you need to explicitly choose other comparison operators when you want them. OCaml also offers a special set of polymorphic comparison operators that can work on almost any type, but those are considered to be problematic, and so are hidden by default by Base

      1
      2
      3
      4
      5
      6
      7
      8
      9
      
           let is_inside_scene_element point scene_element =
             let open Float.O in
             match scene_element with
             | Circle { center; radius } ->
                distance center point < radius
             | Rect { lower_left; width; height } ->
                point.x    > lower_left.x && point.x < lower_left.x + width
                && point.y > lower_left.y && point.y < lower_left.y + height
             | Segment _ -> false
      

expressions:

  • numerical expressions

    • numbers can have _ arbitrarily inserted into it. This is sugar and is there for readability. So 300_0 and 3000 are the same.
    • speciality: math operations are explicitly typed and don’t do any typing assumptions (e.g. float addition vs integer addition)
  • type inference:

    OCaml gives the freedom for us to be implicit or explicit in the types used in our code without creating ambiguity in either choices.

    expressions have types and the type of an expression is inferred from the available information about the components of that expression. This is almost like a logical engine, for which the programmer usually builds an intuition on over time.

    explicit typing is more for documentation purposes, or to assist correctness checks by humans.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
      (* Here's the untyped version *)
      let sum_if_true test first second =
        (if test first then first else 0)
        + (if test second then second else 0);;
    
      (* Here's the explicitly typed version *)
      let sum_if_true (test : int -> bool) (x : int) (y : int) : int =
        (if test x then x else 0)
        + (if test y then y else 0);;
    
    • generic type inference and parametric-polymorphism

      When it’s not possible to tie it down to concrete types, a type variable e.g. 'a is a generic type that is used to represent the generic type. The ' is convention for showing that the name 'a represents a type var.

      This is similar to how generics is handled by other languages like Java (perhaps Python’s typing system as well).

      example from utop:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
          let first_if_true test x y =
      
            if test x then x else y;;
      
          (* my toplevel output was: *)
          (*   val first_if_true : ('a -> bool/2) -> 'a -> 'a -> 'a = <fun> *)
          let long_string s = String.length s > 6;;
          let big_number x = x > 3;;
      
          first_if_true long_string "nice" "well done, let's do this";;
          first_if_true big_number  3  2;;
      

name binding

let-binding is how OCaml does name binding, which has some implicit meanings that get floated as rules:

  • variable name identifiers must start with lowercase letter or underscore. The lowercase part is because uppercased first letter names are treated as type constructors.

    1
    2
    3
    4
    5
    6
    7
    
      (* Legal examples *)
      let seven = 3 + 4;;
      let _seven = "seven";;
      let s3v3n = "hello";;
    
      (* Illegal examples *)
      (* let Eight = 4 + 4;; (\* This will error out on "Eight" because it's things it's a constructor *\) *)
    
  • functions are values so names also can be let-binded to functions. The syntax for definition and invocation of functions is more Unix-shell-like (space separated) instead of by tokens like parentheses.

We can co nested let-bindings as a way to define names within a local scope. This helps us write out complex expressions easily. The scope of let bindings are terminated by ;;.

1
2
3
let x = 7 in
let y = x * x in
x + y;;

Modules:

  • . is for name-spaced access rather than OOP-like accessing of a method.

  • name shadowing:

    in OCaml it’s when there’s a clash in name identifiers, it’s not an automatic erasure of the old one, it’s just a hiding/superceding within a given scope. At the point where that scope ends, the original definition is visible again.

    Shadowing is not destructive.

    Shadowing can be constrained to a limited, local scope like so:

    1
    2
    3
    
      let ratio x y =
        let open Float.O in
        of_int x / of_int y;;
    

    alternative (more concise) syntax:

    1
    2
    
      let ratio x y =
        Float.O.(of_int x / of_int y);;
    
    • open allows availability of names in the current toplevel without having to prefix the module name.
      • Is open Explicit or Implicit Shadowing?
        • depends on whether you’re doing it with the knowledge that name shadowing will occur or not.

Type Errors vs Exceptions

OCaml distinguishes compiler errors (pre-runtime) from runtime Exceptions (at runtime).

Structural/ADTs: tuples, lists, options, records

These are data types that are predefined, polymorphic, algebraic data types. In other languages, we’d have described it as “container types”, that term in the OCaml world is for reference holders.

Why “Container” Is Not Precise

In OCaml, the term “container” is best reserved for types that implement explicit interfaces for holding collections of items—like sets, maps, or similar abstractions (as opposed to data types defined by their constructors as in options, lists, or tuples). Lists qualify as containers in this generic sense, but options and tuples serve somewhat different structural roles—they embody fixed or optional structure rather than arbitrary-sized holding. (ref)

  • Tuples as cartesian product types

    • the type description (e.g. val a_tuple : int * string = (3, "three")) hints at a mental model where we consider the cartesian product of the two types t and s, that’s why the * is used.

    • syntax is similar for constructing a tuple (let a_tuple = (3,"three");;) and for extracting from a tuple (let (x,y) = a_tuple;;)

    • we can pattern match function arguments, this is the same stuff that we’re familiar with in the elixir world.

  • List and options as polymorphic types

    • List

      • elements must be homogenous in their type.

      • constructor is :: and it’s right associative, empty list is []

        we can do list concat but remember it’s NOT in constant time

        1
        
              [1;2;3] @ [2;4;12]
        
      • SPECIALITY: list elements are separated using ;. , will automatically give us tuples so careful on this. I’d imagine this is a common beginner’s silent bug because that’s legal but often unintended.

      • Pattern matching:

        It’s a good idea to try and represent outcomes exhaustively.

        • we could choose to pattern match using the [] and :: operator primitives but oftentimes, we may miss out some cases (the compiler is smart enough to warn against this).

          Instead, we could use match expressions

          1
          2
          3
          4
          5
          6
          7
          8
          9
          
                  (* this compiles with a warning, may throw runtime exception if we pass in [] *)
                  let my_favorite_language_1 (my_favorite :: the_rest) =
                    my_favorite;;
          
                  (* with match expression *)
                  let my_favorite_language_2 languages =
                    match languages with
                    | first :: the_rest -> first
                    | [] -> "OCaml" (* A good default! *);;
          
    • Option

      • Some and None are constructors that help us build optional values.

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        
              (* Constructing options: *)
              let divide x y =
                if y = 0 then None else Some (x / y);;
        
              (* Pattern matching useful for examining options *)
              let downcase_extension filename =
                match String.rsplit2 filename ~on:'.' with
                | None -> filename
                | Some (base,ext) ->
                  base ^ "." ^ String.lowercase ext;;
        

        In a way, only Options are nullable in OCaml, which differs from other langauges where any type may be nullable since it may be instantiated but not assigned a value.

        Missing values are explicit in OCaml, which is beautiful (old Yegor Bugayenko essay on “Why Null is Bad” and another Tony Hoare essay on “Null References: The Billion Dollar Mistake” )

        OCaml’s result type is used similarly for expressing error or absence but is intended to signal computation errors rather than optional absence.

  • Records as labelled product types

    We should contrast these with positional typing that we see in tuples. Since it’s labelled product types, we have a bunch of ways we can access and use them.

    Accessing can be done via dot notation, pattern matching or field punning.

    1
    2
    3
    4
    5
    6
    
      (* Dot notation: *)
      let distance v1 v2 =
        magnitude { x = v1.x -. v2.x; y = v1.y -. v2.y };;
    
      (* field punning: field name reuse from the record *)
      let magnitude { x; y } = Float.sqrt (x **. 2. +. y **. 2.);;
    
  • Variant Types (aka Algebraic Data Types): representing multi-object scenes

    Say we wish to describe a multi-object scene, we use Variant types to describe a unified representation as a single type.

    Unlike union types in other languages (which are usually untagged), variants are tagged types

    1
    2
    3
    4
    
      type scene_element =
        | Circle  of circle_desc (* So, Circle is the tag for this *)
        | Rect    of rect_desc
        | Segment of segment_desc
    

Example of expressiveness:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  type point2d = { x : float; y : float };
  type circle_desc  = { center: point2d; radius: float };
  type rect_desc    = { lower_left: point2d; width: float; height: float };
  type segment_desc = { endpoint1: point2d; endpoint2: point2d };
  type scene_element =
    | Circle  of circle_desc
    | Rect    of rect_desc
    | Segment of segment_desc;

  let distance v1 v2 =
    magnitude { x = v1.x -. v2.x; y = v1.y -. v2.y };

  (* We're matching based on the variant tags *)
  let is_inside_scene_element point scene_element =
    let open Float.O in
    match scene_element with
    | Circle { center; radius } ->
      distance center point < radius
    | Rect { lower_left; width; height } ->
      point.x    > lower_left.x && point.x < lower_left.x + width
      && point.y > lower_left.y && point.y < lower_left.y + height
    | Segment _ -> false;

  let is_inside_scene point scene =
    List.exists scene
      ~f:(fun el -> is_inside_scene_element point el) (* this is an anon function! *)

  (* examples of usage: *)
  is_inside_scene {x=3.;y=7.}
  [ Circle {center = {x=4.;y= 4.}; radius = 5.0 } ];;

Pattern Matching: some generic points about it

Like in Elixir, pattern matching gives us a lot of joyful ways to express things.

The cases within a match expression are described as “arms” of the match.

SPECIALITY: designing a recursive function can be seen as the definition of inductive cases which eventually reduce to base cases. When writing recursive list functions, pattern matching becomes very useful:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
let rec sum l =
  match l with
  | [] -> 0                   (* base case *)
  | hd :: tl -> hd + sum tl   (* inductive case *);;

let rec remove_sequential_duplicates list =
  match list with
  | [] -> []
  | first :: second :: tl ->
 if first = second then
   remove_sequential_duplicates (second :: tl)
 else
   first :: remove_sequential_duplicates (second :: tl);;

For match cases, we can actually stack up the cases and have one arm for multiple matches like so: it’s called an or-pattern

1
2
3
4
5
6
7
let rec find_first_repeat list =
  match list with
  | [] | [_] ->
    (* only zero or one elements, so no repeats *)
    None
  | x :: y :: tl ->
    if x = y then Some x else find_first_repeat (y::tl);;

Imperative Programming

Consider the following distinguishing of imperative vs functional programming:

  • imperative: computations are structured as sequences of instructions that operate by making modifications to the state of the program.
  • functional: variable bindings and most data structures being immutable.

OCaml supports imperative programming via the usage of mutable data structures (e.g. arrays, hash tables) and control flow constructs (for, while loops). We control the sequence of operations using ; because order might matter since we’re doing mutations.

Arrays

  • direct access and modification of array elements in constant time.
  • one of the most compact DSes, better than lists.
  • return value for this is the unit type, treated as a placeholder that communicates that side-effects have happened. Something like void in C/Java.

examples:

1
2
3
let numbers = [| 1; 2; 3; 4 |];;
numbers.(2) <- 4;;  (* 0-idx-ed access and modification *)
numbers;;

Mutable Record Fields

Fields within records can be defined as mutable so that we an keep modifying the same record. We can have a hybrid set of fields here, some mutable and some immutable.

Typically means we should have some crud-like functions that interact with this as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
type running_sum =
  { mutable sum: float;
    mutable sum_sq: float; (* sum of squares *)
    mutable samples: int;
  }

let mean rsum = rsum.sum /. Float.of_int rsum.samples;;
let stdev rsum =
  Float.sqrt
    (rsum.sum_sq /. Float.of_int rsum.samples -. mean rsum **. 2.);;

let create () = { sum = 0.; sum_sq = 0.; samples = 0 };;
let update rsum x =
  rsum.samples <- rsum.samples + 1;
  rsum.sum     <- rsum.sum     +. x;
  rsum.sum_sq  <- rsum.sum_sq  +. x *. x;;

Refs

Single mutable values, it’s actually just a record type with contents as the single mutable field. Should be seen as a builtin type alias to a record with a single field called contents.

1
2
3
4
5
(* Refs-useful funtions and operators for ease *)
let x = ref 0  (* create a ref, i.e., { contents = 0 } *);;
!x             (* get the contents of a ref, i.e., x.contents *);;
x := !x + 1    (* assignment, i.e., x.contents <- ... *);;
!x;;

Refs can be used as accumulators in iterations but that’s not idiomatic:

1
2
3
4
let sum list =
  let sum = ref 0 in
  List.iter list ~f:(fun x -> sum := !sum + x);
  !sum;;

for and while loops

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
let permute array =
  let length = Array.length array in
  for i = 0 to length - 2 do
    (* pick a j to swap with *)
    let j = i + Random.int (length - i) in
    (* Swap i and j *)
    let tmp = array.(i) in
    array.(i) <- array.(j);
    array.(j) <- tmp
  done;;
1
2
3
4
5
6
7
let find_first_negative_entry array =
  let pos = ref 0 in
  while !pos < Array.length array && array.(!pos) >= 0 do
    (*we're prevent OOB errors by using the short-circuiting abilities of &&, if we swap the left and right expressions then we can force out errors*)
    pos := !pos + 1
  done;
  if !pos = Array.length array then None else Some !pos;;

Chapter 2: Variables and Functions

Variables

OCaml has a different take on variables than other programming languages in the sense that variables are all constants within their scope.

Additionally, we may shadow them in nested scopes and have access to the outer scopes when the nested scope is done with. A good mental model is a stack of names and every time we redefine it, we append to the stack and when we’re done, we pop from the stack (leaving access to the previous value for that shadowed name).

This is the same as what we’re familiar with in the elixir world.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
(* Nested let-bindings are useful *)
let languages = "OCaml,Perl,C++,C";;
let dashed_languages =
  let languages = String.split languages ~on:',' in
  String.concat ~sep:"-" languages;;

(* especially useful to do some complex calcuations and have intermediary values that we can refer to:  *)
let area_of_ring inner_radius outer_radius =
  let pi = Float.pi in
  let area_of_circle r = pi *. r *. r in
  area_of_circle outer_radius -. area_of_circle inner_radius;;

Pattern Matching and Let

Nothing fancy here, pattern matching is structural comparisons so we can have let-binding of names on LHS and pattern match to the right hand side like so:

Just careful that all cases for the pattern matching are considered, else use a match expression instead.

1
2
3
4
5
6
7
let (ints,strings) = List.unzip [(1,"one"); (2,"two"); (3,"three")];;

(* better to use match expression here: *)
let upcase_first_entry line =
  match String.split ~on:',' line with
  | [] -> assert false (* String.split returns at least one element, so we can assert this case *)
  | first :: rest -> String.concat ~sep:"," (String.uppercase first :: rest);;

Functions

Essentially “first class citizens” because they’re as normal as any other object.

let AND fun: let-binding function parameters

Think of the parameter of a function as a variable being bound to the value passed by the caller. This notion is useful for our final understanding of monadic styles.

The following are equivalent statements (almost) in how this name-binding works:

1
2
(fun x -> x + 1) 7;;
let x = 7 in x + 1;;

Multi-argument Functions

Some ways to write functions with multiple args:

1
2
3
let abs_diff x y = abs (x - y);; (* usual, curried by default *)
let abs_diff = (fun x -> (fun y -> abs(x - y)));; (* explicitly written as a curried function *)
let abs_diff (x, y) = abs (x - y);; (* alternatively written as a single tuple argument, can't be partially applied *)

SPECIALITY: Multi-arg functions are equivalent to their curried forms in OCaml, so we may partially apply them.

Recursive Functions

Should separate OCaml’s looping constructs and use them for imperative style code. This is why recursive functions are idiomatic for building looping constructs recursively.

We have to explicitly use let rec to define recursive function in OCaml. The value of having the programmer explicitly define recursive functions:

  • good for mutually recursive definitions because those are harder for humans to reason with

    here’s an example of a mutually recursive example for didactic reasons:

    1
    2
    3
    4
    
      let rec is_even x =
        if x = 0 then true else is_odd (x - 1)
      and is_odd x =
        if x = 0 then false else is_even (x - 1);;
    
  • having non-recursive forms helps us create a new definition that extends and supersedes a recursive form by shadowing it.

Prefix and Infix Operators (and operator overloading)

  • special set of identifiers: ~ ! $ % & * + - . / : < = > ? @ ^ |

    Here’s an extended set of grammar rules for it, search it properly when you need to override / implement such infix functions. There’s a bunch of precedence and associativity rules to account for as well.

    The first character of the operator symbol determines a bunch of things (operator precedence, left or right associative…). We can group up multiple characters into a single infix operator.

    Some interesting things about user-defined operators:

    • we can easily implement the pipe operator (see below)
    • our choice of operator matters because it would have implications such as the operator being left or right associative that may pose problems to us.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      
          let (|>) x f = f x;;  (* NOTE: left-associative operator *)
      
          (* example of using pipe operator in OCaml *)
          open Stdio;;
          let path = "/usr/bin:/usr/local/bin:/bin:/sbin:/usr/bin";;
          String.split ~on:':' path
          |> List.dedup_and_sort ~compare:String.compare
          |> List.iter ~f:print_endline;;
      
          (* NOTE: This works well because |> is left-associative, which is why the operator we use matters*)
      
          (* this won't work well because the operator is right-associative *)
          let (^>) x f = f x;;
      
    • operator precedence matters.
  • GOTCHA: using * identifier need to be careful because (**) is for comments syntax

    The fix to this is that we have to have some extra spaces:

    1
    2
    
      let ( *** ) x y = (x **. y) **. y;; (* this is correct *)
      let (***) x y = (x **. y) **. y;; (* this is NOT correct *)
    
  • GOTCHA: there are some special operators:

    • - : can be written as infix (e.g. Int.max (3-4) 1) and prefix (e.g. Int.max 3 (-4)) versions, with different meanings.

      Also, - has a lower precedence to function application in an expression, so that’s why we need brackets in the example above.

  • Reverse Application Operator (|>)

    This is mind blowing 🤯.

    Since operator overloading is easy to do and the scope for it may be kept small, we can implement the Reverse Application operator (elixir pipe operator) easily.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    
    let (|>) x f = f x;;  (* NOTE: left-associative operator *)
    
    (* example of using pipe operator in OCaml *)
    open Stdio;;
    let path = "/usr/bin:/usr/local/bin:/bin:/sbin:/usr/bin";;
    String.split ~on:':' path
    |> List.dedup_and_sort ~compare:String.compare
    |> List.iter ~f:print_endline;;
    
    (* NOTE: This works well because |> is left-associative, which is why the operator we use matters*)
    
    (* this won't work well because the operator is right-associative *)
    let (^>) x f = f x;;
    
  • Application Operator (@@)

    For syntax sugar when we apply multiple functions together: f @@ g @@ h x which replaces f(g(h x)). This HAS to be right-associative.

more on syntax

Naturally, there’s some good syntax support for writing functions.

  • function: syntactic support

    The function syntax allows us to pattern match easily, we can use it together with the usual syntax. If it’s a multi-argument function, we’re actually just pattern matching the last parameter to that function .

    1
    2
    3
    4
    5
    
    let some_or_default default = function
      | Some x -> x
      | None -> default;;
    
    List.map ~f:(some_or_default 100) [Some 3; None; Some 4];;
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    (* another example:  *)
    (** [contains t x] returns true iff [x] is contained in the
        interval [t] *)
    let contains t x =
      match t with
      | Empty -> false
      | Interval (l,h) ->
         Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0
    (* converting this to use the syntactic sugar: *)
    (** [contains t x] returns true iff [x] is contained in the interval [t] *)
    let contains x = function
      | Empty -> false
      | Interval (l, h) ->
        Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0
    
  • labelled args

    Labelling here is just named params.

    The labelled args can be label pruned (aligns with field punning that we saw in records).

    1
    2
    3
    4
    5
    6
    7
    8
    
    let ratio ~num ~denom = Float.of_int num /. Float.of_int denom;;
    
    (* pruned names: *)
    let num = 3 in
    let denom = 4 in
    ratio ~num ~denom;;
    (* equivalent application of the ratio function: *)
    ratio ~num:3 ~denom:10
    
    • Use cases

      To highlight the usefulness of labelling:

      1. explicating long argument lists

        when too many args, easier to remember by label rather than position

      2. when positional arg types are uninformative (and to disambiguate similar arguments)

        sometimes type signature is not sufficient to get the meaning of the arguments. Using labelled signature gives this clarity.

        e.g. val create_hashtable : int -> bool -> ('a,'b) Hashtable.t vs val create_hashtable : init_size:int -> allow_shrinking:bool -> ('a,'b) Hashtable.t

      3. it’s useful for partial application and chaining together operations

        It allows us to bind some of the arguments to partial functions based on their label. In the example below works because List.iter can be given the labelled argument for f.

        1
        2
        3
        
           String.split ~on:':' path
           |> List.dedup_and_sort ~compare:String.compare
           |> List.iter ~f:Stdio.print_endline;;
        
    • GOTCHA: Higher Order Functions must be consistent in the ordering of labeled functions

      This one has to do with consistent ordering:

      • suppose HOF: let apply_to_tuple f (first,second) = f ~first ~second;;

        then if we use a labelled function like let divide ~first ~second = first / second;;, everything works well because the ordering of the arguments and the labels are consistent.

      • if the HOF was to swap the ordering: let apply_to_tuple_2 f (first,second) = f ~second ~first;; then the same divide function wouldn’t work.

        this is because the HOF expects a function and that function has a signature that the compiler ties down based on the names used there. In the case of the 2nd HOF, the function argument has a signature where the first argument it accepts is labelled as second and the second argument is labelled as first. This is where the mismatch is created if divide has a different ordering of the labelled arguments.

      • we can inspect the function signatures for our two functions and the problem is clearer:

        • apply_to_tuple has the signature - : (first:'a -> second:'b -> 'c) -> 'a * 'b -> 'c = <fun>

        • apply_to_tuple2 has the signature - : (second:'a -> first:'b -> 'c) -> 'b * 'a -> 'c = <fun>

  • Optional Arguments

    Good use cases for Optional Argos:

    1. The intent is to provide defaults for params that you may have an opinion on what they should be.
    2. for wrapping onto some API interface and providing some arguments already
      1
      2
      3
      4
      5
      6
      7
      8
      
         let concat ?(sep="") x y = x ^ sep ^ y;;
      
         (* here's how the wrapper can be built: *)
         let uppercase_concat ?sep a b = concat ?sep (String.uppercase a) b;; (* this is a pass-through*)
      
         (* negative example of how to write the wrapper: *)
         let uppercase_concat ?(sep="") a b = concat ~sep (String.uppercase a) b;;
         (* the decision on what the default separator should be is a distinct, separate decision on the outer function. If the inner function (concat) changes then we might need to change the outer function as well.*)
      
    3. RULE OF THUMB: avoid optional arguments for functions internal to a module (i.e. functions not defined in the interface)

    Optional arguments really only make sense when the extra concision of omitting the argument outweighs the corresponding loss of explicitness.

    Problems:

    1. user not knowing that there’s an optional argument
    2. rarely used functions should NOT have optional arguments

    When defining a function with optional args, we have some syntactic sugar (where we just provide a default param to the optional) to help us out:

    1
    2
    3
    4
    5
    6
    7
    
    (* Longer way to write the function*)
    let concat ?sep x y =
      let sep = match sep with None -> "" | Some s -> s in
      x ^ sep ^ y ;;
    
    (* Terse way: no need match on optional, just can provide a default param*)
    let concat ?(sep="") x y = x ^ sep ^ y;;
    

    When passing in optional args, they have to be passed in explicitly:

  • how labelled and optional args are inferred

    • explicit typing

      in this example:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      
      let numeric_deriv ~delta ~x ~y ~f =
        let x' = x +. delta in
        let y' = y +. delta in
        let base = f ~x ~y in
        let dx = (f ~x:x' ~y -. base) /. delta in
        let dy = (f ~x ~y:y' -. base) /. delta in
        (dx,dy);;
      (*
      the signature comes out to rely on the order of arguments that's in the code:
      
      val numeric_deriv :
        delta:float ->
        x:float -> y:float -> f:(x:float -> y:float -> float) -> float * float =
        <fun>
      
      *)
      

      not obvious how the order of the arguments to f should be chosen, but there’s a need to have some heuristic around resolving this ambiguity.

      The heuristic the compiler uses is to prefer labels to options and to choose the order of arguments that shows up in the source code.

      In cases that are ambiguous, compiler may throw an error for this, but we may still have varying orders if we explicitly provide the types:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      
      (* this works, types are explicitly provided *)
      let numeric_deriv ~delta ~x ~y ~(f: x:float -> y:float -> float) =
        let x' = x +. delta in
        let y' = y +. delta in
        let base = f ~x ~y in
        let dx = (f ~y ~x:x' -. base) /. delta in
        let dy = (f ~x ~y:y' -. base) /. delta in
        (dx,dy);;
      
      (* this won't work and will throw an error *)
      let numeric_deriv ~delta ~x ~y ~f =
        let x' = x +. delta in
        let y' = y +. delta in
        let base = f ~x ~y in
        let dx = (f ~y ~x:x' -. base) /. delta in
        let dy = (f ~x ~y:y' -. base) /. delta in
        (dx,dy);;
      (*
      Line 5, characters 15-16:
      Error: This function is applied to arguments
             in an order different from other calls.
             This is only allowed when the real type is known.
      *)
      
    • GOTCHA: partial application with optional arguments

      This part can be sometimes tricky.

      Somewhat consistent with our currying observations, we have to be aware the optional argument can be ERASED in some cases if we do partial applications.

      The rule is:

      • optional argument is erased as soon as the first positional (i.e., neither labeled nor optional) argument defined after the optional argument is passed in.

        say we do let colon_concat = concat ~sep: ":", then signature A and B will behave differently

        Signature A below will end up having the erasure B will still allow for it.

        This point only applies when we’re trying to do partial applications for functions that have optionals within them! If we provide all the arguments at once, then the ordering doesn’t matter whatsoever.

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        
          (* signature A: positionals are all after the optional *)
          let concat ?(sep="") x y = x ^ sep ^ y;;
          (* val concat : ?sep:string -> string -> string -> string = <fun> *)
          let prepend_pound = concat "# ";;
          (* val prepend_pound : string -> string = <fun> *)
        
          (* Signature B: positional before optional *)
          let concat x ?(sep="") y = x ^ sep ^ y;;
          (* val concat : string -> ?sep:string -> string -> string = <fun> *)
          let prepend_pound = concat "# ";;
          (* val prepend_pound : ?sep:string -> string -> string = <fun> *)
        

Chapter 3: Lists and Patterns

Basic notes

Just some points to revisit the tour that we already had

  1. cons operator :: is right-associative. The attachment of things is consistent with the knowledge that OCaml lists are singly-linked lists.

    this is why the extension doesn’t modify the existing list, it’s more of that the new element is “prepended” and has a refernce to the rest of the list.

  2. the empty list [] is polymorphic even though the overall list is NOT.

  3. conventional naming:

    • follow the naming of hd and tl for head and tail of list.

Patterns to extract data from list

match expressions can let-bind new names

Match expressions can be used to bind new variables. This may lead to problems where we think that we’re reusing a variable within the arm of a match case but it’s actually going to shadow that variable instead.

LIMITATION: In the elixir world, this is why we have the use of the pin operator but in OCaml, there’s no similar construct

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(* example of how match expressions are used to bind new variables *)
let rec sum l =
  match l with
  | [] -> 0
  | hd :: tl -> hd + sum tl;;

(* GOTCHA: here, it's not a reference but a new declaration, hence the older variable gets shadowed:  *)
let rec drop_value l to_drop =
  match l with
  | [] -> []
  | to_drop :: tl -> drop_value tl to_drop (* this is not a referene, it's a re-declaration of to_drop *)
  | hd :: tl -> hd :: drop_value tl to_drop;; (* this case is the same as the 2nd case above.*)

the fix to this is to use guards or if statements, which would give the same function signature:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
let rec drop_value l to_drop =
  match l with
  | [] -> []
  | hd :: tl when hd = to_drop -> drop_value tl to_drop
  | hd :: tl -> hd :: drop_value tl to_drop
;;

let rec drop_value l to_drop =
  match l with
  | [] -> []
  | hd :: tl ->
    let new_tl = drop_value tl to_drop in
    if hd = to_drop then new_tl else hd :: new_tl;;

Limitations and Blessings of Pattern Matching

Pattern matching is structural \(\implies\) we shouldn’t try to encode arbitrary conditions within the pattern. (That’s what the when conditions are for.)

The conditions we use pattern matching for should only be structurally relevant.

blessing: Performance

  • match cases are entries in a jump table, so they are efficient.

    a good comparison is between the performance for match cases and an equivalent if else statement:

    WITNESS: here’s how we can do a quick benchmark using core_bench module

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    
    
      (* this gives a jump table. *)
      let plus_one_match x =
        match x with
        | 0 -> 1
        | 1 -> 2
        | 2 -> 3
        | 3 -> 4
        | 4 -> 5
        | 5 -> 6
        | _ -> x + 1;;
    
      (* this is sequential, gives a conditional branch tree to navigate *)
      let plus_one_if x =
        if      x = 0 then 1
        else if x = 1 then 2
        else if x = 2 then 3
        else if x = 3 then 4
        else if x = 4 then 5
        else if x = 5 then 6
        else x + 1;;
    
    
    
      #require "core_bench";;
      open Core_bench;;
    
      [ Bench.Test.create ~name:"plus_one_match" (fun () ->
            plus_one_match 10)
      ; Bench.Test.create ~name:"plus_one_if" (fun () ->
            plus_one_if 10) ]
      |> Bench.bench;;
    
    1
    2
    3
    4
    5
    6
    7
    
    Estimated testing time 20s (2 benchmarks x 10s). Change using '-quota'.
    ┌────────────────┬──────────┬────────────┐
    │ Name           │ Time/Run │ Percentage │
    ├────────────────┼──────────┼────────────┤
    │ plus_one_match │  35.45ns │     74.59% │
    │ plus_one_if    │  47.53ns │    100.00% │
    └────────────────┴──────────┴────────────┘
    

blessing: Error Detection

Some examples of what we’ve already seen

  • the exhaustiveness of match expression-cases is checked by the compiler
  • redundancies of cases are also checked

These work by patterned-context, not by some algo that checks the predicates (doesn’t exist).

What we will see soon:

  • the exhaustiveness checks in the context of variants and user defined types
  • the usefulness of using match statement warnings when we want to chase a chain of refactor-effects – the ball of yarn will just show itself!!

List module

Similar to elixir, the default modules offer good APIs for us to use.

Something to note in this section is that we will see a bunch of composed funtions such as List.concat_map which composes map and concat. Like in our understanding of functions (which is right-associative) any composed function op x_y means apply y then x

Here’s an example set of code where we:

  1. use List.map and List.map2_exn

    • the _exn is the OCaml way of doing throwable function calls, like in Elixir where we have strict operations appended with ! to the function name.
  2. use String functions

    1. String.concat which are better for large strings. Using the ^ operator will generate multiple sub-strings first – becomes a problem for largers strings.
    2. String.make
  3. use List.fold which is just a more general version of reduce function in OCaml syntax

    List.fold walks over the list from left to right, updating the accumulator at each step and returning the final value of the accumulator when it’s done.

    1
    2
    
       (* example usage for summing a list *)
       List.fold ~init:0 ~f:(+) [1;2;3;4];;
    

    Note:

    1. accumulator need NOT be the same time as the elements of the list
      1
      
            List.fold ~init:[] ~f:(fun acc hd -> hd :: acc) [1;2;3;4];;
      

The example is one where we try to render a table of values (i.e. a 2D array). So our table rendering function’s code would look something like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
(* 1: find out the max widths for the headers based on the values that are in the columns *)
let max_widths header rows =
  let lengths l = List.map ~f:String.length l in
  List.fold rows
    ~init:(lengths header)
    ~f:(fun acc row ->
        List.map2_exn ~f:Int.max acc (lengths row));; (* The _exn is there because the function throws an exception if the lists are of mismatched length:*)

(* 2: function that renders the header-values separator *)
let render_separator widths =
  let pieces = List.map widths
      ~f:(fun w -> String.make w '-')
  in
  "|-" ^ String.concat ~sep:"-+-" pieces ^ "-|";;

(* 3. to pad a string to a specified length *)
let pad s length =
  s ^ String.make (length - String.length s) ' ';;

(* 4. now we can render a row of data: *)
let render_row row widths =
  let padded = List.map2_exn row widths ~f:pad in
  "| " ^ String.concat ~sep:" | " padded ^ " |";;

(* 5. Combine altogether to render the table: *)
let render_table header rows =
  let widths = max_widths header rows in
  String.concat ~sep:"\n"
    (render_row header widths
     :: render_separator widths
     :: List.map rows ~f:(fun row -> render_row row widths)
    );;

(* applying it for the example *)
Stdio.print_endline
  (render_table
     ["language";"architect";"first release"]
     [ ["Lisp" ;"John McCarthy" ;"1958"] ;
       ["C"    ;"Dennis Ritchie";"1969"] ;
       ["ML"   ;"Robin Milner"  ;"1973"] ;
       ["OCaml";"Xavier Leroy"  ;"1996"] ;
]);;

(* Here's what the output looks like:

| language | architect      | first release |
|----------+----------------+---------------|
| Lisp     | John McCarthy  | 1958          |
| C        | Dennis Ritchie | 1969          |
| ML       | Robin Milner   | 1973          |
| OCaml    | Xavier Leroy   | 1996          |
- : unit = ()
 *)
1
2
3
4
5
6
| language | architect      | first release |
|----------+----------------+---------------|
| Lisp     | John McCarthy  | 1958          |
| C        | Dennis Ritchie | 1969          |
| ML       | Robin Milner   | 1973          |
| OCaml    | Xavier Leroy   | 1996          |

Combining Elements with List.reduce

It’s a more specific case, of fold, without the init param supplied.

Result is wrapped in an optional.

Filtering with List.filter, List.filter_map

  • the case for filtering then mapping is common enough for List.filter_map
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
      (* list of file extensions from a list of files, returning deduped and in sorted order. *)
      let extensions filenames =
        List.filter_map filenames ~f:(fun fname ->
            match String.rsplit2 ~on:'.' fname with
            | None  | Some ("",_) -> None
            | Some (_,ext) ->
              Some ext)
        |> List.dedup_and_sort ~compare:String.compare;;
    
      extensions ["foo.c"; "foo.ml"; "bar.ml"; "bar.mli"];
    

Partitioning with List.partition_tf

The _tf is a helpful mnemonic that reminds us that the output is two lists: first list is with the elements where the predicate returns true and the second list is the one for falses.

Multiple ways to combine lists

We can append (which can be done via List.append or operator @) or we can concat via List.concat (something more like flattening a list of lists).

Since concat and map functions are often used together (same idea as flatmaps), there’s a dedicated List.concat_map for us to use

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#require "core_unix.sys_unix";;
module Sys = Core.Sys
module Filename = Core.Filename;;

(* example without concat_map *)
let rec ls_rec s =
  if Sys_unix.is_file_exn ~follow_symlinks:true s
  then [s]
  else
    Sys_unix.ls_dir s
    |> List.map ~f:(fun sub -> ls_rec (Filename.concat s sub))
    |> List.concat;; (*flattens*)

(* example with concat_map *)
let rec ls_rec s =
  if Sys_unix.is_file_exn ~follow_symlinks:true s
  then [s]
  else
    Sys_unix.ls_dir s
    |> List.concat_map ~f:(fun sub -> ls_rec (Filename.concat s sub));;
val ls_rec : string -> string list = <fun>

Tail-call Optimisation: Writing tail recursive functions

Classic stack machine used so there’s a limit to the stack frames, but it’s easy to write out tail recursive functions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(* Not tail-recursive: recursive call wrapped inside addition *)
let rec length = function
  | [] -> 0
  | _ :: tl -> 1 + length tl  (* not tail position *)
;;

(* Tail-recursive helper with accumulator `n` *)
let rec length_plus_n l n =
  match l with
  | [] -> n
  | _ :: tl -> length_plus_n tl (n + 1)  (* tail position *)

(* Wrapper function to hide accumulator *)
let length l = length_plus_n l 0

Tail-call optimisation is not new to me but here’s another short rundown:

  • in length_plus_n, the recursive call for that function is a tail call \(\implies\) no need to allocate new stack frame

    • tail-call when the caller of the function doesn’t have to do anything withh the returned value (doesn’t use it) that is returned by the callee, other than to return it. This is why there’s no need to preserve that stack frame (of the caller) \(\implies\) compiler can make this tail call optimisation.
  • idioms (to transform non-TCO to TCO style):

    1. tail calls = recursive call is the “final action”, no other important logic happens after

    2. accumulator and helper usage:

      it’s just a clean way of doing things so that we can isolate out non-tail calls from tail-calls within naive recursive functions \(\implies\) naturally we end up with tail-called helpers

  • if all calls are tail calls, then we’ve done tail-call optimisation correctly.

Terser / Faster Patterns

using the as pattern to alias the matched pattern

Typically if we add in name, a new object for the matching type is created. We can save space by using as to get handle to the matched value.

The as pattern lets you bind a name to the whole matched value in addition to deconstructing it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
let rec remove_sequential_duplicates list =
  match list with
  | [] | [_] as l -> l  (* or-pattern to collapse the handling *)
  | first :: (second :: _ as tl) -> (* see the matched object _ is what we have a direct handle to (instead of creating a copy.)  *)
    if first = second then
      remove_sequential_duplicates tl
    else
      first :: remove_sequential_duplicates tl;;

(* SLOW, BAD EXAMPLE *)
let rec remove_sequential_duplicates list =
  match list with
  | [] -> []
  | [x] -> [x]
  | first :: second :: tl ->
    if first = second then
      remove_sequential_duplicates (second :: tl)
    else
      first :: remove_sequential_duplicates (second :: tl);;

using when for precondition checks

continuing with the example, the similarity check was being done via an if-else. We could shift it to a when guard.

1
2
3
4
5
6
let rec remove_sequential_duplicates list =
  match list with
  | [] | [_] as l -> l
  | first :: (second :: _ as tl) when first = second ->
    remove_sequential_duplicates tl
  | first :: tl -> first :: remove_sequential_duplicates tl;;

polymorphic comparison

This is going to be so magical.

The idea here is that the Base’s default equality op is integers-only. If we use a polymorphic version, then we will be able to use that function we had for any type.

That’s what Base.Poly helps us do.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
open Base.Poly

let rec remove_sequential_duplicates list =
  match list with
  | [] | [_] as l -> l
  | first :: (second :: _ as tl) when first = second ->
    remove_sequential_duplicates tl
  | first :: tl -> first :: remove_sequential_duplicates tl;;

remove_sequential_duplicates ["one";"two";"two";"two";"three"];;

LIMITATION: you can’t build these functions (polymorphic comparators) on your own. OCaml’s polymorphic comparison functions are built into the runtime to a low level. These comparisons are polymorphic on the basis of ignoring almost everything about the types of the values that are being compared, paying attention only to the structure of the values as they’re laid out in memory.

WITNESS: typically OCaml devs don’t like polymorphic comparison because it can introduce silent bugs that will be hard to catch. Restrictions are a good thing because they give predictability to our code.

NOTE: when guards may harm the compiler’s exhaustiveness check because the pattern matching and logical predicate (for when clause) is being directly used.

Chapter 4: Files, Modules, and Programs

Files are the unit of organisation for a module. This section is a bunch of OCaml specific notes on writing OCaml programs, working with modules and module signatures.

Single File Programs

here’s the demo code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
open Base
open Stdio

let build_counts () =
  In_channel.fold_lines In_channel.stdin ~init:[] ~f:(fun counts line ->
      let count =
        match List.Assoc.find ~equal:String.equal counts line with
        | None -> 0
        | Some x -> x
      in
      List.Assoc.add ~equal:String.equal counts line (count + 1))

let () =
  build_counts ()
  |> List.sort ~compare:(fun (_, x) (_, y) -> Int.descending x y)
  |> (fun l -> List.take l 10)
  |> List.iter ~f:(fun (line, count) -> printf "%3d: %s\n" count line)

(* NOTE: the let () = ... uses the unit and this autmatically does a type check with the return on the RHS which should also be unit*)

For learning, we’re directly compiling a single file.

This needs explicit linking for imported modules, this is done by using ocamlfind together with ocamlopt for the linking (which is asked by -linkpkg)

ocamlfind ocamlopt -linkpkg -package base -package stdio freq.ml -o freq

We shall KIV first until we find a better way to

Multi file Programs and Modules

Files are the unit of association for modules. Should be seen as a collection of definitions stored within a namespace. Modules are always titlecased regardless the casing of the filename where they’re defined.

The buildsystem will figure out the dependencies accordingly.

Signatures and Abstract Types

We should always depend on interfaces instead of direct code implementations, interface segregation is always great.

OCaml uses interface/signature/module type interchangeably. Similar to C, we can have interface files (.mli) where the interfaces are defined for them to be implemented in corresponding .ml files.

The concrete datatypes that a module supports may be considered an implementation detail, therefore we might wish to define some abstract data types. That can be done within the interface files as well.

Interfaces are a good place to include docstrings, which can be naturally picked up by odoc

val declarations

val declarations form the syntax for specifying values in a signature: val <identifier> : <type>

Abstract Types in Signatures

This part is about declaring interfaces (signatures) in a way that the concreteness of that implementation is abstract to the module consumers.

An entry like of an abstract type t within the interface file (.mli) means: “This module defines a type t, but I am not telling you how it is implemented.”

1
2
3
4
5
(* interface .mli *)
type t

(* implementation file needs to provide the actual implementation.  *)
type t = int Map.M(String).t

The compiler uses both files:

  1. for abstraction boundary, determining what’s visible to clients: use the .mli
  2. for how to build and use it internally, .ml

When compiled:

  1. The .mli type makes the type abstract to callers.

    Describes what a module exposes — the public API. Abstract type definitions hide representation details.

  2. The .ml type becomes a manifest implementation, checked to be consistent with the .mli.

    Provides the actual definitions and values — how the module’s promises are fulfilled.

  • Disambiguating “abstract” vs “polymorphic”

    MISCONCEPTION:

    This was somewhat a misconception of mine. They are similar but not the same.

    Read this section.

Concrete Types in Signatures

We may wish to define concrete types (typically variant types) in our interfaces. Concreteness here means that the clients of that module have visibility to the structure of that type.

Whether a given type should be abstract or concrete is important and depends on context:

  • Abstract types give you more control over how values are created and accessed, and make it easier to enforce invariants beyond what is enforced by the type itself

  • concrete types let you expose more detail and structure to client code in a lightweight way

NOTE: types and values have distinct namespaces, so we may have the same name seen in both type and function definitions. Here’s an example:

In OCaml, if you define an abstract type in the interface (.mli), you must also concretise it in the implementation (.ml).

If you define a concrete type in the interface (.mli), you must also define the same concrete type in the implementation (.ml). It may look redundant.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(* INTERFACE: *)
(** Represents the median computed from a set of strings. In the case
    where there is an even number of choices, the one before and after
    the median is returned. *)
(* this is a concrete type, it's visible to clients of this module. *)
type median = Median of string | Before_and_after of string * string

val median : t -> median

(* IMPLEMENTATION *)
(* we duplicate the concrete definition of the median type. *)
type median = Median of string | Before_and_after of string * string

let median t =
  let sorted_strings =
    List.sort (Map.to_alist t) ~compare:(fun (_, x) (_, y) ->
        Int.descending x y )
  in
  let len = List.length sorted_strings in
  if len = 0 then failwith "median: empty frequency count" ;
  let nth n = fst (List.nth_exn sorted_strings n) in
  if len % 2 = 1 then Median (nth (len / 2))
  else Before_and_after (nth ((len / 2) - 1), nth (len / 2))

Nested Modules

Files are a unit of association for modules but we may want to have sub-modules within a module to have clearer separation of overlapping but different types. We may nest modules within other modules.

We want the ability to define type identifiers that are distinct but may have similar underlying implementations. I think it’s a bad analogy but it’s similar to subclassing an abstract/virtual class from the OOP world – only in that it provides separation of types with potentially shared underlying representation, but it’s better framed as type abstraction and information hiding, not inheritance.

This is because confusing different kinds of identifiers is a very real source of bugs so instead of using bare types (e.g. strings) we should use concrete types that we define (Username, Hostname).

Example given is that of usernames and hostnames within sessions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
open Base
module Time = Core.Time

(** This ID type is a base abstract type. *)
module type ID = sig
  type t

  val of_string : string -> t
  val to_string : t -> string
  val ( = ) : t -> t -> bool
end

module String_id = struct
  type t = string

  let of_string x = x
  let to_string x = x
  let ( = ) = String.( = )
end

module Username : ID = String_id
module Hostname : ID = String_id

type session_info =
  { user : Username.t
  ; host : Hostname.t
  ; when_started : Time.t
  }

(* this will bug out because user and host are two distinct types and the comparison here is faulty *)
(* let sessions_have_same_user s1 s2 = Username.( = ) s1.user s2.host *)
let sessions_have_same_user s1 s2 = Username.( = ) s1.user s2.user

Opening Modules

Opening gives direct access to the namespace, but may shadow existing names as well.

Some general rules of thumb on this:

  1. open rarely

    Opening a module is basically a trade-off between terseness and explicitness—the more modules you open, the fewer module qualifications you need, and the harder it is to look at an identifier and figure out where it comes from.

    There are some modules that were designed to be opened: like Base itself, or Option.Monad_infix or Float.O within Base.

  2. Use local opens to limit the scope of the opened module

    There are two syntactic approaches to this:

    1. normal – let binding the module namespace

      1
      2
      3
      
            let average x y =
              let open Int64 in
              (x + y) / of_int 2;;
      
    2. lightweight – better for small expressions

      1
      2
      
            let average x y =
              Int64.((x + y) / of_int 2);;
      
  3. Use module shortcuts

    Similar to module aliases like in Elixir. We should do this only to a small, local scope. Doing it at top-level is a mistake.

    1
    2
    3
    4
    5
    6
    
       let print_median m =
         let module C = Counter in
         match m with
         | C.Median string -> printf "True median:\n   %s\n" string
         | C.Before_and_after (before, after) ->
           printf "Before and after median:\n   %s\n   %s\n" before after
    

Including Modules

Opening a module affects the environment used to search for identifiers, including a module is a way of adding new identifiers to a module proper.

The difference between include and open is that we’ve done more than change how identifiers are searched for: we’ve changed what’s in the module. Directly using open won’t work because that chaining up of namespace won’t be done.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
(* consider this Interval module *)
module Interval = struct
  type t = | Interval of int * int
           | Empty

  let create low high =
    if high < low then Empty else Interval (low,high)
end;;

(* we can create a new, extended version of Interval by including its namespace: *)
module Extended_interval = struct
  include Interval

  let contains t x =
    match t with
    | Empty -> false
    | Interval (low,high) -> x >= low && x <= high
end;;

(* this is what the module signature looks like:

 module Extended_interval :
  sig
    type t = Interval.t = Interval of int * int | Empty
    val create : int -> int -> t
    val contains : t -> int -> bool
  end
 *)

include works for both signatures (on interface files) as well as code, so this is one way we can extend the functionality of modules that we consume.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
(******************************)
(*    the implementation:     *)
(* ========================== *)
(******************************)
open Base

(* The full contents of the option module *)
include Option

(* The new function we're going to add *)
let apply f_opt x =
  match f_opt with
  | None -> None
  | Some f -> Some (f x)


(******************)
(* the interface  *)
(* ============== *)
(******************)
open Base

(* Include the interface of the option module from Base *)
include module type of Option

(* Signature of function we're adding *)
val apply : ('a -> 'b) t -> 'a -> 'b t

The implementation is where shadowing happens \(\implies\) the order of declaration matters here (in the ml file). Order doesn’t matter in the interface file (.mli).

  • Common Definitions

    Similar to barrel files, we may have an import.ml for common imports. They may hold things like intentional name overrides e.g. using a custom Ext_option in place of Option when we use the name Option

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    
    (* within import.ml *)
    module Option = Ext_option
    
    (* within our module file *)
    open Base
    open Import (*the common definitions imported*)
    
    let lookup_and_apply map key x = Option.apply (Map.find map key) x
    

Common Module Compilation Errors

Here’s the common sources of compilation errors:

  1. Type Mismatches – the simplest types

    The compiler will complain about this if the interface and implementation files differ in their types.

  2. Type definitions missing

    This is actually more of the implementation is missing: when we defined it in the interface but don’t have a corresponding implementation for it.

    TERMINOLOGY: the interface declaration is “type spec” and the implementation declaration is the “type definition”.

  3. Type Definition Order Mismatches

    For abstract (variant) types that we define, the order matters and should match between the interface and the implementation file. The order of the declaration of variants matters to the OCaml compiler.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    
       (* I1: v1 of interface -- this will match with implementation below:*)
       (** Represents the median computed from a set of strings. In the case
           where there is an even number of choices, the one before and after
           the median is returned. *)
       type median = Median of string | Before_and_after of string * string
    
       val median : t -> median
    
       (* I2: v2 of interface -- this will not match with implementation below:*)
       (** Represents the median computed from a set of strings. In the case
           where there is an even number of choices, the one before and after
           the median is returned. *)
       type median =
         | Before_and_after of string * string
         | Median of string
    
       val median : t -> median
    
       (* Implemenation (within .ml file) *)
       type median = Median of string | Before_and_after of string * string
    
       let median t =
         let sorted_strings =
           List.sort (Map.to_alist t) ~compare:(fun (_, x) (_, y) ->
               Int.descending x y )
         in
         let len = List.length sorted_strings in
         if len = 0 then failwith "median: empty frequency count" ;
         let nth n = fst (List.nth_exn sorted_strings n) in
         if len % 2 = 1 then Median (nth (len / 2))
         else Before_and_after (nth ((len / 2) - 1), nth (len / 2))
    
  4. Cyclic dependencies

    Technically, recursive modules (that have cyclic deps) are possible but for now assume it’s illegal.

    What’s forbidden:

    1. self-referential: module referencing its own name: e.g. let singleton l = Counter.touch Counter.empty within counter.ml

    2. transitive references

      the compiler will tell you where the cyclic dependency is from its trace.

Designing with Modules

Modules are essential to OCaml programs, some design tips:

Rarely expose concrete types, stick to abstract types

Most of the time, abstraction is the right choice, for two reasons:

  1. it enhances the flexibility of your design, and

    benefit: we’re free to change implementation with minimal blast radius if interface consumers depend on abstract types

  2. it makes it possible to enforce invariants on the use of your module

    problem: If your types are exposed, then users of the module can create new instances of that type (or if mutable, modify existing instances) in any way allowed by the underlying type. That may violate a desired invariant i.e., a property about your type that is always supposed to be true.

When will concrete types make sense?

when there’s a lot of value in pattern matching for the concrete types and when the invariants that you care about are already enforced by the data type itself.

Design for the Call Site

Beyond ease of understanding, you want the call to be as obvious as possible for someone who is reading it at the call site. This reduces the need to jump to the interface declarations to get more context.

Some ways of improving the readability of client code:

  1. use labelled args

  2. good names for functions, variant tags, record fields

    naming RULE OF THUMBs:

    • A good rule of thumb is that names that have a small scope should be short, whereas names that have a large scope, like the name of a function in a module interface, should be longer and more descriptive.

    • Another useful rule of thumb is that more rarely used names should be longer and more explicit, since the cost of verbosity goes down and the benefit of explicitness goes up the less often a name is used.

  3. uniform interfaces

    Make the different interfaces in your codebase follow similar patterns to have some level of predictability.

    To borrow guidelines from the common modules:

    1. A module for (almost) every type. You should mint a module for almost every type in your program, and the primary type of a given module should be called t.

    2. Put t first. If you have a module M whose primary type is M.t, the functions in M that take a value of type M.t should take it as their first argument.

    3. Mark the exception throwable functions with _exn

      Functions that routinely throw an exception should end in _exn. Otherwise, errors should be signaled by returning an option or an Or_error.t KIV chapter 7 on error handling.

    4. Some type signatures for specific functions should be uniform

      signature for map is always essentially the same, no matter what the underlying type it is applied to. This kind of function-by-function API uniformity is achieved through the use of signature includes, which allow for different modules to share components of their interface.

  4. Design interfaces before writing implemenation

    This is just classic words of wisdom.

    Types and signatures provide a lightweight tool for constructing a skeleton of your design in a way that helps clarify your goals and intent, before you spend a lot of time and effort fleshing it out.

Chapter 5: Records

Basics

  • Syntax pointers:
    • the keynames within records must start with a lowercase letter.

      here’s a basic example

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      
          open Core
          (* record defined here -- corresponds to /etc/services file on unix *)
          type service_info =
            { service_name : string;
              port         : int;
              protocol     : string;
            };;
      
          # require "re";;
          let service_info_of_string line =
            let matches =
              let pat = "([a-zA-Z]+)[ \t]+([0-9]+)/([a-zA-Z]+)" in
              Re.exec (Re.Posix.compile_pat pat) line
            in
            { service_name = Re.Group.get matches 1;
              port = Int.of_string (Re.Group.get matches 2);
              protocol = Re.Group.get matches 3;
            }
      
          let ssh = service_info_of_string "ssh 22/udp # SSH Remote Login Protocol";;
      
    • accessing fields from a record:

      • field value accessing can be done with dot notation
      • we may pattern match as well
  • how the compiler infers the record types:
    • when no ambiguity: the compiler bases its inference on the field names used in constructing the record. That inference is most straightforward when each field name belongs to only one record type.
  • fields within the record can have an abstract type \(\implies\) the record is polymorphic
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    
      open Core
      (* record defined here -- corresponds to /etc/services file on unix *)
      type service_info =
        { service_name : string;
          port         : int;
          protocol     : string;
        };;
    
      # require "re";;
      let service_info_of_string line =
        let matches =
          let pat = "([a-zA-Z]+)[ \t]+([0-9]+)/([a-zA-Z]+)" in
          Re.exec (Re.Posix.compile_pat pat) line
        in
        { service_name = Re.Group.get matches 1;
          port = Int.of_string (Re.Group.get matches 2);
          protocol = Re.Group.get matches 3;
        };;
    
    
      (* polymorphic record, with abstract type: *)
      type 'a with_line_num = { item: 'a; line_num: int }
    
      let parse_lines parse file_contents =
        let lines = String.split ~on:'\n' file_contents in
        List.mapi lines ~f:(fun line_num line ->
            { item = parse line;
              line_num = line_num + 1;
          })
      ;;
    
      let service_file_content =  "rtmp              1/ddp     # Routing Table Maintenance Protocol
                                   tcpmux            1/udp     # TCP Port Service Multiplexer
                                   tcpmux            1/tcp     # TCP Port Service Multiplexer";;
    
      (* a' may be the service_info record *)
      parse_lines service_info_of_string
              service_file_content;;
    
      (* a' may be an integer *)
      parse_lines Int.of_string "1\n10\n100\n1000";;
    

Pattern Matching & Exhaustiveness

Record patterns are irrefutable, meaning that a record pattern match will never fail at runtime (since the fields are fixed). Therefore, we can pattern-match using a single pattern. In general, patterns for types with a fixed structure, like records and tuples, are irrefutable, unlike types with variable structures like lists and variants.

Also we can do a partial pattern match, though that may cause a drift in usage expectation if the record changes silently – this means that when new fields are added to the record, code that should be updated to react to the presence of those new fields will not be flagged by the compiler.

There’s optional warnings (see ocaml -warn-help) with which we can check for some of these sources of error. RULE OF THUMB: treat warnings as errors in dev but allow them for prod/packaged for distribution.

Use _ to signal fields that we don’t care about.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
(* suppose the record type is this:  *)
type service_info =
  { service_name : string;
    port         : int;
    protocol     : string;
    comment      : string option;
  }

(* if warnings for +9 is turned on  *)
let service_info_to_string
  { service_name = name; port = port; protocol = prot  }
  =
  sprintf "%s %i/%s" name port prot
;;
(* Then we can expect this warning:
   Line 2, characters 5-59:
    Warning 9 [missing-record-field-pattern]: the following labels are not bound in this record pattern:
    comment
    Either bind these labels explicitly or add '; _' to the pattern.
    val service_info_to_string : service_info -> string = <fun>
 *)

(* better way to handle fields we don't care about:  *)
let service_info_to_string
  { service_name = name; port = port; protocol = prot; _ }
  =
  sprintf "%s %i/%s" name port prot
;;

Field Punning (and label punning)

Punning gives us expressive syntax while allowing us to be terse.

This style of punning allows us to propagate the same names throughout the codebase so names are consistent, making it easier to navigate the source.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
(* PUNNING @ CREATION *)
let service_info_of_string line =
  (* first, split off any comment *)
  let (line,comment) =
    match String.rsplit2 line ~on:'#' with
    | None -> (line,None)
    | Some (ordinary,comment) -> (ordinary, Some comment)
  in
  (* now, use a regular expression to break up the
     service definition *)
  let matches =
    Re.exec
      (Re.Posix.compile_pat
         "([a-zA-Z]+)[ \t]+([0-9]+)/([a-zA-Z]+)")
      line
  in
  let service_name = Re.Group.get matches 1 in
  let port = Int.of_string (Re.Group.get matches 2) in
  let protocol = Re.Group.get matches 3 in
  { service_name; port; protocol; comment };; (*punned*)

(* PUNNING @ REFERENCE: Field and label punned version: *)
let create_service_info ~service_name ~port ~protocol ~comment =
  { service_name; port; protocol; comment };;

(* longer version, without punning: *)
let create_service_info
      ~service_name:service_name ~port:port
      ~protocol:protocol ~comment:comment =
  { service_name = service_name;
    port = port;
    protocol = protocol;
    comment = comment;
  };;

Reusing Field Names

When reusing field names, the compiler(typechecker) may exhibit implicit behaviour that may be non-obvious at first glance. This is why there are primitives for us to explicitly do type annotations.

Example of non-obvious compilation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
(* consider the following record types: *)
type log_entry =
  { session_id: string;
    time: Time_ns.t;
    important: bool;
    message: string;
  }
type heartbeat =
  { session_id: string;
    time: Time_ns.t;
    status_message: string;
  }
type logon =
  { session_id: string;
    time: Time_ns.t;
    user: string;
    credentials: string;
  }

(* B1: for common field names, typechecker picks the most recent definition*)
let get_session_id t = t.session_id;;
(* val get_session_id : logon -> string = <fun>  <=== this is from logon (most recent) *)

(* B2: the order of names that the typechecker has to do a lookup for affects whether it sees an ambiguity *)
let status_and_session t = (t.status_message, t.session_id);;
(* this resolves to heartbeat without ambiguity because it needs to resolve status_message and that field name is unique to heartbeat so our type is resolved without issue.

  val status_and_session : heartbeat -> string * string = <fun>
 *)
let session_and_status t = (t.session_id, t.status_message);;
(* this can't be resolved because it needs to resolve session_id and that field name is NOT unique to a particular record type.

Line 1, characters 45-59:
Error: This expression has type logon
       There is no field status_message within type logon
    *)

(* B3: type annotations remove ambiguity  *)
let session_and_status (t:heartbeat) = (t.session_id, t.status_message);;

Idiom: pack types into modules

Each record type can be given a namespace within which related values can be placed. Conventionally, the record type is just set to t.

The idea here is to qualify a record by the module it comes from.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(* Follows the "pack similar record types into modules" idiom *)
module Log_entry = struct
  type t =
    { session_id: string;
      time: Time_ns.t;
      important: bool;
      message: string;
    }
end
module Heartbeat = struct
  type t =
    { session_id: string;
      time: Time_ns.t;
      status_message: string;
    }
end
module Logon = struct
  type t =
    { session_id: string;
      time: Time_ns.t;
      user: string;
      credentials: string;
    }
end

(* creation function for Log_entry *)
let create_log_entry ~session_id ~important message =
  { Log_entry.time = Time_ns.now ();
    Log_entry.session_id;
    Log_entry.important;
    Log_entry.message
  };;

It’s possible for us to be more terse. The mental model here extends onto that case of ambiguity that we saw earlier (B2) in that the typechecker/compiler only needs to be shown how to resolve ambiguity once.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
(* Reference 1: without the conciseness: *)
let create_log_entry ~session_id ~important message =
  { Log_entry.time = Time_ns.now ();
    Log_entry.session_id;
    Log_entry.important;
    Log_entry.message
  };;

(* Reference 2: adding a type annotation: *)
let create_log_entry ~session_id ~important message : Log_entry.t =
  { time = Time_ns.now (); session_id; important; message };;

(* Reference 3: annotation used for pattern matching *)
let message_to_string { Log_entry.important; message; _ } =
  if important then String.uppercase message else message;;

(* R4: annotation used for dot-access *)
let is_important t = t.Log_entry.important;;
  • for the pattern matching type annotation, the use of . has 2 different ways. Consider let is_important t = t.Log_entry.important;;.

    first dot: is a record field access, with everything to the right of the dot being interpreted as a field name;

    second dot: accessing the contents of a module, referring to the record field important from within the module Log_entry. The fact that Log_entry is capitalized and so can’t be a field name is what disambiguates the two uses.

Type-Directed Constructor Disambiguation

We can avoid qualifying the record field if we allow the compiler to infer the type of record in question.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
(* R3: old *)
let message_to_string { Log_entry.important; message; _ } =
  if important then String.uppercase message else message;;
(* R3: new *)
let message_to_string ({ important; message; _ } : Log_entry.t) =
  if important then String.uppercase message else message;;

(* R4: old *)
let is_important t = t.Log_entry.important;;
(* R4: new *)
let is_important (t:Log_entry.t) = t.important;;

The meaning of “constructor” here is special in OCaml

  • constructor constructs the value of a concrete data type

    • variant constructor \(\implies\) builds tagged union types (sum type cases)

      e.g. Some, None, Ok, Error

    • record constructor \(\implies\) builds structured records (product types)

      e.g { field1 = v1; field2 = v2 }

    • polymorphic variant constructor \(\implies\) builds polymorphic variant cases

      e.g. `Foo, `Bar

  • so the disambiguation here relates to which data-type constructor is being referred to when multiple modules or types define constructors with the same names.

Functional updates

There’s a functional update syntax that is terse and allows us to build records with specific field changes from another record. Functional updates make your code independent of the identity of the fields in the record that are not changing. It is a good choice for cases where it’s not important that you consider every field of the record when making a change

with keyword used here, the value-assignments on RHS are the diffs to be applied to the record on the LHS

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(* functional update*)
let register_heartbeat t hb =
  { t with last_heartbeat_time = hb.Heartbeat.time };;

(* original *)
let register_heartbeat t hb =
  { addr = t.addr;
    port = t.port;
    user = t.user;
    credentials = t.credentials;
    last_heartbeat_time = hb.Heartbeat.time;
};;

Mutable Fields in Records

This helps the immutable programming primitives that OCaml uses.

KIV chapter 8 for more on Imperative programming.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15

(* mutable record fields *)
type client_info =
  { addr: Core_unix.Inet_addr.t;
    port: int;
    user: string;
    credentials: string;
    mutable last_heartbeat_time: Time_ns.t;
    mutable last_heartbeat_status: string;
  }

(* mutation of the mutable fields when we register heartbeat. *)
let register_heartbeat t (hb:Heartbeat.t) =
  t.last_heartbeat_time   <- hb.time;
  t.last_heartbeat_status <- hb.status_message;;

First-class fields

Typically, we’d access fields using dot-operators and in that way they don’t behave as first-class objects – we can’t reer to the field itself as a value or pass it around.

Library-provided functionality can make fields first class. Seems like this behaviour is Jane Street-specific (within Base), where there are some patterns to generate accessor functions. It seems like the annotation is a meta annotation that uses fieldslib to generate some accessor functions (specific to the fields) and a Fields submodule to the existing module which gives us a bunch of helper functions to work with record fields.

This seems to be the first introduction of meta-programming directives [@@deriving fields]

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#require "ppx_jane";;

module Logon = struct
  type t =
    { session_id: string;
      time: Time_ns.t;
      user: string;
      credentials: string;
    }
  [@@deriving fields]
end;;

(* this gives the following generated sub-module
    module Logon :
    sig
        type t = {
        session_id : string;
        time : Time_ns.t;
        user : string;
        credentials : string;
        }
        val credentials : t -> string
        val user : t -> string
        val time : t -> Time_ns.t
        val session_id : t -> string
        module Fields :
        sig
            val names : string list
            val credentials :
            ([< `Read | `Set_and_create ], t, string) Field.t_with_perm
            val user :
            ([< `Read | `Set_and_create ], t, string) Field.t_with_perm
            val time :
            ([< `Read | `Set_and_create ], t, Time_ns.t) Field.t_with_perm
    ...
        end
    end
*)

(* using this *)
let get_users logons =
  List.dedup_and_sort ~compare:String.compare
(List.map logons ~f:Logon.user);;

In the code above, we can also see access-control markers for the fields. Inspecting Field.get;; shows - : ('b, 'r, 'a) Field.t_with_perm -> 'r -> 'a = <fun>

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
let show_field field to_string record =
  let name = Field.name field in
  let field_string = to_string (Field.get field record) in
  name ^ ": " ^ field_string;;

let logon =
 { Logon.
   session_id = "26685";
   time = Time_ns.of_string_with_utc_offset "2017-07-21 10:11:45Z";
   user = "yminsky";
   credentials = "Xy2d9W";
 };;

show_field Logon.Fields.user Fn.id logon;; (*NOTE: we use the =Fn= module here*)

(*********)
(* ====- *)
(*********)

let print_logon logon =
  let print to_string field =
    printf "%s\n" (show_field field to_string logon)
  in
  Logon.Fields.iter
    ~session_id:(print Fn.id)
    ~time:(print Time_ns.to_string)
    ~user:(print Fn.id)
    ~credentials:(print Fn.id);;

print_logon logon;;

Fn module is a collection of useful primitives that deal with other functions (e.g. Identity function is just Fn.id – see the Fun library here)

For example, there’s an iterator that allows us to iterate all first-class fields:

  • nice side effect of this approach is that it helps you adapt your code when the fields of a record change.