Chapter 6: Variants

Variants are sum types of multiple forms of data, all of them are tagged explicitly. The value this gives is that we can represent complex data and organise the case-analysis of that information.

Basics:

  • Each case within the variant has an associated tag and they are constructors because they can construct the concrete value. Optionally, it may have a sequence of fields.

  • The following example uses variants as simple tags with no associated data (optional sequence of fields omitted), effectively as Enums, a simple representation:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
      open Base
      open Stdio
      type basic_color =
        | Black | Red | Green | Yellow | Blue | Magenta | Cyan | White;;
    
      let basic_color_to_int = function
        | Black -> 0 | Red     -> 1 | Green -> 2 | Yellow -> 3
        | Blue  -> 4 | Magenta -> 5 | Cyan  -> 6 | White  -> 7;;
    
      let color_by_number number text =
      Printf.sprintf "\027[38;5;%dm%s\027[0m" number text;;
    
      let blue = color_by_number (basic_color_to_int Blue) "Blue";;
    
      printf "Hello %s World!\n" blue;; (* the output should be "Hello Blue World" where "Blue" is in the colour blue.*)
    
  • For more expressiveness, tags can have arguments which describe the data available in each case. Variants may have multiple arguments:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    
      (* bg context: we're trying to represent colours within the terminal. *)
      type weight = Regular | Bold
      type color =
        | Basic of basic_color * weight (* basic colors, regular and bold *)
        | RGB   of int * int * int      (* 6x6x6 color cube *)
        | Gray  of int                  (* 24 grayscale levels *)
      ;;
    
      (* here's a list of colors we can create: *)
      [RGB (250,70,70); Basic (Green, Regular)];;
    
      (* now we can have a to_int that handles all the cases *)
      let color_to_int = function
        | Basic (basic_color,weight) ->
           let base =
             match weight with
               Bold -> 8
             | Regular -> 0 in
           base + basic_color_to_int basic_color
        | RGB (r,g,b) -> 16 + b + g * 6 + r * 36
        | Gray i -> 232 + i;;
    
      let color_print color s =
      printf "%s\n" (color_by_number (color_to_int color) s);;
    
      color_print (Basic (Red,Bold)) "A bold red!";;
      color_print (Gray 4) "A muted gray...";;
    

Variants, Tuples and Parens

The differences between a multi-argument variant and a variant containing a tuple are mostly about performance. A multi-argument variant is a single allocated block in memory, while a variant containing a tuple requires an extra heap-allocated block for the tuple. KIV memory representation of values in Chapter 23.

The multi-argument variant-case constructor and tuple constructor looks very similar BUT they are not the same. The multi-arg variant constructor expects multiple arguments, if we supply a 3-valued tuple to it, it would be seen as supplying a singular argument.

The parens is what makes the difference and it’s subtle.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
(* demonstrating the difference: *)
RGB(200, 0, 200);; (* these are 3 separate arguments that we are passing! *)
let purple = (200,0,200);;
(* this will error out because the tuple is passed as a single arg: *)


(* V1: tupled constructor: *)
type tupled = Tupled of (int * int);;
let of_tuple x = Tupled x;;
let to_tuple (Tupled x) = x;;

(* V2: untupled constructor *)
type untupled = Untupled of int * int;;
let of_tuple' x = Untupled x;;
let to_tuple' (Untupled x) = x;;

(* V3: similar to V1, because we've destructured it (though the type is differently tagged) *)
let of_tuple'' (x,y) = Untupled (x,y);;
let to_tuple'' (Untupled (x,y)) = (x,y);;

Catch-All Cases & Refactoring

OCaml’s type system helps us make it easier to trace dependencies when refactoring, so we can just trace based on what the compiler says.

Our objective should be to write our code in a way that maximizes the compiler’s chances of helping you find the bugs. To this end, a useful RULE OF THUMB is to avoid catch-all cases in pattern matches.

Problem: catch-alls suppress exhaustiveness checks: catch-alls interfere with the compiler’s ability to do exhaustion checks

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(* suppose we wronte a function like so: *)
let oldschool_color_to_int = function
  | Basic (basic_color,weight) ->
    let base = match weight with Bold -> 8 | Regular -> 0 in
    base + basic_color_to_int basic_color
  | _ -> basic_color_to_int White;; (*uses catch-all case*)

(* suppose we change our color variant type *)
type color =
  | Basic of basic_color     (* basic colors *)
  | Bold  of basic_color     (* bold basic colors *) (*<-- this part is new*)
  | RGB   of int * int * int (* 6x6x6 color cube *)
  | Gray  of int             (* 24 grayscale levels *)

(* Then, because of the catch-all usage, we won't know about the missing case for Bold *)

Combining Records and Variants

Algebraic Data Types are more of a category:

  • product types – mathematically similar to Cartesian Products

    • tuples
    • records: can be thought of as conjunctions (ANDs of multiple types within the record)
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      
          module Time_ns = Core.Time_ns
          module Log_entry = struct
            type t =
              { session_id: string;
                time: Time_ns.t;
                important: bool;
                message: string;
              }
          end
          module Heartbeat = struct
            type t =
              { session_id: string;
                time: Time_ns.t;
                status_message: string;
              }
          end
          module Logon = struct
            type t =
              { session_id: string;
                time: Time_ns.t;
                user: string;
                credentials: string;
              }
          end
          ;;
      
  • sum types – similar to disjoint unions

    • variants: it’s a OR of multiple types
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      
          type client_message = | Logon of Logon.t
                                | Heartbeat of Heartbeat.t
                                | Log_entry of Log_entry.t
      
          (* consider this long-winded code (the session id extraction has repeated lines) *)
          let messages_for_user user messages =
            let (user_messages,_) =
              List.fold messages ~init:([], Set.empty (module String))
                ~f:(fun ((messages,user_sessions) as acc) message ->
                  match message with
                  | Logon m ->
                    if String.(m.user = user) then
                      (message::messages, Set.add user_sessions m.session_id)
                    else acc
                  | Heartbeat _ | Log_entry _ ->
                    let session_id = match message with
                      | Logon     m -> m.session_id
                      | Heartbeat m -> m.session_id
                      | Log_entry m -> m.session_id
                    in
                    if Set.mem user_sessions session_id then
                      (message::messages,user_sessions)
                    else acc
                )
            in
            List.rev user_messages;;
          (* we can improve this by defining the types better *)
      

So, the power comes from combining both these types – layered combinations. Here’s how we can refactor the message types:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
(* the distinct message types should hold fields that are unique to them *)
module Log_entry = struct
  type t = { important: bool;
             message: string;
           }
end
module Heartbeat = struct
  type t = { status_message: string; }
end
module Logon = struct
  type t = { user: string;
             credentials: string;
           }
end

(* we can store a record that contains the fields that are common across all messages *)
module Common = struct
  type t = { session_id: string;
             time: Time_ns.t;
           }
end

(* which allows us to define a common variant type for details *)
type details =
  | Logon of Logon.t
  | Heartbeat of Heartbeat.t
  | Log_entry of Log_entry.t

(* Now, the same function is simpler because we don't need to handle the specific ways of extracting the session id: *)
let messages_for_user user (messages : (Common.t * details) list) =
  let (user_messages,_) =
    List.fold messages ~init:([],Set.empty (module String))
      ~f:(fun ((messages,user_sessions) as acc) ((common,details) as message) ->
        match details with
        | Logon m ->
          if String.(=) m.user user then
            (message::messages, Set.add user_sessions common.session_id)
          else acc
        | Heartbeat _ | Log_entry _ ->
          if Set.mem user_sessions common.session_id then
            (message::messages, user_sessions)
          else acc
      )
  in
  List.rev user_messages;;

(* this design is also useful for us to match on the message type directly *)
let handle_message server_state ((common:Common.t), details) =
  match details with
  | Log_entry m -> handle_log_entry server_state (common,m)
  | Logon     m -> handle_logon     server_state (common,m)
  | Heartbeat m -> handle_heartbeat server_state (common,m);;

Embedded Records

We could have directly put in the record types within the variant definition. These are inline records, help us be more concise and more efficient than free-standing record types being referenced – they don’t require a separate allocation object for the contents of the variant.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
type details =
  | Logon     of { user: string; credentials: string; }
  | Heartbeat of { status_message: string; }
  | Log_entry of { important: bool; message: string; }

(* inlining didn't change the existing function, we're matching on the tag names within *)
let messages_for_user user (messages : (Common.t * details) list) =
  let (user_messages,_) =
    List.fold messages ~init:([],Set.empty (module String))
      ~f:(fun ((messages,user_sessions) as acc) ((common,details) as message) ->
        match details with
        | Logon m ->
          if String.(=) m.user user then
            (message::messages, Set.add user_sessions common.session_id)
          else acc
        | Heartbeat _ | Log_entry _ ->
          if Set.mem user_sessions common.session_id then
            (message::messages, user_sessions)
          else acc
      )
  in
  List.rev user_messages;;

(* We won't be able to access the tag directly because they are no longer free-standing record types *)

let get_logon_contents = function
  | Logon m -> Some m
  | _ -> None;;

Variants & Recursive Data Structures

Variants commonly used for recursive datastructures!

Following example helps illustrate how to define a boolean expression language, it’s similar in design to the Blang (boolean language).

1
2
3
4
5
6
type 'a expr =
  | Base  of 'a (* this is the base predicate type; we use the polymorphic type 'a*)
  | Const of bool
  | And   of 'a expr list
  | Or    of 'a expr list
  | Not   of 'a expr

The tags here are easy to understand, the Base tag is the special one. It describes what can be tested, so it’s the base predicate. That’s why it’s described as the predicate that “ties the expr to your application”.

Let’s use the context of a mail processor, where we might wish to DEFINE filter expressions and then EVALUATE them. We might also wish to SIMPLIFY boolean expressions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(* consider mail processor use-case *)
type mail_field = To | From | CC | Date | Subject
type mail_predicate = { field: mail_field;
                        contains: string } (* this is what we want our base predicate to be*)

(* DEFINE mail_predicate expr: *)
let test field contains = Base { field; contains };; (* here, mail_predicate is the base predicate, so test, see the type of this value of test:*)
(* val test : mail_field -> string -> mail_predicate expr = <fun>*)

And [
        Or [ test To "doligez"; test CC "doligez" ];
        test Subject "runtime";
    ];;

(* EVALUATE mail_predicate expr:
 , this example just needs to be supplied a base_eval function, the rest of it is straightforward.
 *)
let rec eval expr base_eval =
  (* a shortcut, so we don't need to repeatedly pass [base_eval]
     explicitly to [eval]
     - this is a local helper, it aliases partially applied eval to the same base_eval
     *)
  let eval' expr = eval expr base_eval in
  match expr with
  | Base  base  -> base_eval base
  | Const bool  -> bool
  | And   exprs -> List.for_all exprs ~f:eval'
  | Or    exprs -> List.exists  exprs ~f:eval'
  | Not   expr  -> not (eval' expr);;

Carrying onto SIMPLIFICATION routine: We need to define how the different expressions may be simplified, so we need to rely on some of simplification functions that can be used for our simplification routine.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
let and_ l =
  if List.exists l ~f:(function Const false -> true | _ -> false)
  then Const false (* reduces to Const false if any of the arms of And are false*)
  else
    match List.filter l ~f:(function Const true -> false | _ -> true) with
    | [] -> Const true (* Drops all the constant true*)
    | [ x ] -> x (* Drop the And if it only has one arm*)
    | l -> And l (* No arms ==> reduce to Const true*)

let or_ l =
  if List.exists l ~f:(function Const true -> true | _ -> false) then Const true
  else
    match List.filter l ~f:(function Const false -> false | _ -> true) with
    | [] -> Const false
    | [x] -> x
    | l -> Or l

(* for the not_, we shouldn't use catch-alls*)
let not_ = function
  | Const b -> Const (not b)
  | Not e -> e (* this is the double negation case, which simplies it*)
  | (Base _ | And _ | Or _) as e -> Not e;;
  (* | e -> Not e (\* Don't use catch-alls*\) *)

let rec simplify = function
  | Base _ | Const _ as x -> x
  | And l -> and_ (List.map ~f:simplify l)
  | Or l  -> or_  (List.map ~f:simplify l)
  | Not e -> not_ (simplify e);;

How we might use it:

1
2
simplify (Not (And [ Or [Base "it's snowing"; Const true];
Base "it's raining"]));;

Polymorphic Variants

Polymorphic variants are more flexible and syntactically more lightweight than ordinary variants but they have a distinct cost to using them.

Interestingly, in OOP subtyping terminology we see this variance showing parallels to be like so:

NotationMeaningSubtyping DirectionAnalogy
[> ...]“These tags or more”Supertype (covariant)“open upwards”
[< ...]“These tags or less”Subtype (contravariant)“open downwards”
(no marker)“exactly these tags”Invariant“closed fixed”

KIV this until we come to the “Objects” part. I think it’s safe to treat this parallel to subtyping as just a mental model for now.

Here’s a covariant example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
let three = `Int 3;; (*`Int is the tag, it's type will get inferred*)
let four = `Float 4;;
let nan = `Not_a_number;;

(* we can make create a list of this, which now represents a set of tags *)
[three; four; nan];;
(* the type for this is a covariant:
  - : [> `Float of float | `Int of int | `Not_a_number ] list = *)

(* we can still create a new type with the same tag *)
(* let five = `Int "five";; *)
(* but we may not use it with clashing interpretations of their type inference *)
(* [three;four;five] (*this fails*)  *)
(* let foo = [four; nan; five];; (\*this won't fail*\) *)

Here’s a contravariant example:

1
2
3
4
5
6
7
let is_positive = function
  | `Int   x -> x > 0
  | `Float x -> Float.(x > 0.);;
(* this is a contravariant collection: so "these tags or less"
   this checks out because is_positive doesn't know how to deal with any other cases -- it can handle types with one or more of the cases identified here.

  val is_positive : [< `Float of float | `Int of int ] -> bool = <fun> *)

Here’s a invariant example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
(* Consistent types for polymorphic variant tags *)
let three = `Int 3;;
let four = `Float 4.;;
let nan = `Not_a_number;;

(* This list combines tags with consistent payloads *)
let valid_list = [three; four; nan];;

(* This will fail to compile due to incompatible payloads *)
(* let invalid_list = [three; `Int "five"];; *)

(* Proper function to handle known tags only *)
let is_positive = function
  | `Int x -> x > 0
  | `Float x -> x > 0.
  | `Not_a_number -> false
;;

(* Usage with explicit type annotation for disambiguation *)
let exact = List.filter ~f:is_positive [three; four];;
(* this is the resultant type:
   val exact : [ `Float of float | `Int of int ] list = [`Int 3; `Float 4.] *)

Interestingly we can do both upper and lower bounding of the variant types:

1
2
3
4
5
6
7
8
let is_positive = function
  | `Int   x -> Ok (x > 0)
  | `Float x -> Ok Float.O.(x > 0.)
  | `Not_a_number -> Error "not a number";;

(* gives the following resultant val:
   val is_positive :
   [< `Float of float | `Int of int | `Not_a_number ] -> (bool, string) result = <fun> *)

Using polymorphic variants can get confusing because of the type that is actually inferred.

Polymorphic variants and Catch-all Cases

In ordinary variants, catch-all cases are already error-prone (because of the loss of exhaustiveness-checks) – in polymorphic types, catch-alls are even worse.

So _ catch all will lowerbound the type. This becomes a problem if we have typos:

1
2
3
4
5
6
7
let is_positive_permissive = function
  | `Int   x -> Ok Int.(x > 0)
  | `Float x -> Ok Float.(x > 0.)
  | _ -> Error "Unknown number type";;


is_positive_permissive (`Ratio (3,4));;

with the typo (Float -> Floot), we realise that it will fall to that catch-all instead of the tag being caught (as an Error) as an unknown tag (as would have been the case of ordinary variants).

1
is_positive_permissive (`Floot 3.5);;

RULE OF THUMB: As a general matter, one should be wary about mixing catch-all cases and polymorphic variants.

Use case of using polymorphic variants: Terminal Colors Redux

There’s still practical value to using polymorphic types. Suppose we want to have an alpha-channel extension to colors, if we define a new variant type with a new arm for RGBA, it’s not good enough because the compiler would see the two variants (old and extended) as two different, incompatible variant types.

What we need is to share tags between two different variant types and this is what polymorphic variants are useful for.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
let basic_color_to_int = function
  | `Black -> 0 | `Red     -> 1 | `Green -> 2 | `Yellow -> 3
  | `Blue  -> 4 | `Magenta -> 5 | `Cyan  -> 6 | `White  -> 7;;

let color_to_int = function
  | `Basic (basic_color,weight) ->
    let base = match weight with `Bold -> 8 | `Regular -> 0 in
    base + basic_color_to_int basic_color
  | `RGB (r,g,b) -> 16 + b + g * 6 + r * 36
  | `Gray i -> 232 + i;;

let extended_color_to_int = function
  | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
  | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color;;

NOTE: extend_color_to_int needs to invoke color_to_int with a narrower type, so it has to be contravariant. The use of explicit or-cases within the pattern match helps here. To be elaborate, the narrowing happens because `RGBA never will be passed to color_to_int. However, if we were to do a catch-all branch instead, then this narrowing doesn’t happen and so compilation would fail.

Packaging the code up

The interface:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
open Base

type basic_color =
  [ `Black   | `Blue | `Cyan  | `Green
  | `Magenta | `Red  | `White | `Yellow ]

type color =
  [ `Basic of basic_color * [ `Bold | `Regular ]
  | `Gray of int
  | `RGB  of int * int * int ]

type extended_color =
  [ color
  | `RGBA of int * int * int * int ]

val color_to_int          : color -> int
val extended_color_to_int : extended_color -> int

the implementation :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
open Base

type basic_color =
  [ `Black   | `Blue | `Cyan  | `Green
  | `Magenta | `Red  | `White | `Yellow ]

type color =
  [ `Basic of basic_color * [ `Bold | `Regular ]
  | `Gray of int
  | `RGB  of int * int * int ]

type extended_color =
  [ color
  | `RGBA of int * int * int * int ]

let basic_color_to_int = function
  | `Black -> 0 | `Red     -> 1 | `Green -> 2 | `Yellow -> 3
  | `Blue  -> 4 | `Magenta -> 5 | `Cyan  -> 6 | `White  -> 7

let color_to_int = function
  | `Basic (basic_color,weight) ->
    let base = match weight with `Bold -> 8 | `Regular -> 0 in
    base + basic_color_to_int basic_color
  | `RGB (r,g,b) -> 16 + b + g * 6 + r * 36
  | `Gray i -> 232 + i

let extended_color_to_int = function
  | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
  | `Grey x -> 2000 + x (* <<<<<<<< didactic error <<<<*)
  | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color

Some notes about this:

  1. the .ml file has exact variants only, this is what allows the pattern matching to be possible

  2. Consider the case where we make a typo for some reason and add in a new polymorphic type as case as per the error seen above

    • why won’t it error out?

      All that happened was that the compiler inferred a wider type for extended_color_to_int, which happens to be compatible with the narrower type that was listed in the mli. As a result, this library builds without error.

    • how can we prevent this?

      Adding an explicit type annotation to the code itself will help because the compiler will know enough.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
              (* this will NOT error out: *)
              let extended_color_to_int = function
                | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
                | `Grey x -> 2000 + x
                | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color
      
              (* this will error out [which is what we want]: *)
              let extended_color_to_int : extended_color -> int = function
                | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
                | `Grey x -> 2000 + x
                | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color
      
  3. given that we have the typedefs, we can use the name of the type explicitly for the type narrowing by prefixing with #

    It is useful when you want to narrow down to a type whose definition is long, and you don’t want the verbosity of writing the tags down explicitly in the match.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
       (* terse version *)
       let extended_color_to_int : extended_color -> int = function
         | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
         | #color as color -> color_to_int color
    
       (* this works too *)
       let extended_color_to_int : extended_color -> int = function
         | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
         | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color
    

When to Use Polymorphic Variants

  • RULE OF THUMB: in general, regular variants are more pragmatic
  • Safe use case:
    • Probably the safest and most common use case for polymorphic variants is where ordinary variants would be sufficient but are syntactically too heavyweight. For example, you often want to create a variant type for encoding the inputs or outputs to a function, where it’s not worth declaring a separate type for it. Polymorphic variants are very useful here, and as long as there are type annotations that constrain these to have explicit, exact types, this tends to work well.
  • Costs of using polymorphic variants:
    1. complexity

      confusing type inference may make it harder to read the code / debug this. Error messages may get really long as well.

      Indeed, concision at the value level is often balanced out by more verbosity at the type level.

    2. error-finding

      type-safe but typing discipline not too good.

    3. efficiency

      it’s a little less efficient

      polymorphic variants are somewhat heavier than regular variants, and OCaml can’t generate code for matching on polymorphic variants that is quite as efficient as what it generated for regular variants.

Chapter 7: Error Handling

OCaml’s support for handling errors minimises the pain of doing it. This chapter is about designing interfaces that make error handling easier.

Error-Aware Return Types

Error-handling functions are useful to use because they let us express error handling explicitly and concisely.

We can get some of these functions from the Option module and a bunch of others from Result and Or_error modules. The important part is to be sensitive to some of the idioms that exist because of some common patterns that show up.

  • Approach 1: Our called function may return errors explicitly

    Including errors in the return values of your functions requires the caller to handle the error explicitly, allowing the caller to make the choice of whether to recover from the error or propagate it onward.

    Wrapping in options is context dependent, not always clear what should be an Error vs valid outcome. Therefore, a general purpose library may not know this in advance.

    Typical examples include wrapping the answer in optionals and such.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
      let compute_bounds ~compare list =
        let sorted = List.sort ~compare list in
        match List.hd sorted, List.last sorted with
        | None,_ | _, None -> None (* "error case" propagates the None out*)
        | Some x, Some y -> Some (x, y)
    
      let find_mismatches table1 table2 =
        Hashtbl.fold table1 ~init:[] ~f:(fun ~key ~data mismatches ->
          match Hashtbl.find table2 key with
          | Some data' when data' <> data -> key :: mismatches
          | _ -> mismatches (* there's no "error" propagation and that's correct because of this case's semantic meaning*)
        );;
    
  • Approach 2: encoding errors with result

Encoding Errors with Result

Wrapping outcomes within Options is non-specific and the nature of the Error is not conveyed by this. Result.t is for this type of information (can be seen as an augmented option).

Ok and Error are used and they’re available @ top-level without needing to open the Results module.

Error and Or_error

When using Result.t, we are care about success cases and error cases. As for the types, we should standardise on an error type for consistency sake. Some reasons to do so:

  • easier to write utility functions to automate common error-handling patterns

    this point resonates with the choice of doing Error subclassing for specificity from the OOP world.

Or_error.t is just Result.t with the error case specialized to Error.t type. The Or_error module is useful for such common error-handling patterns:

  • Or_error.try_with: catch the exception yourself
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
      let float_of_string s =
        Or_error.try_with (fun () -> Float.of_string s);;
      (* which gives the val as:
         val float_of_string : string -> float Or_error.t = <fun> *)
    
      float_of_string "3.34";; (*returns: Base__.Result.Ok 3.34
      *)
    
      float_of_string "a.bc";; (*error case, returns Base__.Result.Error (Invalid_argument "Float.of_string a.bc")*)
    
  • idiom: using s-expressions to create Error

    This point is about how we can represent / create Error from sexps. A common idiom is to use %message syntax-extension and pass in further values as s-expressions.

    An s-expression is a balanced parenthetical expression where the leaves of the expressions are strings. They’re (the Sexlib library) good for common serialisation use-cases also. Sexplib comes with a syntax extension that can autogenerate sexp converters for specific types.

    1
    2
    3
    4
    5
    6
    
    #require "ppx_jane";;
    Error.t_of_sexp
      [%sexp ("List is too long",[1;2;3] : string * int list)];;
    
    (* -- generates: --*)
    (* - : Error.t = ("List is too long" (1 2 3)) *)
    

    We can tag errors as well. Example of tagging errors :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    Error.tag
      (Error.of_list [ Error.of_string "Your tires were slashed";
                       Error.of_string "Your windshield was smashed" ])
      ~tag:"over the weekend";;
    
    (* This will give us:
       - : Error.t =
        ("over the weekend" "Your tires were slashed" "Your windshield was smashed")
    *)
    

    There’s a message syntax extension that we can use:

    1
    2
    3
    4
    5
    6
    7
    
    let a = "foo" and b = ("foo",[3;4]);;
    Or_error.error_s
      [%message "Something went wrong" (a:string) (b: string * int list)];;
    
    (* results in this type:
    Base__.Result.Error ("Something went wrong" (a foo) (b (foo (3 4))))
    *)
    

bind and other Error handling Idioms

Just like option-wrapping and results, there are other patterns we start to observe hence these have been codified as the following idioms.

  • Idiom: bind function (safe stage-wise chain of operations)

    bind only applies the function if the param is Some and this can be applied as a function or even as an infix operator. We can get this infix operator from Option.Monad_infix.

    The usefulness of this is that bind can be used as a way of sequencing together error-producing functions so that the first one to produce an error terminates the computation. It is better for large, complex examples with many stages of error handling, the bind idiom becomes clearer and easier to manage.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    
    (* this example is a small-scale example *)
    let compute_bounds ~compare list =
      let open Option.Monad_infix in
      let sorted = List.sort ~compare list in
      List.hd sorted   >>= fun first ->
      List.last sorted >>= fun last  ->
      Some (first,last);;
    (* generated val:
       val compute_bounds : compare:('a -> 'a -> int) -> 'a list -> ('a * 'a) option =
       <fun>
     *)
    
    (* manually writing it out *)
    let compute_bounds ~compare list =
      let sorted = List.sort ~compare list in
      Option.bind (List.hd sorted) ~f:(fun first ->
          Option.bind (List.last sorted) ~f:(fun last ->
              Some (first,last)));;
    (*
      val compute_bounds : compare:('a -> 'a -> int) -> 'a list -> ('a * 'a) option =
      <fun>
     *)
    
    
    (* NOTE: here's the way bind is implemented *)
    let bind option ~f =
      match option with
      | None -> None
      | Some x -> f x;;
    (* val bind : 'a option -> f:('a -> 'b option) -> 'b option = <fun> *)
    

    Perhaps a good analogy for bind is like chaining with JavaScript’s optional chaining (?.) or Promise .then(...), but specialized for computations that may fail (None).

  • Idiom: syntax extension: Monads and Let_syntax

    The monadic binding we saw in bind can look more like a regular let-binding. So the mental model here is just a special form of let-binding that has builtin error-handling semantics.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    
    #require "ppx_let";; (*<--- need to enable the extension*)
    let compute_bounds ~compare list =
      let open Option.Let_syntax in
      let sorted = List.sort ~compare list in
      let%bind first = List.hd sorted in
      let%bind last  = List.last sorted in
      Some (first,last);;
    (*
    
    val compute_bounds : compare:('a -> 'a -> int) -> 'a list -> ('a * 'a) option =
      <fun>
    
      *)
    
  • Idiom: Option.both

    Takes two optional values and produces a new optional pair which is None if either of the optional values are None

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    let compute_bounds ~compare list =
      let sorted = List.sort ~compare list in
      Option.both (List.hd sorted) (List.last sorted);;
    (*
    
    val compute_bounds : compare:('a -> 'a -> int) -> 'a list -> ('a * 'a) option =
      <fun>
    
      *)
    

Exceptions

Exceptions are a way to terminate a computation and report an error, while providing a mechanism to catch and handle (and possibly recover from) exceptions that are triggered by subcomputations – Runtime trapping, reporting of errors and allowing us to catch them and possibly recover gracefully from them.

They are ordinary values, so we can treat them like any other values (and define our own exceptions). They are all of the same exn type.

Exceptions are like variant types but they are special because they’re open (can be defined in multiple places) – new tags (new exceptions) can be added to it by different parts of the program. \(\implies\) we can’t exhaustive match on an exn because the universe of tags is not closed.

In contrast, variants have a closed universe of available tags.

1
2
3
4
5
6
7
let rec find_exn alist key = match alist with
  | [] -> raise (Key_not_found key)
  | (key',data) :: tl -> if String.(=) key key' then data else find_exn tl key;;

(*
    val find_exn : (string * 'a) list -> string -> 'a = <fun>
*)

NOTE: the return type of raise is special, it’s a polymorphic 'a because it fits into whatever its surrounding context is just to do its job. This is what allows raise to be called from anywhere in the program.

It never really returns anyway. Similar behaviour exists in functions like infinite loops.

1
2
3
4
raise;;
(* - : exn -> 'a = <fun> *)
let rec forever () = forever ();;
(* val forever : unit -> 'a = <fun> *)

Declaring Exceptions using [@@deriving sexp]

Plain exception definitions don’t give us useful info, we can declare exception and the types it depends on using this preprocessor annotation which generates sexps for us: [@@deriving sexp]. the representation generated includes the full module path of the module where the exception in question is defined.

PL-DESIGN: theres’s a whole PPX (preprocessor) pipeline to AST evaluation that is done within compilers. This is something that I hadn’t looked into deeply before. Such metaprogramming constructs allows us to extend the language safely. A short description / primer on this here – there should be more generic PL concepts and writeups for this that is language-agnostic (perhaps this book).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
type 'a bounds = { lower: 'a; upper: 'a } [@@deriving sexp]
(*types:
type 'a bounds = { lower : 'a; upper : 'a; }
val bounds_of_sexp : (Sexp.t -> 'a) -> Sexp.t -> 'a bounds = <fun>
val sexp_of_bounds : ('a -> Sexp.t) -> 'a bounds -> Sexp.t = <fun>
 *)

exception Crossed_bounds of int bounds [@@deriving sexp];;
(* types:
   exception Crossed_bounds of int bounds
 *)

Crossed_bounds { lower=10; upper=0 };;
(* types:
   - : exn = (//toplevel//.Crossed_bounds ((lower 10) (upper 0))) *)

Helper functions for throwing exceptions

These are some ergonomic aspects, many of them are within Common and Exn within Base.

Some examples:

  1. using failswith for the exception throwing. Throws Failure
  2. using assert for invariant checks.
    • we can use assert with an arbitrary condition for failure cases.

      assert is useful because it captures line number and char offset from the source, so it’s informative.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      
           let merge_lists xs ys ~f =
             if List.length xs <> List.length ys then None
             else
               let rec loop xs ys =
                 match xs,ys with
                 | [],[] -> []
                 | x::xs, y::ys -> f x y :: loop xs ys
                 | _ -> assert false
               in
               Some (loop xs ys);;
      

Exception Handlers

We wish to handle exceptions that are thrown (and propagated).

The general syntax looks like this:

1
2
3
4
try <expr> with (* expr is main thing to try *)
| <pat1> -> <expr1> (* match cases only on the exception, if any thrown*)
| <pat2> -> <expr2>
...

Cleaning Up in the Presence of Exceptions

As with other languages, we should have a finally syntax for exception handling. This is NOT a built-in primitive and we rely on libraries for this.

The idiom here is similar to context managers (e.g. Python’s with context manager).

Exn.protect from Base is a useful function for this:

  1. [body] thunk f: function for the main body
  2. [cleanup] thunk finally: for the finally logic

This functionality is common enough for file handling or IO handling that there’s also a In_channel.with_file to manage closing of file descriptors.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
let load filename =
  let inc = In_channel.create filename in
  Exn.protect
    ~f:(fun () -> In_channel.input_lines inc |> List.map ~f:parse_line)
    ~finally:(fun () -> In_channel.close inc);;

let load filename =
  In_channel.with_file filename ~f:(fun inc ->
    In_channel.input_lines inc |> List.map ~f:parse_line);;

(* for both the generated val is:
   val load : string -> float list list = <fun>
 *)

Catching Specific Exceptions

Consider the case where we wish to catch a specific error from a specific function call (the error type may be the same as from other places, the intent is to handle one particular error from one particular place).

the type system doesn’t tell you what exceptions a given function might throw. For this reason, it’s usually best to avoid relying on the identity of the exception to determine the nature of a failure. A better approach is to narrow the scope of the exception handler, so that when it fires it’s very clear what part of the code failed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
(* demonstrative example of the problem: what happens if compute_weight fails because of Key_not_found (instead of find_exn)  -- it will just silently get handled by the exception handler and return 0.*)
let lookup_weight ~compute_weight alist key =
  try
    let data = find_exn alist key in
    compute_weight data
  with
  Key_not_found _ -> 0.;;

(* VERBOSE VERSION: here, the scope of our try is only around the find_exn function *)
let lookup_weight ~compute_weight alist key =
  match
    try Some (find_exn alist key)
    with _ -> None
  with
  | None -> 0.
  | Some data -> compute_weight data;;

(* and this is the type
val lookup_weight :
  compute_weight:('a -> float) -> (string * 'a) list -> string -> float =
  <fun>
*)

This behaviour is common enough that there’s a concise version to write this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
let lookup_weight ~compute_weight alist key =
  match find_exn alist key with
  | exception _ -> 0. (* this marks the exception-handling cases from the invocationof find_exn*)
  | data -> compute_weight data;;

(* the longer, uglier version: *)
let lookup_weight ~compute_weight alist key =
  match
    try Some (find_exn alist key)
    with _ -> None
  with
  | None -> 0.
  | Some data -> compute_weight data;;
(* and this is the type
val lookup_weight :
  compute_weight:('a -> float) -> (string * 'a) list -> string -> float =
  <fun>
*)

Backtraces

We desire useful debugging information, backtraces are useful to us. Uncaught exceptions will show the backtrace already. We desire to capture a backtrace with our program and Backtrace.Exn.most_recent helps us do that \(\implies\) useful for error-reporting purposes.

RULE OF THUMB: it’s not a common pattern to use exceptions as part of your flow control and it’s generally better to use raise_notrace. Stack-traces should almost always be left on (responsibility is on code to keep things performant).

Backtraces affect speed.

Usually backtraces are turned off, there’s a bunch of ways that is controlled:

  1. OCAMLRUNPARAM env variable @ runtime

  2. the program needs to be compiled with debugging symbols

    this part is related to the bytecode vs native code compilations (wherein bytecode allows debugging symbols to be preserved and makes it easier to debug).

  3. Backtrace.Exn.set_recording false.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
open Core
open Core_bench

exception Exit

let x = 0

type how_to_end = Ordinary | Raise | Raise_no_backtrace

let computation how_to_end =
  let x = 10 in
  let y = 40 in
  let _z = x + (y * y) in
  match how_to_end with
  | Ordinary -> ()
  | Raise -> raise Exit
  | Raise_no_backtrace -> raise_notrace Exit

let computation_with_handler how = try computation how with Exit -> ()

let () =
  [
    Bench.Test.create ~name:"simple computation" (fun () ->
        computation Ordinary);
    Bench.Test.create ~name:"computation w/handler" (fun () ->
        computation_with_handler Ordinary);
    Bench.Test.create ~name:"end with exn" (fun () ->
        computation_with_handler Raise);
    Bench.Test.create ~name:"end with exn notrace" (fun () ->
        computation_with_handler Raise_no_backtrace);
  ]
  |> Bench.make_command
  |> Command_unix.run

Here’s an example of the benchmark:

1
2
3
4
5
6
Name                    Time/Run   Cycls/Run
 ----------------------- ---------- -----------
  simple computation        1.84ns       3.66c
  computation w/handler     3.13ns       6.23c
  end with exn             27.96ns      55.69c
  end with exn notrace     11.69ns      23.28c

Observations:

  1. setting up an exception handler is cheap, it may be left unused
  2. actually raising an exception is expensive (55 cycles)
  3. if we raise and exception without backtraces, it costs about 25 cycles.

Exceptions to Error-Aware Types & Back Again

Often, we’ll need to move in between using exceptions and using error-aware types and there’s some support to help this process.

  • [ Option ] given code that may throw an exception, we can capture it within an option using try_with

  • [ Result, Or_error ] similar try_with functions

    • we can also re-raise the exception using ok_exn
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
(**********)
(* OPTION *)
(**********)
let find alist key =
Option.try_with (fun () -> find_exn alist key);;
(* val find : (string * 'a) list -> string -> 'a option = <fun> *)

(************)
(* OR_ERROR *)
(************)
let find alist key =
Or_error.try_with (fun () -> find_exn alist key);;
(* val find : (string * 'a) list -> string -> 'a Or_error.t = <fun> *)

(**************)
(* Re-raising *)
(**************)
(* the exception may be re-raised: *)
Or_error.ok_exn (find ["a",1; "b",2] "c");;
(* Exception: Key_not_found("c"). *)

Choosing Error-Handling Strategy

When thinking about exceptions vs error-aware return types, the tradeoff is between concision vs explicitness.

  • Exceptions \(\implies\) better for speed of implementation
    • pro: more concise, can defer the error handling job to a larger scope and don’t clutter up types
    • con: easy to ignore
  • Error-Aware types \(\implies\) better for stability
    • pro: fully manifest in type definitions, so errors generated are explicit and impossible to ignore
    • for errors that are a foreseeable and ordinary part of the execution of your production code and that are not omnipresent, error-aware return types are typically the right solution.

use exceptions for exceptional conditions

RULE OF THUMB: The maxim of “use exceptions for exceptional conditions” applies. If an error occurs sufficiently rarely, then throwing an exception is often the right behavior.

Omnipresent errors (OOM)

it’s overkill to use error-aware return types for this, will be too tedious. Needing to mark everything will also make it less explicit as to what the problem actually was.

Chapter 8: Imperative Programming

The book makes the distinction between functional and imperative by emphasising that imperative programming and its modification of a program’s internal state is the action of interacting with changeable parts of the world.

Additionally, there’s also an argument that some algorithms can be implemented in imperative fashion hence a performance benefit to imperative code; examples:

  1. in-place sorting algos
  2. graph algos that rely on mutable data structures
  3. dynamic programming algos
  4. numeric linear algebra / Tamil
  5. real-time, embedded system algorithms
  6. low-level system programming

This chapter deals with OCaml’s support for the imperative programming paradigm.

Toy Example: imperative dictionaries

This is just a toy example, not for actual use.

An open hashing scheme, where the hash table will be an array of buckets, each bucket containing a list of key/value pairs that have been hashed into that bucket.

Interface:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
open Base

(* dictionary with keys of type 'a and data of tyep 'b' *)
type ('a, 'b) t

val create
  :  hash:('a -> int)
  -> equal:('a -> 'a -> bool)
  -> ('a, 'b) t

val length : ('a, 'b) t -> int
val add : ('a, 'b) t -> key:'a -> data:'b -> unit (* the imperative functions typically return unit() as a way of signalling side-effects *)
val find : ('a, 'b) t -> 'a -> 'b option
val iter : ('a, 'b) t -> f:(key:'a -> data:'b -> unit) -> unit
val remove : ('a, 'b) t -> 'a -> unit

Implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
open Base

(* 1: define the dict as a record *)
type ('a, 'b) t =
  { mutable length : int
  ; buckets : ('a * 'b) list array
  ; hash : 'a -> int (* hashing function kept within the record itself *)
  ; equal : 'a -> 'a -> bool (* equality function kept within the record itself *)
  }

(* 2: basic functions to manipulate a dictionary  *)
let num_buckets = 17 (* bucket array is fixed length because of this const *)
(** Chooses the position in the array that a key should be stored at*)
let hash_bucket t key = t.hash key % num_buckets
(* this is the ini; binds the hashing and equality functions *)
let create ~hash ~equal =
  { length = 0
  ; buckets = Array.create ~len:num_buckets []
  ; hash
  ; equal
  }

let length t = t.length
let find t key =
  List.find_map (* returns the first Some or None if all are None in the array*)
    t.buckets.(hash_bucket t key) (* imperative syntax: reading value *)
    ~f:(fun (key', data) ->
      if t.equal key' key then Some data else None)

(* 3: iter implementation -- works by side-effect, walks through values in a given bucket *)
let iter t ~f =
  for i = 0 to Array.length t.buckets - 1 do
    List.iter t.buckets.(i) ~f:(fun (key, data) -> f ~key ~data)
  done
  (* since this works by side-effect, it returns a unit() *)


(* 4: handle the adding and removing of mappings from the dictionary *)
let bucket_has_key t i key =
  List.exists t.buckets.(i) ~f:(fun (key', _) -> t.equal key' key)

let add t ~key ~data =
  let i = hash_bucket t key in
  let replace = bucket_has_key t i key in
  let filtered_bucket =
    if replace
    then
      List.filter t.buckets.(i) ~f:(fun (key', _) ->
          not (t.equal key' key))
    else t.buckets.(i)
  in
  t.buckets.(i) <- (key, data) :: filtered_bucket; (* note the sequence of operations, which is controlled by the use of ; *)
  if not replace then t.length <- t.length + 1 (* mutable update on the lengthh if it's a new addition (vs a replacement) *)

let remove t key =
  let i = hash_bucket t key in
  if bucket_has_key t i key
  then (
    let filtered_bucket =
      List.filter t.buckets.(i) ~f:(fun (key', _) ->
          not (t.equal key' key))
    in
    t.buckets.(i) <- filtered_bucket;
    t.length <- t.length - 1)

Some useful notes:

  1. for loops are suitable for imperative contexts, we should prefer those even if recursive ways to express that may exist.
  2. some mutable operators shown in the example:
    • ; sequencing operator, helps us control the sequence of imperative actions. let-bindings would have worked too but for imperative code, this is more idiomatic

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
           <expr1>;
           <expr2>;
           ...
           <exprN>
      
             (* ANDD  *)
      
           let () = <expr1> in
           let () = <expr2> in
           ...
           <exprN>
      

      are equivalent!

    • <- mutable update operator. works for:

      • elements of an array (array.(i) <- expr)
      • updating record field (record.field <- expression)
  3. RULE OF THUMB: it’s good practice to leave all the side-effecting operations to the end of the function – minimises the chance of exceptions leaving corrupted state

Primitive Mutable Data

A walkthrough of the mutable primitives in OCaml

Array-like

Array like means mutable integer-indexed containers that provide constant-time access to elements

  • Ordinary Arrays

    General array, we can create it using a literal syntax [|1;2;3|]

    Some usual operations:

    • setting value: use the dot operator

    • reading/retrieving value: use the dot and assignment operator

    • blit: this a sliced copy which allows us to copy over a subarray to another subarray

      It’s closer in functionality to c’s memcopy or memmove rather than pythons sliced-copy. This is because out-of-bounds exceptions can happen if we’re not careful about array boundaries. Additionally, the copy-over overwrite behaviour happens safely, so careful about that.

      NOTE: the signature of this function is a little odd compared to other languages like python: Array.blit source source_start destination destination_start length and it’s not labelled arguments

      1
      2
      3
      4
      5
      6
      7
      8
      9
      
        (* Create arrays *)
        let src = [|1; 2; 3; 4; 5|];;
        let dst = [|0; 0; 0; 0; 0|];;
      
        (* Copy 3 elements from src (starting at index 1) to dst (starting at index 2) *)
        Array.blit src 1 dst 2 3;;
      
        (* dst now becomes *)
        dst;;
      
  • Bytes and Strings

    To represent a character, a single byte width (8 bit character) is what we require.

    Bytes and Strings are very similar.

    1
    2
    3
    4
    5
    6
    
    let b = Bytes.of_string "foobar";;
    (* val b : bytes = "foobar" *)
    Bytes.set b 0 (Char.uppercase (Bytes.get b 0));;
    (* - : unit = () *)
    Bytes.to_string b;;
    (* - : string = "Foobar" *)
    
    • Bytes: can be seen as char arrays
      • each entry (of char) is 8-bytes long
      • is mutable
      • still somewhat space efficient
    • Strings:
      • each entry (of char) is 1-byte long
      • is immutable
      • more space-efficient than bytes
  • Bigarrays

    for handling memory blocks outside of OCaml’s heap, this allows interoperability between languages of other libraries. KIV until the memory representation part.

    Generalised syntax for this:

    1
    2
    
    <bigarray_expr>.{<index_expr>}
    <bigarray_expr>.{<index_expr>} <- <value_expr>
    

Mutable Record and Object Fields and Ref Cells

For mutable records, it’s something we’ve seen before. Take note that the record itself is immutable but field(s) within the record may be mutable.

Objects are similar KIV until Chapter 12.

  • Ref cells

    like we’ve seen before, we might want to have an accumulator value to which we keep changing its state and ref is useful for that.

    Under the hood, it’s just a record with a single mutable field (contents) and it has some syntactic sugar that makes it easier to work with refs.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    (* Create/Initialise *)
    let x = ref 1;;
    (* val x : int ref = {Base.Ref.contents = 1} *)
    
    (* Access!!! *)
    !x;;
    (* - : int = 1 *)
    
    (* Assign!!! *)
    x := !x + 1;;
    (* - : unit = () *)
    
    !x;;
    (* - : int = 2 *)
    

Foreign Functions

Foreign functions allow ocaml to use imperative constructs used by syscalls or external libraries (e.g. write syscall, clock).

KIV chapter 22.

For and while loops

Both for and while loops aren’t really necessary because we could have otherwise written out a recursive version. It’s more idiomatic to use them in situations where the code is imperative.

for loops:

  • the bounds are both open bounds so the upper and lower are inclusive.
  • we can iterative in the reverse direction by using downto
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
open Stdio;;
for i = 0 to 3 do printf "i = %d\n" i done;;
(* i = 0 *)
(* i = 1 *)
(* i = 2 *)
(* i = 3 *)
(* - : unit = () *)

for i = 3 downto 0 do printf "i = %d\n" i done;;
(* i = 3 *)
(* i = 2 *)
(* i = 1 *)
(* i = 0 *)
(* - : unit = () *)

while loop:

  • pretty much the same order of evaluation of expressions as other languages
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
let rev_inplace ar =
  let i = ref 0 in
  let j = ref (Array.length ar - 1) in
  (* terminate when the upper and lower indices meet *)
  while !i < !j do
    (* swap the two elements *)
    let tmp = ar.(!i) in
    ar.(!i) <- ar.(!j);
    ar.(!j) <- tmp;
    (* bump the indices *)
    Int.incr i; (* <-- this is the builtin idiomatic way of incrementing an int ref *)
    Int.decr j (* likewise, for decrementing int ref *)
  done;;
(* val rev_inplace : 'a array -> unit = <fun> *)
let nums = [|1;2;3;4;5|];;
(* val nums : int array = [|1; 2; 3; 4; 5|] *)
rev_inplace nums;;
(* - : unit = () *)
nums;;
(* - : int array = [|5; 4; 3; 2; 1|] *)

Example: Doubly-linked lists

Yet again, this is just a didactic example, for actual purposes, use Doubly_linked module within Core.

Some notes on the interface / implementation:

  1. Elements:

    • Elements act as pointers to the interior of a list and allow us to navigate the list and give us a point at which to apply mutating operations.
    • We implement them as records. Records will have some optional mutable fiends – at the start of the list, the prev will be None and at the end of the list, the next will be None
  2. in the element-wise modification functions, notice how match is surrounded with parens. This is because the precedence of match is very low. There’s a need to separate that expression from the thereafter assignment operation.

    Without this separation, the final assignment expression would have become part of the None case.

    an alternative was to use begin...end.

  3. the implementation of remove shows why imperative code is tricky and sometimes can be fragile.

    • in remove, if we are removing the first element in the list, then we update the list pointer itself.

    • in the actual implementations, the edge cases are handled by error detection and error correction logic.

    • example of problematic actions:

      • double-removing an element will cause the main list reference to be set to None, thus emptying the list.
      • Similar problems arise from removing an element from a list it doesn’t belong to.

      RULE OF THUMB: for imperative data-structures, use the libraries as much as possible and if we don’t have one and need to implement it, give extra attention to the error handling.

Here’s the interface

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
open Base

(* 1: there are two types defined here *)
type 'a t (* type of the list *)
type 'a element (* type of an element within the list *)

(** Basic list operations *)

val create : unit -> 'a t
val is_empty : 'a t -> bool

(** Navigation using [element]s *)

val first : 'a t -> 'a element option
val next : 'a element -> 'a element option
val prev : 'a element -> 'a element option
val value : 'a element -> 'a

(** Whole-data-structure iteration *)

val iter : 'a t -> f:('a -> unit) -> unit
val find_el : 'a t -> f:('a -> bool) -> 'a element option

(** Mutation *)

val insert_first : 'a t -> 'a -> 'a element
val insert_after : 'a element -> 'a -> 'a element
val remove : 'a t -> 'a element -> unit

Here’s our implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
open Base
(* 1: we define the types first:  *)

(* an ='a element= is a record *)
type 'a element =
  { value : 'a (* const value *)
  ; mutable next : 'a element option (* mutable  *)
  ; mutable prev : 'a element option
  }

(* the tye of the list will be a mutable reference to a an optional element (the first element) *)
type 'a t = 'a element option ref

(* 2: We can impement some basic functions on the list and on elements *)
let create () = ref None
let is_empty t = Option.is_none !t
let value elt = elt.value
let first t = !t
let next elt = elt.next
let prev elt = elt.prev


(* 3: adding list modification functions *)
let insert_first t value =
  let new_elt = { prev = None; next = !t; value } in

  (* note: precedence of match is low, that's why we wrap it in parens*)
  (match !t with
  | Some old_first -> old_first.prev <- Some new_elt (* adding in the reverse pointer *)
  | None -> ()); (* being added to empty list*)
  t := Some new_elt; (* mutable assign to t*)
  new_elt

(* 4: add in element-relative functions *)
let insert_after elt value =
  let new_elt = { value; prev = Some elt; next = elt.next } in
  (match elt.next with
  | Some old_next -> old_next.prev <- Some new_elt
  | None -> ());
  elt.next <- Some new_elt;
  new_elt

let remove t elt =
  let { prev; next; _ } = elt in
  (match prev with
  | Some prev -> prev.next <- next
  | None -> t := next); (* if there isn't any previous, then the list pointer itself is updated.*)
  (match next with
  | Some next -> next.prev <- prev
  | None -> ());
  (* adjust pointers for the removed element *)
  elt.prev <- None;
  elt.next <- None

(* 5: iteration functions -- implemented using simple recursive loops*)
let iter t ~f =
  let rec loop = function
    | None -> ()
    | Some el ->
      f (value el); (* the function is applied to the value of the element *)
      loop (next el) (* the loop function is recursively called on the next element from the current element*)
  in
  loop !t (* invokes loop on the first element *)

let find_el t ~f =
  let rec loop = function
    | None -> None
    | Some elt -> if f (value elt) then Some elt else loop (next elt)
  in
  loop !t

Cyclic Data Structures

Doubly-linked lists are cyclic because it is possible to follow a nontrivial sequence of pointers that closes in on itself. In general, building cyclic data structures requires the use of side effects. This is done by constructing the data elements first, and then adding cycles using assignment afterward.

Cyclic data structures (general purpose):

  • we need to use side-effects to create them
  • we should construct the elements first
  • then we should add the cycles

An rare exception: we can use let rec for fixed-size cyclical data structures.

1
2
let rec endless_loop = 1 :: 2 :: 3 :: endless_loop;;
(* val endless_loop : int list = [1; 2; 3; <cycle>] *)

Modifying the List

Take note on the implementation point that the match expression has to be wrapped around

Iteration Functions

These are the map, iter, fold functions (refer to the code block above)

  • iter: the goal of which is to call a unit-producing function on every element of the list, in order
  • find_el: runs a provided test function on each value stored in the list, returning the first element that passes the test

Laziness and other benign effects

Benign Effects: there are cases where we want to be pure by default and use some limited imperative side-effects that give us performance improvements.

benign effect: laziness

There’s a need to wrap up lazy operations with the lazy keyword. The application of that operation has to be explicitly forced out via Lazy.force.

1
2
3
4
5
6
open Stdio

let v = lazy (print_endline "performing lazy computation"; Float.sqrt 16.);;
print_endline "hello world";
(* it won't actually be applied until we force the application: *)
Lazy.force v;;

Here’s a didactic example of how we may implement laziness:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
type 'a lazy_state =
  | Delayed of (unit -> 'a)
  | Value of 'a
  | Exn of exn;;
(* type 'a lazy_state = Delayed of (unit -> 'a) | Value of 'a | Exn of exn *)

type 'a our_lazy = { mutable state : 'a lazy_state };;
(* type 'a our_lazy = { mutable state : 'a lazy_state; } *)

let our_lazy f = { state = Delayed f };;
(* val our_lazy : (unit -> 'a) -> 'a our_lazy = <fun> *)

(* we've wrapped the thunk up here *)
let v =
  our_lazy (fun () -> (* -- this is the thunk *)
    print_endline "performing lazy computation"; Float.sqrt 16.);;
(* val v : float our_lazy = {state = Delayed <fun>} *)

(* here's our attempt at forcing out the function evaluation *)
let our_force l =
  match l.state with
  | Value x -> x
  | Exn e -> raise e
  | Delayed f ->
    try
      let x = f () in
      l.state <- Value x;
      x
    with exn ->
      l.state <- Exn exn;
      raise exn;;
(* val our_force : 'a our_lazy -> 'a = <fun> *)

Notes:

  1. we can create a lazy value from a thunk (which is a nullary function).

    NOTE: in the context of functional programming languages, a thunk is a nullary function. In others (e.g. js, it’s more generalised as a deferred computation that may or may not take in an argument)

Memoization and Dynamic Programming

Our favourite first-reach optimisation technique :)

CAUTION: a memoized function by nature leaks memory. As long as we hold onto the memoized function, we’re holding every result it has returned thus far.

here’s an example of how a memoisation function may be implemented by us:

1
2
3
4
5
let memoize m f =
  let memo_table = Hashtbl.create m in
  (fun x ->
     Hashtbl.find_or_add memo_table x ~default:(fun () -> f x));;
(* val memoize : 'a Hashtbl.Key.t -> ('a -> 'b) -> 'a -> 'b = <fun> *)
  • useful for top down recursive algos

    useful for efficiently implementing some recursive algos

    Examples:

    1. Let’s use a didactic fib function as a way to illustrate how to memoize:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      
          open Base
      
          (* -- helper timer function: *)
          let time f =
            let open Core in
            let start = Time.now () in
            let x = f () in
            let stop = Time.now () in
            printf "Time: %F ms\n" (Time.diff stop start |> Time.Span.to_ms);
            x
      
          (* val time : (unit -> 'a) -> 'a = <fun> *)
      
          (* slow, not memoized function *)
          let rec fib i =
          if i <= 1 then i else fib (i - 1) + fib (i - 2);;
          (* val fib : int -> int = <fun> *)
      
          (* NOTE: the tricky part is that we need to insert the memoization BEFORE the recursive calls within =fib=  *)
          let memo_fib = memoize (module Int) fib;;
      
          (* val fib : int -> int = <fun> *)
          time (fun () -> memo_fib 40);;
          (* Time: 18174.5970249 ms *)
          (* - : int = 102334155 *)
          time (fun () -> memo_fib 40);;
          (* Time: 0.00524520874023 ms *)
          (* - : int = 102334155 *)
      

      To make things better, we should write fib in a way that unwinds the recursion.

      We explore that re-write first, then look at how to make it memoized:

      In this example, we pass in a function that gets called before the usual recursive call (but this won’t have the memoisation yet):

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      
         (* not the recursive version *)
         let fib_norec fib i =
           if i <= 1 then i
           else fib (i - 1) + fib (i - 2);;
      
         (* turning it back to an ordinary Fib function by adding in the recursive knot:  *)
         let rec fib i = fib_norec fib i;;
      
         (* here's a polymorphic variant: *)
         let make_rec f_norec =
           let rec f x = f_norec f x in
           f;;
         (* NOTE:
            1. the function f_norec passed in to make_rec is a function that isn’t recursive but takes as an argument a function that it will call.
      
            2. so, make_rec essentially feeds f_norec to itself -- which makes it a true recursive function
      
            *)
      
         fib 20;;
      

      Now we try to make make_rec such that we can memoize when it ties the recursive knot. We’re using a reference here as a way to tie the recursive knot without using let rec (doesn’t work here)

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      
         (* our objective is to pass this function into a memoize function *)
         let fib_norec fib i =
           if i <= 1 then i
           else fib (i - 1) + fib (i - 2);;
      
         (* key point here is how we let bind the recursive function within *)
         let make_rec f_norec =
           let rec f x = f_norec f x in
           f;;
         (* val make_rec : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun> *)
      
         let memo_rec m f_norec x =
           let fref = ref (fun _ -> assert false) in
           let f = memoize m (fun x -> f_norec !fref x) in
           fref := f;
           f x;;
         (* memo_rec's signature is almost the same as make_rec, except for the accepting of a module param for the hashtable type.
      
         val memo_rec : 'a Hashtbl.Key.t -> (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b =
           <fun>
         *)
      
         (* now we can finally get a memoised version of fib *)
         let fib = memo_rec (module Int) fib_norec;;
         (* val fib : int -> int = <fun> *)
         time (fun () -> fib 40);;
         (*
         Time: 0.121355056763 ms
      ​   - : int = 102334155
         *)
      

      The code block above teaches us how to avoid the memory leak by defining an inner scope for memoized function calls and allowing the local scope to be the reason why garbage collection happens correctly and the memo-table doesn’t cause a memory leak.

      In the code above, the memory behaviour is important for us to understand correctly:

      1. only when fib is called, then the final argument to memo_rec (which is x, the param for fib) is presented and only then is the memoize function called

        Because the result of that call falls out of scope when fib returns, this is what makes memo_rec avoid a memory leak (since the memo table is garbage-collected after the computation completes)

      2. so, calling memo_rec (module Int) fib_norec does not trigger the call to memoize yet until the last param is binded.

        in fact, we could have done a single line call:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        
              let memo_rec m f_norec x =
                let fref = ref (fun _ -> assert false) in
                let f = memoize m (fun x -> f_norec !fref x) in
                fref := f;
                f x;;
        
              let fib = memo_rec (module Int) (
                            fun fib i ->
                            if i <= 1 then 1 else fib (i - 1) + fib (i - 2));;
              (* val fib : int -> int = <fun> *)
        

      NOTE: org-babel won’t preserve the table and so we won’t see the outputs showing proof that memoization has happened.

    2. Levenshtein edit distance between 2 strings:

      similar to the fib function, we need to memoize before the inner recursive call is made to the edit_distance function and to do that, we need it to take a pair of strings as a single argument (since our memo_rec defined earlier only works on single-argument functions). We also need to ensure that we can hash the pair of strings.

      we an use ppx-jane which has some meta-functions that can help us derive hash-functions and equality tests instead of writing them out ourselves.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      
         (* 1: using the deriver *)
         #require "ppx_jane";;
         module String_pair = struct
           type t = string * string [@@deriving sexp_of, hash, compare]
         end;;
         (* module String_pair : *)
         (*   sig *)
         (*     type t = string * string *)
         (*     val sexp_of_t : t -> Sexp.t *)
         (*     val hash_fold_t : Hash.state -> t -> Hash.state *)
         (*     val hash : t -> int *)
         (*     val compare : t -> t -> int *)
         (*   end *)
      
         (* 2: now we can use memo_rec for edit_distance and memoise it properly *)
         let edit_distance = memo_rec (module String_pair)
           (fun edit_distance (s,t) ->
              match String.length s, String.length t with
              | (0,x) | (x,0) -> x
              | (len_s,len_t) ->
                (* 1 operation to drop suffix in either case *)
                let s' = String.drop_suffix s 1 in
                let t' = String.drop_suffix t 1 in
                let cost_to_drop_both =
                  if Char.(=) s.[len_s - 1] t.[len_t - 1] then 0 else 1
                in
                (* we shall get the smallest of the lot *)
                List.reduce_exn ~f:Int.min
                  [ edit_distance (s',t ) + 1
                  ; edit_distance (s ,t') + 1
                  ; edit_distance (s',t') + cost_to_drop_both
         ]);;
         (* val edit_distance : String_pair.t -> int = <fun> *)
         time (fun () -> edit_distance ("OCaml 4.09","ocaml 4.09"));;
         (* Time: 0.964403152466 ms *)
      
  • Limitations of let rec

    LIMITATION: There are limits on what can appear on the RHS of a let rec, such as not allowing let rec x = x + 1 to work. There are only 3 constructs that can show up on the RHS of a let rec:

    1. a function definition
    2. a constructor
    3. the lazy keyword

    This is good and bad in the sense that our memo_rec can’t be implemented using let rec but it also helps us avoid nonsensical cases for the compiler.

    NOTE, PL-DESIGN: Haskell is lazy and such compiler-restrictions don’t show up.

    Also laziness is more constrained than explicit mutation, so in some cases might lead to code that is easier to reason with.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    
    (* we used let rec here *)
    let make_rec f_norec =
      let rec f x = f_norec f x in
      f;;
    
    (* we didn't use let rec here *)
    let memo_rec m f_norec x =
      let fref = ref (fun _ -> assert false) in
      let f = memoize m (fun x -> f_norec !fref x) in
      fref := f;
      f x;;
    (* memo_rec's signature is almost the same as make_rec, except for the accepting of a module param for the hashtable type.*)
    
    let lazy_memo_rec m f_norec x =
      let rec f = lazy (memoize m (fun x -> f_norec (force f) x)) in
      (force f) x;;
    (* generated type:
    
      val lazy_memo_rec : 'a Hashtbl.Key.t -> (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b =
      <fun>
      *)
    time (fun () -> lazy_memo_rec (module Int) fib_norec 40);;
    (* Time: 0.181913375854 ms *)
    (* - : int = 102334155 *)
    
    
    (* we can't try to force out a lazy version just to use let rec because a lazy value can't try to force itself as part of its own evaluation (without changing memo_rec into a lazy version) *)
    let rec x = lazy (force x + 1);;
    (* val x : int lazy_t = <lazy> *)
    force x;;
    (* Exception: Lazy.Undefined. *)
    

Input and Output

To elaborate on the “imperative” terminology, any function that doesn’t boil down to a deterministic transformation from its arguments to its return value is imperative in nature. That includes not only things that mutate your program’s data, but also operations that interact with the world outside of your program.

So IO such as:

  • file IO (includes terminal IO)
  • socket IO / network IO

Here we focus on OCaml’s buffered I/O library:

  1. only two types:
    • in_channel: for reading
    • out_channel: for writing
  2. IO interfaces:
    • core library deals with files and terminals
    • Unix module can be used to create other channels

Terminal I/O

The 3 channels (stdin, stdout, stderr) are available at the top level of Core’s namespace directly (we don’t need to go through In_channel and Out_channel modules).

There’s a chance that this code example is deprecated, I’m not going to investigate further on this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(* supposed to be in a .ml file, compiled by dune *)
open Core

let () =
  Out_channel.output_string stdout "Pick a timezone: ";
  Out_channel.flush stdout; (* flushes the buffer to actually write to the channel*)
  match In_channel.(input_line stdin) with
  | None -> failwith "No timezone provided" (* In_channel.input_line returns an option -- None means end of input stream e.g. EOL *)
  | Some zone_string ->
    let zone = Time_unix.Zone.find_exn zone_string in
    let time_string = Time.to_string_abs (Time.now ()) ~zone in
    Out_channel.output_string stdout
      (String.concat
         ["The time in ";Time_unix.Zone.to_string zone;" is ";time_string;".\n"]);
    Out_channel.flush stdout (* flush it again to force the printing -- good habit, though not necessary since it's end of program*)

notes:

  1. since out_channel s are buffered, we need to flush it to get the Out_channel.output_string to

Formatted Output with printf

printf in OCaml is special compared to C’s printf because it’s type-safe!

What kind of control can printf give?

  • alignment and padding
  • escape-strings
  • formatting of numbers (decimal, hex, binary?)
  • precision of float numbers

Functions similar to printf for other outputs:

  • eprintf for stderr
  • fprintf for arbitrary out_channel
  • sprintf that returns a formatted string
  • Understanding format strings

    Type-safe printf: The compiler checks that the types referred to by the format string match the types of the rest of the args passed to printf

    This analysis of contents happens at compile-time \(\implies\) format string needs to be available as a string literal @ compile-time – it’s for this reason that the compiler complains if we pass in a string variable. We’d need to otherwise annotate that type so that a string literal is inferred as a format string

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
    let fmt = "%i is an integer\n";;
    (* val fmt : string = "%i is an integer\n" *)
    printf fmt 3;;
    (*
      Line 1, characters 8-11:
    Error: This expression has type string but an expression was expected of type
             ('a -> 'b, Stdio.Out_channel.t, unit) format =
               ('a -> 'b, Stdio.Out_channel.t, unit, unit, unit, unit) format6
     ,
     *)
    
    (* if we type it correctly then it will work*)
    open CamlinternalFormatBasics;;
    let fmt : ('a, 'b, 'c) format =
    "%i is an integer\n";;
    printf fmt 3;;
    

    Here’s the time-conversion program updated using printf, some notes:

    1. flushing the channel works via %! format string when using printf:

      printf "The time in %s is %s.\n%!" (Time_unix.Zone.to_string zone) time_string

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    
    open Core
    
    let () =
      printf "Pick a timezone: %!";
      match In_channel.input_line In_channel.stdin with
      | None -> failwith "No timezone provided"
      | Some zone_string ->
        let zone = Time_unix.Zone.find_exn zone_string in
        let time_string = Time.to_string_abs (Time.now ()) ~zone in
        printf "The time in %s is %s.\n%!" (Time_unix.Zone.to_string zone) time_string
    
    (* ----- older, more verbose syntax:
    open Core
    
    let () =
      Out_channel.output_string stdout "Pick a timezone: ";
      Out_channel.flush stdout; (* flushes the buffer to actually write to the channel*)
      match In_channel.(input_line stdin) with
      | None -> failwith "No timezone provided" (* In_channel.input_line returns an option -- None means end of input stream e.g. EOL *)
      | Some zone_string ->
        let zone = Time_unix.Zone.find_exn zone_string in
        let time_string = Time.to_string_abs (Time.now ()) ~zone in
        Out_channel.output_string stdout
          (String.concat
             ["The time in ";Time_unix.Zone.to_string zone;" is ";time_string;".\n"]);
        Out_channel.flush stdout (* flush it again to force the printing -- good habit, though not necessary since it's end of program*)
    
    *)
    

File I/O

The general pattern we realise for File IO is that we create the channel, then use the channel then we close the channel (file read vs write is the same thing, just the channels are swapped (e.g. file vs stdio)).

We need self-cleaning code here:

  • We need to be careful with how we handle the finite resource of File Descriptors, so exceptions shouldn’t mean that we don’t release the fd \(\implies\) no fd leak (add in a finally step)

    There’s better bookkeeping functions that are available to us In_channelwith_file that we should be using.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
      (* we have ergonomic bookkeeping functions that we can use:  *)
      let sum_file filename =
        In_channel.with_file filename ~f:(fun file ->
          let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
          List.fold ~init:0 ~f:(+) numbers);;
      (* val sum_file : string -> int = <fun> *)
    
      (* the manual way would have been to add in a finally block for self-cleanup *)
      let sum_file filename =
        let file = In_channel.create filename in
        Exn.protect ~f:(fun () ->
          let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
          List.fold ~init:0 ~f:(+) numbers)
          ~finally:(fun () -> In_channel.close file);;
      (* val sum_file : string -> int = <fun> *)
    
  • we shouldn’t read the whole file into memory either and we should do line-by-line processing using In_channel.fold_lines instead of In_channel.input_lines (which reads the whole file)

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
      (* line by line processing *)
      let sum_file filename =
        In_channel.with_file filename ~f:(fun file ->
          In_channel.fold_lines file ~init:0 ~f:(fun sum line ->
            sum + Int.of_string line));;
    
      (* val sum_file : string -> int = <fun> *)
    
    
      (* compared with reading the whole file into memory *)
      let sum_file filename =
        In_channel.with_file filename ~f:(fun file ->
          let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
          List.fold ~init:0 ~f:(+) numbers);;
      (* val sum_file : string -> int = <fun> *)
    

There’s better patterns in the API docs for this. There’s also a guide to file manipulation here.

Order of Evaluation

OCaml, like other languages is strict \(\implies\)

  • when we bind an identifier to result of some expression, the expression is evaluated prior to the binding
  • call a function on a set of args, those args are evaluated prior to being passed to the function

When we code imperatively, the order of evaluation matters. Additionally, it also matters if we evaluate lazily or not:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
(* example: any of these sins are negative? *)
(* this code will unnecessarily compute sins -- we could have been lazy about it *)
let x = Float.sin 120. in
let y = Float.sin 75.  in
let z = Float.sin 128. in
List.exists ~f:(fun x -> Float.O.(x < 0.)) [x;y;z];;
(* - : bool = true *)

let x = lazy (Float.sin 120.) in
let y = lazy (Float.sin 75.)  in
let z = lazy (Float.sin 128.) in
List.exists ~f:(fun x -> Float.O.(Lazy.force x < 0.)) [x;y;z];;
(* - : bool = true *)

Why OCaml is strict:

lazy evaluation and imperative programming generally don’t mix well because laziness makes it harder to reason about when a given side effect is going to occur. Understanding the order of side effects is essential to reasoning about the behavior of an imperative program.

Strictness benefits:

  1. expressions bound by a sequence of let bindings, evaluated in the order that they’re defined

Counter-intuitive order of evaluation

OCaml compiler’s order of evaluation can be a little counter intuitive – the sub-expression that is last here is evaluated first:

1
2
3
4
5
6
7
8
9
List.exists ~f:(fun x -> Float.O.(x < 0.))
  [ (printf "1\n"; Float.sin 120.);
    (printf "2\n"; Float.sin 75.);
    (printf "3\n"; Float.sin 128.); ];;
(* prints out: *)
(* 3 *)
(* 2 *)
(* 1 *)
(* - : bool = true *)

Side-effects and Weak Polymorphism

Weakly polymorphic variable: variable that can be used with any single type (the _ in the type variable naming is what indicates that something is a weakly polymorphic typing). The compiler wishes to concretise this type ASAP.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
(* example of variable with weak polymorphic typing *)
let remember =
  let cache = ref None in
  (fun x ->
     match !cache with
     | Some y -> y
     | None -> cache := Some x; x);;
(* val remember : '_weak1 -> '_weak1 = <fun> *)

(* example of general polymorphic typing: *)
let identity x = x;;
(* val identity : 'a -> 'a = <fun> *)

(* concretisation of the weakly polymorphic type *)
let remember_three () = remember 3;;
(* val remember_three : unit -> int = <fun> *)
remember "avocado";; (*<---- this will error out*)
  • Note that the type of remember was settled by the definition of remember_three, even though remember_three was never called!

The Value Restriction

This part is about knowing when we have simple types that allow the code to remain polymorphic vs when it’s weakly polymorphic (and hence the values must be restricted).

Value Restriction is the rule that is used to concretise the types for weakly polymorphic variables.

The idea is that initially it’s an unknown type which is stored in a persistent, mutable cell. “Simple values” are types from the kinds of expressions that don’t introduce persistent, mutable cells – and so can remain polymorphic:

  • constants
  • constructors that only contain other simple values
  • function declarations
  • let bindings where the sub-bindings are all simple values

The value restriction doesn’t require that there is no mutable state, only that there is no persistent mutable state that could share values between uses of the same function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
(* this is simple, it's polymorphic *)
(fun x -> [x;x]);;
(* - : 'a -> 'a list = <fun> *)

(* this ends up being weakly polymorphic -- mainly because OCaml can't separate pure and impure functions. *)
identity (fun x -> [x;x]);;
(* - : '_weak2 -> '_weak2 list = <fun> *)

(* this has a fresh reference for each call to this function, hence it's NOT weakly typed. In comparison, a memoized function would be weakly typed (because the mutable cache would have access across different function calls for the same function -- persistent, mutable state.)  *)
let f () = ref None;;
(* val f : unit -> 'a option ref = <fun> *)

Partial Application and the Value Restriction

In most cases, the value restriction is a good thing because the value in question can only be safely be used with a single type.

Partial application is the exception because partial application is NOT a simple value, so functions created by partial application are sometimes less general than we would expect.

the solution to avoiding this inferring of a weakly polymorphic type is to do eta expansion – general approach for resolving problems that arise from the value restriction.

1
2
3
4
5
6
let list_init_10 = List.init 10;; (* [1] *)
(* val list_init_10 : f:(int -> '_weak3) -> '_weak3 list = <fun> *)

(* eta expansion: this keeps things polymorphic -- we avoid the partial application -- [2] *)
let list_init_10 ~f = List.init 10 ~f;;
val list_init_10 : f:(int -> 'a) -> 'a list = <fun>
  • [1]: this is inferred as a weakly polymorphic type for the resulting function because there’s nothing that guarantees that List.init isn’t creating a persistent ref somewhere inside of it that would be shared across multiple call to list_init_10.

  • [2]: we do eta expansion to avoid the partial application and avoid the weakly polymorphic type inference.

Relaxing the Value Restriction

Value Restriction is just a syntactic check. We can do a few operations that count as simple values and anything that is a simple value can be generalised (polymorphic).

There’s a relaxed version of value-restriction that lets us use type info to allow polymorphic types for things that are not simple values

  1. a function application (which may be inferred as weakly polymorphic) can be strongly polymorphic if the value is an immutable value

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
        (* function application -- inferred as weak *)
        identity (fun x -> [x;x]);;
        (* - : '_weak4 -> '_weak4 list = <fun> *)
    
        (* immutable hence can be strongly polymorphic *)
        identity [];;
        (* - : 'a list = [] *)
    
        [||];; (* mutable return value*)
        (* - : 'a array = [||] *)
        identity [||];; (* --> gets infered as weakly polymorphic*)
        (* - : '_weak5 array = [||] *)
    
  2. abstract values need explicit guarantees to be inferred as strongly polymorphic:

    an abstract data type needs to be annotated in a way that it guarantees in the interface that the data structure doesn’t contain any persistent references to values of type 'a and only then will OCaml infer polymorphic types for abstract values.

    In the example, Concat_list.t is immutable but without the guarantee, OCaml treats it as if it were mutable.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    
       module Concat_list : sig
         type 'a t
         val empty : 'a t
         val singleton : 'a -> 'a t
         val concat  : 'a t -> 'a t -> 'a t  (* constant time *)
         val to_list : 'a t -> 'a list       (* linear time   *)
       end = struct
    
         type 'a t = Empty | Singleton of 'a | Concat of 'a t * 'a t
    
         let empty = Empty
         let singleton x = Singleton x
         let concat x y = Concat (x,y)
    
         let rec to_list_with_tail t tail =
           match t with
           | Empty -> tail
           | Singleton x -> x :: tail
           | Concat (x,y) -> to_list_with_tail x (to_list_with_tail y tail)
    
         let to_list t =
           to_list_with_tail t []
    
       end;;
    
       (*
         module Concat_list :
         sig
           type 'a t
           val empty : 'a t
           val singleton : 'a -> 'a t
           val concat : 'a t -> 'a t -> 'a t
           val to_list : 'a t -> 'a list
         end
       *)
    
       Concat_list.empty;;
       (* - : 'a Concat_list.t = <abstr> *) (*<--- it's abstract type but no guarantee*)
       identity Concat_list.empty;; (*<---- gives weak polymorphism*)
       (* - : '_weak6 Concat_list.t = <abstr> *)
    
    
       (* ============================================================== *)
       module Concat_list : sig
         type +'a t
         val empty : 'a t
         val singleton : 'a -> 'a t
         val concat  : 'a t -> 'a t -> 'a t  (* constant time *)
         val to_list : 'a t -> 'a list       (* linear time   *)
       end = struct
    
         type 'a t = Empty | Singleton of 'a | Concat of 'a t * 'a t
    
         let empty = Empty
         let singleton x = Singleton x
         let concat x y = Concat (x,y)
    
         let rec to_list_with_tail t tail =
           match t with
           | Empty -> tail
           | Singleton x -> x :: tail
           | Concat (x,y) -> to_list_with_tail x (to_list_with_tail y tail)
    
         let to_list t =
           to_list_with_tail t []
    
       end;;
       (*
       module Concat_list :
         sig
           type +'a t
           val empty : 'a t
           val singleton : 'a -> 'a t
           val concat : 'a t -> 'a t -> 'a t
           val to_list : 'a t -> 'a list
         end
         *)
    
       (* now, it's strongly polymorphic: *)
       identity Concat_list.empty;;
       (* - : 'a Concat_list.t = <abstr> *)
    

Chapter 9: GADTs

Generalized Algebraic Data Types (GADTs) are an extension to variants that we’ve seen so far:

Benefits:

  1. more expressive
  2. can create types that match the shape of the intended program more precisely
  3. code can be safer, more concise and more efficient

Unique Features / Mechanisms:

  1. let the compiler learn more type info when we descend into a case of a pattern match
  2. make it easy to use existential types – working with data of a specific but unknown type

Costs:

  1. distinct cost (TODO see below)
  2. harder to use, less intuitive than ordinary variants
  3. can be confusing to figure out how to use them effectively \(\implies\) should only be used when it makes a big qualitative improvement to your design

We learn from some examples for this chapter.

A Little Language

A good use is to have typed expression languages (like Blang). This is going to be our DSL.

WE wish to create a language here that mixes arithmetic and boolean expressions – we have to deal with the possibility of ill-typed expressions (e.g. expressions that adds a bool and an int). We need type safety to ensure that the compiler will throw errors if such expressions are present. A non-type-safe system would otherwise require us to do dynamic type checking on the type of the input params of the function.

We attempt to use an ordinary variant first:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
(* 1. Type defs (interface): *)
open Base

type value =
  | Int of int
  | Bool of bool

type expr =
  | Value of value
  | Eq of expr * expr
  | Plus of expr * expr
  | If of expr * expr * expr

(* 2. Defining a custom exception for ill-typed expressions *)
exception Ill_typed

(* 3. v0: evaluator that had many dynamic checks to detect errors
 , The problem here is that it's possible to create ill-typed expressions which will trip these dynamic checks that we're doing.
 Our objective was to make them work together.
 *)
let rec eval expr =
  match expr with
  | Value v -> v
  | If (c, t, e) ->
    (match eval c with
     | Bool b -> if b then eval t else eval e
     | Int _ -> raise Ill_typed) (*-- dynamic type checking as an attempt to have type safety*)
  | Eq (x, y) ->
    (match eval x, eval y with
     | Bool _, _ | _, Bool _ -> raise Ill_typed
     | Int f1, Int f2 -> Bool (f1 = f2))
  | Plus (x, y) ->
    (match eval x, eval y with
     | Bool _, _ | _, Bool _ -> raise Ill_typed
     | Int f1, Int f2 -> Int (f1 + f2));;
(* val eval : expr -> value = <fun> *)
let i x = Value (Int x)
and b x = Value (Bool x)
and (+:) x y = Plus (x,y);;
(* val i : int -> expr = <fun> *)
(* val b : bool -> expr = <fun> *)
(* val ( +: ) : expr -> expr -> expr = <fun> *)
eval (i 3 +: b false);;
(* Exception: Ill_typed. *)

Here, it’s because of the dynamic type checking in our logic did our type error get thrown here. We wish our language to be type safe

Making the Language Type-Safe

We’re still trying to make things work with ordinary variants.

A type-safe version of this API maybe include:

  1. ability for the expressions to have a type param to distinguish integer and bool expressions

Signature:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
module type Typesafe_lang_sig = sig
  type 'a t

  (** functions for constructing expressions *)

  val int : int -> int t
  val bool : bool -> bool t
  val if_ : bool t -> 'a t -> 'a t -> 'a t
  val eq : 'a t -> 'a t -> bool t
  val plus : int t -> int t -> int t

  (** Evaluation functions *)

  val int_eval : int t -> int
  val bool_eval : bool t -> bool
end

Implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
module Typesafe_lang : Typesafe_lang_sig = struct
  (* phantom type *)
  type 'a t = expr (*links the abstract type to the concrete expressions*)

  let int x = Value (Int x)
  let bool x = Value (Bool x)
  let if_ c t e = If (c, t, e)
  let eq x y = Eq (x, y)
  let plus x y = Plus (x, y)

  let int_eval expr =
    match eval expr with
    | Int x -> x
    | Bool _ -> raise Ill_typed

  let bool_eval expr =
    match eval expr with
    | Bool x -> x
    | Int _ -> raise Ill_typed
end

(* attempting an ill-typed expression *)
let expr = Typesafe_lang.(plus (int 3) (bool false));;
(*
Line 1, characters 40-52:
Error: This expression has type bool t but an expression was expected of type
         int t
   Type bool is not compatible with type int
       *)

Notes:

  1. the main trick here is the phantom-type: type 'a t = expr

    SO the 'a (type parameter) is the phantom type because it doesn’t show up in the body definition of t \(\implies\) type param is unused \(\implies\) free to take on any value.

    by the way, the body definition refers to the RHS of the = in the type definition, so since expr doesn’t use 'a, internally 'a t is just implemented as the type expr. So the 'a doesn’t really do much there and is a phantom. This is also called a manifest type definition or aliasing.

    More about the use of “phantom”. Our purpose is to encode extra compile-time type information without affecting runtime representation.

  2. there are two eval functions here, if we had used GADTs, we could have had a single eval expression.

    the function signature differing between the two evals is what allows the compiler to catch the ill-typing (along with the phantom type)

    this still has type checking being done dynamically within the logic though.

  3. this is type-safe because the type-system is what would reject the ill-typed expression. It’s not complete though because in some cases we still can get around it and that’s why the dynamic type checking is still needed in the eval functions:

    1
    2
    3
    4
    
       let expr = Typesafe_lang.(eq (bool true) (bool false));;
       (* val expr : bool Typesafe_lang.t = <abstr> *)
       Typesafe_lang.bool_eval expr;;
       (* Exception: Ill_typed. *)
    

    the types within the implementation don’t necessarily rule out ill-typed expressions. The same fact explains why we needed two different eval functions: the implementation of eval doesn’t have any type-level guarantee of when it’s handling a bool expression versus an int expression, so it can’t safely give results where the type of the result varies based on the result of the expression.

Trying to do Better with Ordinary Variants

We persist and try to encode the typing rules within the DSL directly for the expr and value types.

1
2
3
4
5
6
7
8
9
type 'a value =
  | Int of 'a
  | Bool of 'a

type 'a expr =
  | Value of 'a value
  | Eq of 'a expr * 'a expr
  | Plus of 'a expr * 'a expr
  | If of bool expr * 'a expr * 'a expr

implementations make it seem like they work as intended:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
let i x = Value (Int x)
and b x = Value (Bool x)
and (+:) x y = Plus (x,y);;
(*
val i : 'a -> 'a expr = <fun>
val b : 'a -> 'a expr = <fun>
val ( +: ) : 'a expr -> 'a expr -> 'a expr = <fun>
*)
i 3;;
(* - : int expr = Value (Int 3) *)
b false;;
(* - : bool expr = Value (Bool false) *)
i 3 +: i 4;;
(* - : int expr = Plus (Value (Int 3), Value (Int 4)) *)

There are some problems to this implementation:

  1. the inner and outer types are always going to be the same, we can’t inter-op with types as we need.
    1
    2
    3
    4
    5
    6
    7
    
       If (Eq (i 3, i 4), i 0, i 1);;
       (*
       Line 1, characters 9-12:
       Error: This expression has type int expr
              but an expression was expected of type bool expr
              Type int is not compatible with type bool
       *)
    
  • Key Problem

    The way we want to use the type param isn’t supported by ordinary variants – we want the type param to be populated in different ways in the different tags and to depend in non-trivial ways on the types of the data associated with each tag.

    that’s what GADTs are for.

GADTs to the Rescue

  • Syntax

    Here’s the GADT type signature.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    type _ value =
      | Int : int -> int value
      | Bool : bool -> bool value
    
    type _ expr =
      | Value : 'a value -> 'a expr
      | Eq : int expr * int expr -> bool expr
      | Plus : int expr * int expr -> int expr
      | If : bool expr * 'a expr * 'a expr -> 'a expr
    
    1. when we see the type declaration, the use of the _ is syntactic sugar for “This type has one parameter, but I’m not going to name it because I don’t need to.” So for type declaration, it’s like saying “I’m ignoring this type name”

      it’s an interesting parallel to the use of _ in the context of pattern matching where _ means “I’m ignoring this value name”.

    2. the colon to the right of the tags tells us that it’s a GADT

    3. consider the definition of a tag: <tag name> : <function signature>.

      GOTCHA: when we say things like “type parameter can be determined by the tag”, it’s NOT JUST the tag name that we are referring to, it’s the annotation typically (i.e. is it algebraic or concrete?)

      1. LHS: the tag name
      2. RHS of : can be seen as a single-arg function type \(\implies\) it’s like the type constructor.
        • LHS of -> :

          • types of the args of the constructor
        • RHS of -> :

          • shows the types of the args of the constructed value
          • is an instance of the type of the overall GADT
    4. The type parameter CAN depend on BOTH the tag and the type of arguments.

      What this really means is “can I tell the type param by looking at the tag?”. This includes the tag name as well as the annotations for that tag. Annotations may imply that the types are concrete OR have some usage of algebraic types. If they’re concrete then we can just know the type based on the tag. Else, we have to resolve what the algebraic type would actually be.

      • Eq is an example where the type param only depends on the tag. That’s why we already have the type concretized to int and bool

        Eq : int expr * int expr -> bool expr

      • If is an example where the type param depends on the args to the tag: i.e. the type param of the If is the type param of the then and else clauses, that’s why we’re using the 'a algebraic type.

        If : bool expr * 'a expr * 'a expr -> 'a expr.

  • Observations

    Given the GADT above, we have the following code that gives some observations:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    
    (* examples of type safety @ construction *)
    (*=============================================  *)
    let i x = Value (Int x)
    and b x = Value (Bool x)
    and (+:) x y = Plus (x,y);;
    (*
    val i : int -> int expr = <fun>
    val b : bool -> bool expr = <fun>
    val ( +: ) : int expr -> int expr -> int expr = <fun>
    *)
    i 3;;
    (* - : int expr = Value (Int 3) *)
    b 3;;
    (*
    Line 1, characters 3-4:
    Error: This expression has type int but an expression was expected of type
             bool
             *)
    i 3 +: i 6;;
    (* - : int expr = Plus (Value (Int 3), Value (Int 6)) *)
    i 3 +: b false;;
    (*
    Line 1, characters 8-15:
    Error: This expression has type bool expr
           but an expression was expected of type int expr
           Type bool is not compatible with type int
    *)
    
    (* GADT-based evaluator function that doesn't need any dynamic type-safety checks: *)
    (* ========================================== *)
    let eval_value : type a. a value -> a = function
      | Int x -> x
      | Bool x -> x;;
    (* NOTE: the value generated for this function matches what our type annotation is!:
    
     val eval_value : 'a value -> 'a = <fun> *)
    let rec eval : type a. a expr -> a = function
      | Value v -> eval_value v
      | If (c, t, e) -> if eval c then eval t else eval e
      | Eq (x, y) -> eval x = eval y
      | Plus (x, y) -> eval x + eval y;;
    (* val eval : 'a expr -> 'a = <fun> *)
    
    1. we see that we’ve enforced type safety. The rules for the type safety are directly encoded in the definition of the expression type. We have a single

      In our attempt without GADTs,we had to do the type-safety enforcement via signature-level restrictions on phantom types. In particular we had to alias the 'a t (abstract type) to expr (concrete type defined prior).

      Showing expression type vs signature-based :

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      
         type _ expr =
           | Value : 'a value -> 'a expr
           | Eq : int expr * int expr -> bool expr
           | Plus : int expr * int expr -> int expr
           | If : bool expr * 'a expr * 'a expr -> 'a expr
      
         (* -- previously, we enforced this type safety with signature-level restrictions on phantom types *)
         module Typesafe_lang : Typesafe_lang_sig = struct
           (* phantom type *)
           type 'a t = expr (*links the abstract type to the concrete expressions -- this is aliasing of the phantom type*)
      
           let int x = Value (Int x)
           let bool x = Value (Bool x)
           let if_ c t e = If (c, t, e)
           let eq x y = Eq (x, y)
           let plus x y = Plus (x, y)
      
           let int_eval expr =
             match eval expr with
             | Int x -> x
             | Bool _ -> raise Ill_typed
      
           let bool_eval expr =
             match eval expr with
             | Bool x -> x
             | Int _ -> raise Ill_typed
         end
      
    2. With GADT, we have a single polymorphic eval function. Previously, we needed to have two concretely typed specific evaluators when using phantom types.

    3. Downside: when using GADTs, the code using them needs extra type annotations:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      
         let eval_value : type a. a value -> a = function
           | Int x -> x
           | Bool x -> x;;
      
         (* without annotations, we'd have gotten something like the following error:
      
            Error: This pattern matches values of type bool value
                but a pattern was expected which matches values of type int value
                Type bool is not compatible with type int
            *)
      

GADTs, Locally Abstract Types, and Polymorphic Recursion

We’ve seen the correct type-annotated implementation for eval already. Carrying on from before, we explore the need to have specific syntax for typing when our function is a polymorphic (GADT with abstract types) AND also recursive.

  • Problem

    eval is both polymorphic and recursive which makes this complex.

    1. polymorphic:

      The reason we need to do that is because OCaml, by default, is not willing to instantiate the ordinary type variables in DIFFERENT ways (as int vs as bool) within the body of the same function but that’s what we want here. Without the specific syntax, there’s some type restriction that happens and it eventually causes the type mismatch that results in the error we see in the preceding code block.

    2. recursive:

      recursive code that is also polymorphic may be prone to typing errors because the compiler may be forced to unify a (the concrete version of the abstract type 'a) across all recursive calls IF there’s a leakage of type from inner context to a more outer context. Could be in the form of an entire concretization of the abstract type or could be just a type restriction applied.

    Longer Elaboration:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    
    ### Understanding "Escaping of Scope" in the Context of GADTs and Locally Abstract Types
    
    Your notes discuss a complex but important concept in OCaml's type system involving GADTs, polymorphic recursion, and locally abstract types.
    
    ***
    
    #### What does "escaping of scope" mean?
    
    - A **type variable (like `'a`) "escapes its scope"** when it is introduced or bound in a *local* context (like inside a pattern match or a locally abstract type annotation) but then tries to leak into a broader context where it was not intended to be visible.
    
    - In OCaml's type-checking, especially with GADTs and polymorphic recursion, **type variables introduced locally in pattern matching or annotations are allowed to be specialized within that local scope only.** They represent *fresh*, *locally abstract* types.
    
    - If, during type-checking, these variables would have to be used or unified with types outside of their defining scope, the compiler reports a scope escape error or a type error, because it breaks the correctness guarantees of the type system.
    
    ***
    
    #### How does this relate to your `eval` function example?
    
    - Your function `eval` is recursive and polymorphic:
      ```ocaml
      let rec eval (type a) (e : a expr) : a = ...
      ```
      Here, `(type a)` introduces a *locally abstract type variable* `a` scoped to the function body.
    
    - When recursive calls to `eval` occur, the compiler attempts to unify or merge those recursive calls’ types.
    
    - Without explicit polymorphic annotation (`let rec eval : 'a. 'a expr -> 'a = ...`), the locally abstract type `a` **would leak (escape) from inside the recursive call's scope into the outer function's scope**, forcing the compiler to unify `a` across all recursive calls.
    
    - This escape causes the compiler to expect that `a` always equals a specific type (e.g., `bool`), leading to failures when other branches pass expressions with different types like `int`.
    
    - The fix, as your notes outline, is to give `eval` a **polymorphic (for all `a`) type signature** explicitly to ensure that `a` doesn’t escape — each recursive call is allowed a fresh, independent instantiation of `a`.
    
    ***
    
    #### Intuition summary:
    
    | Phrase                 | Meaning                                                                                     |
    |------------------------|---------------------------------------------------------------------------------------------|
    | *Locally abstract type* | A type variable introduced locally (inside a function, pattern match, or annotation).       |
    | *Escaping its scope*    | When such a type variable would have to be valid outside its defining local context.          |
    | *Why a problem?*        | Violates assumptions that the type variable is fresh and isolated; type safety breaks down. |
    | *How to fix?*           | Use explicit polymorphic recursion syntax so each recursive call gets a fresh type instance. |
    
    ***
    
    #### Your notes align strongly with formal explanations found in these references:
    
    - [OCaml Manual Chapter on GADTs (Section on Existential and Locally Abstract Types)](https://ocaml.org/manual/gadts.html)
    - [Real World OCaml — GADTs chapter](https://dev.realworldocaml.org/gadts.html)
    - Thomas Leonard's article on OCaml’s locally abstract types and polymorphic recursion
    
    ***
    
    If you want, I can explain with a detailed example showing the scope of the locally abstract type and how polymorphic recursion fixes the escape problem, so let me know!
    
    [1](https://ocaml.org/manual/5.2/gadts-tutorial.html)
    [2](https://dev.realworldocaml.org/gadts.html)
    [3](https://discuss.ocaml.org/t/strange-grammar-in-the-real-world-ocaml-book-chapter-gadts/10093)
    [4](https://stackoverflow.com/questions/27864200/an-concrete-simple-example-to-demonstrate-gadt-in-ocaml)
    [5](https://www.math.nagoya-u.ac.jp/~garrigue/papers/ml2011.pdf)
    [6](https://www.cl.cam.ac.uk/teaching/1617/L28/gadts.pdf)
    [7](http://pauillac.inria.fr/~remy/gadts/)
    [8](https://www.cl.cam.ac.uk/teaching/1415/L28/gadts.pdf)
    
  • Locally abstract types as a solution

    This type-restriction is a problem that we desire to “fix” and we do that by adding a locally abstract type that doesn’t have that restriction.

    We might try this annotation, but it will fail because eval is recursive in nature and inference of GADTs doesn’t play well with recursive calls.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    let eval_value (type a) (v : a value) : a =
      match v with
      | Int x -> x
      | Bool x -> x;;
    (* val eval_value : 'a value -> 'a = <fun> *)
    
    
    let rec eval (type a) (e : a expr) : a =
      match e with
      | Value v -> eval_value v
      | If (c, t, e) -> if eval c then eval t else eval e
      | Eq (x, y) -> eval x = eval y
      | Plus (x, y) -> eval x + eval y;;
    (*
      Line 4, characters 43-44:
    Error: This expression has type a expr but an expression was expected of type
             bool expr
             Type a is not compatible with type bool
    *)
    

    The main problem is that eval is recursive so the type-checker is trying to merge the locally abstract type a into the type of the re cursive function eval and the way that a is escaping its scope (inner to outer) is when merging it into the outer scope within which eval is defined, that’s why it’s expecting type bool expr.

    FIX: explicitly marking eval as polymorphic and that’s where the type annotation syntax comes from: let rec eval : 'a. 'a expr -> 'a A = .... IF eval is marked as polymorphic, then the type of eval is not specialised to a so a doesn’t escape its scope.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    
    (* locally abstract syntax + marked as polymorphic  *)
    let rec eval : 'a. 'a expr -> 'a =
      fun (type a) (x : a expr) -> (* <-- this is an anon function definition, taking an explicit argument with explicit type annotations*)
       match x with
       | Value v -> eval_value v
       | If (c, t, e) -> if eval c then eval t else eval e
       | Eq (x, y) -> eval x = eval y
       | Plus (x, y) -> eval x + eval y;;
    (* val eval : 'a expr -> 'a = <fun> *)
    
    (* without the syntax (code won't work):  *)
    let rec eval (type a) (e : a expr) : a =
      match e with
      | Value v -> eval_value v
      | If (c, t, e) -> if eval c then eval t else eval e
      | Eq (x, y) -> eval x = eval y
      | Plus (x, y) -> eval x + eval y;;
    
    • type variable escaping its scope

      when we have leakage of the concretization of an abstract type from inner to outer context (in the case of a recursive function call, it’s an outer to inner function call and back again the other way).

      By marking eval as polymorphic, then the type of eval is not specialised to a so a doesn’t escape its scope.

    • eval being an example of polymorphic recursion

      Another reason the compiler can’t infer the types automatically.

      for example, with If, since the If itself must be of type bool, but the type of the then and else clauses could be of type int. This means that when evaluating If, we’ll dispatch eval at a different type than it was called on.

      eval needs to see itself as polymorphic

    • using the syntactic sugar

      For Recursive functions that use GADTs, our type annotation should be able to do the following for it to feel good:

      We can combine the two things:

      1. polymorphism annotation and
      2. the creation of the locally abstract types.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      
      (* this uses the syntactic sugar *)
      let rec eval : type a. a expr -> a = function
        | Value v -> eval_value v
        | If (c, t, e) -> if eval c then eval t else eval e
        | Eq (x, y) -> eval x = eval y
        | Plus (x, y) -> eval x + eval y;;
      (* val eval : 'a expr -> 'a = <fun> *)
      
      (* the more verbose (and correct) way earlier was: *)
      let rec eval : 'a. 'a expr -> 'a =
        fun (type a) (x : a expr) ->
         match x with
         | Value v -> eval_value v
         | If (c, t, e) -> if eval c then eval t else eval e
         | Eq (x, y) -> eval x = eval y
         | Plus (x, y) -> eval x + eval y;;
      (* val eval : 'a expr -> 'a = <fun> *)
      

      The syntax : type a. (with function afterward) tells the compiler:

      • “Here I am explicitly introducing a locally abstract type a which will be instantiated freshly in each match branch or recursive call.”

        • the concretisation will be local to the function body and recursive calls, thereby preventing the type inference leakage
      • This abstracts 'a as a local type variable (a), allowing its different instantiations during recursion or pattern matching.

When are GADTs useful?

Use case: Varying your Return type

We are familiar with cases where return types vary based on the input type but usually these involve simple dependencies between types that correspond to the data flow within our code. List.find is a good example.

We might want more flexibility, e.g. for List.find on not found cases, we want to make it such that we can choose what returns.

A typical variant vs a GADT:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(* the GADT version of If_not_found -- the type of our argument that we give to control the return type.*)
module If_not_found = struct
  type (_, _) t =
    | Raise : ('a, 'a) t
    | Return_none : ('a, 'a option) t
    | Default_to : 'a -> ('a, 'a) t
end

(* typical variant, for comparison *)
module If_not_found = struct
  type 'a t =
    | Raise
    | Return_none
    | Default_to of 'a
end

Using the normal variant almost seems like what we want (to control the return type of the function based on our inputs in a non-trivial manner). The problem is that we don’t exactly achieve that because our output is always wrapped in an option:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
let rec flexible_find list ~f (if_not_found : _ If_not_found.t) =
  match list with
  | hd :: tl ->
    if f hd then Some hd else flexible_find ~f tl if_not_found
  | [] ->
    (match if_not_found with
    | Raise -> failwith "Element not found"
    | Return_none -> None
    | Default_to x -> Some x);;

(* --- this almost seems to work, except the return type is always wrapped in an option.

val flexible_find :
  'a list -> f:('a -> bool) -> 'a If_not_found.t -> 'a option = <fun>
*)

(* --- demo calls:  note how all of them (other than raised) are option-wrapped*)
flexible_find ~f:(fun x -> x > 10) [1;2;5] Return_none;;
(* - : int option = None *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] (Default_to 10);;
(* - : int option = Some 10 *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] Raise;;
(* Exception: (Failure "Element not found"). *)
flexible_find ~f:(fun x -> x > 10) [1;2;20] Raise;;
(* - : int option = Some 20 *)

Here, our GDAT will work here. In our GDAT, we have two type params:

  • one for the type of the list element, and
  • one for the return type of the function.

These type params will be locally abstract in their definition.

Now, when we implement the flexible_find function, we get the behaviour we really want:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
let rec flexible_find
 : type a b. f:(a -> bool) -> a list -> (a, b) If_not_found.t -> b =
 fun ~f list if_not_found ->
  match list with
  | [] ->
    (match if_not_found with
    | Raise -> failwith "No matching item found"
    | Return_none -> None
    | Default_to x -> x)
  | hd :: tl ->
    if f hd
    then (
      match if_not_found with
      | Raise -> hd
      | Return_none -> Some hd
      | Default_to _ -> hd)
    else flexible_find ~f tl if_not_found;;
(* notice that the generated signature matches what our locally abstract type definition is above within the function type annotation:

val flexible_find :
  f:('a -> bool) -> 'a list -> ('a, 'b) If_not_found.t -> 'b = <fun>
 *)

(* --- demo calls *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] Return_none;;
(* - : int option = Base.Option.None *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] (Default_to 10);;
(* - : int = 10 *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] Raise;;
(* Exception: (Failure "No matching item found"). *)
flexible_find ~f:(fun x -> x > 10) [1;2;20] Raise;;
(* - : int = 20 *)

Use Case: Capturing the Unknown

We’ve seen unknown types being used e.g. in the case of tuples. The abstract type variables may be universally quantified i.e. applies for all types that are provided.

For some cases, we might wish to have a existentially quantified type variables instead. This means the variables are a particular but unknown type. GADTs can help offer a natural way to do this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(* stringable GADT *)
type stringable =
  Stringable : { value: 'a; to_string: 'a -> string } -> stringable
(* 'a is an arbitrary type here *)

(* printing a stringable  *)
let print_stringable (Stringable s) =
  Stdio.print_endline (s.to_string s.value);;
(* val print_stringable : stringable -> unit = <fun> *)

(* using print_stringable on different types of stringable. *)
let stringables =
  (let s value to_string = Stringable { to_string; value } in
    [ s 100 Int.to_string
    ; s 12.3 Float.to_string
    ; s "foo" Fn.id
    ]);;
(* Generated types:
val stringables : stringable list =
  [Stringable {value = <poly>; to_string = <fun>};
   Stringable {value = <poly>; to_string = <fun>};
   Stringable {value = <poly>; to_string = <fun>}]
*)
List.iter ~f:print_stringable stringables;;

we know that 'a is existentially quantified because it is on the LHS of the arrow but not the RHS \(\implies\) the 'a that appears internally, doesn’t appear in the type param for stringable itself.

this makes 'a (existentially quantified type) bound within the definition of stringable. The type of the underlying object is existentially bound within type stringable, any function that tries to return such a value won’t type-check:

1
2
3
4
5
6
7
8
(* this function is trying to return the underlying value (with abstract type 'a). It's not right because it's supposed to be only kept within the stringable. By returning it, the function will be trying to leak the type out of its scope. *)
let get_value (Stringable s) = s.value;;
(*
Line 1, characters 32-39:
Error: This expression has type $Stringable_'a
       but an expression was expected of type 'a
       The type constructor $Stringable_'a would escape its scope
*)

In the error message, the type variable $Stringable_'a has 3 parts:

  1. the $ marks the variable as existential
  2. the Stringable is the name of the GADT tag this variable came from
  3. the 'a is the name of the type variable from inside that tag.

Use case: Abstracting Computational Machines

GADTs are useful for writing combinators. The combinator pattern:

  1. allow the combination of small components into larger computational machines.
  2. are component-combining functions

Consider a pipeline pattern, a sequence of steps where each consumes the output of the previous step, possibly doing some side-effects and returning a value for the next step. We’re thinking of a custom pipeline, beyond the pipeline operator:

  1. profiling the ability to profile each step of the pipeline and report it

  2. control over execution

    allowing users to pause the execution and restart at a later time

  3. Custom error handling

    pipeline knows where it failed and offers the possibility of restarting

  • Without Using GADT: concrete combinators

    We can define, build and execute pipelines without GADTs by manually adding in functions that define the empty case (just to seed) and how we can build onto and existing pipeline.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    
    (* signature of Pipeline *)
    module type Pipeline = sig
      type ('input,'output) t (* type ('a, 'b) t means input of type 'a output of type 'b *)
    
      (* for adding a step to the pipeline *)
      val ( @> ) : ('a -> 'b) -> ('b,'c) t -> ('a,'c) t
      (* for seeding the pipeline *)
      val empty : ('a,'a) t
    end
    
    module Example_pipeline (Pipeline : Pipeline) = struct
      open Pipeline
      let sum_file_sizes =
        (fun () -> Sys_unix.ls_dir ".")
        @> List.filter ~f:Sys_unix.is_file_exn
        @> List.map ~f:(fun file_name -> (Core_unix.lstat file_name).st_size)
        @> List.sum (module Int) ~f:Int64.to_int_exn
        @> empty
    end;;
    (*module Example_pipeline :
      functor (Pipeline : Pipeline) ->
        sig val sum_file_sizes : (unit, int) Pipeline.t end
        *)
    
    (* ---- basic pipeline implementation:
    
       pipeline can be defined as a simple function and we use the @> to compose. Pipeline execution is just function application. *)
    
    module Basic_pipeline : sig
       include Pipeline
       val exec : ('a,'b) t -> 'a -> 'b
     end= struct
      type ('input, 'output) t = 'input -> 'output
    
      let empty = Fn.id
    
      let ( @> ) f t input =
        t (f input)
    
      let exec t input = t input
    end
    

    The Basic_pipeline doesn’t do any better than the |> operator though.

    To get better functionality, we have 2 choices:

    1. we improve on the pipeline type and give it extra structures, handle exceptions and all the fun stuff

      problem: we have to pre-commit to whatever services we’re supporting and we have to embed them in our pipeline representation

    2. use GADTs to abstractly represent the pipeline we want and then build the functionality we want on top of that

  • Using GADT

    Instead of concretely building a machine for executing a pipeline, we can use GADTs to abstractly represent the pipeline we want, and then build the functionality we want on top of that representation.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    
    type (_, _) pipeline =
      | Step : ('a -> 'b) * ('b, 'c) pipeline -> ('a, 'c) pipeline (* corresponds to the @> operator *)
      | Empty : ('a, 'a) pipeline (* corresponds to the empty pipeline *)
    
    (*=== 1: basic functionality *)
    (* pipeline seed + extend functionality: *)
    let ( @> ) f pipeline = Step (f,pipeline);;
    (* val ( @> ) : ('a -> 'b) -> ('b, 'c) pipeline -> ('a, 'c) pipeline = <fun> *)
    let empty = Empty;;
    (* val empty : ('a, 'a) pipeline = Empty *)
    
    (* pipeline exec functionalitlly -- polymorphic recursion:*)
    let rec exec : type a b. (a, b) pipeline -> a -> b =
     fun pipeline input ->
      match pipeline with
      | Empty -> input
      | Step (f, tail) -> exec tail (f input);;
    (* val exec : ('a, 'b) pipeline -> 'a -> 'b = <fun> *)
    
    
    (*==== 2: extra functionality example: *)
    let exec_with_profile pipeline input =
      let rec loop
          : type a b.
            (a, b) pipeline -> a -> Time_ns.Span.t list -> b * Time_ns.Span.t list
        =
       fun pipeline input rev_profile ->
        match pipeline with
        | Empty -> input, rev_profile
        | Step (f, tail) ->
          let start = Time_ns.now () in
          let output = f input in
          let elapsed = Time_ns.diff (Time_ns.now ()) start in
          loop tail output (elapsed :: rev_profile)
      in
      let output, rev_profile = loop pipeline input [] in
      output, List.rev rev_profile;;
    
    (*
    val exec_with_profile : ('a, 'b) pipeline -> 'a -> 'b * Time_ns.Span.t list =
      <fun>
    *)
    

    When we extend functionality, the typing doesn’t need to change. It’s the logic in our function that uses GADTs that can change to update functionality.

    Benefits of using GADT over having combinators that build a concrete computational machine:

    1. simpler core types that are typically built from GADT tags taht are reflections of the types of the base combinators
    2. more modular design, core types don’t need to contemplate on every possible use you wanna make of them
    3. more efficient code, because the more concrete approach would allocate closures to wrap up the necessary functionality and closures are more heavyweight than GADT tags.

Use Case: Narrowing the Possibilities (set of possible states)

We can narrow the set of possible states for a given data-type in different circumstances. Useful for managing complex application state.

Here’s how logon requests may be managed in a non-GDAT approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
type logon_request =
  { user_name : User_name.t
  ; user_id : User_id.t option
  ; permissions : Permissions.t option
  }


let authorized request =
  match request.user_id, request.permissions with
  | None, _ | _, None ->
    Error "Can't check authorization: data incomplete"
  | Some user_id, Some permissions ->
    Ok (Permissions.check permissions user_id);;
(* val authorized : logon_request -> (bool, string) result = <fun> *)

(* PROBLEMs:
   - this works alright for a limited set of functionality, but is hard to manage when there's other types of complications:
   1. needing to manage more fields (including optional fields)
   2. more operations depending on the optional fields
   3. needing to handle multiple parallel requests @ different states of completion
 *)

Tracking the (application) state at the type level

Instead, we can deal with complexity of cases by tracking the state of a given request at the type level and using that to narrow the set of states a given request can be in which helps us remove extra case-analysis and error-handling and reduce the complexity of the code as well as remove opportunities for mistakes.

At the type level, we could:

  1. mint different types for different states of the request (e.g. incomplete request, mandatory request…)

    this becomes verbose

  2. use GADTs to track the state of the request within a type parameter and use the parameter to narrow the set of available cases without duplicating the type.

  • A completion-sensitive option type

    We can have distinct types that we use as markers of different states. Using the completion states, we can do a completeness-sensitive option type via a GADT where:

    1. one type variable is the type of contents of the option
    2. the second indicates whether it is at an incomplete state
    1
    2
    3
    4
    5
    6
    
    type incomplete = Incomplete
    type complete = Complete
    
    type (_, _) coption =
      | Absent : (_, incomplete) coption
      | Present : 'a -> ('a, _) coption
    

    We didn’t use complete here explicitly, only have used incomplete. This is so that only an incomplete coption can be Absent and only a complete coption can be Present.

    The examples will show the narrowing:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    
    let get ~default o =
       match o with
       | Present x -> x
       | Absent -> default;;
    (* val get : default:'a -> ('a, incomplete) coption -> 'a = <fun> *)
    (*^ here, the incomplete type was inferred here (based on the type def of coption*)
    
    (*
    [1] this wont compile (if we force the annotation as complete instead of incomplete):
       (*
    let get ~default (o : (_,complete) coption) =
      match o with
      | Absent -> default
      | Present x -> x;;
    
    Line 3, characters 7-13:
    Error: This pattern matches values of type ('a, incomplete) coption
           but a pattern was expected which matches values of type
             ('a, complete) coption
           Type incomplete is not compatible with type complete
       ,
       *)
     *)
    
    
    (* [2] we can force it to compile by removing the absent branch from the function *)
    
    let get (o : (_,complete) coption) =
      match o with
      | Present x -> x;;
    (* val get : ('a, complete) coption -> 'a = <fun> *)
    (* or, more simply written as: *)
    let get (Present x : (_,complete) coption) = x;;
    (* let get (Present x : (_,complete) coption) = x;; *)
    

    when the coption is known to be complete, the pattern matching is narrowed to just the Present case.

    By doing the type annotation on the GADT cases, we restrict what that function can be used for and that’s how we narrowed the type down.

  • A completion-sensitive request type

    Similar idea but instead of using option-wrapping we’re looking into request wrapping. The general usage is for error-aware code but using GADTs for various cases.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    
    (* completion-sensitive version of logon_request record: *)
    type 'c logon_request = (*NOTE: the 'c is the type param that marks completeness state. If the outer request is marked as complete, it covers the inner fields' completeness as well.*)
      { user_name : User_name.t
      ; user_id : (User_id.t, 'c) coption
      ; permissions : (Permissions.t, 'c) coption
      }
    
    (* easy to fill in values for the fields (it's using the with-update syntax below:) *)
    let set_user_id request x = { request with user_id = Present x };;
    (* val set_user_id : 'a logon_request -> User_id.t -> 'a logon_request = <fun> *)
    let set_permissions request x = { request with permissions = Present x };;
    (* val set_permissions : 'a logon_request -> Permissions.t -> 'a logon_request = *)
    (*   <fun> *)
    
    
    (* === NOTE: we should have a explicit completeness check, which constructs a version of the record with the completed fields filled in *)
    
    (* --- this is a polymorphic completeness check: includes both complete and incomplete return types *)
    let check_completeness request =
      match request.user_id, request.permissions with
      | Absent, _ | _, Absent -> None
      | (Present _ as user_id), (Present _ as permissions) ->
        Some { request with user_id; permissions };;
    (*
    val check_completeness : incomplete logon_request -> 'a logon_request option =
      <fun>
      *)
    
    (* --- this is a restricted return type (return value explicitly returns a complete request):*)
    let check_completeness request : complete logon_request option =
      match request.user_id, request.permissions with
      | Absent, _ | _, Absent -> None
      | (Present _ as user_id), (Present _ as permissions) ->
        Some { request with user_id; permissions };;
    (*
    val check_completeness :
      incomplete logon_request -> complete logon_request option = <fun>
      *)
    

    And because of that, we can finally write an authorisation checker:

    1
    2
    3
    4
    
    let authorized (request : complete logon_request) =
      let { user_id = Present user_id; permissions = Present permissions; _ } = request in
      Permissions.check permissions user_id;;
    (* val authorized : complete logon_request -> bool = <fun> *)
    

    We realise that most of the time, this kind of type narrowing is NOT worth the complexity of setting it up unless our FSM is sufficiently complex – in which case cutting down the possibilities that our code needs to handle / contemplate can make a big difference to the comprehensibility and correctness of the result.

  • type distinctness and abstraction

    Our completeness types just needed them to have different states marked. We didn’t need to define obviously different types to them. This is because OCaml’s variant types are nominal.

    Variant, Record and Abstract types are nominally typed in OCaml.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    (* older version: *)
    type incomplete = Incomplete (* Incomplete is a tag for the variant incomplete *)
    type complete = Complete
    
    (* this works *)
    type incomplete = Z (* Z is a tag of a variant type *)
    type complete = Z
    
    (* here, Z is the constructor for the variant type *)
    let i = (Z : incomplete) and c = (Z : complete);;
    
    (* we can narrow a pattern match using types as indices: *)
    type ('a, _) coption =
      | Absent : (_, incomplete) coption
      | Present : 'a -> ('a, _) coption
    
    (* since we mark complete case only, we only need to contemplate the Present case *)
    let assume_complete (coption : (_,complete) coption) =
      match coption with
      | Present x -> x;;
    (* val assume_complete : ('a, complete) coption -> 'a = <fun> *)
    

    In some ways this type-specific contemplating makes me remember the way we overload names (with different patterns within the parameters) to handle different cases in different functions.

    Possible Type Distinctness problem:

    the way we expose these types through an interface can cause OCaml to lose track of the distinctness of the types in question. When creating types to act as abstract markers for the type parameter of a GADT, you should choose definitions that make the distinctness of those types clear, and you should expose those definitions in your mli s.

    Distinctness problems that give rise to non-exhaustiveness:

    1. In the first example below, we hide the implementation of the marker type so there’s a distinctness problem.

    2. second case, we see that the exposed types are NOT definitively different (they would have been if we defined them as variants with differently named tags [like originally])

      this is especially so because types appearing differently in an interface (the sig) may turn out to be the same in the implementation (see 3rd attempt)

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    
    (* ===== 1: the definition of complete and incomplete is hidden *)
    module M : sig
      type incomplete (* -- this is what hides the definition*)
      type complete
    end = struct
      type incomplete = Z
      type complete = Z
    end
    include M
    
    type ('a, _) coption =
      | Absent : (_, incomplete) coption
      | Present : 'a -> ('a, _) coption
    
    (*this is no longer exhaustive and it expectes teh Absent case to be matched*)
    let assume_complete (coption : (_,complete) coption) =
      match coption with
      | Present x -> x;;
    (*
    Lines 2-3, characters 5-21:
    
    Warning 8 [partial-match]: this pattern-matching is not exhaustive.
    Here is an example of a case that is not matched:
    Absent
    val assume_complete : ('a, complete) coption -> 'a = <fun>
     *)
    
    (* =============== Attempt 2: exposing the implementation of the marker types ============ *)
    module M : sig
      type incomplete = Z (* -- exposed the definition of the marker types (the variant tag) *)
      type complete = Z
    end = struct
      type incomplete = Z
      type complete = Z
    end
    include M
    
    type ('a, _) coption =
      | Absent : (_, incomplete) coption
      | Present : 'a -> ('a, _) coption
    
    (* Still not exhaustive: *)
    let assume_complete (coption : (_,complete) coption) =
      match coption with
      | Present x -> x;;
    (*
    Lines 2-3, characters 5-21:
    Warning 8 [partial-match]: this pattern-matching is not exhaustive.
    Here is an example of a case that is not matched:
    Absent
    val assume_complete : ('a, complete) coption -> 'a = <fun>
    *)
    
    
    (*===== 3: types appearing as distinct in the interface (the sig) may have the same implementation: *)
    module M : sig (*-- sig is the interface*)
      type incomplete = Z
      type complete = Z
    end = struct
      type incomplete = Z
      type complete = incomplete = Z (* same implementation *)
    end
    
  • narrowing without GADTs (using uninhabitable types instead)

    OCaml can eliminate impossible cases from ordinary variants too, not just for GADTs.

    The idea is to (just like GADTs) demonstrate that the case in question is impossible at the type level. We need to use an uninhabited type (i.e. type with no associated values). This can be created using a variant with no tags. Base offers a Notion.t as a standard uninhabited type.

    PL_DESIGN: This allows us to have refutation cases:

    the period in the final case tells the compiler that we believe this case can never be reached, and OCaml will verify that it’s true. In some simple cases, however, the compiler can automatically add the refutation case for you, so you don’t need to write it out explicitly.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    (* custom uninhabited type *)
    type nothing = | (*nothing is a variant with no tags*)
    
    open Stdio;;
    
    (* here, the Error case contains an uninhabited type (Nothing.t)
     , so we can refute that case in the branches.
     *)
    let print_result (x : (int, Nothing.t) Result.t) =
      match x with
      | Ok x -> printf "%d\n" x
      | Error _ -> .;;
    (* val print_result : (int, Nothing.t) result -> unit = <fun *)
    
    (* actually this function is pretty simple, the compiler can add the refutation case automatically for us *)
    let print_result (x : (int, Nothing.t) Result.t) =
      match x with
      | Ok x -> printf "%d\n" x;;
    (* val print_result : (int, Nothing.t) result -> unit = <fun> *)
    
    • Use-case for uninhabitable types

      When using a highly configurable library that supports multiple different modes of use and we don’t need to use all of them for a given application. (e.g. Async’s RPC lib)

      In this example, we see narrowing being applied to code that isn’t designed with narrowing in mind.

      Async RPCs have a State_rpc flavor of interaction, which is parameterised via 4 types for diff kinds of data:

      1. query: initial client request
      2. state: initial snapshot from server
      3. update: sequence of updates to that snapshot
      4. error: error for stream termination

      Imagine we want to use State_rpc without needing to terminate the stream with a custom error. We could consider 2 options:

      1. so we can instantiate State_rpc using the unit type for the error type. \(\implies\) we’d still need to handle the error case for the dispatching code

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        
           open Core
           open Async
           let rpc =
             Rpc.State_rpc.create
               ~name:"int-map"
               ~version:1
               ~bin_query:[%bin_type_class: unit]
               ~bin_state:[%bin_type_class: int Map.M(String).t]
               ~bin_update:[%bin_type_class: int Map.M(String).t]
               ~bin_error:[%bin_type_class: unit] (*-- using unit type here for error*)
               ()
        
        
           let dispatch conn =
             match%bind Rpc.State_rpc.dispatch rpc conn () >>| ok_exn with
             | Ok (initial_state, updates, _) -> handle_state_changes initial_state updates
             | Error () -> failwith "this is not supposed to happen";;
           (* val dispatch : Rpc.Connection.t -> unit Deferred.t = <fun> *)
        
      2. using an uninhabited type for the error \(\implies\) narrowed down the cases and the use of the error type is just banned.

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        
           let rpc =
             Rpc.State_rpc.create
               ~name:"foo"
               ~version:1
               ~bin_query:[%bin_type_class: unit]
               ~bin_state:[%bin_type_class: int Map.M(String).t]
               ~bin_update:[%bin_type_class: int Map.M(String).t]
               ~bin_error:[%bin_type_class: Nothing.t]  (*--- using Nothing.t (uninhabited type) *)
               ()
        
        
           let dispatch conn =
             match%bind Rpc.State_rpc.dispatch rpc conn () >>| ok_exn with
             | Ok (initial_state, updates, _) -> handle_state_changes initial_state updates;;
           (* val dispatch : Rpc.Connection.t -> unit Deferred.t = <fun> *)
        

Limitations of GADTs

We saw the utility and the complexity. Now it’s time to see some limitations and how to navigate around them.

Limitation: Or-patterns

GADTs don’t work well with or-patterns.

Or-patterns won’t work if we merge cases together if we end up using the type information that is discovered during the pattern match (the concretisation of the locally abstract type).

This is annoying but not a deal-breaker because we can just extract out the logic into a function and call the same function for the different arms that we wanted to merge via or-patterns anyway.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
open Core
module Source_kind = struct
  type _ t =
    | Filename : string t
    | Host_and_port : Host_and_port.t t
    | Raw_data : string t
end

let source_to_sexp (type a) (kind : a Source_kind.t) (source : a) =
  match kind with
  | Filename -> String.sexp_of_t source
  | Host_and_port -> Host_and_port.sexp_of_t source
  | Raw_data -> String.sexp_of_t source;;
(*-- NOTE: type a here is a locally abstract type*)

(* val source_to_sexp : 'a Source_kind.t -> 'a -> Sexp.t = <fun> *)


(* -- this won't work because it relies on the concretisation of 'a (even if we as humans know that a will only ever be string based on the Source_kind) *)
let source_to_sexp (type a) (kind : a Source_kind.t) (source : a) =
  match kind with
  | Filename | Raw_data -> String.sexp_of_t source
  | Host_and_port -> Host_and_port.sexp_of_t source;;
(*
Line 3, characters 47-53:
Error: This expression has type a but an expression was expected of type
         string
*)

(* Exceptional case: when we don't use the concretised form of 'a then we can still use or-patterns. *)
let requires_io (type a) (kind : a Source_kind.t) =
  match kind with
  | Filename | Host_and_port -> true
  | Raw_data -> false;;
(* val requires_io : 'a Source_kind.t -> bool = <fun> *)

TODO Deriving Serializers

*NOTE: I skipped this, my brain got fried. 🧠

Serialisers like [@@derive sexp] won’t work with GADTs.

LIMITATIONS: There’s a dependency between the value of an argument and the type of the returned value, which is something that OCaml’s type system can’t express. Therefore, in the code below, it’s not possible for the

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
type _ number_kind =
  | Int : int number_kind
  | Float : float number_kind
[@@deriving sexp];;

(*
Lines 1-4, characters 1-20:
Error: This expression has type int number_kind
       but an expression was expected of type a__007_ number_kind
       Type int is not compatible with type a__007_
*)

Chapter 10: Functors

We shift our focus back to OCaml’s modules.

Beyond the purpose of organising code into units with specified interfaces, we wish to see how they are useful in building generic code and structuring large-scale systems. Functors are essential for this.

Functors: functions from modules to modules, useful to structure:

  1. Dependency Injection

    • components of a system can become swappable
    • useful for mocking for testing and simulation
  2. Auto extension of modules

    • standardised way of extending modules with new functionality
    • we can write logic once and apply to different types
  3. Instantiating modules with state

    • functors allow automation of the construction of modules that have their own separate and independent mutable state

Functors are powerful for modularising code.

Costs of using Functors:

  1. syntactically heavyweight compared to other language features
  2. tricky issues to understand to use them properly – with sharing constraints and destructive substitution being some of them on that list

for small and simple programs, heavy use of functors is probably a mistake. But as your programs get more complicated and you need more effective modular architectures, functors become a highly valuable tool.

A trivial example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
open Base;;
module type X_int = sig val x : int end;;
(* module type X_int = sig val x : int end *)

(*-- this is the functor
  - X_int constraints the module returned by the functor
  - X_int constraints the argument to the functor
 *)
module Increment (M : X_int) : X_int = struct
  let x = M.x + 1
end;;
(* module Increment : functor (M : X_int) -> X_int *)

Functor syntax is a little extra:

  1. require explicit type annotations (module type annotations), ordinary functions don’t

    • only the type on the input is mandatory
  2. Nice to haves in practice:

    • should constraint the module returned by the function

      If we don’t, then the output type is inferred more specifically instead of being referenced by the named signature (X_int)

      1
      2
      3
      4
      5
      
           (* note the output type: it's written explicitly instead of being referenced by the named signatture *)
           module Increment (M : X_int) = struct
             let x = M.x + 1
           end;;
           (* module Increment : functor (M : X_int) -> sig val x : int end *)
      
    • should also use an mli even if it’s not mandatory

  3. Signature matching is what determines interoperability between functors and other modules. This satisfiability of signatures works similarly to OOP languages (Liskov Substitution Principle – between ‘Parent/child types’).

    • we can apply Increment to any module whose signature satisfies interface X_int. This is similar to how contents of an ml file must satisfy the mli.

      • Substitution loss: in the example below Three_and_more satisfies X_int and therefore Increment can be used but because it has other fields (y), those fields are ignored by Increment. That’s why we see Four not having the field y.
    • module type can omit some information available in the module, either by dropping fields or leaving some fields abstract

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
          (* satisfies X_int => can be used by Increment *)
          module Three = struct let x = 3 end;;
          (* module Three : sig val x : int end *)
          module Four = Increment(Three);;
          (* module Four : sig val x : int end *)
          Four.x - Three.x;;
          (* - : int = 1 *)
    
          (* extending the interface (and it still matches) *)
          module Three_and_more = struct
            let x = 3
            let y = "three"
          end;;
          (* module Three_and_more : sig val x : int val y : string end *)
          module Four = Increment(Three_and_more);;
          (* module Four : sig val x : int end *)
          Four.x - Three_and_more.x;;
          (* - : int = 1 *)
    

A bigger example: computing with intervals

We shall use functors to define a generic interval library that can be used with any type that supports a total ordering on the underlying set.

working with generic intervals usually has the usual operations:

  • emptiness check
  • containment check
  • intersection of intervals…
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
(* ===== 1. define the interface based on what kind of info we need from the endpoints of an interval
 , - it's essentially a comparator, with a type and a compare function.
   - the compare function usese the old school int-output comparison:
      * 0 if equal
      * 1 if first arg > second arg
      * -1 if first arg < second arg

 *)
module type Comparable = sig
  type t
  val compare : t -> t -> int
end;;
(* module type Comparable = sig type t val compare : t -> t -> int end *)


(* ===== 2: define the functor for making an interval *)
module Make_interval(Endpoint : Comparable) = struct

  (** Represent the interval as a variant type -- either Interval or Empty  *)
  type t = | Interval of Endpoint.t * Endpoint.t
           | Empty

  (** [create low high] creates a new interval from [low] to
      [high].  If [low > high], then the interval is empty *)
  let create low high =
    if Endpoint.compare low high > 0 then Empty
    else Interval (low,high)

  (** Returns true iff the interval is empty *)
  let is_empty = function
    | Empty -> true
    | Interval _ -> false

  (** [contains t x] returns true iff [x] is contained in the
      interval [t] *)
  let contains t x =
    match t with
    | Empty -> false
    | Interval (l,h) ->
      Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0

  (** [intersect t1 t2] returns the intersection of the two input
      intervals *)
  let intersect t1 t2 =
    let min x y = if Endpoint.compare x y <= 0 then x else y in
    let max x y = if Endpoint.compare x y >= 0 then x else y in
    match t1,t2 with
    | Empty, _ | _, Empty -> Empty
    | Interval (l1,h1), Interval (l2,h2) ->
      create (max l1 l2) (min h1 h2)
end;;
(*
  module Make_interval :
  functor (Endpoint : Comparable) ->
    sig
      type t = Interval of Endpoint.t * Endpoint.t | Empty
      val create : Endpoint.t -> Endpoint.t -> t
      val is_empty : t -> bool
      val contains : t -> Endpoint.t -> bool
      val intersect : t -> t -> t
    end
 *)

(* ============= 3: applying the functor (anon module example) *)
module Int_interval =
  Make_interval(struct (*NOTE: this functor input is an anonymous Module*)
    type t = int
    let compare = Int.compare
end);;
(*
module Int_interval :
  sig
    type t = Interval of int * int | Empty
    val create : int -> int -> t
    val is_empty : t -> bool
    val contains : t -> int -> bool
    val intersect : t -> t -> t
  end
*)

(* ======== 4: Directly using aligned library modules: works because these modules satisfy an extended interface for Comparable already *)
module Int_interval = Make_interval(Int);;
(*
module Int_interval :
  sig
    type t = Make_interval(Base.Int).t = Interval of int * int | Empty
    val create : int -> int -> t
    val is_empty : t -> bool
    val contains : t -> int -> bool
    val intersect : t -> t -> t
  end
*)
module String_interval = Make_interval(String);;
(*
module String_interval :
  sig
    type t =
      Make_interval(Base.String).t =
        Interval of string * string
      | Empty
    val create : string -> string -> t
    val is_empty : t -> bool
    val contains : t -> string -> bool
    val intersect : t -> t -> t
  end
*)

(* ============== 5: the ease of modifying the extensions thanks to functors: *)
module Rev_int_interval =
  Make_interval(struct
    type t = int
    let compare x y = Int.compare y x (*revered comparison*)
end);;
(* NOTE:
   since this is nominally typed, Rev_int_interval.t is different from Int_interval.t even if the structure is exactly the same.
   This is good because we treat them as having semantically different meaningsa and hence they shouldn't be treated as the same.
 *)
(*
module Rev_int_interval :
  sig
    type t = Interval of int * int | Empty
    val create : int -> int -> t
    val is_empty : t -> bool
    val contains : t -> int -> bool
    val intersect : t -> t -> t
  end
  *)

Notes:

  1. the Functor’s body has implementations of useful functions/primitives to interact with Intervals

  2. in the examples where we directly applied the functor to aligned libaries (Int, String)… there’s some learnings:

    1. the use of standardised interfaces (e.g. comparable) is what makes the codebase consistent and predictable
    2. also helps us use functors more easily.
  3. the functor is NOT abstract and that’s a problem because some of the invariants (like low <= high) is being manually handled via a creator function which can be bypassed.

    The use of create is like an initiator factory that does some dynamic logical checks to see if the input params are right (low <= high).

    However, it’s possible to bypass it:

    1
    2
    3
    4
    5
    6
    7
    8
    
       Int_interval.is_empty (* going through create *)
       (Int_interval.create 4 3);;
       (* - : bool = true *)
    
    
       Int_interval.is_empty (* bypassing create by using the Interval constructor*)
       (Int_interval.Interval (4,3));;
       (* - : bool = false *)
    

Making the functor abstract

We can make the functor abstract by restricting the output of Make_interval with an interface.

In the code below, we get an abstract functor, but the type endpoint is not exposed and so we can’t construct an interval yet.

However, we see the following happen:

  1. we have a reference to the endpoint type within the new interface

  2. the functor returns this interface which is more restrictive than the anon one before.

  3. in the interface, we see the use of an abstract type endpoint which intentionally creates an abstraction barrier: users of this module will not know what type endpoint is concretised to since that type info is NOT exposed via the interface.

    Typically this is a good thing because:

    1. prevents dependencies on implementation details

    2. creates ADTs safely

    3. allows different internal representations later without breaking callers (e.g. by concretising it to a particular type and losing its abstract typing.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
module type Interval_intf = sig
  type t
  type endpoint (*-- this is an abstract type from the POV of consumers of this module, it gives us a way of referring to the endpoint type*)
  val create : endpoint -> endpoint -> t
  val is_empty : t -> bool
  val contains : t -> endpoint -> bool
  val intersect : t -> t -> t
end;;
(*
module type Interval_intf =
  sig
    type t
    type endpoint
    val create : endpoint -> endpoint -> t
    val is_empty : t -> bool
    val contains : t -> endpoint -> bool
    val intersect : t -> t -> t
  end
*)

(* we can re-write our Make_interval functor with this interface *)
module Make_interval(Endpoint : Comparable) : Interval_intf = struct
  type endpoint = Endpoint.t (* we keep a handle to the endpoint type*)
  type t = | Interval of Endpoint.t * Endpoint.t
           | Empty

  (** [create low high] creates a new interval from [low] to
      [high].  If [low > high], then the interval is empty *)
  let create low high =
    if Endpoint.compare low high > 0 then Empty
    else Interval (low,high)

  (** Returns true iff the interval is empty *)
  let is_empty = function
    | Empty -> true
    | Interval _ -> false

  (** [contains t x] returns true iff [x] is contained in the
      interval [t] *)
  let contains t x =
    match t with
    | Empty -> false
    | Interval (l,h) ->
      Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0

  (** [intersect t1 t2] returns the intersection of the two input
      intervals *)
  let intersect t1 t2 =
    let min x y = if Endpoint.compare x y <= 0 then x else y in
    let max x y = if Endpoint.compare x y >= 0 then x else y in
    match t1,t2 with
    | Empty, _ | _, Empty -> Empty
    | Interval (l1,h1), Interval (l2,h2) ->
      create (max l1 l2) (min h1 h2)
end;;
(* module Make_interval : functor (Endpoint : Comparable) -> Interval_intf *)

(* --- since we haven't exposed the type =endpoint=, we can't construct an interval anymore: *)
module Int_interval = Make_interval(Int);;
(*
module Int_interval :
  sig
    type t = Make_interval(Base.Int).t
    type endpoint = Make_interval(Base.Int).endpoint
    val create : endpoint -> endpoint -> t
    val is_empty : t -> bool
    val contains : t -> endpoint -> bool
    val intersect : t -> t -> t
  end
*)
Int_interval.create 3 4;;
(*
Line 1, characters 21-22:
Error: This expression has type int but an expression was expected of type
         Int_interval.endpoint
*)

By “not exposed” we mean that the Int_interval.endpoint is still abstract. If we were to expose it, then we can say that endpoint is equal to Int.t (or generally speaking, Endpoint.t where Endpoint is the argument to the functor). A type is not exposed when it appears abstract in a module’s interface — that is, its name is visible, but its concrete implementation (what it’s equal to) is hidden. This happens because the module’s signature does not include a manifest equality for the type.

In such cases, clients cannot create, deconstruct, or assume equality between that type and another; they can only manipulate it through the operations provided by the module.

sharing constraints

Other than explicitly tying down the type in the interface, one way we can expose the abstract type is by sharing constraints – compiler gets told that a given type equals some other type.

One way is via a sharing constraint that tells the compiler to expose the fact that a given type is equal to some other type. Syntax: <Module_type> with type <type1> = <type1'> and type <type2> = <type2'>

This gives a new signature that’s been modified so that the type1 defined within the module is equal to the type1' whose definition is outside of it.

  • doing the sharing @ the interface

    We can make a specialised version of the Interval_int and get a specialised version for ints by having Int_interval_intf where we share the constraint for the endpoint type:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    
    
    (* the original, abstract interface without the constraint on the type:  *)
    module type Interval_intf = sig
      type t
      type endpoint (*-- this is an abstract type from the POV of consumers of this module, it gives us a way of referring to the endpoint type*)
      val create : endpoint -> endpoint -> t
      val is_empty : t -> bool
      val contains : t -> endpoint -> bool
      val intersect : t -> t -> t
    end;;
    (*
    module type Interval_intf =
      sig
        type t
        type endpoint
        val create : endpoint -> endpoint -> t
        val is_empty : t -> bool
        val contains : t -> endpoint -> bool
        val intersect : t -> t -> t
      end
    *)
    
    (* using a sharing constraint to create a specialized version of Interval_intf: *)
    module type Int_interval_intf =
    Interval_intf with type endpoint = int;;
    (*
    module type Int_interval_intf =
      sig
        type t
        type endpoint = int
        val create : endpoint -> endpoint -> t
        val is_empty : t -> bool
        val contains : t -> endpoint -> bool
        val intersect : t -> t -> t
      end
    *)
    
  • doing the sharing @ the functor

    We can also use the sharing constraints in the context of a functor.

    Common use case: when we want to expose that the following are related:

    • some of the types of the module being generated by the functor

      So this is the type endpoint in the new module (functor output)

    • some of the types of the module being fed to the functor

      And this is the type Endpoint.t from the module Endpoint (which is the functor argument.)

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    
    module Make_interval(Endpoint : Comparable)
      : (Interval_intf with type endpoint = Endpoint.t)
    = struct
    
      type endpoint = Endpoint.t
      type t = | Interval of Endpoint.t * Endpoint.t
               | Empty
    
      (** [create low high] creates a new interval from [low] to
          [high].  If [low > high], then the interval is empty *)
      let create low high =
        if Endpoint.compare low high > 0 then Empty
        else Interval (low,high)
    
      (** Returns true iff the interval is empty *)
      let is_empty = function
        | Empty -> true
        | Interval _ -> false
    
      (** [contains t x] returns true iff [x] is contained in the
          interval [t] *)
      let contains t x =
        match t with
        | Empty -> false
        | Interval (l,h) ->
          Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0
    
      (** [intersect t1 t2] returns the intersection of the two input
          intervals *)
      let intersect t1 t2 =
        let min x y = if Endpoint.compare x y <= 0 then x else y in
        let max x y = if Endpoint.compare x y >= 0 then x else y in
        match t1,t2 with
        | Empty, _ | _, Empty -> Empty
        | Interval (l1,h1), Interval (l2,h2) ->
          create (max l1 l2) (min h1 h2)
    
    end;;
    (*
    module Make_interval :
      functor (Endpoint : Comparable) ->
        sig
          type t
          type endpoint = Endpoint.t
          val create : endpoint -> endpoint -> t
          val is_empty : t -> bool
          val contains : t -> endpoint -> bool
          val intersect : t -> t -> t
        end
        *)
    
    (* ====== now we can do things that require the =endpoint= type to be exposed e.g. constructing intervals: *)
    module Int_interval = Make_interval(Int);;
    (*
    module Int_interval :
      sig
        type t = Make_interval(Base.Int).t
        type endpoint = int
        val create : endpoint -> endpoint -> t
        val is_empty : t -> bool
        val contains : t -> endpoint -> bool
        val intersect : t -> t -> t
      end
      *)
    let i = Int_interval.create 3 4;;
    (* val i : Int_interval.t = <abstr> *)
    Int_interval.contains i 5;;
    (* - : bool = false *)
    

an alternative: destructive substitution

Constraint sharing works but it’s a little ugly.

We can do better using destructive substitution. A bit of misnomer, there’s nothing destructive about destructive substitution, it’s just a way of creating a new signature by transforming an existing one.

  • destructive substitution of the interface signature

    Here, we modify the signature of Interval_intf and replacing endpoint with Endpoint.t everywhere, which deletes the definition of endpoint from the signature.

    Syntax looks similar to the sharing constraint but we use := instead of =.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    
    (* ==== using destructive substitution *)
    module type Int_interval_intf =
    Interval_intf with type endpoint := int;;
    (*
    module type Int_interval_intf =
      sig
        type t
        val create : int -> int -> t
        val is_empty : t -> bool
        val contains : t -> int -> bool
        val intersect : t -> t -> t
      end
    *)
    
    (* using a sharing constraint to create a specialized version of Interval_intf: *)
    module type Int_interval_intf =
    Interval_intf with type endpoint = int;;
    (*
    module type Int_interval_intf =
      sig
        type t
        type endpoint = int
        val create : endpoint -> endpoint -> t
        val is_empty : t -> bool
        val contains : t -> endpoint -> bool
        val intersect : t -> t -> t
      end
    *)
    
  • destructive substitution in the context of the functor

    Similar to constraint sharing, we can do destructive substitution in the context of the functor. A little different from the substitution at the interface level, we have kept the type t in this interface as abstract and have exposed the type of the endpoint. This allows us to create values of type Int_interval.t using the creation function but not using the constructors directly (which would have allowed the violation of the module).

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    
    module Make_interval(Endpoint : Comparable)
      : Interval_intf with type endpoint := Endpoint.t =
    struct
      (* the type t is abstract, and the type of the endpoint is exposed; so we can create values of type Int_interval.t *)
      type t = | Interval of Endpoint.t * Endpoint.t
               | Empty
    
      (** [create low high] creates a new interval from [low] to
          [high].  If [low > high], then the interval is empty *)
      let create low high =
        if Endpoint.compare low high > 0 then Empty
        else Interval (low,high)
    
      (** Returns true iff the interval is empty *)
      let is_empty = function
        | Empty -> true
        | Interval _ -> false
    
      (** [contains t x] returns true iff [x] is contained in the
          interval [t] *)
      let contains t x =
        match t with
        | Empty -> false
        | Interval (l,h) ->
          Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0
    
      (** [intersect t1 t2] returns the intersection of the two input
          intervals *)
      let intersect t1 t2 =
        let min x y = if Endpoint.compare x y <= 0 then x else y in
        let max x y = if Endpoint.compare x y >= 0 then x else y in
        match t1,t2 with
        | Empty, _ | _, Empty -> Empty
        | Interval (l1,h1), Interval (l2,h2) ->
          create (max l1 l2) (min h1 h2)
    
    end;;
    (*
    module Make_interval :
      functor (Endpoint : Comparable) ->
        sig
          type t
          val create : Endpoint.t -> Endpoint.t -> t
          val is_empty : t -> bool
          val contains : t -> Endpoint.t -> bool
          val intersect : t -> t -> t
        end
    *)
    
    
    (* ======= applying them: *)
    module Int_interval = Make_interval(Int);;
    (*
    module Int_interval :
      sig
        type t = Make_interval(Base.Int).t
        val create : int -> int -> t
        val is_empty : t -> bool
        val contains : t -> int -> bool
        val intersect : t -> t -> t
      end
      *)
    Int_interval.is_empty
    (Int_interval.create 3 4);;
    (* - : bool = false *)
    
    (* attempting to bypass the creation function and using the constructor directly will error out *)
    Int_interval.is_empty (Int_interval.Interval (4,3));;
    (*
    Line 1, characters 24-45:
    Error: Unbound constructor Int_interval.Interval
    *)
    

    There’s no need to define the endpoint type alias in the body of the module because the endpoint type is gone from the interface.

Using Multiple Interfaces

We wish to make our interval module serialisable – i.e. read and write intervals as a stream of bytes.

Our intermediate format for this shall be sexps: a parenthesised expression whose atoms are strings, it’s what Base uses as its common serialisation format.

For this case, we can use derivation annotations for types that will generate the converter functions (serialise and de-serialise).

We can’t directly apply the [@@deriving sexp] to type t within the Make_interval functor signature until we ensure that Endpoint.t satisfies Sexpable.S which is a signature that looks like this:

1
2
3
4
5
sig
  type t
  val sexp_of_t : t -> Sexp.t
  val t_of_sexp : Sexp.t -> t
end

We have two choices:

  1. Now, we can modify Make_interval to use the Sexpable.S and the Interval_intf interfaces. For the Sexpable.S interface, we use destructive substitution on the type t.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    
       module type Interval_intf_with_sexp = sig
         include Interval_intf
         include Sexpable.S with type t := t
       end;;
       (*
       module type Interval_intf_with_sexp =
         sig
           type t
           type endpoint
           val create : endpoint -> endpoint -> t
           val is_empty : t -> bool
           val contains : t -> endpoint -> bool
           val intersect : t -> t -> t
           val t_of_sexp : Sexp.t -> t
           val sexp_of_t : t -> Sexp.t
         end
         *)
    
  2. we can define type t within the new module and apply destructive substitutions to all the included interfaces. This is cleaner when combining multiple interfaces because it correctly reflects that all of the signatures are being handled equivalently:

    • Interval_intf

    • Sexpable.S

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    
       module type Interval_intf_with_sexp = sig
         type t
         include Interval_intf with type t := t
         include Sexpable.S with type t := t
       end;;
       (*
       module type Interval_intf_with_sexp =
         sig
           type t
           type endpoint
           val create : endpoint -> endpoint -> t
           val is_empty : t -> bool
           val contains : t -> endpoint -> bool
           val intersect : t -> t -> t
           val t_of_sexp : Sexp.t -> t
           val sexp_of_t : t -> Sexp.t
         end
         *)
    
       (* Now, we can update our functor: *)
       module Make_interval(Endpoint : sig
           type t
           include Comparable with type t := t
           include Sexpable.S with type t := t
         end)
         : (Interval_intf_with_sexp with type endpoint := Endpoint.t)
       = struct
         type t = | Interval of Endpoint.t * Endpoint.t
                  | Empty
         [@@deriving sexp]
    
         (** [create low high] creates a new interval from [low] to
             [high].  If [low > high], then the interval is empty *)
         let create low high =
           if Endpoint.compare low high > 0 then Empty
           else Interval (low,high)
    
         (* NOTE: here, we wrapped the autogenerated [t_of_sexp] to enforce
            the invariants of the data structure -- modification *)
         let t_of_sexp sexp =
           match t_of_sexp sexp with
           | Empty -> Empty
           | Interval (x,y) -> create x y (*-- creator function is what enforces our invariants.*)
    
         (** Returns true iff the interval is empty *)
         let is_empty = function
           | Empty -> true
           | Interval _ -> false
    
         (** [contains t x] returns true iff [x] is contained in the
             interval [t] *)
         let contains t x =
           match t with
           | Empty -> false
           | Interval (l,h) ->
             Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0
    
         (** [intersect t1 t2] returns the intersection of the two input
             intervals *)
         let intersect t1 t2 =
           let min x y = if Endpoint.compare x y <= 0 then x else y in
           let max x y = if Endpoint.compare x y >= 0 then x else y in
           match t1,t2 with
           | Empty, _ | _, Empty -> Empty
           | Interval (l1,h1), Interval (l2,h2) ->
             create (max l1 l2) (min h1 h2)
       end;;
       (*
       module Make_interval :
         functor
           (Endpoint : sig
                         type t
                         val compare : t -> t -> int
                         val t_of_sexp : Sexp.t -> t
                         val sexp_of_t : t -> Sexp.t
                       end)
           ->
           sig
             type t
             val create : Endpoint.t -> Endpoint.t -> t
             val is_empty : t -> bool
             val contains : t -> Endpoint.t -> bool
             val intersect : t -> t -> t
             val t_of_sexp : Sexp.t -> t
             val sexp_of_t : t -> Sexp.t
           end
       *)
    
       (* ==== USAGE:  *)
       module Int_interval = Make_interval(Int);;
       (*
       module Int_interval :
         sig
           type t = Make_interval(Base.Int).t
           val create : int -> int -> t
           val is_empty : t -> bool
           val contains : t -> int -> bool
           val intersect : t -> t -> t
           val t_of_sexp : Sexp.t -> t
           val sexp_of_t : t -> Sexp.t
         end
         *)
       Int_interval.sexp_of_t (Int_interval.create 3 4);;
       (* - : Sexp.t = (Interval 3 4) *)
       Int_interval.sexp_of_t (Int_interval.create 4 3);;
       (* - : Sexp.t = Empty *)
    

Use-case: Extending modules

We can use functors to generate type-specific functionality for a given module in a standardised way.

here’s a somewhat skeletal interface for a functional version of a FIFO queue.

  • problem: it’s quite skeletal, we could have had many useful helper functions that don’t appear in the interface. For a reference to what some common functions on container types look like, we can see the List type and see things like List.for_all and other useful helper functions available to us.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
type 'a t

val empty : 'a t

(** [enqueue q el] adds [el] to the back of [q] *)
val enqueue : 'a t -> 'a -> 'a t

(** [dequeue q] returns None if the [q] is empty, otherwise returns
    the first element of the queue and the remainder of the queue *)
val dequeue : 'a t -> ('a * 'a t) option

(** Folds over the queue, from front to back *)
val fold : 'a t -> init:'acc -> f:('acc -> 'a -> 'acc) -> 'acc

and here’s the implementation.

  • Trick: we maintain an input list and an output list. This allows us to efficiently enqueue on the input list and dequeue from the output list.

    The input list is reversed and becomes the new output list.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
open Base

type 'a t = 'a list * 'a list (* the abstract type is concretised in the implementation to hold two lists: input and output list*)

let empty = ([],[])

let enqueue (in_list, out_list) x =
  (x :: in_list,out_list)

let dequeue (in_list, out_list) =
  match out_list with
  | hd :: tl -> Some (hd, (in_list, tl))
  | [] ->
    match List.rev in_list with (* use the reverse of the input list if the output list is empty:*)
    | [] -> None
    | hd :: tl -> Some (hd, ([], tl))

let fold (in_list, out_list) ~init ~f =
  let after_out = List.fold ~init ~f out_list in
  List.fold_right ~init:after_out ~f:(fun x acc -> f acc x) in_list

Improving by using a Foldable module

We don’t want to keep repeating the implementations for the container types.

We acknowledge that many of the helper functions can be derived from the fold function that we already implemented.

Instead of writing all the helper functions, we can define a functor that can add functionality to any of the containers that have a fold function. Foldable module here automates the process of adding helper functions to a fold-supporting container.

  • it contains a module signature S which defines the signature that is required to support folding.
  • contains a functor Extend that allows one to extend any module matching Foldable.S
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
open Base

(* S is the signature that needs to be satisfied by the Functor argument. *)
module type S = sig
  type 'a t
  val fold : 'a t -> init:'acc -> f:('acc -> 'a -> 'acc) -> 'acc
end

(* Extension is the signature of the output module of the functor *)
module type Extension = sig
  type 'a t
  val iter    : 'a t -> f:('a -> unit) -> unit
  val length  : 'a t -> int
  val count   : 'a t -> f:('a -> bool) -> int
  val for_all : 'a t -> f:('a -> bool) -> bool
  val exists  : 'a t -> f:('a -> bool) -> bool
end

(* For extending a Foldable module *)
module Extend(Arg : S)
  : (Extension with type 'a t := 'a Arg.t) = (*destructive substitution with the type within the functor argument*)
struct
  open Arg

  (*-- here are all the functions derived from the fold function.*)
  let iter t ~f =
    fold t ~init:() ~f:(fun () a -> f a) (* just applies the function as a side-effect*)

  let length t =
    fold t ~init:0  ~f:(fun acc _ -> acc + 1) (* counting accumulation*)

  let count t ~f =
    fold t ~init:0  ~f:(fun count x -> count + if f x then 1 else 0)


  exception Short_circuit

  let for_all c ~f =
    try iter c ~f:(fun x -> if not (f x) then raise Short_circuit); true
    with Short_circuit -> false

  let exists c ~f =
    try iter c ~f:(fun x -> if f x then raise Short_circuit); false
    with Short_circuit -> true
end

Now, the Extend functor (which takes in something that satisfies the signature for Foldable) can be used on Fqueue and we would have extended the functionality for Fqueue.

To apply the functor, we shall put the definition of Fqueue in a submodule called T and then call the functor (Foldable.Extend) on T.

This is the interface for the extended version of Fqueue:

1
2
3
type 'a t
include (module type of Fqueue) with type 'a t := 'a t
include Foldable.Extension with type 'a t := 'a t

This is the application of the functor:

1
2
include Fqueue
include Foldable.Extend(Fqueue)

Functors that Base provides

Similar pattern for extension is provided by Base in the form of the following functors:

  • Container.Make: similar to Foldable.Extend
  • Comparable.Make: comparison function, with support for containers like maps and sets.
  • Hashable.Make: add support for hashing-based datastructures including hash tables, hash sets and hash heaps.
  • Monad.Make: for monadic libraries. Here, functor is used to provide a collection of standard helper functions based on the bind and return operators.