Chapter 11: First-Class Modules

OCaml essentially has 2 parts to the language:

  1. a core language that is concerned with values and types

    • can’t contain modules or module types

      So we can’t define a variable whose value is a module or a function that takes a module as an argument.

  2. a module language that is concerned with modules and module signatures

    • can contain types and values

However, OCaml provides a language construct to circumvent this stratification: first-class modules – can be created from and converted back to regular modules

Letting modules into the core language is powerful, it increases the range of what we can express and makes it easier to build flexible and modular systems.

Working with First-Class Modules

We’re going to use some toy examples to illustrate the following points:

Creating First-Class Modules (packaging)

Creating first-class modules requires us to package a module up with a signature that satisfies it.

Given a module signature and a module definition, we have an example of how this packaging can be done using the syntax (module <Module> : <Module_type>)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(* Signature *)
open Base;;
module type X_int = sig val x : int end;;
(* module type X_int = sig val x : int end *)

(* Definition *)
module Three : X_int = struct let x = 3 end;;
(* module Three : X_int *)
Three.x;;
(* - : int = 3 *)

(* packaging it up at a first-class module *)
let three = (module Three : X_int);;
(* val three : (module X_int) = <module> *)

Inference and Anonymous Modules

  • From the packaging syntax, we can module type if it can be inferred
  • We can also do this wrapping if the module is anonymous
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(* inferrable *)
module Four = struct let x = 4 end;;
(* module Four : sig val x : int end *)
let numbers = [ three; (module Four) ];;
(* val numbers : (module X_int) list = [<module>; <module>] *)


(* anonymous *)
let numbers = [three; (module struct let x = 4 end)];;
(* val numbers : (module X_int) list = [<module>; <module>] *)

Unpacking first-class modules

Since we know how to package a module into a first-class module, we should know how to unpack a module and access its contents using the syntax (val <first_class_module> : <Module_type>)

1
2
3
4
module New_three = (val three : X_int);;
(* module New_three : X_int *)
New_three.x;;
(* - : int = 3 *)

Functions for Manipulating First-Class Modules

Since they are first-class we have some ways to manipulate them (including ordinary functions that consume and create first-class modules):

  1. this function is about conversions between the first-class module (runtime) from the Module language (compile-time).

    If we wish to do module-field projections, then it needs to be in Module language (unpacked) instead of the packed first-class language.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    
       (* consumes a module and returns the int within it *)
       let to_int m =
         let module M = (val m : X_int) in
         M.x;;
       (* val to_int : (module X_int) -> int = <fun> *)
    
       (* consumes 2 packed modules, returns a first-class module *)
       let plus m1 m2 =
         (module struct
            let x = to_int m1 + to_int m2
          end : X_int);;
       (* val plus : (module X_int) -> (module X_int) -> (module X_int) = <fun> *)
    
       let res = plus three (module Four);;
       module Res = (val res : X_int);;
       Res.x;; (* -- this is a field inspection*)
    
       (* this works *)
       module Foo = (val (plus three (module Four: X_int)) :  X_int);;
       Foo.x;; (* works because Foo is a module-identifier so Foo.x is a module-field projection. Projections work only for syntactic modules, not for runtime first-class values.*)
    
       (* this doesn't work:
          (*
          (val (plus three (module Four : X_int)) : X_int)
        *)
        this is not a module, it's a first-class value of a type -- an existential package. We need to unpack the first-class module (runtime) into a module binding (compile-time) then do the field projection.
        *)
    
       (* forcing out an inline example: *)
       let module Foo = (val (plus three (module Four : X_int)) : X_int) in
           Foo.x;;
    
  2. We can pattern match to unpack a first-class module

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
       (* --- unpacking via pattern-match, more concise *)
       let to_int (module M : X_int) = M.x;;
       (* val to_int : (module X_int) -> int = <fun> *)
    
    
       (* OLD way: consumes a module and returns the int within it *)
       let to_int m =
         let module M = (val m : X_int) in
         M.x;;
       (* val to_int : (module X_int) -> int = <fun> *)
    

    So all in all, we have good expressiveness:

    1
    2
    3
    4
    
       let six = plus three three;;
       (* val six : (module X_int) = <module> *)
       to_int (List.fold ~init:six ~f:plus [three;three]);;
       (* - : int = 12 *)
    

Richer First-Class Modules

Let’s go beyond simple int values and let the first-class modules contain types and functions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
module type Bumpable = sig
  type t
  val bump : t -> t
end;;
(* module type Bumpable = sig type t val bump : t -> t end *)

(* multiple instances of this module signature: *)
module Int_bumper = struct
  type t = int
  let bump n = n + 1
end;;
(* module Int_bumper : sig type t = int val bump : t -> t end *)
module Float_bumper = struct
  type t = float
  let bump n = n +. 1.
end;;
(* module Float_bumper : sig type t = float val bump : t -> t end *)

(* we can package these as first-class modules: *)
let int_bumper = (module Int_bumper : Bumpable) and float_bumper = (module Float_bumper : Bumpable);;

Exposing Types

Continuing on, int_bumper is fully abstract and so we can’t exploit the fact that the type in question is int. We can’t really do anything with values of Bumper.t.

we can’t do this:

1
2
3
4
5
6
7
let (module Bumper) = int_bumper in
Bumper.bump 3;;
(*
Line 2, characters 15-16:
Error: This expression has type int but an expression was expected of type
         Bumper.t
*)
  • option 1: using a sharing constraint

    To make int_bumper usable, we need to expose that the type Bumpable.t is equal to int for which we can use constraint sharing

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    
    let int_bumper = (module Int_bumper : Bumpable with type t = int);;
    (* val int_bumper : (module Bumpable with type t = int) = <module> *)
    let float_bumper = (module Float_bumper : Bumpable with type t = float);;
    (* val float_bumper : (module Bumpable with type t = float) = <module> *)
    
    
    (* usage works:*)
    let (module Bumper) = int_bumper in
    Bumper.bump 3;;
    (* - : int = 4 *)
    let (module Bumper) = float_bumper in
    Bumper.bump 3.5;;
    (* - : float = 4.5 *)
    
  • option 2: using a locally abstract type to make polymorphic first-class modules

    We can use first-class modules polymorphically, consider this function that takes in 2 args: Bumpable module and list of elements of the same type as type t of the module.

    The type a (pseudoparameter) allows us to use a as a locally abstract type. a then acts like an abstract type within the context of the function.

    In this example, we then use the locally abstract type as part of a sharing constraint that ties the type B.t with the type of the elements of the list passed in.

    This makes the function polymorphic in both the type of the list element and the type Bumpable.t as we see in the usage section of the code example:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
    let bump_list
          (type a)
          (module Bumper : Bumpable with type t = a)
          (l: a list)
      =
      List.map ~f:Bumper.bump l;;
    
    (*
    val bump_list : (module Bumpable with type t = 'a) -> 'a list -> 'a list =
      <fun>
      *)
    (* === polymorphic usage: *)
    bump_list int_bumper [1;2;3];;
    (* - : int list = [2; 3; 4] *)
    bump_list float_bumper [1.5;2.5;3.5];;
    (* - : float list = [2.5; 3.5; 4.5] *)
    

    Polymorphic first-class modules are important because they allow you to connect the types associated with a first-class module to the types of other values you’re working with.

  • More on locally abstract types

    One of the key properties of locally abstract types is that they’re dealt with as abstract types in the function they’re defined within, but are polymorphic from the outside.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    let wrap_in_list (type a) (x:a) = [x];;
    (* val wrap_in_list : 'a -> 'a list = <fun> *) (*<-- "a" is used ina way that is compatible with it being abstract but hte type of the function that is inferred is polymorphic. *)
    
    (* so compiler complains if we try this:  *)
    let double_int (type a) (x:a) = x + x;;
    (*
    Line 1, characters 33-34:
    Error: This expression has type a but an expression was expected of type int
    *)
    

    Locally abstract types are useful because they let us create a fresh type name that can be referenced inside type definitions:

    • can be used to build local modules
    • can be used to wire types to functors in a type-safe way

    see how we create a new first-class module here:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    module type Comparable = sig
      type t
      val compare : t -> t -> int
    end;;
    (* module type Comparable = sig type t val compare : t -> t -> int end *)
    let create_comparable (type a) compare =
      (module struct
        type t = a (* this equality is internal to the module *)
        let compare = compare
      end : Comparable with type t = a);; (* -- the locally abstract type exposes the type equality: external to the module.*)
    (*
    val create_comparable :
      ('a -> 'a -> int) -> (module Comparable with type t = 'a) = <fun>
      *)
    let int_compare = create_comparable Int.compare;;
    (* - : (module Comparable with type t = int) = <module> *)
    let float_compare = create_comparable Float.compare;;
    (* - : (module Comparable with type t = float) = <module> *)
    let module I_comp = (val int_compare) in (* seems like the module type is inferrable*)
        I_comp.compare 2 3;;
    
    • Disambiguating “abstract” and “polymorphic”

      ConceptMeaningTypical ContextExample
      Abstract typeA type whose concrete representation is hidden or unknown.Module boundaries or locally abstract types (inside their scope).`type t` in a signature; `(type a)` inside `let`.
      Polymorphic typeA function or value that works for any type.`‘a`, `‘b` quantified type parameters.`let id : ‘a -> ‘a = fun x -> x`.
      Locally Abstract TypeBehaves as abstract inside its definition but is polymorphic outside.GADTs, first-class modules, polymorphic recursion.`let f (type a) (x : a expr) = …`

      The difference is in their levels of generality and different mechanisms:

      1. “Abstract” means “unknown (for now)”

        • for an abstract type, the concrete representation is intentionally hidden

        • it doesn’t mean that the type can vary between calls / doesn’t mean that it can be generalised across calls – it just means that we can’t see inside it

        • compiler treats this like a distinct opaque name, unequal to others even if implementation is int or string underneath

          So type t is an abstract (but one fixed type), known only within M

          1
          2
          3
          4
          5
          6
          7
          
               module M : sig
                 type t           (* abstract -- to users of this module, t can't be treated as int *)
                 val create : unit -> t
               end = struct
                 type t = int     (* concrete inside the module *)
                 let create () = 42
               end
          
      2. “Polymorphic” means “general across types”

        “Polymorphism” == “Generality across many possible types”.

        so a polymorphic value or function will work uniformly over any type. Typically we define this using 'a instead of a.

        1
        2
        
           let id (x : 'a) : 'a = x
           (* id : 'a -> 'a *)
        
      3. “locally abstract types” mix the two up:

        “One of the key properties of locally abstract types is that they’re dealt with as abstract types in the function they’re defined within, but are polymorphic from the outside.”

        When we say let f (type a) (x: a t) = ...:

        1. inside the function f, we wish a to be treated as an abstract type – unknown, fixed and opaque (we can’t assume what a is)

          • inside the definition, a acts like an abstract placeholder (“opaque” – we only know “some type”)
        2. from outside the function f, f is polymorphic in a: it can be used with any concrete instantiation of a.

          • so outside the definition, the function is universally quantified over a so we can call it with any concrete type (i.e. it’s polymorphic)

Example: A Query-handling framework

We will have a system that responds to user-generated queries. System uses sexps for formatting queries and responses, and also the config for the query handler.

The signature here is for a module that implements a system for responding to user-generated queries. Other serialisation formats (JSON) could have worked too.

So here’s our Query_handler interface.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
module type Query_handler = sig

  (** Configuration for a query handler *)
  type config

  val sexp_of_config : config -> Sexp.t (*serialises the config*)
  val config_of_sexp : Sexp.t -> config (*deserialised the serialised config*)

  (** The name of the query-handling service *)
  val name : string

  (** The state of the query handler *)
  type t

  (** Creates a new query handler from a config *)
  val create : config -> t

  (** Evaluate a given query, where both input and output are
      s-expressions *)
  val eval : t -> Sexp.t -> Sexp.t Or_error.t
end;;
  • sexp converters are tedious to implement by hand, we can use ppx_sexp_conv to generate the sexp converters based on their type definition.

    We can also use the annotations within a signature to add the appropriate type signature. The purpose of applying the annotation to the interface is that the compiler knows that a module implementation that satisfies the signature has those sexp functions.

    When applied within a type definition (implementation), the compiler will emit code for those functions – so code is generated.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
      (* applying the annotation on a type to derive the sexp converter functions *)
      #require "ppx_jane";;
      type u = { a: int; b: float } [@@deriving sexp];;
      (*
      type u = { a : int; b : float; }
      val u_of_sexp : Sexp.t -> u = <fun>
      val sexp_of_u : u -> Sexp.t = <fun>
      *)
      sexp_of_u {a=3;b=7.};;
      (* - : Sexp.t = ((a 3)(b 7)) *)
      u_of_sexp (Core.Sexp.of_string "((a 43) (b 3.4))");;
      (* - : u = {a = 43; b = 3.4} *)
    
      (* using the annotation directly within the signature (implementation) *)
      module type M = sig type t [@@deriving sexp] end;;
      (*
      module type M =
        sig type t val t_of_sexp : Sexp.t -> t val sexp_of_t : t -> Sexp.t end
        *)
    

    Same annotations can be attached within a signature to add the appropriate type signature:

Implementing a query handler

Given the Query_handler interface, we can create some query handler modules.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
module Unique = struct
  type config = int [@@deriving sexp]
  type t = { mutable next_id: int }

  let name = "unique"
  let create start_at = { next_id = start_at }

  let eval t sexp =
    (* NOTE: we expect the input for the query to be () which is Sexp.unit -- that's what the match expression is doing:*)
    match Or_error.try_with (fun () -> unit_of_sexp sexp) with
    | Error _ as err -> err
    | Ok () ->
      let response = Ok (Int.sexp_of_t t.next_id) in
      t.next_id <- t.next_id + 1;
      response
end;;

let unique = Unique.create 0;;
(* val unique : Unique.t = {Unique.next_id = 0} *)
Unique.eval unique (Sexp.List []);;
(* - : (Sexp.t, Error.t) result = Ok 0 *)
Unique.eval unique (Sexp.List []);;
(* - : (Sexp.t, Error.t) result = Ok 1 *)

Here’s another example of a query handler that does directory listings:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#require "core_unix.sys_unix";;

module List_dir = struct
  type config = string [@@deriving sexp] (* the default directory that relative paths are interpreted within *)
  type t = { cwd: string }

  (** [is_abs p] Returns true if [p] is an absolute path  *)
  let is_abs p =
    String.length p > 0 && Char.(=) p.[0] '/'

  let name = "ls"
  let create cwd = { cwd }

  let eval t sexp =
    match Or_error.try_with (fun () -> string_of_sexp sexp) with
    | Error _ as err -> err
    | Ok dir ->
      let dir =
        if is_abs dir then dir
        else Core.Filename.concat t.cwd dir
      in
      Ok (Array.sexp_of_t String.sexp_of_t (Sys_unix.readdir dir))
end;;


let list_dir = List_dir.create "/var";;
(* val list_dir : List_dir.t = {List_dir.cwd = "/var"} *)
List_dir.eval list_dir (sexp_of_string ".");;
(*
  - : (Sexp.t, Error.t) result =
Ok
 (yp networkd install empty ma mail spool jabberd vm msgs audit root lib db
  at log folders netboot run rpc tmp backups agentx rwho)
  *)

List_dir.eval list_dir (sexp_of_string "yp");;
(* - : (Sexp.t, Error.t) result = Ok (binding) *)

Dispatching to Multiple Query Handlers

We’d like to create a whole dispatch table and dispatch queries.

It’s natural to use data structures like lists (or tables) using first-class modules, which would have been odd to do with modules and functors alone.

For the following example, assume (query-name query) is the shape of a single query, where query-name is the name used to determine which handler to dispatch the query to and query is the body of the query (as a sexp).

  1. First we shall have a signature that combines a Query_handler module with an instantiated form of a query handler
    • this part looks very similar to OOP code with the module type and the this
  2. We can have a function that uses a locally abstract type to construct new instances, as a sort of a higher order function
  3. then we an have a dispatch-table builder
  4. then we can add in a dispatcher function
    • this part looks like OOP code

      One key difference is that first-class modules allow you to package up more than just functions or methods. As we’ve seen, you can also include types and even modules. We’ve only used it in a small way here, but this extra power allows you to build more sophisticated components that involve multiple interdependent types and values.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
(* === 1: signature to combine the Query_handler module with an instantiated form: *)
module type Query_handler_instance = sig
  module Query_handler : Query_handler
  val this : Query_handler.t
end;;
(*
module type Query_handler_instance =
  sig module Query_handler : Query_handler val this : Query_handler.t end
*)

(* === 2: using a locally abstract type, we can have an instance builder HOF *)
let build_instance
      (type a)
      (module Q : Query_handler with type config = a)
      config
  =
  (module struct
    module Query_handler = Q
    let this = Q.create config
  end : Query_handler_instance);;
(*
val build_instance :
  (module Query_handler with type config = 'a) ->
  'a -> (module Query_handler_instance) = <fun>
*)

(* -- this makes the construction of new instances all one-liners:  *)
let unique_instance = build_instance (module Unique) 0;;
(* val unique_instance : (module Query_handler_instance) = <module> *)
let list_dir_instance = build_instance (module List_dir)  "/var";;
(* val list_dir_instance : (module Query_handler_instance) = <module> *)


(* === 3: dispatch table builder: *)
let build_dispatch_table handlers =
  let table = Hashtbl.create (module String) in
  List.iter handlers
    ~f:(fun ((module I : Query_handler_instance) as instance) ->
      Hashtbl.set table ~key:I.Query_handler.name ~data:instance);
  table;;
(*
val build_dispatch_table :
  (module Query_handler_instance) list ->
  (string, (module Query_handler_instance)) Hashtbl.Poly.t = <fun>
  *)

(* === 4: dispatcher function to dispatch a query to a dispatch table *)
let dispatch dispatch_table name_and_query =
  match name_and_query with
  | Sexp.List [Sexp.Atom name; query] ->
    begin match Hashtbl.find dispatch_table name with
    | None ->
      Or_error.error "Could not find matching handler"
        name String.sexp_of_t
    | Some (module I : Query_handler_instance) -> (* NOTE: fn interacts with an instance by unpacking it into a module (I) and then using the query handler instance (I.this) in cert with the associated module (I.Query_handler) *)
      I.Query_handler.eval I.this query
    end
  | _ ->
    Or_error.error_string "malformed query";;
(*
val dispatch :
  (string, (module Query_handler_instance)) Hashtbl.Poly.t ->
  Sexp.t -> Sexp.t Or_error.t = <fun>
  *)

We can turn this into a CLI-interface code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
(* === 1: signature to combine the Query_handler module with an instantiated form: *)
module type Query_handler_instance = sig
  module Query_handler : Query_handler
  val this : Query_handler.t
end;;
(*
module type Query_handler_instance =
  sig module Query_handler : Query_handler val this : Query_handler.t end
*)

(* === 2: using a locally abstract type, we can have an instance builder HOF *)
let build_instance
      (type a)
      (module Q : Query_handler with type config = a)
      config
  =
  (module struct
    module Query_handler = Q
    let this = Q.create config
  end : Query_handler_instance);;
(*
val build_instance :
  (module Query_handler with type config = 'a) ->
  'a -> (module Query_handler_instance) = <fun>
*)

(* -- this makes the construction of new instances all one-liners:  *)
let unique_instance = build_instance (module Unique) 0;;
(* val unique_instance : (module Query_handler_instance) = <module> *)
let list_dir_instance = build_instance (module List_dir)  "/var";;
(* val list_dir_instance : (module Query_handler_instance) = <module> *)


(* === 3: dispatch table builder: *)
let build_dispatch_table handlers =
  let table = Hashtbl.create (module String) in
  List.iter handlers
    ~f:(fun ((module I : Query_handler_instance) as instance) ->
      Hashtbl.set table ~key:I.Query_handler.name ~data:instance);
  table;;
(*
val build_dispatch_table :
  (module Query_handler_instance) list ->
  (string, (module Query_handler_instance)) Hashtbl.Poly.t = <fun>
  *)

(* === 4: dispatcher function to dispatch a query to a dispatch table *)
let dispatch dispatch_table name_and_query =
  match name_and_query with
  | Sexp.List [Sexp.Atom name; query] ->
    begin match Hashtbl.find dispatch_table name with
    | None ->
      Or_error.error "Could not find matching handler"
        name String.sexp_of_t
    | Some (module I : Query_handler_instance) -> (* NOTE: fn interacts with an instance by unpacking it into a module (I) and then using the query handler instance (I.this) in cert with the associated module (I.Query_handler) *)
      I.Query_handler.eval I.this query
    end
  | _ ->
    Or_error.error_string "malformed query";;
(*
val dispatch :
  (string, (module Query_handler_instance)) Hashtbl.Poly.t ->
  Sexp.t -> Sexp.t Or_error.t = <fun>
  *)

open Stdio;;
let rec cli dispatch_table =
  printf ">>> %!";
  let result =
    match In_channel.(input_line stdin) with
    | None -> `Stop
    | Some line ->
      match Or_error.try_with (fun () ->
        Core.Sexp.of_string line)
      with
      | Error e -> `Continue (Error.to_string_hum e)
      | Ok (Sexp.Atom "quit") -> `Stop
      | Ok query ->
        begin match dispatch dispatch_table query with
        | Error e -> `Continue (Error.to_string_hum e)
        | Ok s    -> `Continue (Sexp.to_string_hum s)
        end;
  in
  match result with
  | `Stop -> ()
  | `Continue msg ->
    printf "%s\n%!" msg;
    cli dispatch_table;;
(*
val cli : (string, (module Query_handler_instance)) Hashtbl.Poly.t -> unit =
  <fun>
  *)




let () =
  cli (build_dispatch_table [unique_instance; list_dir_instance]);;

Loading and Unloading Query Handlers

First-class modules give a lot of dynamism and flexibility – e.g. we can make it such that our query handlers can be loaded and unloaded at runtime.

We will define a Loader module:

  1. define a Loader module that controls a set of active query handlers
  2. define a creator function for creating a Loader.t
  3. define functions (load, unload) to manipulate the table of active query handlers
  4. define eval function which determines the query interface presented to the user
    • first we create variant type for the request
    • then we use the sexp converter generated for that type to parse the query from the user
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
(* ==== 1: loader module *)
module Loader = struct
  type config = (module Query_handler) list [@sexp.opaque]
  [@@deriving sexp]

  (** Loader.t has 2 tables: one for known modules and one for active handler instances *)
  type t = { known  : (module Query_handler)          String.Table.t
           ; active : (module Query_handler_instance) String.Table.t
           }

  let name = "loader"

(* ==== 2: function for creating a Loader.t  *)
let create known_list =
    let active = String.Table.create () in (* NOTE: the active table starts off as empty*)
    let known  = String.Table.create () in
    List.iter known_list
      ~f:(fun ((module Q : Query_handler) as q) ->
        Hashtbl.set known ~key:Q.name ~data:q);
    { known; active }

(* ==== 3: functions for manipulating the table of active query handlers.  *)
(* LOAD function: *)
let load t handler_name config =
    if Hashtbl.mem t.active handler_name then
      Or_error.error "Can't re-register an active handler"
        handler_name String.sexp_of_t
    else
      match Hashtbl.find t.known handler_name with
      | None ->
        Or_error.error "Unknown handler" handler_name String.sexp_of_t
      | Some (module Q : Query_handler) ->
        let instance = (* NOTE: instance is the first-class module that we create here*)
          (module struct
             module Query_handler = Q
             let this = Q.create (Q.config_of_sexp config)
           end : Query_handler_instance)
        in
        Hashtbl.set t.active ~key:handler_name ~data:instance;
        Ok Sexp.unit

(* UNLOAD FUNCTION *)
let unload t handler_name =
    if not (Hashtbl.mem t.active handler_name) then
      Or_error.error "Handler not active" handler_name String.sexp_of_t
    else if String.(=) handler_name name then
      Or_error.error_string "It's unwise to unload yourself"
    else (
      Hashtbl.remove t.active handler_name;
      Ok Sexp.unit
    )

(* ==== 4: request variant type and the eval function:  *)
type request =
    | Load of string * Sexp.t
    | Unload of string
    | Known_services
    | Active_services
  [@@deriving sexp]

let eval t sexp =
    match Or_error.try_with (fun () -> request_of_sexp sexp) with
    | Error _ as err -> err
    | Ok resp ->
      match resp with
      | Load (name,config) -> load   t name config
      | Unload name        -> unload t name
      | Known_services ->
        Ok ([%sexp_of: string list] (Hashtbl.keys t.known))
      | Active_services ->
        Ok ([%sexp_of: string list] (Hashtbl.keys t.active))
end
(* ^ ======= Loader module is complete now *)

Combining it with the Cli interface:

  1. create an instance of the loader query handler
  2. add that instance to the loader’s active table
  3. then launch the cli interface, passing it the active table.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
let () =
  let loader = Loader.create [(module Unique); (module List_dir)] in
  let loader_instance =
    (module struct
       module Query_handler = Loader
       let this = loader
     end : Query_handler_instance)
  in
  Hashtbl.set loader.Loader.active
    ~key:Loader.name ~data:loader_instance;
  cli loader.active
  • dynamic linking facilities

    KIV this but OCaml’s dynamic linking facilities allows us to compile and link in new code to a running program.

    WE can automate this using libraries like ocaml_plugin which can be installed via OPAM and takes care of the workflow around setting up dynamic linking.

  • MAGIC: of OCaml

    MAGIC: The Magic of OCaml: Type Safety Meets Dynamic Linking

    Seems like one of the most profound strengths of OCaml’s type system is how it extends safety and correctness all the way to the boundaries between separately compiled modules. This power shines when we consider dynamic linking — the ability to compile new code and load it into a running system.

    Imagine a Mars rover millions of kilometers away that needs a software patch. You can’t afford runtime surprises.

    In a dynamically typed language, replacing a module at runtime carries risk: the new code might not conform to the old module’s expectations until it actually fails in the field.

    But in OCaml, the compiler enforces type integrity at every stage — even across separately compiled units. Each module’s interface carries a precise type signature, and the compiler ensures that any dynamically linked module matches that interface exactly. The runtime (Dynlink) can then safely load the compiled code (.cmxs files) knowing that the functions, values, and data layouts all align perfectly with the system’s expectations.

    This means dynamic linking in OCaml isn’t an act of faith — it’s a provably safe operation. The strong static type system guarantees that what you load at runtime behaves exactly as the rest of the program expects.

    So the “magic” of OCaml lies in this combination:

    • Static type safety gives compile-time guarantees about correctness and consistency.

    • Dynamic linking allows runtime adaptability.

      Together, they enable systems that can evolve and patch themselves — even from millions of kilometers away — with mathematical confidence in their integrity.

Living without First-Class Modules

Most designs that can be done with first-class modules can be simulated without them (though it’s a little awkward to do that).

The idea here is that we hide the true types of the objects in question behind the functions stored in the closure. We’ve implemented our query_handler_instance as just types below. As for Unique query handler into this framework, we just

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
type query_handler_instance =
  { name : string
  ; eval : Sexp.t -> Sexp.t Or_error.t };;
(*
type query_handler_instance = {
  name : string;
  eval : Sexp.t -> Sexp.t Or_error.t;
}
*)
type query_handler = Sexp.t -> query_handler_instance;;
(* type query_handler = Sexp.t -> query_handler_instance *)

(* ==== 2: putting the Unique query handler into this closure-based approach *)
let unique_handler config_sexp =
  let config = Unique.config_of_sexp config_sexp in
  let unique = Unique.create config in
  { name = Unique.name
  ; eval = (fun config -> Unique.eval unique config)
  };;

(* val unique_handler : Sexp.t -> query_handler_instance = <fun> *)

This is a small scale example and so it’s alright to not use first-class functions. When it gets a lot more complex (more functionality to be hidden away behind a set of closures, more complicated the r/s between the different types in question) then the more awkward this gets and it’s better to use first-class modules at that point.

Chapter 12: Objects

What is OOP?

Paradigm that groups computation and data within logical objects. It’s interesting how the OCaml folks set the vocabulary around this:

  • object contains data within fields

  • method functions can be invoked against the data within the object \(\implies\) “sending a message” to the object

    this “sending a message” part is interesting because it aligns with the opinion that OOP is intrinsically imperative, where the object is like an FSM. Sending messages to an object causes it to change state, possibly sending messages to other objects. Naturally, this is assuming the object’s state is allowed to be mutable.

  • class: code definition behind an object

  • constructor: builds the object

5 fundamental properties that makes OOP unique:

  1. Abstraction

    Implementation details hidden within the object, external interface is the set of publicly accessible methods

  2. Dynamic Lookup

    When sending a message to object, method to be executed is dynamically determined by the implementation of the object, not by some static property of the program. Therefore, different objects may react to the same message in different ways.

  3. Subtyping

    If an object a has all the functionality of an object b, then we may use a in any context where b is expected.

  4. Inheritance

    Definition of one object (class) can be used to produce a new kind of object (class). The new definition may override behaviour, share behaviour with parent.

  5. Open recursion

    Methods within an object can invoke other methods within the same object using a special variable (self or this). Objects created from classes use dynamic lookup, allowing a method defined in one class to invoke methods defined in another class that inherits from the first.

OCaml Objects

OCaml Object system is very surprising to OOP-folks:

Class System: Separation of Object and their types

Typically, we’d expect the class name to be the type of objects created when we instantiate it, and the relationships between object types correlate to inheritance.

  1. In OCaml, Classes are used to construct objects and support inheritance. However, Classes are not types. Objects have object types.

    If we want to use objects, we don’t need to use classes at all.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
       (* ==== defining an object *)
       open Base;;
       let s = object
         val mutable v = [0; 2]
    
         method pop =
           match v with
           | hd :: tl ->
             v <- tl;
             Some hd
           | [] -> None
    
         method push hd =
           v <- hd :: v
       end;;
       (* val s : < pop : int option; push : int -> unit > = <obj> *)
    
       (* ==== Usage: *)
       s#pop;;
       (* - : int option = Some 0 *)
       s#push 4;;
       (* - : unit = () *)
       s#pop;;
       (* - : int option = Some 4 *)
    

    Some notes:

    1. object type is what is within the <...> which only contains the types of the methods
    2. fields are typically private, only modifiable from the functions that cause side-effects on their state.
    3. methods can have 0 params.
      • we don’t need to use the unit argument (unlike the equivalent functional version)
      • this is because method call is routed to a concrete object instance. so zero argument methods can be invoked directly by writing obj#method without parens. It’s really because of the dynamic dispatching pattern.
  2. objects may be constructed by functions (without being placed within a class(?)).

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
       let stack init = object
         val mutable v = init
    
         method pop =
           match v with
           | hd :: tl ->
             v <- tl;
             Some hd
           | [] -> None
    
         method push hd =
           v <- hd :: v
       end;;
       (* val stack : 'a list -> < pop : 'a option; push : 'a -> unit > = <fun> *)
       let s = stack [3; 2; 1];;
       (* val s : < pop : int option; push : int -> unit > = <obj> *)
       s#pop;;
       (* - : int option = Some 3 *)
    

    NOTE:

    1. the function stack’s returned object now uses the polymorphic type 'a instead of the previous example where it was bound to int.

Object Polymorphism

Similar to polymorphic variants, we can use methods without a typedef.

This means that the compiler will infer the object types automatically from the methods that are invoked on them. The type system complains if it sees incompatible uses of the same methods.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
(* === 1: automatic inference *)
let area sq = sq#width * sq#width;;
(* val area : < width : int; .. > -> int = <fun> *)
let minimize sq : unit = sq#resize 1;;
(* val minimize : < resize : int -> unit; .. > -> unit = <fun> *)
let limit sq = if (area sq) > 100 then minimize sq;;
(* val limit : < resize : int -> unit; width : int; .. > -> unit = <fun> *)

(*== incompatibility is caught!!:==== *)
(*
  let toggle sq b : unit =
  if b then sq#resize `Fullscreen else minimize sq;;

(*
Line 2, characters 51-53:
Error: This expression has type < resize : [> `Fullscreen ] -> unit; .. >
       but an expression was expected of type < resize : int -> unit; .. >
       Types for method resize are incompatible
*)
*)

(* ==== 2: closing the object type *)
let area_closed (sq: < width : int >) = sq#width * sq#width;;
(* val area_closed : < width : int > -> int = <fun> *)

(* so if we define sq object type again, it's going to error out because it was closed earlier *)
let sq = object
  method width = 30
  method name = "sq"
end;;
(*
val sq : < name : string; width : int > = <obj>
area_closed sq;;
Line 1, characters 13-15:
Error: This expression has type < name : string; width : int >
       but an expression was expected of type < width : int >
       The second object type has no method name
       *)

Notes:

  1. object types may be open or close:
    • open: more methods can be inferred still

      the .. (ellipsis) in the inferred object types indicate that there could be other unspecified methods that the object may have

      this is actually called an ellision \(\implies\) stands of “possibly more methods”

    • closed: no more inferring of other methods and such

      we can see how the object’s definition is extensible from the examples, we also see that we can close the object so that the object type is fixed.

Elisions are Polymorphic (row-polymorphism)

An elided object type is polymorphic. SO if we try to write a type definition using it, then we get an “unbound type variable error” type square = < width : int; ..>;;

This kind of polymorphism is called row polymorphism and it uses row-variables (.. is a row-variable). OCaml’s polymorphic variant types also use row-polymorphism; close relationship between objects and polymorphic variants: “objects are to records what polymorphic variants are to ordinary variants”

So, an object with object type < pop: int option; ..> can be any object with a method pop : int option regardless of how it’s implemented. The dynamic dispatch handles the method invocation when we invoke that method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
let print_pop st = Option.iter ~f:(Stdio.printf "Popped: %d\n") st#pop;;
(* val print_pop : < pop : int option; .. > -> unit = <fun> *)

(* == works on the stack type defined above: *)
print_pop (stack [5;4;3;2;1]);;
(* Popped: 5 *)
(* - : unit = () *)

(* === consider a whole separate implementation for stack: *)
let array_stack l = object
  val stack = Stack.of_list l
  method pop = Stack.pop stack
end;;
(* val array_stack : 'a list -> < pop : 'a option > = <fun> *)

(* the print_pop function works the same: *)
print_pop (stack [5;4;3;2;1]);;
(* Popped: 5 *)
(* - : unit = () *)

Immutable Objects

The view that OOP is intrinsically makes sense. Objects are like FSMs and message-passing is what allows us to changs FSM state.

However, this assumes that the objects are mutable.

We can make them immutable too:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
let imm_stack init = object
  val v = init

  method pop =
    match v with
    | hd :: tl -> Some (hd, {< v = tl >})
    | [] -> None

  method push hd =
    {< v = hd :: v >} (* <== this is special syntax for shallow-copying an object and overriding a field within*)
end;;
(*
val imm_stack :
  'a list -> (< pop : ('a * 'b) option; push : 'a -> 'b > as 'b) = <fun>
  *)

NOTEs:

  • Special syntax:

    uses a special syntax {<...>} to produce a copy of the current object (and the assignment is an override of the copied object). In so doing, we don’t cause any modifications to the existing object.

    Syntax rules:

    1. can only be used within a method body
    2. only values of fields may be updated
    3. method implmentations are fixed @ the time of object-creation, so they can’t be chagned dynamically.

When to use Objects

RULE OF THUMB: The main value of objects is really the ability to use open recursion and inheritance which is more of part of the class system.

What are the alternative toolings at our disposal?

  1. classes and objects

    Pros:

    1. no need type definitions
    2. good support for row-polymorphism
    3. benefits comes from the class system
      • inheritance

      • open recursion

        Open recursion allows interdependent parts of an object to be defined separately. This works because calls between the methods of an object are determined when the object is instantiated, a form of late binding. This makes it possible (and necessary) for one method to refer to other methods in the object without knowing statically how they will be implemented \(\implies\) open recursion

        This also means that some part of the code may have its implementation deferred and it will still work.

    Cons:

    1. syntax-heavy
    2. runtime costs, might as well use records
  2. first-class modules + functors + data types

    pros:

    1. more expressive (can include types)

    2. very explicit

    cons:

    1. is early binded – can only parameterise module code so t hat part of it can be implemented later via a function / functor. This explicitness can be quite verbose.

Subtyping

Central concept in OOP that governs when an object with type A can be used in an expression expecting object of type B.

Concretely, subtyping restricts when e :> t (coercion operator) can be applied. Coercion only works if type of e is a subtype of t

Width Subtyping for coercing objects

Width subtyping means that an object type A is a subtype of B if A has all of the methods of B (and possibly more). A square is a subtype of shape because it implements all the methods of shape, which in the example below, is just area.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
type shape = < area : float >;;
(* type shape = < area : float > *)

type square = < area : float; width : int >;;
(* type square = < area : float; width : int > *)
let square w = object
  method area = Float.of_int (w * w)
  method width = w
end;;
(* val square : int -> < area : float; width : int > = <fun> *)

(* === example of just assignment but NOT coercion (will error out) *)
(square 10 : shape);;
(*Line 1, characters 2-11:
Error: This expression has type < area : float; width : int >
       but an expression was expected of type shape
       The second object type has no method width
       *)

(square 10 :> shape);;
(* - : shape = <obj> *)

Depth Subtyping for coercing objects

Depth subtyping allows us to coerce and object if its individual methods could be safely coerced. So object type < m: t1 > is a subtype of object type < m: t2 > if t1 is a subtype of t2.

Given the shape type as type shape = < area : float >;;,

In the example here, both coin and map have a method called shape which is a subtype of the shape type and so, both of them can be coerced into the object type <shape: shape> (NOTE: the method shape that returns the type shape):

  • within coin: shape : < area : float; radius : int > and the type of this method is a subtype of the shape type.
  • within map: shape : < area : float; width : int > and the type of this method is a subtype of the shape type.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
type circle = < area : float; radius : int >;;
(* type circle = < area : float; radius : int > *)
let circle r = object
  method area = 3.14 *. (Float.of_int r) **. 2.0
  method radius = r
end;;
(* val circle : int -> < area : float; radius : int > = <fun> *)

(* ==== both of the types below have a shape method whose type is a subtype of the shape type. So they can both be coerced.*)
let coin = object
  method shape = circle 5
  method color = "silver"
end;;
(* val coin : < color : string; shape : < area : float; radius : int > > = <obj> *)

let map = object
  method shape = square 10
end;;
(* val map : < shape : < area : float; width : int > > = <obj> *)

(* === example of depth-subtyping coercion.  *)
type item = < shape : shape >;;
(* type item = < shape : shape > *)
let items = [ (coin :> item) ; (map :> item) ];;
(* val items : item list = [<obj>; <obj>] *)
  • Polymorphic Variant Subtyping

    We can coerce a polymorphic variant into a larger polymorphic variant type. A polymorphic variant type A is a subtype of B if the tags of A are subset of the tags of B. This is aligned to the use of structural subtyping that is used for polymorphic variants.

    1
    2
    3
    4
    5
    6
    7
    8
    
    type num = [ `Int of int | `Float of float ];;
    (* type num = [ `Float of float | `Int of int ] *)
    type const = [ num | `String of string ];;
    (* type const = [ `Float of float | `Int of int | `String of string ] *)
    let n : num = `Int 3;;
    (* val n : num = `Int 3 *)
    let c : const = (n :> const);; (*<=== so the subtype here is type num and the supertype is type const. We can type-coerce num into const *)
    (* val c : const = `Int 3 *)
    
    • The compiler allows coercion because:

      • Every value of num can safely fit into const, since every tag of num (Int, Float) also exists in const.
      • const might allow more tags (String), but anywhere a const is required, a value of type num is guaranteed to match at least some of its cases– so it’s a valid, safe substitution.
    • Key intuition:

      1. Subtype: The more specific type (with a smaller set of allowed tags) is the subtype.
        • Here: num allows only Int and Float.
      2. Supertype: The broader type (with a superset of supported tags) is the supertype.
        • Here: const allows Int, Float, and String.

Variance

The classic variance forms in OOP but in the context of object types in OCaml.

Notes based on the code below:

  1. immutability allows us to do the first type coercion from square list to shape list.

    so 'a list is covariant (in 'a)

    If it was mutable (e.g. square array) then we would have been able to store non-square shapes into what should be an array of squares. So, square array is NOT a subtype of shape array.

    so 'a array is invariant (only accepts its own type)

  2. Finally for the contravariant example, consider a function with the type square -> string. We can’t use it with types wider than itself like shape -> string because if we get a circle we wouldn’t know what to do. However, we can use it with types narrower than itself.

    Similar argument applies to why a function with type shape -> string can be safely used with type square -> string

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(* ==== 1: COVARIANCE: a square is a shape, so a square list can be type coerced into a shape list *)
let squares: square list = [ square 10; square 20 ];;
(* val squares : square list = [<obj>; <obj>] *)
let shapes: shape list = (squares :> shape list);; (* == relies on lists being immutable*)
(* val shapes : shape list = [<obj>; <obj>] *)

(* === 2: INVARIANT: 'a array is invariant  == this will error out.*)
let square_array: square array = [| square 10; square 20 |];;
(* val square_array : square array = [|<obj>; <obj>|] *)
let shape_array: shape array = (square_array :> shape array);;
(*
Line 1, characters 32-61:
Error: Type square array is not a subtype of shape array
       The second object type has no method width
*)

(* === 3: CONTRAVARIANT: *)
let shape_to_string: shape -> string =
  fun s -> Printf.sprintf "Shape(%F)" s#area;;
(* val shape_to_string : shape -> string = <fun> *)
let square_to_string: square -> string =
  (shape_to_string :> square -> string);;
(* val square_to_string : square -> string = <fun> *)
  • Variance Annotations

    The compiler can infer the variance of types. In some cases, we wish to explicitly annotate the type.

    One such case is if there’s a signature that hides the type parameters of an immutable type.

    In those cases we can manually annotate the variance to the type’s parameters in the signature:

    • +: for covariance
    • -: for contravariance
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    
    (* === the implementation of the Either type -- so the type may be inferred as covariant *)
    module Either = struct
      type ('a, 'b) t =
        | Left of 'a
        | Right of 'b
      let left x = Left x
      let right x = Right x
    end;;
    (*
    module Either :
      sig
        type ('a, 'b) t = Left of 'a | Right of 'b
        val left : 'a -> ('a, 'b) t
        val right : 'a -> ('b, 'a) t
      end
      *)
    
    (* these usage will allow the type to be inferred as covariant *)
    let left_square = Either.left (square 40);;
    (*
    val left_square : (< area : float; width : int >, 'a) Either.t =
      Either.Left <obj>
      *)
    (left_square :> (shape,_) Either.t);; (* <=== we can type coerce to a supertype.*)
    (* - : (shape, 'a) Either.t = Either.Left <obj> *)
    
    (* === interface that hides the definition for others -- the type can't be known by consumers of this interface and so the variance will be inferred as invariant. *)
    module Abs_either : sig
      type ('a, 'b) t
      val left: 'a -> ('a, 'b) t
      val right: 'b -> ('a, 'b) t
    end = Either;;
    (*
    module Abs_either :
      sig
        type ('a, 'b) t
        val left : 'a -> ('a, 'b) t
        val right : 'b -> ('a, 'b) t
      end
      *)
    (* -- so this will be inferred as invariant *)
    (Abs_either.left (square 40) :> (shape, _) Abs_either.t);;
    (*
    Line 1, characters 2-29:
    Error: This expression cannot be coerced to type (shape, 'b) Abs_either.t;
           it has type (< area : float; width : int >, 'a) Abs_either.t
           but is here used with type (shape, 'b) Abs_either.t
           Type < area : float; width : int > is not compatible with type
             shape = < area : float >
           The second object type has no method width
           *)
    
    
    (* ========== USING VARIANCE ANNOTATIONS to the Type's params in the signature: *)
    module Var_either : sig
      type (+'a, +'b) t
      val left: 'a -> ('a, 'b) t
      val right: 'b -> ('a, 'b) t
    end = Either;;
    (*
    module Var_either :
      sig
        type (+'a, +'b) t
        val left : 'a -> ('a, 'b) t
        val right : 'b -> ('a, 'b) t
      end
      *)
    
    (* -- type coercion works again. *)
    (Var_either.left (square 40) :> (shape, _) Var_either.t);;
    (* - : (shape, 'a) Var_either.t = <abstr> *)
    

    We consider a more concrete variance example.

    1. the problem we see is that the push method makes it such that the square stack and circle stack are not subtypes of shape stack.

      • in shape stack the push method takes an arbitrary shape.

        If we could actually coerce square stack to a shape stack then it means it’s possible to push an arbitrary shape onto the square stack and this would have been regarded as an error.

        In other words, < push: 'a -> unit; .. > is contravariant in 'a so < push: square -> unit; pop: square option> can’t be a subtype of < push: shape -> unit; pop: shape option>

      • the push method makes the stack object type contravariant in 'a

    2. to make it work, we use a more precise type that indicates that we will not be using the push method. That’s how we get the type readonly_stack .

      • shape_stacks is ascribed a composite polymorphic type:

        what the total_area’s parameter type annotation means (let total_area (shape_stacks: shape stack list)):

        • a list of objects of type < pop: shape option >, where readonly_stack is parametrized by shape.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    
    (* === 1. we applied the stack function to some squares and some circles:  *)
    type 'a stack = < pop: 'a option; push: 'a -> unit >;;
    (* type 'a stack = < pop : 'a option; push : 'a -> unit > *)
    let square_stack: square stack = stack [square 30; square 10];;
    (* val square_stack : square stack = <obj> *)
    let circle_stack: circle stack = stack [circle 20; circle 40];;
    (* val circle_stack : circle stack = <obj> *)
    
    (* === 2: attempting to use a area accumulating function*)
    let total_area (shape_stacks: shape stack list) =
      let stack_area acc st =
        let rec loop acc =
          match st#pop with
          | Some s -> loop (acc +. s#area)
          | None -> acc
        in
        loop acc
      in
    List.fold ~init:0.0 ~f:stack_area shape_stacks;;
    (* val total_area : shape stack list -> float = <fun> *)
    (* === problem:
    total_area [(square_stack :> shape stack); (circle_stack :> shape stack)];;
    Line 1, characters 13-42:
    
    Error: Type square stack = < pop : square option; push : square -> unit >
           is not a subtype of
             shape stack = < pop : shape option; push : shape -> unit >
           Type shape = < area : float > is not a subtype of
             square = < area : float; width : int >
           The first object type has no method width
    
    (* === problem: the push function makes it such that the shape stack object type is contravariant in 'a
    *)
    *)
    
    (* ==== 3: precise type to avoid the contravariance *)
    type 'a readonly_stack = < pop : 'a option >;; (* === this is a polymorphic object type *)
    (* type 'a readonly_stack = < pop : 'a option > *)
    let total_area (shape_stacks: shape readonly_stack list) =
      let stack_area acc st =
        let rec loop acc =
          match st#pop with
          | Some s -> loop (acc +. s#area)
          | None -> acc
        in
        loop acc
      in
    List.fold ~init:0.0 ~f:stack_area shape_stacks;;
    (* val total_area : shape readonly_stack list -> float = <fun> *)
    total_area [(square_stack :> shape readonly_stack); (circle_stack :>
    shape readonly_stack)];;
    (* - : float = 7280. *)
    

    It should be pointed out that this typing works, and in the end, the type annotations are fairly minor. In most typed object-oriented languages, these coercions would simply not be possible. For example, in C++, a STL type list<T> is invariant in T, so it is simply not possible to use list<square> where list<shape> is expected (at least safely). The situation is similar in Java, although Java has an escape hatch that allows the program to fall back to dynamic typing. The situation in OCaml is much better: it works, it is statically checked, and the annotations are pretty simple.

Narrowing

Narrowing is NOT allowed in OCaml.

So we can’t downcast a supertype to its subtype because:

  1. technical reason: hard to implement it

  2. design argument: narrowing violates abstraction

    For structurally type-systems like in OCaml, if we allowed narrowing, then we could effectively enumerate the methods in an object (e.g. To check whether an object obj has some method foo : int, one would attempt a coercion (obj :> < foo : int >).).

TODO: there’s an example here that I just skipped. It talks about how to handle some cases where one might wish for type narrowing to be possible and suggests using pattern matching amongst other patterns (and discusses its drawbacks).

Subtyping vs Row Polymorphism

Subtyping and Row-polymorphism are 2 distinct mechanisms and they overlap a lot in the sense that both are mechanisms we can use to write functions that can be applied to objects of different types.

Some notes:

  1. Row polymorphism may be preferred over subtyping because there’s no explicit coercions \(\implies\) perserves more type information

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
       (* ==== row-polymorphism example: *)
       let remove_large l =
       List.filter ~f:(fun s -> Float.(s#area <= 100.)) l;;
       (* val remove_large : (< area : float; .. > as 'a) list -> 'a list = <fun> *)
       (* --- so this method is still preserved *)
       let remove_large l =
       List.filter ~f:(fun s -> Float.(s#area <= 100.)) l;;
       (* val remove_large : (< area : float; .. > as 'a) list -> 'a list = <fun> *)
    
       (* ==== subtyping coercion example: *)
       (*
         Writing a similar function with a closed type and applying it using subtyping does not preserve the methods of the argument: the returned object is only known to have an area method:
         *)
       let remove_large (l: < area : float > list) =
       List.filter ~f:(fun s -> Float.(s#area <= 100.)) l;;
       (* val remove_large : < area : float > list -> < area : float > list = <fun> *)
       remove_large (squares :> < area : float > list );;
       (* - : < area : float > list = [<obj>; <obj>] *)
    
  2. There are some cases where row polymorphism can’t be used.

    1. Can’t be used to place different types of objects in the same container (heterogeneous elements can’t be created using row-polymorphism).

      1
      2
      3
      4
      5
      6
      7
      
            let hlist: < area: float; ..> list = [square 10; circle 30];;
            (*
            Line 1, characters 50-59:
            Error: This expression has type < area : float; radius : int >
                   but an expression was expected of type < area : float; width : int >
                   The second object type has no method radius
            *)
      
    2. Can’t be used to place different types of objects in the same reference:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      
            let shape_ref: < area: float; ..> ref = ref (square 40);;
            (*
            val shape_ref : < area : float; width : int > ref =
              {Base.Ref.contents = <obj>}
              *)
            shape_ref := circle 20;;
            (*
            Line 1, characters 14-23:
            Error: This expression has type < area : float; radius : int >
                   but an expression was expected of type < area : float; width : int >
                   The second object type has no method radius
                   *)
      

    In both cases, subtyping works:

    1
    2
    3
    4
    5
    6
    
       let hlist: shape list = [(square 10 :> shape); (circle 30 :> shape)];;
       (* val hlist : shape list = [<obj>; <obj>] *)
       let shape_ref: shape ref = ref (square 40 :> shape);;
       (* val shape_ref : shape ref = {Base.Ref.contents = <obj>} *)
       shape_ref := (circle 20 :> shape);;
       (* - : unit = () *)
    

Chapter 13: Classes

One of the main goals of OOP is code-reuse via inheritance and that’s why we need to have a good understanding of OCaml classes. A class is a recipe for creating objects, and that recipe can be changed by adding new methods and fields or modifying existing methods.

OCaml Classes

  1. Classes have class-types which describes the class itself and class-types are separate from regular OCaml types.

    Class types should not be confused with object types.

    • Class Type: object ... end

    • object type: e.g. < get : int; .. >

  2. we produce objects from the class using new

  3. While it’s true that classes and class names are NOT types, for convenience, the definition of class istack also defines and object type istack with the same methods as the class. NOTE: the istack object type represents any objects with these methods, NOT just objects created by the istack class.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
(* === 1: defining a class: *)
open Base;;
class istack = object
  val mutable v = [0; 2]

  method pop =
    match v with
    | hd :: tl ->
      v <- tl;
      Some hd
    | [] -> None

  method push hd =
    v <- hd :: v
end;;
(*
class istack :
  object
    val mutable v : int list
    method pop : int option
    method push : int -> unit
  end
  *)

(* ==== 2: producing an object: *)
let s = new istack;;
(* val s : istack = <obj> *) (*<== istack object type created automatically that looks like the istack class definition's signature*)
s#pop;;
(* - : int option = Some 0 *)
s#push 5;;
(* - : unit = () *)
s#pop;;
(* - : int option = Some 5 *)

Class Parameters and Polymorphism

A class definition is the constructor, we may pass in arguments when the object is created with new.

  1. constructor params can be:

    • type params
      • Type params for the class are placed in square brackets before the class name in the class definition.
    • constructor params
  2. we need to provide enough constraints so that the compiler will infer the correct type.

    To achieve this, we can add type constraints to:

    1. parameters
    2. fields
    3. methods

    Usually, we’d just annotate the fields and/or class params and add constraints to the methods only if necessary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
class ['a] stack init = object
  val mutable v : 'a list = init (*<-- type annotation for v helps us constraint the type inference else it would be "too polymorphic"*)

  method pop =
    match v with
    | hd :: tl ->
      v <- tl;
      Some hd
    | [] -> None

  method push hd =
    v <- hd :: v
end;;
(*
class ['a] stack :
  'a list ->
  object
    val mutable v : 'a list
    method pop : 'a option
    method push : 'a -> unit
  end
*)

Object Types as Interfaces

Suppose we wish to traverse the elements on our stack. In the OOP world, we would have defined a class for the iterator object which would have given us a generic mechanism to inspect and traverse the elements of a collection.

Typically languages use 2 mechanisms to define an abstract interface like this:

  1. like in java, an iterator interface:
    1
    2
    3
    4
    5
    6
    
       // Java-style iterator, specified as an interface.
       interface <T> iterator {
         T Get();
         boolean HasValue();
         void Next();
       };
    
  2. in languages without interfaces (like C++), using abstract classes without implementation them:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
       // Abstract class definition in C++.
       template<typename T>
       class Iterator {
        public:
         virtual ~Iterator() {}
         virtual T get() const = 0;
         virtual bool has_value() const = 0;
         virtual void next() = 0;
       };
    

OCaml supports both styles, and gives more flexibility: an object type can be implemented by any object with the appropriate methods. It doesn’t need to be specified by the object’s class a priori.

Object classes used as an interface

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
(* === 1: defining the object type for iterator and the functions it requires*)
type 'a iterator = < get : 'a; has_value : bool; next : unit >;;
(* type 'a iterator = < get : 'a; has_value : bool; next : unit > *)

(* === 2: defining the actual iterator class for lists. This will produce objects that will satisfy the iterator object type that we have defined above. *)
class ['a] list_iterator init = object
  val mutable current : 'a list = init

  method has_value = not (List.is_empty current)

  method get =
    match current with
    | hd :: tl -> hd
    | [] -> raise (Invalid_argument "no value")

  method next =
    match current with
    | hd :: tl -> current <- tl
    | [] -> raise (Invalid_argument "no value")
end;;
(*
class ['a] list_iterator :
  'a list ->
  object
    val mutable current : 'a list
    method get : 'a
    method has_value : bool
    method next : unit
  end
  *)

(* === 3: we add the iterator method to the stack class that produces an interator *)
class ['a] stack init = object
  val mutable v : 'a list = init

  method pop =
    match v with
    | hd :: tl ->
      v <- tl;
      Some hd
    | [] -> None

  method push hd =
    v <- hd :: v

  (** This produces an iterator, for which we construct a list_iterator object that refers to the current contents of the stack. *)
  method iterator : 'a iterator =
    new list_iterator v
end;;
(*
class ['a] stack :
  'a list ->
  object
    val mutable v : 'a list
    method iterator : 'a iterator
    method pop : 'a option
    method push : 'a -> unit
  end
  *)

(* ==== USAGE:  *)
let s = new stack [];;
(* val s : '_weak1 stack = <obj> *)
s#push 5;;
(* - : unit = () *)
s#push 4;;
(* - : unit = () *)
let it = s#iterator;;
(* val it : int iterator = <obj> *)
it#get;;
(* - : int = 4 *)
it#next;;
(* - : unit = () *)
it#get;;
(* - : int = 5 *)
it#next;;
(* - : unit = () *)
it#has_value;;
(* - : bool = false *)

Functional Iterators

Why take the trouble to implement an interator interface when we are using a functional language. lol.

We apply the usual functional patterns:

  1. iter

  2. fold

    • which is polymorphic and has this type: ('b -> 'a -> 'b) -> 'b -> 'b

    • the 'a is already within the class type param, for 'b, we need to use a type quantifier like below:

      SYNTAX AWKWARDNESS: 'b. is the type quantifier that should be read as “for all 'b”. They can only be used DIRECTLY after the method name \(\implies\) method params must be expressed using a fun or function expression.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    
       class ['a] stack init = object
         val mutable v : 'a list = init
    
         method pop =
           match v with
           | hd :: tl ->
             v <- tl;
             Some hd
           | [] -> None
    
         method push hd =
           v <- hd :: v
    
         (* -- type quantifier*)
         method fold : 'b. ('b -> 'a -> 'b) -> 'b -> 'b =
           (fun f init -> List.fold ~f ~init v)
       end;;
       (*
       class ['a] stack :
         'a list ->
         object
           val mutable v : 'a list
           method fold : ('b -> 'a -> 'b) -> 'b -> 'b
           method pop : 'a option
           method push : 'a -> unit
         end
         *)
    

Inheritance

We can create subclasses that inherit from the parent class and:

  1. add new functionality e.g. adding this print method

    • we can directly inherit from the class using inherit keyword
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
       class sstack init = object
         inherit [string] stack init
    
         method print =
           List.iter ~f:Stdio.print_endline v
       end;;
       (*
       class sstack :
         string list ->
         object
           val mutable v : string list
           method pop : string option
           method print : unit
           method push : string -> unit
         end
         *)
    
  2. override existing methods

    • we access the original method using as super statement. This creates a special super object through which we can call superclass methods. It’s not a real object though, just a good mental model.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
       class double_stack init = object
         inherit [int] stack init as super
    
         method push hd =
           super#push (hd * 2)
       end;;
       (*
       class double_stack :
         int list ->
         object
           val mutable v : int list
           method pop : int option
           method push : int -> unit
         end
         *)
    

Class Types

Purpose of a class type is for exposure to other files or modules via a module signature. We may wish to include the classes in module’s signature that is exposed because other modules may inherit from class definitions. To do this we need to specify the class types.

Using class types is not as mainstream in OOP languages. The idea is that a class type specifies the type of each VISIBLE parts of the class, INCLUDING fields and methods. Whatever we omit gets hidden.

Now, we have multiple choices in defining the module type, depending on how much of the implementation we want to expose.

  • one extreme: maximally abstract signature that completely hides the class definitions

    this completely ignores the classes.

  • We may wish to include the classes in module’s signature that is exposed because other modules may inherit from class definitions. To do this we need to specify the class types.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
module Stack = struct
  class ['a] stack init = object
    val mutable v : 'a list = init

    method pop =
      match v with
      | hd :: tl ->
         v <- tl;
         Some hd
      | [] -> None

    method push hd =
      v <- hd :: v
  end

  (** type t defined for the type of our stacks *)
  type 'a t = 'a stack

  let make init = new stack init
end

(*=== defining the module type (the signature we'd put in the interface file): extreme: maximally abstract:  *)
module AbstractStack : sig
   type 'a t = < pop: 'a option; push: 'a -> unit >

   val make : 'a list -> 'a t
end = Stack

(* === including class types, exposing them via the interface. *)
module VisibleStack : sig

  type 'a t = < pop: 'a option; push: 'a -> unit >

  class ['a] stack : object
    val mutable v : 'a list
    method pop : 'a option
    method push : 'a -> unit
  end

  val make : 'a list -> 'a t
end = Stack

Open Recursion

We’ve seen this before. It’s useful because of the dynamic dispatch that we can do. We can use open recursion for cases where using data types or modules would be hard to do.

Open Recursion:

  1. allows object’s methods to invoke other methods on the same object

  2. allows a method in one class to call a method from another class if both classes are inherited by the same object.

    This allows mutually recursive parts of an object to be defined separately.

    Key feature of classes: This ability to define mutually recursive methods. If we did it using data types or modules it would have been much more cumbersome and verbose.

example: recursive traversal of a document structure

IF we had to traverse a document and we only had types defined for a tree with different types of nodes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
type doc =
  | Heading of string
  | Paragraph of text_item list
  | Definition of string list_item list

and text_item =
  | Raw of string
  | Bold of text_item list
  | Enumerate of int list_item list
  | Quote of doc

and 'a list_item =
  { tag: 'a;
    text: text_item list }

It’s not as easy to factor out the common parts of these functions and we likely would have to write recursive code multiple times.

Better way is to define a class and use open recursion. The class just has to define objects that fold over the document data.

Below,

  1. object(self) binds self to the current object which allows the open recursive calls to happen.
  2. we could even inherit from this class and add functionality:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class ['a] folder = object(self)
  method doc acc = function
  | Heading _ -> acc
  | Paragraph text -> List.fold ~f:self#text_item ~init:acc text
  | Definition list -> List.fold ~f:self#list_item ~init:acc list

  method list_item: 'b. 'a -> 'b list_item -> 'a =
    fun acc {tag; text} ->
      List.fold ~f:self#text_item ~init:acc text

  method text_item acc = function
  | Raw _ -> acc
  | Bold text -> List.fold ~f:self#text_item ~init:acc text
  | Enumerate list -> List.fold ~f:self#list_item ~init:acc list
  | Quote doc -> self#doc acc doc
end


class counter = object
  inherit [int] folder as super

  method list_item acc li = acc

  method text_item acc ti =
    let acc = super#text_item acc ti in (* super special obj used to call the [int] folder class's text_item method to fold over the children of text_item*)
    match ti with
    | Bold _ -> acc + 1
    | _ -> acc
end

let count_doc = (new counter)#doc

Private Methods

Are actually more like protected methods since OCaml private methods can be called by subclasses.

Private methods are part of the class type but NOT part of the object type (for objects that are produced from the class). This is what allows subclasses to have access to private methods.

In the example here, we might not want to force subclasses of folder to expose the nitty gritty methods for each of the different doc and text_item cases.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
class ['a] folder2 = object(self)
  method doc acc = function
  | Heading str -> self#heading acc str
  | Paragraph text -> self#paragraph acc text
  | Definition list -> self#definition acc list

  method list_item: 'b. 'a -> 'b list_item -> 'a =
    fun acc {tag; text} ->
      List.fold ~f:self#text_item ~init:acc text

  method text_item acc = function
  | Raw str -> self#raw acc str
  | Bold text -> self#bold acc text
  | Enumerate list -> self#enumerate acc list
  | Quote doc -> self#quote acc doc

  method private heading acc str = acc
  method private paragraph acc text =
    List.fold ~f:self#text_item ~init:acc text
  method private definition acc list =
    List.fold ~f:self#list_item ~init:acc list

  method private raw acc str = acc
  method private bold acc text =
    List.fold ~f:self#text_item ~init:acc text
  method private enumerate acc list =
    List.fold ~f:self#list_item ~init:acc list
  method private quote acc doc = self#doc acc doc
end

(*-- the building of this value shows how the instantiated folder2 object hides private methods*)
let f :
  < doc : int -> doc -> int;
    list_item : 'a . int -> 'a list_item -> int;
    text_item : int -> text_item -> int >  = new folder2



(* ==== we can access private methods from within subclasses because of the access to the class type. *)
class counter_with_private_method = object
  inherit [int] folder2 as super

  method list_item acc li = acc

  method private bold acc txt =
    let acc = super#bold acc txt in
    acc + 1
end

If we wish to have TRULY private classes that no one can access, we just need to omit it from the signature:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class counter_with_sig : object
  method doc : int -> doc -> int
  method list_item : int -> 'b list_item -> int
  method text_item : int -> text_item -> int
end = object
  inherit [int] folder2 as super

  method list_item acc li = acc

  method private bold acc txt =
    let acc = super#bold acc txt in
    acc + 1
end

Binary Methods

A method that takes in an object of self type e.g. for equality. Equality check is somewhat of an extreme case for binary methods because it needs a full info on the objects for comparisons. In many cases, binary functions likely only need partial info (e.g. is_bigger for an arbitrary shape).

  1. we can get the type of the current object using (self: 'self) type annotation
  2. the code below is only allows objects of the same type to be compared and this is a specific problem we need to solve.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class square w = object(self : 'self)
  method width = w
  method area = Float.of_int (self#width * self#width)
  method equals (other : 'self) = other#width = self#width
end;;
(*
class square :
  int ->
  object ('a)
    method area : float
    method equals : 'a -> bool
    method width : int
  end
  *)
class circle r = object(self : 'self)
  method radius = r
  method area = 3.14 *. (Float.of_int self#radius) **. 2.0
  method equals (other : 'self) = other#radius = self#radius
end;;
(*
class circle :
  int ->
  object ('a)
    method area : float
    method equals : 'a -> bool
    method radius : int
  end
   *)

(* usage: *)
(new square 5)#equals (new square 5);;
(* - : bool = true *)
(new circle 10)#equals (new circle 7);;
(* - : bool = false *)

Our objective is to use a common base shape class and be able to compare equality. At the moment a square=/=circle only expects to be compared with itself and not an arbitrary shape. Other languages would typically solve it using type narrowing or dynamic type-checking. OCaml can’t do that.

why using Polymorphic equality won’t work

Approach that won’t work :

instead of using the equality method within the base type shape, we use the polymorphic equality instead.

This has problems:

  1. the builtin polymorphic equality has poor behaviour when used with objects.

    the builtin polymorphic equality only considers two things as equal if they’re physically equal \(\implies\) we’ll get many false negatives

    1
    2
    3
    4
    
       Poly.(=)
         (object method area = 5 end)
         (object method area = 5 end);;
       (* - : bool = false *)
    

using a representation type and a repr method for equality

Approach that works:

In general, we can use the same solution we used for narrowing: using a representation type implemented using variants and implementing comparisons based on the representation type. We need to expose the repr method, but we may hide the type definition for it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

(* ==== exposing the repr method (and hiding the type definition) *)
module Shapes : sig
  type shape_repr
  type shape =
    < repr : shape_repr; equals : shape -> bool; area: float >

  class square : int ->
                 object
                   method width : int
                   method area : float
                   method repr : shape_repr
                   method equals : shape -> bool
                 end
end = struct
  type shape_repr =
    | Square of int
    | Circle of int;;
  (* type shape_repr = Square of int | Circle of int *)
  type shape =
    < repr : shape_repr; equals : shape -> bool; area : float >;;
  (* type shape = < area : float; equals : shape -> bool; repr : shape_repr > *)
  class square w = object(self)
    method width = w
    method area = Float.of_int (self#width * self#width)
    method repr = Square self#width
    method equals (other : shape) =
      match (self#repr, other#repr) with
      | Square x, Square x' -> Int.(=) x x'
      | _ -> false
  end;;
(*
  class square :
  int ->
  object
  method area : float
  method equals : shape -> bool
  method repr : shape_repr
  method width : int
  end
 *)
end

using extensible variants

In the previous section, we had a new problem that we could only add new kinds of shapes if we added new constructors to the shape_repr type (i.e. add new tags to the variant type).

Extensible variants separate the definition of a variant type from the definition of its constructors. So it’s an open type and new variants can always be added.

  1. The compiler can’t do exhaustiveness checks on such open types though.
  2. objects created by these classes are in one-to-one correspondence with members of the representation type so the objects seem somewhat redundant.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
type shape_repr = ..;;
(* type shape_repr = .. *)
type shape =
< repr : shape_repr; equals : shape -> bool; area : float >;;
(*
type shape = < area : float; equals : shape -> bool; repr : shape_repr >
*)

(* == adding new constructors *)
type shape_repr += Square of int;;
(* type shape_repr += Square of int *)
class square w = object(self)
  method width = w
  method area = Float.of_int (self#width * self#width)
  method repr = Square self#width
  method equals (other : shape) =
     match (self#repr, other#repr) with
     | Square x, Square x' -> Int.(=) x x'
     | _ -> false
end;;
(*
class square :
  int ->
  object
    method area : float
    method equals : shape -> bool
    method repr : shape_repr
    method width : int
  end
  *)

partial information for binary methods

Equality check is an extreme and in many cases we just need partial information. The larger method can be used on a square and also on any object of the type shape.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class square w = object(self)
  method width = w
  method area = Float.of_int (self#width * self#width)
  method larger (other : shape) = Float.(self#area > other#area)
end;;
(*
class square :
  int ->
  object
    method area : float
    method larger : shape -> bool
    method width : int
  end
*)

Virtual Classes and Methods

OCaml’s use of the word virtual is different from others like C++ (where virtual == dynamic dispatch). In OCaml, all methods use dynamic dispatch.

A virtual class is like a functor because the “inputs” are declared (but not defined) i.e. the virtual methods and fields. “Applying the functor” happens by subclasses doing inheritance: when virtual methods are given concrete implementations.

So virtual means:

  1. method / field in the virtual class is declared but not implemented. In this case both the method/field AND the class are flagged as virtual and can’t be directly instantiated.

Create Some Simple Shapes

For the example below, we realise that there’s some commonalities between the two classes that we can shift into a superclass:

  • definitions of x and y
  • on_click depends on contains and the implementation of contains differs for different shapes \(\implies\) a good candidate for contains to be a virtual method within a virtual class, thereby allowing the definition of contains to be the subclass’s responsibility.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
open Core
open Async
open Async_graphics

(* the ability to describe how to draw the shape on Async_graphics *)
type drawable = < draw: unit >

class square w x y = object(self)
  val mutable x: int = x
  method x = x

  val mutable y: int = y
  method y = y

  val mutable width = w
  method width = width

  (** Shows how to draw the shape using Async_graphics*)
  method draw = fill_rect x y width width

  method private contains x' y' =
    x <= x' && x' <= x + width &&
      y <= y' && y' <= y + width

  (** for adding event-handlers to shapes*)
  method on_click ?start ?stop f =
    on_click ?start ?stop
      (fun ev ->
         if self#contains ev.mouse_x ev.mouse_y then
           f ev.mouse_x ev.mouse_y)
end

class circle r x y = object(self)
  val mutable x: int = x
  method x = x

  val mutable y: int = y
  method y = y

  val mutable radius = r
  method radius = radius

  method draw = fill_circle x y radius

  method private contains x' y' =
    let dx = x' - x in
    let dy = y' - y in
      dx * dx + dy * dy <= radius * radius

  method on_click ?start ?stop f =
    on_click ?start ?stop
      (fun ev ->
         if self#contains ev.mouse_x ev.mouse_y then
           f ev.mouse_x ev.mouse_y)
end

And this is our virtual class:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class virtual shape x y = object(self)
  (* virtual method -- implementation left to the subclass. *)
  method virtual private contains: int -> int -> bool

  val mutable x: int = x
  method x = x

  val mutable y: int = y
  method y = y

  method on_click ?start ?stop f =
    on_click ?start ?stop
      (fun ev ->
         if self#contains ev.mouse_x ev.mouse_y then
           f ev.mouse_x ev.mouse_y)

  method on_mousedown ?start ?stop f =
    on_mousedown ?start ?stop
      (fun ev ->
         if self#contains ev.mouse_x ev.mouse_y then
           f ev.mouse_x ev.mouse_y)
end

(* === implementing the subclasses: *)
class square w x y = object
  inherit shape x y

  val mutable width = w
  method width = width

  method draw = fill_rect x y width width

  method private contains x' y' =
    x <= x' && x' <= x + width &&
    y <= y' && y' <= y + width
end

class circle r x y = object
  inherit shape x y

  val mutable radius = r
  method radius = radius

  method draw = fill_circle x y radius

  method private contains x' y' =
    let dx = x' - x in
    let dy = y' - y in
      dx * dx + dy * dy <= radius * radius
end

Initializers

Init functions are common in OOP world, we have them in OCaml.

To add in expressions that are executed during instantiation of a class. These expressions can’t refer to the methods of the object because they run before the object’s creation:

  1. place the expressions before the object expression
  2. place the expressions in the initial value of a field

If we wish to make references to methods of the object, then we need to do it differently. We need to use initializers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(* === using expressions before the object creation: *)
class obj x =
  let () = Stdio.printf "Creating obj %d\n" x in
  object
    val field = Stdio.printf "Initializing field\n"; x
end;;
(* class obj : int -> object val field : int end *)
let o = new obj 3;;
(*
Creating obj 3
Initializing field
val o : obj = <obj>
*)


(* === using initializers *)
class growing_circle r x y = object(self)
  inherit circle r x y

  initializer
    (* we are using the inherited on-click to add a handler for click events *)
    self#on_click (fun _x _y -> radius <- radius * 2)
end

Multiple Inheritance

OCaml’s support for multiple inheritance means that we have a variety of ways that classes can be combined, useful especially with the use of virtual classes.

The tricky part is when the inheritance heirarchy is less of a tree and more of a graph and so we should be cautioned against that.

How Names are Resolved

This is about name-clashes.

Inheritance is like *textual inclusion – if there’s more than one definition of a name, the last one wins.

Really the most recent definition for that name is what matters if we have clashes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class square_outline w x y = object
  inherit square w x y
  (* == this draw method is latest defintion -- this is what is used *)
  method draw = draw_rect x y width width
end

class square_outline w x y = object
  method draw = draw_rect x y w w
  (* == the draw method inherited from the superclas (square) is the latest definition, so that is what is used. *)
  inherit square w x y
end

a reiteration of what inheritance means:

  • replace each inherit directive with its definition

  • take the last definition of each method or field.

    Note that the methods and fields added by an inheritance are those listed in its class type, so private methods that are hidden by the type will not be included.

Mixins

Many opinions about multiple inheritance:

  1. that it’s overly complicated
  2. problematic in general and we should use object composition instead

Nevertheless, one general pattern for multiple inheritance that is useful and simple: using the mixin pattern.

Mixins:

  1. just a virtual class implementing a feature based on another one

    If you have a class that implements methods A, and you have a mixin M that provides methods B from A, then you can inherit from M—“mixing” it in—to get features B.

Example is one of mouse-draggability and animation for the shapes we have had:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
(* ==== mixin: allowing the object to be draggable. *)
class virtual draggable = object(self)
  method virtual on_mousedown:
    ?start:unit Deferred.t ->
    ?stop:unit Deferred.t ->
    (int -> int -> unit) -> unit
  val virtual mutable x: int
  val virtual mutable y: int

  val mutable dragging = false
  method dragging = dragging

  initializer
    self#on_mousedown
      (fun mouse_x mouse_y ->
         let offset_x = x - mouse_x in
         let offset_y = y - mouse_y in
         let mouse_up = Ivar.create () in
         let stop = Ivar.read mouse_up in
         dragging <- true;
         on_mouseup ~stop
           (fun _ ->
              Ivar.fill mouse_up ();
              dragging <- false);
         on_mousemove ~stop
           (fun ev ->
              x <- ev.mouse_x + offset_x;
              y <- ev.mouse_y + offset_y))
end

class small_square = object
  inherit square 20 40 40
  inherit draggable
end

(* ==== mixin: animated shapes *)
class virtual animated span = object(self)
  method virtual on_click:
    ?start:unit Deferred.t ->
    ?stop:unit Deferred.t ->
    (int -> int -> unit) -> unit
  val mutable updates: (int -> unit) list = []
  val mutable step = 0
  val mutable running = false

  method running = running

  method animate =
    step <- 0;
    running <- true;
    let stop =
      Clock.after span
      >>| fun () -> running <- false
    in
    Clock.every ~stop (Time.Span.of_sec (1.0 /. 24.0))
      (fun () ->
         step <- step + 1;
         List.iter ~f:(fun f -> f step) updates
      )

  initializer
    self#on_click (fun _x _y -> if not self#running then self#animate)
end
(* --- usage: class will produce circles that move right for a second when clicked. *)
class my_circle = object
  inherit circle 20 50 50
  inherit animated Time.Span.second
  initializer updates <- [fun _ -> x <- x + 5]
end

The initialisers can be added using mixins as well. What’s interesting is because the mixin examples below are only used for side-effects, we can inherit them multiple times to get different animations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class virtual linear x' y' = object
  val virtual mutable updates: (int -> unit) list
  val virtual mutable x: int
  val virtual mutable y: int

  initializer
    let update _ =
      x <- x + x';
      y <- y + y'
    in
    updates <- update :: updates
end

let pi = (Float.atan 1.0) *. 4.0

class virtual harmonic offset x' y' = object
  val virtual mutable updates: (int -> unit) list
  val virtual mutable x: int
  val virtual mutable y: int

  initializer
    let update step =
      let m = Float.sin (offset +. ((Float.of_int step) *. (pi /. 64.))) in
      let x' = Float.to_int (m *. Float.of_int x') in
      let y' = Float.to_int (m *. Float.of_int y') in
      x <- x + x';
      y <- y + y'
    in
    updates <- update :: updates
end


(* ==== inheriting the mixins multiple times *)
class my_square x y = object
  inherit square 40 x y
  inherit draggable
  inherit animated (Time.Span.of_int_sec 5)
  inherit linear 5 0
  inherit harmonic 0.0 7 ~-10
end

let my_circle = object
  inherit circle 30 250 250
  inherit animated (Time.Span.minute)
  inherit harmonic 0.0 10 0
  inherit harmonic (pi /. 2.0) 0 10
end

Displaying the Animated Shapes

This is more of a wrap-up section to the examples.

We finish our shapes module by creating a main function to draw some shapes on the graphical display and running that function using the Async scheduler:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
let main () =
  let shapes = [
     (my_circle :> drawable);
     (new my_square 50 350 :> drawable);
     (new my_square 50 200 :> drawable);
     (new growing_circle 20 70 70 :> drawable);
  ] in
  let repaint () =
    clear_graph ();
    List.iter ~f:(fun s -> s#draw) shapes;
    synchronize ()
  in
    open_graph "";
    auto_synchronize false;
    Clock.every (Time.Span.of_sec (1.0 /. 24.0)) repaint

let () = never_returns (Scheduler.go_main ~main ())

The OCaml builtin graphics functionality is more of a learning tool.

There are better packages / bindings out there:

  1. Lablgtk – based on gtk
  2. LablGL – uses OpenGL
  3. js_of_ocaml – compiles ocaml into js

Chapter 14: Maps and Hash Tables

When storing associated, organised data, OCaml’s association lists come to mind but they’re inefficient because most actions are in \(O(n)\) time.

Here’s a short example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
open Base;;
let digit_alist =
  [ 0, "zero"; 1, "one"; 2, "two"  ; 3, "three"; 4, "four"
  ; 5, "five"; 6, "six"; 7, "seven"; 8, "eight"; 9, "nine" ];;
(*
val digit_alist : (int * string) list =
  [(0, "zero"); (1, "one"); (2, "two"); (3, "three"); (4, "four");
   (5, "five"); (6, "six"); (7, "seven"); (8, "eight"); (9, "nine")]
   *)
List.Assoc.find ~equal:Int.equal digit_alist 6;;
(* - : string option = Some "six" *)
List.Assoc.find ~equal:Int.equal digit_alist 22;;
(* - : string option = None *)
List.Assoc.add ~equal:Int.equal digit_alist 0 "zilch";;
(*
- : (int, string) Base.List.Assoc.t =
[(0, "zilch"); (1, "one"); (2, "two"); (3, "three"); (4, "four");
 (5, "five"); (6, "six"); (7, "seven"); (8, "eight"); (9, "nine")]
 *)

Maps (work in logarithmic time) and Hash Tables (work in constant time) offer better performance.

Maps

When looking at the basic syntax for using Maps, the following things stand out:

  1. The type definition for type t uses Map.t which is supplied 3 types params:

    1. one for the type of the key

    2. one for the type of the data

    3. one for the comparator witness :

      Indicates which comparator function was used to construct the map. It doesn’t say anything about the type of data stored in the map.

      In the example below, String.comparator_witness tells us the default string comparator was used

  2. Map.empty takes in a first-class module as an argument.

    This first-class module provides the comparison function for building the map, along with the sexp converter for generating useful error messages.

    This is also why we don’t need to provide the module again when using functions like Map.find or Map.add.

    There’s some requirements we need to meet for our custom modules to be able to work like this (to be passed into empty map). KIV

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(* ===== interface *)
open Base

(** A collection of string frequency counts *)
type t

(** The empty set of frequency counts *)
val empty : t

(** Bump the frequency count for the given string. *)
val touch : t -> string -> t

(** Converts the set of frequency counts to an association list. Every
    string in the list will show up at most once, and the integers
    will be at least 1. *)
val to_list : t -> (string * int) list

(*  ==== implementation *)
open Base

type t = (string, int, String.comparator_witness) Map.t

let empty = Map.empty (module String)
let to_list t = Map.to_alist t

let touch t s =
  let count =
    match Map.find t s with
    | None -> 0
    | Some x -> x
  in
  Map.set t ~key:s ~data:(count + 1)

Sets

Base gives a set data type also. It’s possible to encode sets in terms of maps but it’s more efficient and natural to use Base’s specialised set type.

We have the typical set operations that one would expect in any programming languages.

1
2
3
4
5
6
7
8
open Base
Set.of_list (module Int) [1;2;3] |> Set.to_list;;
(* - : int list = [1; 2; 3] *)
Set.union
  (Set.of_list (module Int) [1;2;3;2])
  (Set.of_list (module Int) [3;5;1])
|> Set.to_list;;
(* - : int list = [1; 2; 3; 5] *)

Modules and Comparators

This section is about how we can make comparator witnesses work for our custom modules so that we can create maps typed with our custom modules. NOTE: the actual elaboration is done here in this section.

First, creating maps/sets has some useful functions that we should take note:

  1. Creating maps from alists:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
       open Base;;
       let digit_alist =
         [ 0, "zero"; 1, "one"; 2, "two"  ; 3, "three"; 4, "four"
         ; 5, "five"; 6, "six"; 7, "seven"; 8, "eight"; 9, "nine" ];;
       (*
       val digit_alist : (int * string) list =
         [(0, "zero"); (1, "one"); (2, "two"); (3, "three"); (4, "four");
          (5, "five"); (6, "six"); (7, "seven"); (8, "eight"); (9, "nine")]
          *)
    
       let digit_map = Map.of_alist_exn (module Int) digit_alist;; (* -- this will exn on duplicate keys encountered*)
       (* val digit_map : (int, string, Int.comparator_witness) Map.t = <abstr> *)
       Map.find digit_map 3;;
       (* - : string option = Some "three" *)
    

In order for us to make maps and sets from our own custom modules, we need to satisfy the Base.Comparator.S signature. There’s a common idiom for doing this:

  1. for our custom module e.g. Book, we create a submodule, T that contains the basic functionality for OUR custom type (Book)

  2. Include that submodule, T

  3. Also include the result of applying a functor to that module. The functor here may be:

    • Comparator.Make(T)

    • Comparable.Make(T): has some more helper functions e.g. infix comparison operators and min and max functions, in addition to the comparator itself.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
module Book = struct
  module T = struct

    type t = { title: string; isbn: string }

    let compare t1 t2 =
      let cmp_title = String.compare t1.title t2.title in
      if cmp_title <> 0 then cmp_title
      else String.compare t1.isbn t2.isbn

    let sexp_of_t t : Sexp.t =
      List [ Atom t.title; Atom t.isbn ]

  end
  include T (*includes our custom module's implementation*)
  include Comparator.Make(T) (*applies the functor*)
end;;

let some_programming_books =
  Set.of_list (module Book)
    [ { title = "Real World OCaml"
      ; isbn = "978-1449323912" }
    ; { title = "Structure and Interpretation of Computer Programs"
      ; isbn = "978-0262510875" }
    ; { title = "The C Programming Language"
      ; isbn = "978-0131101630" } ];;
(*
val some_programming_books : (Book.t, Book.comparator_witness) Set.t =
  <abstr>
  *)

Why Do We Need Comparator Witnesses?

A comparator witness isn’t about tracking what kind of data we’re using — it’s about tracking how that data is ordered.

Some map and set operations (such as merging, union, or intersection) depend on both operands being ordered according to the same total order. That total order of individual operands is determined by the comparison function used when the data structure was created.

The compiler can’t just rely on the type of the operands here. The problem is that the type 'a alone doesn’t tell us which comparison function was used. e.g. 2 maps might both use keys of type int, yet be ordered by different comparison functions (e.g. ascending vs. descending). If we combined them blindly, we’d break the data structure’s invariants. Instead of having this ambiguity, we can have an explicit outcome defined in the form of a witness – hence “singles out” the ordering function that defines the structure’s total order

That’s why Comparator Witnesses need to be used.

Values that witness a specific comparison function and carry its identity at the type level. This ensures that only maps and sets built with the same comparator witness can be combined, enforcing correctness through the type system.

This is a code example of that:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
(* === init some modules that will be comparable, but use different comparators as the functor *)

(* here's the ordinary version:  *)
let alist = ["foo", 0; "snoo", 3];;
let ord_map = Map.of_alist_exn (module String) alist;;


(* this Reverse module is to act as a parallel to the ordinary version (which is just from an alist) *)
module Reverse = struct
  module T = struct
    type t = string
    let sexp_of_t = String.sexp_of_t
    let t_of_sexp = String.t_of_sexp
    let compare x y = String.compare y x
  end
  include T
  include Comparator.Make(T)
end;;

(* reverse_map: *)
let rev_map = Map.of_alist_exn (module Reverse) alist;;

(* ==== demonstration that although the types of their items are the same (ints), the ordering is different *)
Map.min_elt ord_map;;
(* - : (string * int) option = Some ("foo", 0) *)
Map.min_elt rev_map;;
(* - : (string * int) option = Some ("snoo", 3) *)

(* ==== example intended use for the two maps:  *)
Map.symmetric_diff ord_map rev_map;;
(* -- this will error out because the comparator witnesses are different

Line 1, characters 28-35:
Error: This expression has type
         (string, int, Reverse.comparator_witness) Map.t
       but an expression was expected of type
         (string, int, String.comparator_witness) Map.t
       Type Reverse.comparator_witness is not compatible with type
         String.comparator_witness
         *)

The Polymorphic Comparator

We have the choice of using the polymorphic comparator that is builtin instead of having to implement it all the time for our custom modules. However, we should use it with knowledge of their nuances.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Map.Poly.of_alist_exn digit_alist;;
(* - : (int, string) Map.Poly.t = <abstr> *)

(* --- this will error out : *)
Map.symmetric_diff
  (Map.Poly.singleton 3 "three")
  (Map.singleton (module Int) 3 "four" );;
(*
  Line 3, characters 5-43:
  Error: This expression has type (int, string, Int.comparator_witness) Map.t
  but an expression was expected of type
  (int, string, Map.Poly.comparator_witness) Map.t
  Type Int.comparator_witness is not compatible with type
  Map.Poly.comparator_witness
 *)
  1. maps based on the polymorphic comparator have different comparator witnesses than those based on the type-specific comparison function

    this is expected because there’s NO guarantee that the comparator associated with a given type will order things in the same way that polymorphic compare does.

  • the perils of polymorphic compare

    RULE OF THUMB: polymorphic compare should be AVOIDED in production code.

    Problems:

    1. compares directly on the runtime representation of OCaml values

      So, it will walk the structure of those values regardless of their type \(\implies\) it’s type-oblivious

      This means that it peeks under the ordinary abstraction boundaries which can lead to counter-intuitive outcomes:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      
         (* === example: m1 and m2 should technically be seen as the SAME map *)
         let m1 = Map.of_alist_exn (module Int) [1, "one";2, "two"];;
         (* val m1 : (int, string, Int.comparator_witness) Map.t = <abstr> *)
         let m2 = Map.of_alist_exn (module Int) [2, "two";1, "one"];;
         (* val m2 : (int, string, Int.comparator_witness) Map.t = <abstr> *)
      
         (* ==== using a more idiomatic comparison (Map.equal) shows this equality *)
         Map.equal String.equal m1 m2;;
         (* - : bool = true *)
      
         (* ==== PROBLEM: polymorphic compare walks the datastructures which will have a different underlying layout of the underlying trees so Polymorphic compare will say they are not equal *)
         Poly.(m1 = m2);;
         (* --- note this errors out because maps store the comparison function they were created with and poly compare won't work on functions *)
         (* Exception: Invalid_argument "compare: functional value". *)
      
         (* ==== Polymorphic compare on the binary tree  *)
         (* --- Map.Using_comparator.to_tree exposes the binary tree. *)
         Poly.((Map.Using_comparator.to_tree m1) = (Map.Using_comparator.to_tree m2));;
         (* - : bool = false *)
      
    2. polymorphic compare won’t work on functions (naturally). So when it does the datastructure-walk, if there are function stored then a (runtime) exception will be raised. This is clear when we directly poly-compare maps (example above).

    1. The abstraction breaking can result in bugs that are subtle e.g. a map where keys are sets? then the map built with the polymorphic comparator behaves incorrectly and inconsistently and separates out the keys that should be aggregated together.

      The inconsistency would be from differences in the order in which the sets were built.

Satisfying Comparator.S with [@@deriving]

Using maps and sets on our custom type means that we need to satisfy Comparator.S:

  1. need sexp converters for the type
  2. need comparison functions for the type

We can rely on derivation annotations (Base provides this) instead of manually having to write this all the time (naturally we can always write them ourselves if we want some custom logic to be used).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(* ==== CASE 1: no derivation ppx used at all *)
module Book = struct
  module T = struct

    type t = { title: string; isbn: string }

    let compare t1 t2 =
      let cmp_title = String.compare t1.title t2.title in
      if cmp_title <> 0 then cmp_title
      else String.compare t1.isbn t2.isbn

    let sexp_of_t t : Sexp.t =
      List [ Atom t.title; Atom t.isbn ]

  end
  include T
  include Comparator.Make(T)
end

(* ==== CASE 2: using ppx_sexp_conv and ppx_compare  *)
#require "ppx_jane";;
module Book = struct
  module T = struct
    type t = { title: string; isbn: string }
    [@@deriving compare, sexp_of] (* -- these are the annotations *)
  end
  include T
  include Comparator.Make(T)
end;;
  • =, == and phys_equal

    There’s different notions of equality:

    1. physically equal: if same pointer in memory

      so 2 data structures with identical content but structured separately would have different pointers in memory and hence physically unequal.

    2. polymorphic equal: it’s a structural equality

      the 2 data structures would be equal structurally – values having the same contents

      This kind of equality is the same polymorphic equals point we were talking about earlier in the previous section.

    Not only does OCaml have multiple notions of equality (and picking between them can be tricky), the opened default library may default to a different kind of equality.

    1. Core:

      • physical equality: ==
      • polymorphic equality: =
    2. Base:

      • physical equality: can be done using phys_equal
      • polymorphic equality: hidden – == is deprecated
      • = is type-specific equality (operator overloading based on what type is used)
      1
      2
      
         ref 1 == ref 1;; (*-- this is deprecated*)
         phys_equal (ref 1) (ref 1) ;;
      

Applying [@@deriving] to Maps and Sets (special syntax)

Previously we saw how we could use [@@deriving] annotation and ready our custom types to be used with maps/sets.

  • won’t work:

    If we try to do that with maps directly, this won’t work. There’s no equivalent Map.t_of_sexp for maps. this is because there’s no proper way to serialise/deserialise the comparator witness using sexps.

  • will work: using Map.M map factory functor to define the type, which uses two-level functorization to encode key types and comparator witness.

    This is the map factory functor that fixes the key type and comparator witness once and for all. At point in time when the functor is applied, the key and comparator witness have already been closed over. This makes Map.M(Key).t equivalent to ('key, 'value, Key.comparator_witness), but written in a form that behaves like a partially applied functor over the key type.

    The signature for this functor looks something like this:

    1
    2
    3
    4
    5
    6
    
      module Map.M(String) :
        sig
          type 'v t (* parameterised value type, the key type is already captured *)
          val t_of_sexp : (Sexp.t -> 'v) -> Sexp.t -> 'v t
          ...
        end
    

Summary:

  • Map.t = fully generic map type (key, value, comparator).

  • Map.M(Key).t = specialized map type where the key and comparator modules are fixed, leaving only the value type parameter.

RULE OF THUMB: just use this functor (Map.M(Key).t)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(* === these ways won't work: *)
type string_int_map =
  (string,int,String.comparator_witness) Map.t [@@deriving sexp];;
(*
Line 2, characters 44-49:
Error: Unbound value Map.t_of_sexp
Hint: Did you mean m__t_of_sexp?
*)

(* === this way will work because we use the Map.M functor: *)
type string_int_map = int Map.M(String).t [@@deriving sexp];;
(*
type string_int_map = int Base.Map.M(Base.String).t
val string_int_map_of_sexp : Sexp.t -> string_int_map = <fun>
val sexp_of_string_int_map : string_int_map -> Sexp.t = <fun>
*)


(* === demo: the type signature looks different but the meaning is same *)
let m = Map.singleton (module String) "one" 1;; (* -- creates map*)
(* val m : (string, int, String.comparator_witness) Map.t = <abstr> *)
(m : int Map.M(String).t);;
(* - : int Base.Map.M(Base.String).t = <abstr> *)

Trees – for space efficiency

Maps carry within them the comparator they were created with, but we could wish for space efficiency and just keep hold of ONLY the tree.

The comparator can still be typed via a phantom type, even if it physically wouldn’t include the comparator anymore. The only thing is that we’d need to explicitly provide the comparator when we use this tree-form of the map.

Phantom Type:

  • we had seen this earlier in our little language example where we encoded extra compile-time information without affecting runtime representation. Now we benefit from its use elsewhere.

  • the type doesn’t correspond to any value directly represented in the underlying physical structure of the value.

    i.e. it doesn’t show up in the definition of the type.

  • the type reflects something about the logic of the value of the question

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    
      let ord_tree = Map.Using_comparator.to_tree ord_map;;
      (* --- see how the phantom type shows up here:
    
      val ord_tree :
        (string, int, String.comparator_witness) Map.Using_comparator.Tree.t =
        <abstr>
        *)
    
    
      (* ==== example of how to use the map with only its tree available. *)
      Map.Using_comparator.Tree.find ~comparator:String.comparator ord_tree "snoo";;
      (* - : int option = Some 3 *)
    
      (* -- the phantom type helps us enforce the invariant: we're using the same comparator when looking pu a value as when we stored it. So this example will fail: *)
      Map.Using_comparator.Tree.find ~comparator:Reverse.comparator ord_tree "snoo";;
      (*
      Line 1, characters 63-71:
      Error: This expression has type
               (string, int, String.comparator_witness) Map.Using_comparator.Tree.t
             but an expression was expected of type
               (string, int, Reverse.comparator_witness)
               Map.Using_comparator.Tree.t
             Type String.comparator_witness is not compatible with type
               Reverse.comparator_witness
      *)
    

Hash Tables

The imperative cousins of maps, they differ in the following ways:

  1. they are mutable: adding a k-v pair modifies the existing table

  2. have better time complexity than maps

    it’s constant time lookup and modifications vs log time for maps

  3. hash tables depend on hash function (key to integer), maps depend on comparator function

  4. when creating hash tables, we don’t need any type witnesses (like comparator witness needed for maps/sets) \(\implies\) the syntax is easier

    REASON: No Keeping the same hash function / comparator function preserves no special invariant in the case of Hash Tables

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(* === custom types just need to satisfy the signature for hashtable key: Base.Hashtbl.Key.S *)
#show Base.Hashtbl.Key.S;;
(* module type S = Base__Hashtbl_intf.Key.S *)



(* === we can use derive annotation for the hash function, makes our custom type hashable. *)
module Book = struct
  type t = { title: string; isbn: string }
  [@@deriving compare, sexp_of, hash]
end;;
(*
module Book :
  sig
    type t = { title : string; isbn : string; }
    val compare : t -> t -> int
    val sexp_of_t : t -> Sexp.t
    val hash_fold_t :
      Base_internalhash_types.state -> t -> Base_internalhash_types.state
    val hash : t -> int
  end
  *)
let table = Hashtbl.create (module Book);;
(* val table : (Book.t, '_weak2) Base.Hashtbl.t = <abstr> *)

We could also use OCaml’s polymorphic hash function. RULE OF THUMB: don’t use it in areas where correctness matters.

Time Complexity of Hash Tables

There needs to be some clarification on the time complexity aspects described:

  • it’s amortized time complexity.

    The “constant time lookup and update” is more of amortised costs actually. Like other languages, OCaml’s hash table does need resizing so some operations may get expensive.

  • the cost of the hash function matters

  • collision, for Base leads to the use of binary trees for the hash-bucket \(\implies\) if your hash function is sub-par and has many collisions then hash table use ends up being logarithmic time complexity

    NOTE: this is useful to protect against DDOS attacks!

    The logarithmic behavior of Base’s hash tables in the presence of hash collisions also helps protect against some denial-of-service attacks. One well-known type of attack is to send queries to a service with carefully chosen keys to cause many collisions. This, in combination with the linear behavior of most hashtables, can cause the service to become unresponsive due to high CPU load. Base’s hash tables would be much less susceptible to such an attack because the amount of degradation would be far less.

Collisions with the Polymorphic Hash Function

Like Polymorphic Compare, Polymorphic Hash doesn’t pay attention to the type and blindly walks down the structure of a data type, computing the hash from what it sees.

Problems:

  1. So as long as physical equality is not there, even with equivalent data, it will deem hash tables as unequal

  2. extra prone to creating hash collisions.

    this is because it works by BFS traversal walking over the data structure. Also it is bounded in the number of nodes it’s willing to traverse – which defaults to 10 “meaningful” nodes.

    the bound means that parts of the data structure may be ignored and we result in pathological cases where every value we store has the same hash value.

    1
    2
    3
    4
    5
    6
    7
    8
    
       Hashtbl.Poly.hashable.hash (List.range 0 9);;
       (* - : int = 209331808 *)
       Hashtbl.Poly.hashable.hash (List.range 0 10);;
       (* - : int = 182325193 *)
       Hashtbl.Poly.hashable.hash (List.range 0 11);;
       (* - : int = 182325193 *)
       Hashtbl.Poly.hashable.hash (List.range 0 100);;
       (* - : int = 182325193 *)
    

RULE OF THUMB: for large custom data-structures, write your own hash function OR use [@@deriving]

  • Inline create hash function using ppx_hash

    SYNTACTIC SUGAR: We can just use %hash directly. It’s a shorter form of [@@deriving hash] – that’s what the previous example uses.

Choosing between Maps and Hash Tables

They overlap in functionality so we need to decide when to use what?

  1. idiom: follow the programming paradigm

    immutable maps (for functional code) vs mutable hash tables (for imperative code)

  2. idiom: focus on performance needs and data volume

    For code that is dominated by updates and lookups, hash tables are a clear performance win, and the win is clearer the larger the amount of data.


Here’s some benchmarks:

  1. Bench mark to demonstrate generally more efficient hashtables than maps

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    
        open Base
        open Core_bench
    
        let map_iter ~num_keys ~iterations =
          let rec loop i map =
            if i <= 0
            then ()
            else
              loop
                (i - 1)
                (Map.change map (i % num_keys) ~f:(fun current ->
                     Some (1 + Option.value ~default:0 current)))
          in
          loop iterations (Map.empty (module Int))
    
        let table_iter ~num_keys ~iterations =
          let table = Hashtbl.create (module Int) ~size:num_keys in
          let rec loop i =
            if i <= 0
            then ()
            else (
              Hashtbl.change table (i % num_keys) ~f:(fun current ->
                  Some (1 + Option.value ~default:0 current));
              loop (i - 1))
          in
          loop iterations
    
        let tests ~num_keys ~iterations =
          let t name f = Bench.Test.create f ~name in
          [ t "table" (fun () -> table_iter ~num_keys ~iterations)
          ; t "map" (fun () -> map_iter ~num_keys ~iterations)
          ]
    
        let () =
          tests ~num_keys:1000 ~iterations:100_000
          |> Bench.make_command
          |> Command_unix.run
    

    the dune file:

    1
    2
    3
    
        (executable
          (name      map_vs_hash)
          (libraries base core_bench core_unix.command_unix))
    
    1
    2
    3
    4
    5
    6
    7
    8
    
        dune build map_vs_hash.exe
        ./_build/default/map_vs_hash.exe -ascii -quota 1 -clear-columns time speedup
        Estimated testing time 2s (2 benchmarks x 1s). Change using -quota SECS.
    
          Name    Time/Run   Speedup
         ------- ---------- ---------
          table    13.34ms      1.00
          map      44.54ms      3.34
    
  2. Benchmark to demonstrate that when we need to keep multiple related copies (like snapshots) of data at once, maps are better because they’re immutable

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    
       open Base
       open Core_bench
    
       let create_maps ~num_keys ~iterations =
         let rec loop i map =
           if i <= 0
           then []
           else (
             let new_map =
               Map.change map (i % num_keys) ~f:(fun current ->
                   Some (1 + Option.value ~default:0 current))
             in
             new_map :: loop (i - 1) new_map)
         in
         loop iterations (Map.empty (module Int))
    
       let create_tables ~num_keys ~iterations =
         let table = Hashtbl.create (module Int) ~size:num_keys in
         let rec loop i =
           if i <= 0
           then []
           else (
             Hashtbl.change table (i % num_keys) ~f:(fun current ->
                 Some (1 + Option.value ~default:0 current));
             let new_table = Hashtbl.copy table in
             new_table :: loop (i - 1))
         in
         loop iterations
    
       let tests ~num_keys ~iterations =
         let t name f = Bench.Test.create f ~name in
         [ t "table" (fun () -> ignore (create_tables ~num_keys ~iterations))
         ; t "map" (fun () -> ignore (create_maps ~num_keys ~iterations))
         ]
    
       let () =
         tests ~num_keys:50 ~iterations:1000
         |> Bench.make_command
         |> Command_unix.run
    
    1
    2
    3
    
       executable
         (name      map_vs_hash2)
         (libraries core_bench core_unix.command_unix))
    
    1
    2
    3
    4
    5
    6
    7
    8
    
       dune build map_vs_hash2.exe
       ./_build/default/map_vs_hash2.exe -ascii -clear-columns time speedup
       Estimated testing time 20s (2 benchmarks x 10s). Change using -quota SECS.
    
         Name      Time/Run   Speedup
        ------- ------------ ---------
         table   4_453.95us     25.80
         map       172.61us      1.00
    

TODO Chapter 15: CLI Parsing

Basic Command-Line Parsing

Defining and Anonymous Argument

Defining Basic Commands

Running Commands

Multi-Argument Commands

Argument Types

Defining Custom Argument Types

Optional and Default Arguments

Sequences of Arguments

Adding Labeled Flags

Grouping Subcommands Together

Prompting for Interactive Input

Command-Line Autocompletion with bash

Generating Completion Fragments from Command

Installing the Completion Fragment

  • Installing a Generic Completion Handler

Alternative Command-line Parsers