Chapter 11: First-Class Modules
OCaml essentially has 2 parts to the language:
a core language that is concerned with values and types
can’t contain modules or module types
So we can’t define a variable whose value is a module or a function that takes a module as an argument.
a module language that is concerned with modules and module signatures
- can contain types and values
However, OCaml provides a language construct to circumvent this stratification: first-class modules – can be created from and converted back to regular modules
Letting modules into the core language is powerful, it increases the range of what we can express and makes it easier to build flexible and modular systems.
Working with First-Class Modules
We’re going to use some toy examples to illustrate the following points:
Creating First-Class Modules (packaging)
Creating first-class modules requires us to package a module up with a signature that satisfies it.
Given a module signature and a module definition, we have an example of how this packaging can be done using the syntax (module <Module> : <Module_type>)
| |
Inference and Anonymous Modules
- From the packaging syntax, we can module type if it can be inferred
- We can also do this wrapping if the module is anonymous
| |
Unpacking first-class modules
Since we know how to package a module into a first-class module, we should know how to unpack a module and access its contents using the syntax (val <first_class_module> : <Module_type>)
| |
Functions for Manipulating First-Class Modules
Since they are first-class we have some ways to manipulate them (including ordinary functions that consume and create first-class modules):
this function is about conversions between the first-class module (runtime) from the Module language (compile-time).
If we wish to do module-field projections, then it needs to be in Module language (unpacked) instead of the packed first-class language.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31(* consumes a module and returns the int within it *) let to_int m = let module M = (val m : X_int) in M.x;; (* val to_int : (module X_int) -> int = <fun> *) (* consumes 2 packed modules, returns a first-class module *) let plus m1 m2 = (module struct let x = to_int m1 + to_int m2 end : X_int);; (* val plus : (module X_int) -> (module X_int) -> (module X_int) = <fun> *) let res = plus three (module Four);; module Res = (val res : X_int);; Res.x;; (* -- this is a field inspection*) (* this works *) module Foo = (val (plus three (module Four: X_int)) : X_int);; Foo.x;; (* works because Foo is a module-identifier so Foo.x is a module-field projection. Projections work only for syntactic modules, not for runtime first-class values.*) (* this doesn't work: (* (val (plus three (module Four : X_int)) : X_int) *) this is not a module, it's a first-class value of a type -- an existential package. We need to unpack the first-class module (runtime) into a module binding (compile-time) then do the field projection. *) (* forcing out an inline example: *) let module Foo = (val (plus three (module Four : X_int)) : X_int) in Foo.x;;We can pattern match to unpack a first-class module
1 2 3 4 5 6 7 8 9 10(* --- unpacking via pattern-match, more concise *) let to_int (module M : X_int) = M.x;; (* val to_int : (module X_int) -> int = <fun> *) (* OLD way: consumes a module and returns the int within it *) let to_int m = let module M = (val m : X_int) in M.x;; (* val to_int : (module X_int) -> int = <fun> *)So all in all, we have good expressiveness:
1 2 3 4let six = plus three three;; (* val six : (module X_int) = <module> *) to_int (List.fold ~init:six ~f:plus [three;three]);; (* - : int = 12 *)
Richer First-Class Modules
Let’s go beyond simple int values and let the first-class modules contain types and functions.
| |
Exposing Types
Continuing on, int_bumper is fully abstract and so we can’t exploit the fact that the type in question is int. We can’t really do anything with values of Bumper.t.
we can’t do this:
| |
option 1: using a sharing constraint
To make
int_bumperusable, we need to expose that the typeBumpable.tis equal tointfor which we can use constraint sharing1 2 3 4 5 6 7 8 9 10 11 12 13let int_bumper = (module Int_bumper : Bumpable with type t = int);; (* val int_bumper : (module Bumpable with type t = int) = <module> *) let float_bumper = (module Float_bumper : Bumpable with type t = float);; (* val float_bumper : (module Bumpable with type t = float) = <module> *) (* usage works:*) let (module Bumper) = int_bumper in Bumper.bump 3;; (* - : int = 4 *) let (module Bumper) = float_bumper in Bumper.bump 3.5;; (* - : float = 4.5 *)
option 2: using a locally abstract type to make polymorphic first-class modules
We can use first-class modules polymorphically, consider this function that takes in 2 args:
Bumpablemodule and list of elements of the same type as typetof the module.The
type a(pseudoparameter) allows us to useaas a locally abstract type.athen acts like an abstract type within the context of the function.In this example, we then use the locally abstract type as part of a sharing constraint that ties the type
B.twith the type of the elements of the list passed in.This makes the function polymorphic in both the type of the list element and the type
Bumpable.tas we see in the usage section of the code example:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16let bump_list (type a) (module Bumper : Bumpable with type t = a) (l: a list) = List.map ~f:Bumper.bump l;; (* val bump_list : (module Bumpable with type t = 'a) -> 'a list -> 'a list = <fun> *) (* === polymorphic usage: *) bump_list int_bumper [1;2;3];; (* - : int list = [2; 3; 4] *) bump_list float_bumper [1.5;2.5;3.5];; (* - : float list = [2.5; 3.5; 4.5] *)Polymorphic first-class modules are important because they allow you to connect the types associated with a first-class module to the types of other values you’re working with.
More on locally abstract types
One of the key properties of locally abstract types is that they’re dealt with as abstract types in the function they’re defined within, but are polymorphic from the outside.
1 2 3 4 5 6 7 8 9let wrap_in_list (type a) (x:a) = [x];; (* val wrap_in_list : 'a -> 'a list = <fun> *) (*<-- "a" is used ina way that is compatible with it being abstract but hte type of the function that is inferred is polymorphic. *) (* so compiler complains if we try this: *) let double_int (type a) (x:a) = x + x;; (* Line 1, characters 33-34: Error: This expression has type a but an expression was expected of type int *)Locally abstract types are useful because they let us create a fresh type name that can be referenced inside type definitions:
- can be used to build local modules
- can be used to wire types to functors in a type-safe way
see how we create a new first-class module here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20module type Comparable = sig type t val compare : t -> t -> int end;; (* module type Comparable = sig type t val compare : t -> t -> int end *) let create_comparable (type a) compare = (module struct type t = a (* this equality is internal to the module *) let compare = compare end : Comparable with type t = a);; (* -- the locally abstract type exposes the type equality: external to the module.*) (* val create_comparable : ('a -> 'a -> int) -> (module Comparable with type t = 'a) = <fun> *) let int_compare = create_comparable Int.compare;; (* - : (module Comparable with type t = int) = <module> *) let float_compare = create_comparable Float.compare;; (* - : (module Comparable with type t = float) = <module> *) let module I_comp = (val int_compare) in (* seems like the module type is inferrable*) I_comp.compare 2 3;;Disambiguating “abstract” and “polymorphic”
Concept Meaning Typical Context Example Abstract type A type whose concrete representation is hidden or unknown. Module boundaries or locally abstract types (inside their scope). `type t` in a signature; `(type a)` inside `let`. Polymorphic type A function or value that works for any type. `‘a`, `‘b` quantified type parameters. `let id : ‘a -> ‘a = fun x -> x`. Locally Abstract Type Behaves as abstract inside its definition but is polymorphic outside. GADTs, first-class modules, polymorphic recursion. `let f (type a) (x : a expr) = …` The difference is in their levels of generality and different mechanisms:
“Abstract” means “unknown (for now)”
for an abstract type, the concrete representation is intentionally hidden
it doesn’t mean that the type can vary between calls / doesn’t mean that it can be generalised across calls – it just means that we can’t see inside it
compiler treats this like a distinct opaque name, unequal to others even if implementation is
intorstringunderneathSo type
tis an abstract (but one fixed type), known only withinM1 2 3 4 5 6 7module M : sig type t (* abstract -- to users of this module, t can't be treated as int *) val create : unit -> t end = struct type t = int (* concrete inside the module *) let create () = 42 end
“Polymorphic” means “general across types”
“Polymorphism” == “Generality across many possible types”.
so a polymorphic value or function will work uniformly over any type. Typically we define this using
'ainstead ofa.1 2let id (x : 'a) : 'a = x (* id : 'a -> 'a *)“locally abstract types” mix the two up:
“One of the key properties of locally abstract types is that they’re dealt with as abstract types in the function they’re defined within, but are polymorphic from the outside.”
When we say
let f (type a) (x: a t) = ...:inside the function
f, we wishato be treated as an abstract type – unknown, fixed and opaque (we can’t assume whatais)- inside the definition,
aacts like an abstract placeholder (“opaque” – we only know “some type”)
- inside the definition,
from outside the function
f,fis polymorphic ina: it can be used with any concrete instantiation ofa.- so outside the definition, the function is universally quantified over
aso we can call it with any concrete type (i.e. it’s polymorphic)
- so outside the definition, the function is universally quantified over
Example: A Query-handling framework
We will have a system that responds to user-generated queries. System uses sexps for formatting queries and responses, and also the config for the query handler.
The signature here is for a module that implements a system for responding to user-generated queries. Other serialisation formats (JSON) could have worked too.
So here’s our Query_handler interface.
| |
sexp converters are tedious to implement by hand, we can use
ppx_sexp_convto generate the sexp converters based on their type definition.We can also use the annotations within a signature to add the appropriate type signature. The purpose of applying the annotation to the interface is that the compiler knows that a module implementation that satisfies the signature has those sexp functions.
When applied within a type definition (implementation), the compiler will emit code for those functions – so code is generated.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19(* applying the annotation on a type to derive the sexp converter functions *) #require "ppx_jane";; type u = { a: int; b: float } [@@deriving sexp];; (* type u = { a : int; b : float; } val u_of_sexp : Sexp.t -> u = <fun> val sexp_of_u : u -> Sexp.t = <fun> *) sexp_of_u {a=3;b=7.};; (* - : Sexp.t = ((a 3)(b 7)) *) u_of_sexp (Core.Sexp.of_string "((a 43) (b 3.4))");; (* - : u = {a = 43; b = 3.4} *) (* using the annotation directly within the signature (implementation) *) module type M = sig type t [@@deriving sexp] end;; (* module type M = sig type t val t_of_sexp : Sexp.t -> t val sexp_of_t : t -> Sexp.t end *)Same annotations can be attached within a signature to add the appropriate type signature:
Implementing a query handler
Given the Query_handler interface, we can create some query handler modules.
| |
Here’s another example of a query handler that does directory listings:
| |
Dispatching to Multiple Query Handlers
We’d like to create a whole dispatch table and dispatch queries.
It’s natural to use data structures like lists (or tables) using first-class modules, which would have been odd to do with modules and functors alone.
For the following example, assume (query-name query) is the shape of a single query, where query-name is the name used to determine which handler to dispatch the query to and query is the body of the query (as a sexp).
- First we shall have a signature that combines a
Query_handlermodule with an instantiated form of a query handler- this part looks very similar to OOP code with the module type and the
this
- this part looks very similar to OOP code with the module type and the
- We can have a function that uses a locally abstract type to construct new instances, as a sort of a higher order function
- then we an have a dispatch-table builder
- then we can add in a dispatcher function
this part looks like OOP code
One key difference is that first-class modules allow you to package up more than just functions or methods. As we’ve seen, you can also include types and even modules. We’ve only used it in a small way here, but this extra power allows you to build more sophisticated components that involve multiple interdependent types and values.
| |
We can turn this into a CLI-interface code:
| |
Loading and Unloading Query Handlers
First-class modules give a lot of dynamism and flexibility – e.g. we can make it such that our query handlers can be loaded and unloaded at runtime.
We will define a Loader module:
- define a
Loadermodule that controls a set of active query handlers - define a creator function for creating a
Loader.t - define functions (
load,unload) to manipulate the table of active query handlers - define
evalfunction which determines the query interface presented to the user- first we create variant type for the request
- then we use the sexp converter generated for that type to parse the query from the user
| |
Combining it with the Cli interface:
- create an instance of the loader query handler
- add that instance to the loader’s active table
- then launch the cli interface, passing it the active table.
| |
dynamic linking facilities
KIV this but OCaml’s dynamic linking facilities allows us to compile and link in new code to a running program.
WE can automate this using libraries like
ocaml_pluginwhich can be installed via OPAM and takes care of the workflow around setting up dynamic linking.
MAGIC: of OCaml
MAGIC: The Magic of OCaml: Type Safety Meets Dynamic Linking
Seems like one of the most profound strengths of OCaml’s type system is how it extends safety and correctness all the way to the boundaries between separately compiled modules. This power shines when we consider dynamic linking — the ability to compile new code and load it into a running system.
Imagine a Mars rover millions of kilometers away that needs a software patch. You can’t afford runtime surprises.
In a dynamically typed language, replacing a module at runtime carries risk: the new code might not conform to the old module’s expectations until it actually fails in the field.
But in OCaml, the compiler enforces type integrity at every stage — even across separately compiled units. Each module’s interface carries a precise type signature, and the compiler ensures that any dynamically linked module matches that interface exactly. The runtime (Dynlink) can then safely load the compiled code (.cmxs files) knowing that the functions, values, and data layouts all align perfectly with the system’s expectations.
This means dynamic linking in OCaml isn’t an act of faith — it’s a provably safe operation. The strong static type system guarantees that what you load at runtime behaves exactly as the rest of the program expects.
So the “magic” of OCaml lies in this combination:
Static type safety gives compile-time guarantees about correctness and consistency.
Dynamic linking allows runtime adaptability.
Together, they enable systems that can evolve and patch themselves — even from millions of kilometers away — with mathematical confidence in their integrity.
Living without First-Class Modules
Most designs that can be done with first-class modules can be simulated without them (though it’s a little awkward to do that).
The idea here is that we hide the true types of the objects in question behind the functions stored in the closure. We’ve implemented our query_handler_instance as just types below. As for Unique query handler into this framework, we just
| |
This is a small scale example and so it’s alright to not use first-class functions. When it gets a lot more complex (more functionality to be hidden away behind a set of closures, more complicated the r/s between the different types in question) then the more awkward this gets and it’s better to use first-class modules at that point.
Chapter 12: Objects
What is OOP?
Paradigm that groups computation and data within logical objects. It’s interesting how the OCaml folks set the vocabulary around this:
object contains data within fields
method functions can be invoked against the data within the object \(\implies\) “sending a message” to the object
this “sending a message” part is interesting because it aligns with the opinion that OOP is intrinsically imperative, where the object is like an FSM. Sending messages to an object causes it to change state, possibly sending messages to other objects. Naturally, this is assuming the object’s state is allowed to be mutable.
class: code definition behind an object
constructor: builds the object
5 fundamental properties that makes OOP unique:
Abstraction
Implementation details hidden within the object, external interface is the set of publicly accessible methods
Dynamic Lookup
When sending a message to object, method to be executed is dynamically determined by the implementation of the object, not by some static property of the program. Therefore, different objects may react to the same message in different ways.
Subtyping
If an object a has all the functionality of an object b, then we may use a in any context where b is expected.
Inheritance
Definition of one object (class) can be used to produce a new kind of object (class). The new definition may override behaviour, share behaviour with parent.
Open recursion
Methods within an object can invoke other methods within the same object using a special variable (
selforthis). Objects created from classes use dynamic lookup, allowing a method defined in one class to invoke methods defined in another class that inherits from the first.
OCaml Objects
OCaml Object system is very surprising to OOP-folks:
Class System: Separation of Object and their types
Typically, we’d expect the class name to be the type of objects created when we instantiate it, and the relationships between object types correlate to inheritance.
In OCaml, Classes are used to construct objects and support inheritance. However, Classes are not types. Objects have object types.
If we want to use objects, we don’t need to use classes at all.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24(* ==== defining an object *) open Base;; let s = object val mutable v = [0; 2] method pop = match v with | hd :: tl -> v <- tl; Some hd | [] -> None method push hd = v <- hd :: v end;; (* val s : < pop : int option; push : int -> unit > = <obj> *) (* ==== Usage: *) s#pop;; (* - : int option = Some 0 *) s#push 4;; (* - : unit = () *) s#pop;; (* - : int option = Some 4 *)Some notes:
- object type is what is within the
<...>which only contains the types of the methods - fields are typically private, only modifiable from the functions that cause side-effects on their state.
- methods can have 0 params.
- we don’t need to use the
unitargument (unlike the equivalent functional version) - this is because method call is routed to a concrete object instance. so zero argument methods can be invoked directly by writing
obj#methodwithout parens. It’s really because of the dynamic dispatching pattern.
- we don’t need to use the
- object type is what is within the
objects may be constructed by functions (without being placed within a class(?)).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18let stack init = object val mutable v = init method pop = match v with | hd :: tl -> v <- tl; Some hd | [] -> None method push hd = v <- hd :: v end;; (* val stack : 'a list -> < pop : 'a option; push : 'a -> unit > = <fun> *) let s = stack [3; 2; 1];; (* val s : < pop : int option; push : int -> unit > = <obj> *) s#pop;; (* - : int option = Some 3 *)NOTE:
- the function
stack’s returned object now uses the polymorphic type'ainstead of the previous example where it was bound toint.
- the function
Object Polymorphism
Similar to polymorphic variants, we can use methods without a typedef.
This means that the compiler will infer the object types automatically from the methods that are invoked on them. The type system complains if it sees incompatible uses of the same methods.
| |
Notes:
- object types may be open or close:
open: more methods can be inferred still
the
..(ellipsis) in the inferred object types indicate that there could be other unspecified methods that the object may havethis is actually called an ellision \(\implies\) stands of “possibly more methods”
closed: no more inferring of other methods and such
we can see how the object’s definition is extensible from the examples, we also see that we can close the object so that the object type is fixed.
Elisions are Polymorphic (row-polymorphism)
An elided object type is polymorphic. SO if we try to write a type definition using it, then we get an “unbound type variable error” type square = < width : int; ..>;;
This kind of polymorphism is called row polymorphism and it uses row-variables (.. is a row-variable). OCaml’s polymorphic variant types also use row-polymorphism; close relationship between objects and polymorphic variants: “objects are to records what polymorphic variants are to ordinary variants”
So, an object with object type < pop: int option; ..> can be any object with a method pop : int option regardless of how it’s implemented. The dynamic dispatch handles the method invocation when we invoke that method.
| |
Immutable Objects
The view that OOP is intrinsically makes sense. Objects are like FSMs and message-passing is what allows us to changs FSM state.
However, this assumes that the objects are mutable.
We can make them immutable too:
| |
NOTEs:
Special syntax:
uses a special syntax
{<...>}to produce a copy of the current object (and the assignment is an override of the copied object). In so doing, we don’t cause any modifications to the existing object.Syntax rules:
- can only be used within a method body
- only values of fields may be updated
- method implmentations are fixed @ the time of object-creation, so they can’t be chagned dynamically.
When to use Objects
RULE OF THUMB: The main value of objects is really the ability to use open recursion and inheritance which is more of part of the class system.
What are the alternative toolings at our disposal?
classes and objects
Pros:
- no need type definitions
- good support for row-polymorphism
- benefits comes from the class system
inheritance
open recursion
Open recursion allows interdependent parts of an object to be defined separately. This works because calls between the methods of an object are determined when the object is instantiated, a form of late binding. This makes it possible (and necessary) for one method to refer to other methods in the object without knowing statically how they will be implemented \(\implies\) open recursion
This also means that some part of the code may have its implementation deferred and it will still work.
Cons:
- syntax-heavy
- runtime costs, might as well use records
first-class modules + functors + data types
pros:
more expressive (can include types)
very explicit
cons:
- is early binded – can only parameterise module code so t hat part of it can be implemented later via a function / functor. This explicitness can be quite verbose.
Subtyping
Central concept in OOP that governs when an object with type A can be used in an expression expecting object of type B.
Concretely, subtyping restricts when e :> t (coercion operator) can be applied. Coercion only works if type of e is a subtype of t
Width Subtyping for coercing objects
Width subtyping means that an object type A is a subtype of B if A has all of the methods of B (and possibly more). A square is a subtype of shape because it implements all the methods of shape, which in the example below, is just area.
| |
Depth Subtyping for coercing objects
Depth subtyping allows us to coerce and object if its individual methods could be safely coerced. So object type < m: t1 > is a subtype of object type < m: t2 > if t1 is a subtype of t2.
Given the shape type as type shape = < area : float >;;,
In the example here, both coin and map have a method called shape which is a subtype of the shape type and so, both of them can be coerced into the object type <shape: shape> (NOTE: the method shape that returns the type shape):
- within
coin:shape : < area : float; radius : int >and the type of this method is a subtype of theshapetype. - within
map:shape : < area : float; width : int >and the type of this method is a subtype of theshapetype.
| |
Polymorphic Variant Subtyping
We can coerce a polymorphic variant into a larger polymorphic variant type. A polymorphic variant type
Ais a subtype ofBif the tags ofAare subset of the tags ofB. This is aligned to the use of structural subtyping that is used for polymorphic variants.1 2 3 4 5 6 7 8type num = [ `Int of int | `Float of float ];; (* type num = [ `Float of float | `Int of int ] *) type const = [ num | `String of string ];; (* type const = [ `Float of float | `Int of int | `String of string ] *) let n : num = `Int 3;; (* val n : num = `Int 3 *) let c : const = (n :> const);; (*<=== so the subtype here is type num and the supertype is type const. We can type-coerce num into const *) (* val c : const = `Int 3 *)The compiler allows coercion because:
- Every value of
numcan safely fit intoconst, since every tag ofnum(Int,Float) also exists inconst. constmight allow more tags (String), but anywhere aconstis required, a value of typenumis guaranteed to match at least some of its cases– so it’s a valid, safe substitution.
- Every value of
Key intuition:
- Subtype: The more specific type (with a smaller set of allowed tags) is the subtype.
- Here:
numallows onlyIntandFloat.
- Here:
- Supertype: The broader type (with a superset of supported tags) is the supertype.
- Here:
constallowsInt,Float, andString.
- Here:
- Subtype: The more specific type (with a smaller set of allowed tags) is the subtype.
Variance
The classic variance forms in OOP but in the context of object types in OCaml.
Notes based on the code below:
immutability allows us to do the first type coercion from
square listtoshape list.so
'a listis covariant (in'a)If it was mutable (e.g.
square array) then we would have been able to store non-square shapes into what should be an array of squares. So,square arrayis NOT a subtype ofshape array.so
'a arrayis invariant (only accepts its own type)Finally for the contravariant example, consider a function with the type
square -> string. We can’t use it with types wider than itself likeshape -> stringbecause if we get acirclewe wouldn’t know what to do. However, we can use it with types narrower than itself.Similar argument applies to why a function with type
shape -> stringcan be safely used with typesquare -> string
| |
Variance Annotations
The compiler can infer the variance of types. In some cases, we wish to explicitly annotate the type.
One such case is if there’s a signature that hides the type parameters of an immutable type.
In those cases we can manually annotate the variance to the type’s parameters in the signature:
+: for covariance-: for contravariance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71(* === the implementation of the Either type -- so the type may be inferred as covariant *) module Either = struct type ('a, 'b) t = | Left of 'a | Right of 'b let left x = Left x let right x = Right x end;; (* module Either : sig type ('a, 'b) t = Left of 'a | Right of 'b val left : 'a -> ('a, 'b) t val right : 'a -> ('b, 'a) t end *) (* these usage will allow the type to be inferred as covariant *) let left_square = Either.left (square 40);; (* val left_square : (< area : float; width : int >, 'a) Either.t = Either.Left <obj> *) (left_square :> (shape,_) Either.t);; (* <=== we can type coerce to a supertype.*) (* - : (shape, 'a) Either.t = Either.Left <obj> *) (* === interface that hides the definition for others -- the type can't be known by consumers of this interface and so the variance will be inferred as invariant. *) module Abs_either : sig type ('a, 'b) t val left: 'a -> ('a, 'b) t val right: 'b -> ('a, 'b) t end = Either;; (* module Abs_either : sig type ('a, 'b) t val left : 'a -> ('a, 'b) t val right : 'b -> ('a, 'b) t end *) (* -- so this will be inferred as invariant *) (Abs_either.left (square 40) :> (shape, _) Abs_either.t);; (* Line 1, characters 2-29: Error: This expression cannot be coerced to type (shape, 'b) Abs_either.t; it has type (< area : float; width : int >, 'a) Abs_either.t but is here used with type (shape, 'b) Abs_either.t Type < area : float; width : int > is not compatible with type shape = < area : float > The second object type has no method width *) (* ========== USING VARIANCE ANNOTATIONS to the Type's params in the signature: *) module Var_either : sig type (+'a, +'b) t val left: 'a -> ('a, 'b) t val right: 'b -> ('a, 'b) t end = Either;; (* module Var_either : sig type (+'a, +'b) t val left : 'a -> ('a, 'b) t val right : 'b -> ('a, 'b) t end *) (* -- type coercion works again. *) (Var_either.left (square 40) :> (shape, _) Var_either.t);; (* - : (shape, 'a) Var_either.t = <abstr> *)We consider a more concrete variance example.
the problem we see is that the
pushmethod makes it such that thesquare stackandcircle stackare not subtypes ofshape stack.in
shape stackthepushmethod takes an arbitrary shape.If we could actually coerce
square stackto ashape stackthen it means it’s possible to push an arbitrary shape onto thesquare stackand this would have been regarded as an error.In other words,
< push: 'a -> unit; .. >is contravariant in'aso< push: square -> unit; pop: square option>can’t be a subtype of< push: shape -> unit; pop: shape option>the
pushmethod makes thestackobject type contravariant in'a
to make it work, we use a more precise type that indicates that we will not be using the
pushmethod. That’s how we get the typereadonly_stack.shape_stacksis ascribed a composite polymorphic type:what the
total_area’s parameter type annotation means (let total_area (shape_stacks: shape stack list)):- a list of objects of type
< pop: shape option >, wherereadonly_stackis parametrized byshape.
- a list of objects of type
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52(* === 1. we applied the stack function to some squares and some circles: *) type 'a stack = < pop: 'a option; push: 'a -> unit >;; (* type 'a stack = < pop : 'a option; push : 'a -> unit > *) let square_stack: square stack = stack [square 30; square 10];; (* val square_stack : square stack = <obj> *) let circle_stack: circle stack = stack [circle 20; circle 40];; (* val circle_stack : circle stack = <obj> *) (* === 2: attempting to use a area accumulating function*) let total_area (shape_stacks: shape stack list) = let stack_area acc st = let rec loop acc = match st#pop with | Some s -> loop (acc +. s#area) | None -> acc in loop acc in List.fold ~init:0.0 ~f:stack_area shape_stacks;; (* val total_area : shape stack list -> float = <fun> *) (* === problem: total_area [(square_stack :> shape stack); (circle_stack :> shape stack)];; Line 1, characters 13-42: Error: Type square stack = < pop : square option; push : square -> unit > is not a subtype of shape stack = < pop : shape option; push : shape -> unit > Type shape = < area : float > is not a subtype of square = < area : float; width : int > The first object type has no method width (* === problem: the push function makes it such that the shape stack object type is contravariant in 'a *) *) (* ==== 3: precise type to avoid the contravariance *) type 'a readonly_stack = < pop : 'a option >;; (* === this is a polymorphic object type *) (* type 'a readonly_stack = < pop : 'a option > *) let total_area (shape_stacks: shape readonly_stack list) = let stack_area acc st = let rec loop acc = match st#pop with | Some s -> loop (acc +. s#area) | None -> acc in loop acc in List.fold ~init:0.0 ~f:stack_area shape_stacks;; (* val total_area : shape readonly_stack list -> float = <fun> *) total_area [(square_stack :> shape readonly_stack); (circle_stack :> shape readonly_stack)];; (* - : float = 7280. *)It should be pointed out that this typing works, and in the end, the type annotations are fairly minor. In most typed object-oriented languages, these coercions would simply not be possible. For example, in C++, a STL type
list<T>is invariant inT, so it is simply not possible to uselist<square>wherelist<shape>is expected (at least safely). The situation is similar in Java, although Java has an escape hatch that allows the program to fall back to dynamic typing. The situation in OCaml is much better: it works, it is statically checked, and the annotations are pretty simple.
Narrowing
Narrowing is NOT allowed in OCaml.
So we can’t downcast a supertype to its subtype because:
technical reason: hard to implement it
design argument: narrowing violates abstraction
For structurally type-systems like in OCaml, if we allowed narrowing, then we could effectively enumerate the methods in an object (e.g. To check whether an object
objhas some methodfoo : int, one would attempt a coercion(obj :> < foo : int >).).
TODO: there’s an example here that I just skipped. It talks about how to handle some cases where one might wish for type narrowing to be possible and suggests using pattern matching amongst other patterns (and discusses its drawbacks).
Subtyping vs Row Polymorphism
Subtyping and Row-polymorphism are 2 distinct mechanisms and they overlap a lot in the sense that both are mechanisms we can use to write functions that can be applied to objects of different types.
Some notes:
Row polymorphism may be preferred over subtyping because there’s no explicit coercions \(\implies\) perserves more type information
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18(* ==== row-polymorphism example: *) let remove_large l = List.filter ~f:(fun s -> Float.(s#area <= 100.)) l;; (* val remove_large : (< area : float; .. > as 'a) list -> 'a list = <fun> *) (* --- so this method is still preserved *) let remove_large l = List.filter ~f:(fun s -> Float.(s#area <= 100.)) l;; (* val remove_large : (< area : float; .. > as 'a) list -> 'a list = <fun> *) (* ==== subtyping coercion example: *) (* Writing a similar function with a closed type and applying it using subtyping does not preserve the methods of the argument: the returned object is only known to have an area method: *) let remove_large (l: < area : float > list) = List.filter ~f:(fun s -> Float.(s#area <= 100.)) l;; (* val remove_large : < area : float > list -> < area : float > list = <fun> *) remove_large (squares :> < area : float > list );; (* - : < area : float > list = [<obj>; <obj>] *)There are some cases where row polymorphism can’t be used.
Can’t be used to place different types of objects in the same container (heterogeneous elements can’t be created using row-polymorphism).
1 2 3 4 5 6 7let hlist: < area: float; ..> list = [square 10; circle 30];; (* Line 1, characters 50-59: Error: This expression has type < area : float; radius : int > but an expression was expected of type < area : float; width : int > The second object type has no method radius *)Can’t be used to place different types of objects in the same reference:
1 2 3 4 5 6 7 8 9 10 11 12let shape_ref: < area: float; ..> ref = ref (square 40);; (* val shape_ref : < area : float; width : int > ref = {Base.Ref.contents = <obj>} *) shape_ref := circle 20;; (* Line 1, characters 14-23: Error: This expression has type < area : float; radius : int > but an expression was expected of type < area : float; width : int > The second object type has no method radius *)
In both cases, subtyping works:
1 2 3 4 5 6let hlist: shape list = [(square 10 :> shape); (circle 30 :> shape)];; (* val hlist : shape list = [<obj>; <obj>] *) let shape_ref: shape ref = ref (square 40 :> shape);; (* val shape_ref : shape ref = {Base.Ref.contents = <obj>} *) shape_ref := (circle 20 :> shape);; (* - : unit = () *)
Chapter 13: Classes
One of the main goals of OOP is code-reuse via inheritance and that’s why we need to have a good understanding of OCaml classes. A class is a recipe for creating objects, and that recipe can be changed by adding new methods and fields or modifying existing methods.
OCaml Classes
Classes have class-types which describes the class itself and class-types are separate from regular OCaml types.
Class types should not be confused with object types.
Class Type:
object ... endobject type: e.g.
< get : int; .. >
we produce objects from the class using
newWhile it’s true that classes and class names are NOT types, for convenience, the definition of class
istackalso defines and object typeistackwith the same methods as the class. NOTE: theistackobject type represents any objects with these methods, NOT just objects created by theistackclass.
| |
Class Parameters and Polymorphism
A class definition is the constructor, we may pass in arguments when the object is created with new.
constructor params can be:
- type params
- Type params for the class are placed in square brackets before the class name in the class definition.
- constructor params
- type params
we need to provide enough constraints so that the compiler will infer the correct type.
To achieve this, we can add type constraints to:
- parameters
- fields
- methods
Usually, we’d just annotate the fields and/or class params and add constraints to the methods only if necessary.
| |
Object Types as Interfaces
Suppose we wish to traverse the elements on our stack. In the OOP world, we would have defined a class for the iterator object which would have given us a generic mechanism to inspect and traverse the elements of a collection.
Typically languages use 2 mechanisms to define an abstract interface like this:
- like in java, an iterator interface:
1 2 3 4 5 6// Java-style iterator, specified as an interface. interface <T> iterator { T Get(); boolean HasValue(); void Next(); }; - in languages without interfaces (like C++), using abstract classes without implementation them:
1 2 3 4 5 6 7 8 9// Abstract class definition in C++. template<typename T> class Iterator { public: virtual ~Iterator() {} virtual T get() const = 0; virtual bool has_value() const = 0; virtual void next() = 0; };
OCaml supports both styles, and gives more flexibility: an object type can be implemented by any object with the appropriate methods. It doesn’t need to be specified by the object’s class a priori.
Object classes used as an interface
| |
Functional Iterators
Why take the trouble to implement an interator interface when we are using a functional language. lol.
We apply the usual functional patterns:
iterfoldwhich is polymorphic and has this type:
('b -> 'a -> 'b) -> 'b -> 'bthe
'ais already within the class type param, for'b, we need to use a type quantifier like below:SYNTAX AWKWARDNESS:
'b.is the type quantifier that should be read as “for all'b”. They can only be used DIRECTLY after the method name \(\implies\) method params must be expressed using afunorfunctionexpression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27class ['a] stack init = object val mutable v : 'a list = init method pop = match v with | hd :: tl -> v <- tl; Some hd | [] -> None method push hd = v <- hd :: v (* -- type quantifier*) method fold : 'b. ('b -> 'a -> 'b) -> 'b -> 'b = (fun f init -> List.fold ~f ~init v) end;; (* class ['a] stack : 'a list -> object val mutable v : 'a list method fold : ('b -> 'a -> 'b) -> 'b -> 'b method pop : 'a option method push : 'a -> unit end *)
Inheritance
We can create subclasses that inherit from the parent class and:
add new functionality e.g. adding this
printmethod- we can directly inherit from the class using
inheritkeyword
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16class sstack init = object inherit [string] stack init method print = List.iter ~f:Stdio.print_endline v end;; (* class sstack : string list -> object val mutable v : string list method pop : string option method print : unit method push : string -> unit end *)- we can directly inherit from the class using
override existing methods
- we access the original method using
as superstatement. This creates a specialsuperobject through which we can call superclass methods. It’s not a real object though, just a good mental model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15class double_stack init = object inherit [int] stack init as super method push hd = super#push (hd * 2) end;; (* class double_stack : int list -> object val mutable v : int list method pop : int option method push : int -> unit end *)- we access the original method using
Class Types
Purpose of a class type is for exposure to other files or modules via a module signature. We may wish to include the classes in module’s signature that is exposed because other modules may inherit from class definitions. To do this we need to specify the class types.
Using class types is not as mainstream in OOP languages. The idea is that a class type specifies the type of each VISIBLE parts of the class, INCLUDING fields and methods. Whatever we omit gets hidden.
Now, we have multiple choices in defining the module type, depending on how much of the implementation we want to expose.
one extreme: maximally abstract signature that completely hides the class definitions
this completely ignores the classes.
We may wish to include the classes in module’s signature that is exposed because other modules may inherit from class definitions. To do this we need to specify the class types.
| |
Open Recursion
We’ve seen this before. It’s useful because of the dynamic dispatch that we can do. We can use open recursion for cases where using data types or modules would be hard to do.
Open Recursion:
allows object’s methods to invoke other methods on the same object
allows a method in one class to call a method from another class if both classes are inherited by the same object.
This allows mutually recursive parts of an object to be defined separately.
Key feature of classes: This ability to define mutually recursive methods. If we did it using data types or modules it would have been much more cumbersome and verbose.
example: recursive traversal of a document structure
IF we had to traverse a document and we only had types defined for a tree with different types of nodes:
| |
It’s not as easy to factor out the common parts of these functions and we likely would have to write recursive code multiple times.
Better way is to define a class and use open recursion. The class just has to define objects that fold over the document data.
Below,
object(self)bindsselfto the current object which allows the open recursive calls to happen.- we could even inherit from this class and add functionality:
| |
Private Methods
Are actually more like protected methods since OCaml private methods can be called by subclasses.
Private methods are part of the class type but NOT part of the object type (for objects that are produced from the class). This is what allows subclasses to have access to private methods.
In the example here, we might not want to force subclasses of folder to expose the nitty gritty methods for each of the different doc and text_item cases.
| |
If we wish to have TRULY private classes that no one can access, we just need to omit it from the signature:
| |
Binary Methods
A method that takes in an object of self type e.g. for equality. Equality check is somewhat of an extreme case for binary methods because it needs a full info on the objects for comparisons. In many cases, binary functions likely only need partial info (e.g. is_bigger for an arbitrary shape).
- we can get the type of the current object using
(self: 'self)type annotation - the code below is only allows objects of the same type to be compared and this is a specific problem we need to solve.
| |
Our objective is to use a common base shape class and be able to compare equality. At the moment a square=/=circle only expects to be compared with itself and not an arbitrary shape. Other languages would typically solve it using type narrowing or dynamic type-checking. OCaml can’t do that.
why using Polymorphic equality won’t work
Approach that won’t work :
instead of using the equality method within the base type shape, we use the polymorphic equality instead.
This has problems:
the builtin polymorphic equality has poor behaviour when used with objects.
the builtin polymorphic equality only considers two things as equal if they’re physically equal \(\implies\) we’ll get many false negatives
1 2 3 4Poly.(=) (object method area = 5 end) (object method area = 5 end);; (* - : bool = false *)
using a representation type and a repr method for equality
Approach that works:
In general, we can use the same solution we used for narrowing: using a representation type implemented using variants and implementing comparisons based on the representation type. We need to expose the repr method, but we may hide the type definition for it.
| |
using extensible variants
In the previous section, we had a new problem that we could only add new kinds of shapes if we added new constructors to the shape_repr type (i.e. add new tags to the variant type).
Extensible variants separate the definition of a variant type from the definition of its constructors. So it’s an open type and new variants can always be added.
- The compiler can’t do exhaustiveness checks on such open types though.
- objects created by these classes are in one-to-one correspondence with members of the representation type so the objects seem somewhat redundant.
| |
partial information for binary methods
Equality check is an extreme and in many cases we just need partial information. The larger method can be used on a square and also on any object of the type shape.
| |
Virtual Classes and Methods
OCaml’s use of the word virtual is different from others like C++ (where virtual == dynamic dispatch). In OCaml, all methods use dynamic dispatch.
A virtual class is like a functor because the “inputs” are declared (but not defined) i.e. the virtual methods and fields. “Applying the functor” happens by subclasses doing inheritance: when virtual methods are given concrete implementations.
So virtual means:
- method / field in the
virtualclass is declared but not implemented. In this case both the method/field AND the class are flagged asvirtualand can’t be directly instantiated.
Create Some Simple Shapes
For the example below, we realise that there’s some commonalities between the two classes that we can shift into a superclass:
- definitions of
xandy on_clickdepends oncontainsand the implementation ofcontainsdiffers for different shapes \(\implies\) a good candidate forcontainsto be a virtual method within a virtual class, thereby allowing the definition ofcontainsto be the subclass’s responsibility.
| |
And this is our virtual class:
| |
Initializers
Init functions are common in OOP world, we have them in OCaml.
To add in expressions that are executed during instantiation of a class. These expressions can’t refer to the methods of the object because they run before the object’s creation:
- place the expressions before the object expression
- place the expressions in the initial value of a field
If we wish to make references to methods of the object, then we need to do it differently. We need to use initializers.
| |
Multiple Inheritance
OCaml’s support for multiple inheritance means that we have a variety of ways that classes can be combined, useful especially with the use of virtual classes.
The tricky part is when the inheritance heirarchy is less of a tree and more of a graph and so we should be cautioned against that.
How Names are Resolved
This is about name-clashes.
Inheritance is like *textual inclusion – if there’s more than one definition of a name, the last one wins.
Really the most recent definition for that name is what matters if we have clashes.
| |
a reiteration of what inheritance means:
replace each inherit directive with its definition
take the last definition of each method or field.
Note that the methods and fields added by an inheritance are those listed in its class type, so private methods that are hidden by the type will not be included.
Mixins
Many opinions about multiple inheritance:
- that it’s overly complicated
- problematic in general and we should use object composition instead
Nevertheless, one general pattern for multiple inheritance that is useful and simple: using the mixin pattern.
Mixins:
just a virtual class implementing a feature based on another one
If you have a class that implements methods A, and you have a mixin M that provides methods B from A, then you can inherit from M—“mixing” it in—to get features B.
Example is one of mouse-draggability and animation for the shapes we have had:
| |
The initialisers can be added using mixins as well. What’s interesting is because the mixin examples below are only used for side-effects, we can inherit them multiple times to get different animations.
| |
Displaying the Animated Shapes
This is more of a wrap-up section to the examples.
We finish our shapes module by creating a main function to draw some shapes on the graphical display and running that function using the Async scheduler:
| |
The OCaml builtin graphics functionality is more of a learning tool.
There are better packages / bindings out there:
- Lablgtk – based on gtk
- LablGL – uses OpenGL
- js_of_ocaml – compiles ocaml into js
Chapter 14: Maps and Hash Tables
When storing associated, organised data, OCaml’s association lists come to mind but they’re inefficient because most actions are in \(O(n)\) time.
Here’s a short example:
| |
Maps (work in logarithmic time) and Hash Tables (work in constant time) offer better performance.
Maps
When looking at the basic syntax for using Maps, the following things stand out:
The type definition for type
tusesMap.twhich is supplied 3 types params:one for the type of the key
one for the type of the data
one for the comparator witness :
Indicates which comparator function was used to construct the map. It doesn’t say anything about the type of data stored in the map.
In the example below,
String.comparator_witnesstells us the default string comparator was used
Map.emptytakes in a first-class module as an argument.This first-class module provides the comparison function for building the map, along with the sexp converter for generating useful error messages.
This is also why we don’t need to provide the module again when using functions like
Map.findorMap.add.There’s some requirements we need to meet for our custom modules to be able to work like this (to be passed into empty map). KIV
| |
Sets
Base gives a set data type also. It’s possible to encode sets in terms of maps but it’s more efficient and natural to use Base’s specialised set type.
We have the typical set operations that one would expect in any programming languages.
| |
Modules and Comparators
This section is about how we can make comparator witnesses work for our custom modules so that we can create maps typed with our custom modules. NOTE: the actual elaboration is done here in this section.
First, creating maps/sets has some useful functions that we should take note:
- Creating maps from alists:
1 2 3 4 5 6 7 8 9 10 11 12 13 14open Base;; let digit_alist = [ 0, "zero"; 1, "one"; 2, "two" ; 3, "three"; 4, "four" ; 5, "five"; 6, "six"; 7, "seven"; 8, "eight"; 9, "nine" ];; (* val digit_alist : (int * string) list = [(0, "zero"); (1, "one"); (2, "two"); (3, "three"); (4, "four"); (5, "five"); (6, "six"); (7, "seven"); (8, "eight"); (9, "nine")] *) let digit_map = Map.of_alist_exn (module Int) digit_alist;; (* -- this will exn on duplicate keys encountered*) (* val digit_map : (int, string, Int.comparator_witness) Map.t = <abstr> *) Map.find digit_map 3;; (* - : string option = Some "three" *)
In order for us to make maps and sets from our own custom modules, we need to satisfy the Base.Comparator.S signature.
There’s a common idiom for doing this:
for our custom module e.g.
Book, we create a submodule,Tthat contains the basic functionality for OUR custom type (Book)Include that submodule,
TAlso include the result of applying a functor to that module. The functor here may be:
Comparator.Make(T)Comparable.Make(T): has some more helper functions e.g. infix comparison operators and min and max functions, in addition to the comparator itself.
| |
Why Do We Need Comparator Witnesses?
A comparator witness isn’t about tracking what kind of data we’re using — it’s about tracking how that data is ordered.
Some map and set operations (such as merging, union, or intersection) depend on both operands being ordered according to the same total order. That total order of individual operands is determined by the comparison function used when the data structure was created.
The compiler can’t just rely on the type of the operands here. The problem is that the type 'a alone doesn’t tell us which comparison function was used. e.g. 2 maps might both use keys of type int, yet be ordered by different comparison functions (e.g. ascending vs. descending). If we combined them blindly, we’d break the data structure’s invariants. Instead of having this ambiguity, we can have an explicit outcome defined in the form of a witness – hence “singles out” the ordering function that defines the structure’s total order
That’s why Comparator Witnesses need to be used.
Values that witness a specific comparison function and carry its identity at the type level. This ensures that only maps and sets built with the same comparator witness can be combined, enforcing correctness through the type system.
This is a code example of that:
| |
The Polymorphic Comparator
We have the choice of using the polymorphic comparator that is builtin instead of having to implement it all the time for our custom modules. However, we should use it with knowledge of their nuances.
| |
maps based on the polymorphic comparator have different comparator witnesses than those based on the type-specific comparison function
this is expected because there’s NO guarantee that the comparator associated with a given type will order things in the same way that polymorphic compare does.
the perils of polymorphic compare
RULE OF THUMB: polymorphic compare should be AVOIDED in production code.
Problems:
compares directly on the runtime representation of OCaml values
So, it will walk the structure of those values regardless of their type \(\implies\) it’s type-oblivious
This means that it peeks under the ordinary abstraction boundaries which can lead to counter-intuitive outcomes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19(* === example: m1 and m2 should technically be seen as the SAME map *) let m1 = Map.of_alist_exn (module Int) [1, "one";2, "two"];; (* val m1 : (int, string, Int.comparator_witness) Map.t = <abstr> *) let m2 = Map.of_alist_exn (module Int) [2, "two";1, "one"];; (* val m2 : (int, string, Int.comparator_witness) Map.t = <abstr> *) (* ==== using a more idiomatic comparison (Map.equal) shows this equality *) Map.equal String.equal m1 m2;; (* - : bool = true *) (* ==== PROBLEM: polymorphic compare walks the datastructures which will have a different underlying layout of the underlying trees so Polymorphic compare will say they are not equal *) Poly.(m1 = m2);; (* --- note this errors out because maps store the comparison function they were created with and poly compare won't work on functions *) (* Exception: Invalid_argument "compare: functional value". *) (* ==== Polymorphic compare on the binary tree *) (* --- Map.Using_comparator.to_tree exposes the binary tree. *) Poly.((Map.Using_comparator.to_tree m1) = (Map.Using_comparator.to_tree m2));; (* - : bool = false *)polymorphic compare won’t work on functions (naturally). So when it does the datastructure-walk, if there are function stored then a (runtime) exception will be raised. This is clear when we directly poly-compare maps (example above).
The abstraction breaking can result in bugs that are subtle e.g. a map where keys are sets? then the map built with the polymorphic comparator behaves incorrectly and inconsistently and separates out the keys that should be aggregated together.
The inconsistency would be from differences in the order in which the sets were built.
Satisfying Comparator.S with [@@deriving]
Using maps and sets on our custom type means that we need to satisfy Comparator.S:
- need sexp converters for the type
- need comparison functions for the type
We can rely on derivation annotations (Base provides this) instead of manually having to write this all the time (naturally we can always write them ourselves if we want some custom logic to be used).
| |
=,==andphys_equalThere’s different notions of equality:
physically equal: if same pointer in memory
so 2 data structures with identical content but structured separately would have different pointers in memory and hence physically unequal.
polymorphic equal: it’s a structural equality
the 2 data structures would be equal structurally – values having the same contents
This kind of equality is the same polymorphic equals point we were talking about earlier in the previous section.
Not only does OCaml have multiple notions of equality (and picking between them can be tricky), the opened default library may default to a different kind of equality.
Core:
- physical equality:
== - polymorphic equality:
=
- physical equality:
Base:
- physical equality: can be done using
phys_equal - polymorphic equality: hidden –
==is deprecated =is type-specific equality (operator overloading based on what type is used)
1 2ref 1 == ref 1;; (*-- this is deprecated*) phys_equal (ref 1) (ref 1) ;;- physical equality: can be done using
Applying [@@deriving] to Maps and Sets (special syntax)
Previously we saw how we could use [@@deriving] annotation and ready our custom types to be used with maps/sets.
won’t work:
If we try to do that with maps directly, this won’t work. There’s no equivalent
Map.t_of_sexpfor maps. this is because there’s no proper way to serialise/deserialise the comparator witness using sexps.will work: using
Map.Mmap factory functor to define the type, which uses two-level functorization to encode key types and comparator witness.This is the map factory functor that fixes the key type and comparator witness once and for all. At point in time when the functor is applied, the key and comparator witness have already been closed over. This makes
Map.M(Key).tequivalent to('key, 'value, Key.comparator_witness), but written in a form that behaves like a partially applied functor over the key type.The signature for this functor looks something like this:
1 2 3 4 5 6module Map.M(String) : sig type 'v t (* parameterised value type, the key type is already captured *) val t_of_sexp : (Sexp.t -> 'v) -> Sexp.t -> 'v t ... end
Summary:
Map.t= fully generic map type(key, value, comparator).Map.M(Key).t= specialized map type where the key and comparator modules are fixed, leaving only the value type parameter.
RULE OF THUMB: just use this functor (Map.M(Key).t)
| |
Trees – for space efficiency
Maps carry within them the comparator they were created with, but we could wish for space efficiency and just keep hold of ONLY the tree.
The comparator can still be typed via a phantom type, even if it physically wouldn’t include the comparator anymore. The only thing is that we’d need to explicitly provide the comparator when we use this tree-form of the map.
Phantom Type:
we had seen this earlier in our little language example where we encoded extra compile-time information without affecting runtime representation. Now we benefit from its use elsewhere.
the type doesn’t correspond to any value directly represented in the underlying physical structure of the value.
i.e. it doesn’t show up in the definition of the type.
the type reflects something about the logic of the value of the question
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25let ord_tree = Map.Using_comparator.to_tree ord_map;; (* --- see how the phantom type shows up here: val ord_tree : (string, int, String.comparator_witness) Map.Using_comparator.Tree.t = <abstr> *) (* ==== example of how to use the map with only its tree available. *) Map.Using_comparator.Tree.find ~comparator:String.comparator ord_tree "snoo";; (* - : int option = Some 3 *) (* -- the phantom type helps us enforce the invariant: we're using the same comparator when looking pu a value as when we stored it. So this example will fail: *) Map.Using_comparator.Tree.find ~comparator:Reverse.comparator ord_tree "snoo";; (* Line 1, characters 63-71: Error: This expression has type (string, int, String.comparator_witness) Map.Using_comparator.Tree.t but an expression was expected of type (string, int, Reverse.comparator_witness) Map.Using_comparator.Tree.t Type String.comparator_witness is not compatible with type Reverse.comparator_witness *)
Hash Tables
The imperative cousins of maps, they differ in the following ways:
they are mutable: adding a k-v pair modifies the existing table
have better time complexity than maps
it’s constant time lookup and modifications vs log time for maps
hash tables depend on hash function (key to integer), maps depend on comparator function
when creating hash tables, we don’t need any type witnesses (like comparator witness needed for maps/sets) \(\implies\) the syntax is easier
REASON: No Keeping the same hash function / comparator function preserves no special invariant in the case of Hash Tables
| |
We could also use OCaml’s polymorphic hash function. RULE OF THUMB: don’t use it in areas where correctness matters.
Time Complexity of Hash Tables
There needs to be some clarification on the time complexity aspects described:
it’s amortized time complexity.
The “constant time lookup and update” is more of amortised costs actually. Like other languages, OCaml’s hash table does need resizing so some operations may get expensive.
the cost of the hash function matters
collision, for
Baseleads to the use of binary trees for the hash-bucket \(\implies\) if your hash function is sub-par and has many collisions then hash table use ends up being logarithmic time complexityNOTE: this is useful to protect against DDOS attacks!
The logarithmic behavior of Base’s hash tables in the presence of hash collisions also helps protect against some denial-of-service attacks. One well-known type of attack is to send queries to a service with carefully chosen keys to cause many collisions. This, in combination with the linear behavior of most hashtables, can cause the service to become unresponsive due to high CPU load. Base’s hash tables would be much less susceptible to such an attack because the amount of degradation would be far less.
Collisions with the Polymorphic Hash Function
Like Polymorphic Compare, Polymorphic Hash doesn’t pay attention to the type and blindly walks down the structure of a data type, computing the hash from what it sees.
Problems:
So as long as physical equality is not there, even with equivalent data, it will deem hash tables as unequal
extra prone to creating hash collisions.
this is because it works by BFS traversal walking over the data structure. Also it is bounded in the number of nodes it’s willing to traverse – which defaults to 10 “meaningful” nodes.
the bound means that parts of the data structure may be ignored and we result in pathological cases where every value we store has the same hash value.
1 2 3 4 5 6 7 8Hashtbl.Poly.hashable.hash (List.range 0 9);; (* - : int = 209331808 *) Hashtbl.Poly.hashable.hash (List.range 0 10);; (* - : int = 182325193 *) Hashtbl.Poly.hashable.hash (List.range 0 11);; (* - : int = 182325193 *) Hashtbl.Poly.hashable.hash (List.range 0 100);; (* - : int = 182325193 *)
RULE OF THUMB: for large custom data-structures, write your own hash function OR use [@@deriving]
Inline create hash function using
ppx_hashSYNTACTIC SUGAR: We can just use
%hashdirectly. It’s a shorter form of[@@deriving hash]– that’s what the previous example uses.
Choosing between Maps and Hash Tables
They overlap in functionality so we need to decide when to use what?
idiom: follow the programming paradigm
immutable maps (for functional code) vs mutable hash tables (for imperative code)
idiom: focus on performance needs and data volume
For code that is dominated by updates and lookups, hash tables are a clear performance win, and the win is clearer the larger the amount of data.
Here’s some benchmarks:
Bench mark to demonstrate generally more efficient hashtables than maps
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37open Base open Core_bench let map_iter ~num_keys ~iterations = let rec loop i map = if i <= 0 then () else loop (i - 1) (Map.change map (i % num_keys) ~f:(fun current -> Some (1 + Option.value ~default:0 current))) in loop iterations (Map.empty (module Int)) let table_iter ~num_keys ~iterations = let table = Hashtbl.create (module Int) ~size:num_keys in let rec loop i = if i <= 0 then () else ( Hashtbl.change table (i % num_keys) ~f:(fun current -> Some (1 + Option.value ~default:0 current)); loop (i - 1)) in loop iterations let tests ~num_keys ~iterations = let t name f = Bench.Test.create f ~name in [ t "table" (fun () -> table_iter ~num_keys ~iterations) ; t "map" (fun () -> map_iter ~num_keys ~iterations) ] let () = tests ~num_keys:1000 ~iterations:100_000 |> Bench.make_command |> Command_unix.runthe dune file:
1 2 3(executable (name map_vs_hash) (libraries base core_bench core_unix.command_unix))1 2 3 4 5 6 7 8dune build map_vs_hash.exe ./_build/default/map_vs_hash.exe -ascii -quota 1 -clear-columns time speedup Estimated testing time 2s (2 benchmarks x 1s). Change using -quota SECS. Name Time/Run Speedup ------- ---------- --------- table 13.34ms 1.00 map 44.54ms 3.34Benchmark to demonstrate that when we need to keep multiple related copies (like snapshots) of data at once, maps are better because they’re immutable
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39open Base open Core_bench let create_maps ~num_keys ~iterations = let rec loop i map = if i <= 0 then [] else ( let new_map = Map.change map (i % num_keys) ~f:(fun current -> Some (1 + Option.value ~default:0 current)) in new_map :: loop (i - 1) new_map) in loop iterations (Map.empty (module Int)) let create_tables ~num_keys ~iterations = let table = Hashtbl.create (module Int) ~size:num_keys in let rec loop i = if i <= 0 then [] else ( Hashtbl.change table (i % num_keys) ~f:(fun current -> Some (1 + Option.value ~default:0 current)); let new_table = Hashtbl.copy table in new_table :: loop (i - 1)) in loop iterations let tests ~num_keys ~iterations = let t name f = Bench.Test.create f ~name in [ t "table" (fun () -> ignore (create_tables ~num_keys ~iterations)) ; t "map" (fun () -> ignore (create_maps ~num_keys ~iterations)) ] let () = tests ~num_keys:50 ~iterations:1000 |> Bench.make_command |> Command_unix.run1 2 3executable (name map_vs_hash2) (libraries core_bench core_unix.command_unix))1 2 3 4 5 6 7 8dune build map_vs_hash2.exe ./_build/default/map_vs_hash2.exe -ascii -clear-columns time speedup Estimated testing time 20s (2 benchmarks x 10s). Change using -quota SECS. Name Time/Run Speedup ------- ------------ --------- table 4_453.95us 25.80 map 172.61us 1.00
TODO Chapter 15: CLI Parsing
Basic Command-Line Parsing
Defining and Anonymous Argument
Defining Basic Commands
Running Commands
Multi-Argument Commands
Argument Types
Defining Custom Argument Types
Optional and Default Arguments
Sequences of Arguments
Adding Labeled Flags
Grouping Subcommands Together
Prompting for Interactive Input
Command-Line Autocompletion with bash
Generating Completion Fragments from Command
Installing the Completion Fragment
- Installing a Generic Completion Handler