Chapter 1: Guided Tour
Tooling
- the book is going to prefer using
Baseover the standard library that OCaml has. Checks out because one of the authors is a Jane Street senior (Yaron Minsky) andBaseis from Jane Street. utopis an improvement from the olderocamlshell
Basic Language Syntax
operators
=can be used for let-binding, can also be used for equality testing, they can be used in the same sentence like so:1 2 3 4 5 6let even x = x % 2 = 0;; let sum_if_true test first second = (if test first then first else 0) + (if test second then second else 0);;Boolean binary operators:
&&and||short-circuitNOTE: differences between
Baseand ocaml stdlib:the operator for float vs integer exponentiation differs.
Basediffers because of a leaning towards consistency and clarity.For example, in Base,
**.is float exponentiation, and**is integer exponentiation, whereas in the standard library,**is float exponentiation, and integer exponentiation isn’t exposed as an operator.To use float comparison when using
Base, we need to openFloat.O.When using Base, the default comparison operators work only on integers, and you need to explicitly choose other comparison operators when you want them. OCaml also offers a special set of polymorphic comparison operators that can work on almost any type, but those are considered to be problematic, and so are hidden by default by Base
1 2 3 4 5 6 7 8 9let is_inside_scene_element point scene_element = let open Float.O in match scene_element with | Circle { center; radius } -> distance center point < radius | Rect { lower_left; width; height } -> point.x > lower_left.x && point.x < lower_left.x + width && point.y > lower_left.y && point.y < lower_left.y + height | Segment _ -> false
expressions:
numerical expressions
- numbers can have
_arbitrarily inserted into it. This is sugar and is there for readability. So300_0and3000are the same. - speciality: math operations are explicitly typed and don’t do any typing assumptions (e.g. float addition vs integer addition)
- numbers can have
type inference:
OCaml gives the freedom for us to be implicit or explicit in the types used in our code without creating ambiguity in either choices.
expressions have types and the type of an expression is inferred from the available information about the components of that expression. This is almost like a logical engine, for which the programmer usually builds an intuition on over time.
explicit typing is more for documentation purposes, or to assist correctness checks by humans.
1 2 3 4 5 6 7 8 9(* Here's the untyped version *) let sum_if_true test first second = (if test first then first else 0) + (if test second then second else 0);; (* Here's the explicitly typed version *) let sum_if_true (test : int -> bool) (x : int) (y : int) : int = (if test x then x else 0) + (if test y then y else 0);;generic type inference and parametric-polymorphism
When it’s not possible to tie it down to concrete types, a type variable e.g.
'ais a generic type that is used to represent the generic type. The'is convention for showing that the name'arepresents a type var.This is similar to how generics is handled by other languages like Java (perhaps Python’s typing system as well).
example from utop:
1 2 3 4 5 6 7 8 9 10 11let first_if_true test x y = if test x then x else y;; (* my toplevel output was: *) (* val first_if_true : ('a -> bool/2) -> 'a -> 'a -> 'a = <fun> *) let long_string s = String.length s > 6;; let big_number x = x > 3;; first_if_true long_string "nice" "well done, let's do this";; first_if_true big_number 3 2;;
name binding
let-binding is how OCaml does name binding, which has some implicit meanings that get floated as rules:
variable name identifiers must start with lowercase letter or underscore. The lowercase part is because uppercased first letter names are treated as type constructors.
1 2 3 4 5 6 7(* Legal examples *) let seven = 3 + 4;; let _seven = "seven";; let s3v3n = "hello";; (* Illegal examples *) (* let Eight = 4 + 4;; (\* This will error out on "Eight" because it's things it's a constructor *\) *)functions are values so names also can be let-binded to functions. The syntax for definition and invocation of functions is more Unix-shell-like (space separated) instead of by tokens like parentheses.
We can co nested let-bindings as a way to define names within a local scope. This helps us write out complex expressions easily. The scope of let bindings are terminated by ;;.
| |
Modules:
.is for name-spaced access rather than OOP-like accessing of a method.name shadowing:
in OCaml it’s when there’s a clash in name identifiers, it’s not an automatic erasure of the old one, it’s just a hiding/superceding within a given scope. At the point where that scope ends, the original definition is visible again.
Shadowing is not destructive.
Shadowing can be constrained to a limited, local scope like so:
1 2 3let ratio x y = let open Float.O in of_int x / of_int y;;alternative (more concise) syntax:
1 2let ratio x y = Float.O.(of_int x / of_int y);;openallows availability of names in the current toplevel without having to prefix the module name.- Is
openExplicit or Implicit Shadowing?- depends on whether you’re doing it with the knowledge that name shadowing will occur or not.
- Is
Type Errors vs Exceptions
OCaml distinguishes compiler errors (pre-runtime) from runtime Exceptions (at runtime).
Structural/ADTs: tuples, lists, options, records
These are data types that are predefined, polymorphic, algebraic data types. In other languages, we’d have described it as “container types”, that term in the OCaml world is for reference holders.
Why “Container” Is Not Precise
In OCaml, the term “container” is best reserved for types that implement explicit interfaces for holding collections of items—like sets, maps, or similar abstractions (as opposed to data types defined by their constructors as in options, lists, or tuples). Lists qualify as containers in this generic sense, but options and tuples serve somewhat different structural roles—they embody fixed or optional structure rather than arbitrary-sized holding. (ref)
Tuples as cartesian product types
the type description (e.g.
val a_tuple : int * string = (3, "three")) hints at a mental model where we consider the cartesian product of the two typestands, that’s why the*is used.syntax is similar for constructing a tuple (
let a_tuple = (3,"three");;) and for extracting from a tuple (let (x,y) = a_tuple;;)we can pattern match function arguments, this is the same stuff that we’re familiar with in the elixir world.
List and options as polymorphic types
List
elements must be homogenous in their type.
constructor is
::and it’s right associative, empty list is[]we can do list concat but remember it’s NOT in constant time
1[1;2;3] @ [2;4;12]SPECIALITY: list elements are separated using
;.,will automatically give us tuples so careful on this. I’d imagine this is a common beginner’s silent bug because that’s legal but often unintended.Pattern matching:
It’s a good idea to try and represent outcomes exhaustively.
we could choose to pattern match using the
[]and::operator primitives but oftentimes, we may miss out some cases (the compiler is smart enough to warn against this).Instead, we could use
matchexpressions1 2 3 4 5 6 7 8 9(* this compiles with a warning, may throw runtime exception if we pass in [] *) let my_favorite_language_1 (my_favorite :: the_rest) = my_favorite;; (* with match expression *) let my_favorite_language_2 languages = match languages with | first :: the_rest -> first | [] -> "OCaml" (* A good default! *);;
OptionSomeandNoneare constructors that help us build optional values.1 2 3 4 5 6 7 8 9 10(* Constructing options: *) let divide x y = if y = 0 then None else Some (x / y);; (* Pattern matching useful for examining options *) let downcase_extension filename = match String.rsplit2 filename ~on:'.' with | None -> filename | Some (base,ext) -> base ^ "." ^ String.lowercase ext;;In a way, only
Optionsare nullable in OCaml, which differs from other langauges where any type may be nullable since it may be instantiated but not assigned a value.Missing values are explicit in OCaml, which is beautiful (old Yegor Bugayenko essay on “Why Null is Bad” and another Tony Hoare essay on “Null References: The Billion Dollar Mistake” )
OCaml’s
resulttype is used similarly for expressing error or absence but is intended to signal computation errors rather than optional absence.
Records as labelled product types
We should contrast these with positional typing that we see in tuples. Since it’s labelled product types, we have a bunch of ways we can access and use them.
Accessing can be done via dot notation, pattern matching or field punning.
1 2 3 4 5 6(* Dot notation: *) let distance v1 v2 = magnitude { x = v1.x -. v2.x; y = v1.y -. v2.y };; (* field punning: field name reuse from the record *) let magnitude { x; y } = Float.sqrt (x **. 2. +. y **. 2.);;Variant Types (aka Algebraic Data Types): representing multi-object scenes
Say we wish to describe a multi-object scene, we use Variant types to describe a unified representation as a single type.
Unlike union types in other languages (which are usually untagged), variants are tagged types
1 2 3 4type scene_element = | Circle of circle_desc (* So, Circle is the tag for this *) | Rect of rect_desc | Segment of segment_desc
Example of expressiveness:
| |
Pattern Matching: some generic points about it
Like in Elixir, pattern matching gives us a lot of joyful ways to express things.
The cases within a match expression are described as “arms” of the match.
SPECIALITY: designing a recursive function can be seen as the definition of inductive cases which eventually reduce to base cases. When writing recursive list functions, pattern matching becomes very useful:
| |
For match cases, we can actually stack up the cases and have one arm for multiple matches like so: it’s called an or-pattern
| |
Imperative Programming
Consider the following distinguishing of imperative vs functional programming:
- imperative: computations are structured as sequences of instructions that operate by making modifications to the state of the program.
- functional: variable bindings and most data structures being immutable.
OCaml supports imperative programming via the usage of mutable data structures (e.g. arrays, hash tables) and control flow constructs (for, while loops). We control the sequence of operations using ; because order might matter since we’re doing mutations.
Arrays
- direct access and modification of array elements in constant time.
- one of the most compact DSes, better than lists.
- return value for this is the
unittype, treated as a placeholder that communicates that side-effects have happened. Something likevoidin C/Java.
examples:
| |
Mutable Record Fields
Fields within records can be defined as mutable so that we an keep modifying the same record. We can have a hybrid set of fields here, some mutable and some immutable.
Typically means we should have some crud-like functions that interact with this as well.
| |
Refs
Single mutable values, it’s actually just a record type with contents as the single mutable field. Should be seen as a builtin type alias to a record with a single field called contents.
| |
Refs can be used as accumulators in iterations but that’s not idiomatic:
| |
for and while loops
| |
| |
Chapter 2: Variables and Functions
Variables
OCaml has a different take on variables than other programming languages in the sense that variables are all constants within their scope.
Additionally, we may shadow them in nested scopes and have access to the outer scopes when the nested scope is done with. A good mental model is a stack of names and every time we redefine it, we append to the stack and when we’re done, we pop from the stack (leaving access to the previous value for that shadowed name).
This is the same as what we’re familiar with in the elixir world.
| |
Pattern Matching and Let
Nothing fancy here, pattern matching is structural comparisons so we can have let-binding of names on LHS and pattern match to the right hand side like so:
Just careful that all cases for the pattern matching are considered, else use a match expression instead.
| |
Functions
Essentially “first class citizens” because they’re as normal as any other object.
let AND fun: let-binding function parameters
Think of the parameter of a function as a variable being bound to the value passed by the caller. This notion is useful for our final understanding of monadic styles.
The following are equivalent statements (almost) in how this name-binding works:
| |
Multi-argument Functions
Some ways to write functions with multiple args:
| |
SPECIALITY: Multi-arg functions are equivalent to their curried forms in OCaml, so we may partially apply them.
Recursive Functions
Should separate OCaml’s looping constructs and use them for imperative style code. This is why recursive functions are idiomatic for building looping constructs recursively.
We have to explicitly use let rec to define recursive function in OCaml.
The value of having the programmer explicitly define recursive functions:
good for mutually recursive definitions because those are harder for humans to reason with
here’s an example of a mutually recursive example for didactic reasons:
1 2 3 4let rec is_even x = if x = 0 then true else is_odd (x - 1) and is_odd x = if x = 0 then false else is_even (x - 1);;having non-recursive forms helps us create a new definition that extends and supersedes a recursive form by shadowing it.
Prefix and Infix Operators (and operator overloading)
special set of identifiers:
~ ! $ % & * + - . / : < = > ? @ ^ |Here’s an extended set of grammar rules for it, search it properly when you need to override / implement such infix functions. There’s a bunch of precedence and associativity rules to account for as well.
The first character of the operator symbol determines a bunch of things (operator precedence, left or right associative…). We can group up multiple characters into a single infix operator.
Some interesting things about user-defined operators:
- we can easily implement the pipe operator (see below)
- our choice of operator matters because it would have implications such as the operator being left or right associative that may pose problems to us.
1 2 3 4 5 6 7 8 9 10 11 12 13let (|>) x f = f x;; (* NOTE: left-associative operator *) (* example of using pipe operator in OCaml *) open Stdio;; let path = "/usr/bin:/usr/local/bin:/bin:/sbin:/usr/bin";; String.split ~on:':' path |> List.dedup_and_sort ~compare:String.compare |> List.iter ~f:print_endline;; (* NOTE: This works well because |> is left-associative, which is why the operator we use matters*) (* this won't work well because the operator is right-associative *) let (^>) x f = f x;; - operator precedence matters.
GOTCHA: using
*identifier need to be careful because(**)is for comments syntaxThe fix to this is that we have to have some extra spaces:
1 2let ( *** ) x y = (x **. y) **. y;; (* this is correct *) let (***) x y = (x **. y) **. y;; (* this is NOT correct *)GOTCHA: there are some special operators:
-: can be written as infix (e.g.Int.max (3-4) 1) and prefix (e.g.Int.max 3 (-4)) versions, with different meanings.Also,
-has a lower precedence to function application in an expression, so that’s why we need brackets in the example above.
Reverse Application Operator (
|>)This is mind blowing 🤯.
Since operator overloading is easy to do and the scope for it may be kept small, we can implement the Reverse Application operator (elixir pipe operator) easily.
1 2 3 4 5 6 7 8 9 10 11 12 13let (|>) x f = f x;; (* NOTE: left-associative operator *) (* example of using pipe operator in OCaml *) open Stdio;; let path = "/usr/bin:/usr/local/bin:/bin:/sbin:/usr/bin";; String.split ~on:':' path |> List.dedup_and_sort ~compare:String.compare |> List.iter ~f:print_endline;; (* NOTE: This works well because |> is left-associative, which is why the operator we use matters*) (* this won't work well because the operator is right-associative *) let (^>) x f = f x;;
Application Operator (
@@)For syntax sugar when we apply multiple functions together:
f @@ g @@ h xwhich replacesf(g(h x)). This HAS to be right-associative.
more on syntax
Naturally, there’s some good syntax support for writing functions.
function: syntactic supportThe
functionsyntax allows us to pattern match easily, we can use it together with the usual syntax. If it’s a multi-argument function, we’re actually just pattern matching the last parameter to that function .1 2 3 4 5let some_or_default default = function | Some x -> x | None -> default;; List.map ~f:(some_or_default 100) [Some 3; None; Some 4];;1 2 3 4 5 6 7 8 9 10 11 12 13 14(* another example: *) (** [contains t x] returns true iff [x] is contained in the interval [t] *) let contains t x = match t with | Empty -> false | Interval (l,h) -> Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0 (* converting this to use the syntactic sugar: *) (** [contains t x] returns true iff [x] is contained in the interval [t] *) let contains x = function | Empty -> false | Interval (l, h) -> Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0
labelled args
Labelling here is just named params.
The labelled args can be label pruned (aligns with field punning that we saw in records).
1 2 3 4 5 6 7 8let ratio ~num ~denom = Float.of_int num /. Float.of_int denom;; (* pruned names: *) let num = 3 in let denom = 4 in ratio ~num ~denom;; (* equivalent application of the ratio function: *) ratio ~num:3 ~denom:10Use cases
To highlight the usefulness of labelling:
explicating long argument lists
when too many args, easier to remember by label rather than position
when positional arg types are uninformative (and to disambiguate similar arguments)
sometimes type signature is not sufficient to get the meaning of the arguments. Using labelled signature gives this clarity.
e.g.
val create_hashtable : int -> bool -> ('a,'b) Hashtable.tvsval create_hashtable : init_size:int -> allow_shrinking:bool -> ('a,'b) Hashtable.tit’s useful for partial application and chaining together operations
It allows us to bind some of the arguments to partial functions based on their label. In the example below works because
List.itercan be given the labelled argument forf.1 2 3String.split ~on:':' path |> List.dedup_and_sort ~compare:String.compare |> List.iter ~f:Stdio.print_endline;;
GOTCHA: Higher Order Functions must be consistent in the ordering of labeled functions
This one has to do with consistent ordering:
suppose HOF:
let apply_to_tuple f (first,second) = f ~first ~second;;then if we use a labelled function like
let divide ~first ~second = first / second;;, everything works well because the ordering of the arguments and the labels are consistent.if the HOF was to swap the ordering:
let apply_to_tuple_2 f (first,second) = f ~second ~first;;then the same divide function wouldn’t work.this is because the HOF expects a function and that function has a signature that the compiler ties down based on the names used there. In the case of the 2nd HOF, the function argument has a signature where the first argument it accepts is labelled as
secondand the second argument is labelled asfirst. This is where the mismatch is created ifdividehas a different ordering of the labelled arguments.we can inspect the function signatures for our two functions and the problem is clearer:
apply_to_tuplehas the signature- : (first:'a -> second:'b -> 'c) -> 'a * 'b -> 'c = <fun>apply_to_tuple2has the signature- : (second:'a -> first:'b -> 'c) -> 'b * 'a -> 'c = <fun>
Optional Arguments
Good use cases for Optional Argos:
- The intent is to provide defaults for params that you may have an opinion on what they should be.
- for wrapping onto some API interface and providing some arguments already
1 2 3 4 5 6 7 8let concat ?(sep="") x y = x ^ sep ^ y;; (* here's how the wrapper can be built: *) let uppercase_concat ?sep a b = concat ?sep (String.uppercase a) b;; (* this is a pass-through*) (* negative example of how to write the wrapper: *) let uppercase_concat ?(sep="") a b = concat ~sep (String.uppercase a) b;; (* the decision on what the default separator should be is a distinct, separate decision on the outer function. If the inner function (concat) changes then we might need to change the outer function as well.*) - RULE OF THUMB: avoid optional arguments for functions internal to a module (i.e. functions not defined in the interface)
Optional arguments really only make sense when the extra concision of omitting the argument outweighs the corresponding loss of explicitness.
Problems:
- user not knowing that there’s an optional argument
- rarely used functions should NOT have optional arguments
When defining a function with optional args, we have some syntactic sugar (where we just provide a default param to the optional) to help us out:
1 2 3 4 5 6 7(* Longer way to write the function*) let concat ?sep x y = let sep = match sep with None -> "" | Some s -> s in x ^ sep ^ y ;; (* Terse way: no need match on optional, just can provide a default param*) let concat ?(sep="") x y = x ^ sep ^ y;;When passing in optional args, they have to be passed in explicitly:
how labelled and optional args are inferred
explicit typing
in this example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16let numeric_deriv ~delta ~x ~y ~f = let x' = x +. delta in let y' = y +. delta in let base = f ~x ~y in let dx = (f ~x:x' ~y -. base) /. delta in let dy = (f ~x ~y:y' -. base) /. delta in (dx,dy);; (* the signature comes out to rely on the order of arguments that's in the code: val numeric_deriv : delta:float -> x:float -> y:float -> f:(x:float -> y:float -> float) -> float * float = <fun> *)not obvious how the order of the arguments to
fshould be chosen, but there’s a need to have some heuristic around resolving this ambiguity.The heuristic the compiler uses is to prefer labels to options and to choose the order of arguments that shows up in the source code.
In cases that are ambiguous, compiler may throw an error for this, but we may still have varying orders if we explicitly provide the types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23(* this works, types are explicitly provided *) let numeric_deriv ~delta ~x ~y ~(f: x:float -> y:float -> float) = let x' = x +. delta in let y' = y +. delta in let base = f ~x ~y in let dx = (f ~y ~x:x' -. base) /. delta in let dy = (f ~x ~y:y' -. base) /. delta in (dx,dy);; (* this won't work and will throw an error *) let numeric_deriv ~delta ~x ~y ~f = let x' = x +. delta in let y' = y +. delta in let base = f ~x ~y in let dx = (f ~y ~x:x' -. base) /. delta in let dy = (f ~x ~y:y' -. base) /. delta in (dx,dy);; (* Line 5, characters 15-16: Error: This function is applied to arguments in an order different from other calls. This is only allowed when the real type is known. *)
GOTCHA: partial application with optional arguments
This part can be sometimes tricky.
Somewhat consistent with our currying observations, we have to be aware the optional argument can be ERASED in some cases if we do partial applications.
The rule is:
optional argument is erased as soon as the first positional (i.e., neither labeled nor optional) argument defined after the optional argument is passed in.
say we do
let colon_concat = concat ~sep: ":", then signature A and B will behave differentlySignature A below will end up having the erasure B will still allow for it.
This point only applies when we’re trying to do partial applications for functions that have optionals within them! If we provide all the arguments at once, then the ordering doesn’t matter whatsoever.
1 2 3 4 5 6 7 8 9 10 11(* signature A: positionals are all after the optional *) let concat ?(sep="") x y = x ^ sep ^ y;; (* val concat : ?sep:string -> string -> string -> string = <fun> *) let prepend_pound = concat "# ";; (* val prepend_pound : string -> string = <fun> *) (* Signature B: positional before optional *) let concat x ?(sep="") y = x ^ sep ^ y;; (* val concat : string -> ?sep:string -> string -> string = <fun> *) let prepend_pound = concat "# ";; (* val prepend_pound : ?sep:string -> string -> string = <fun> *)
Chapter 3: Lists and Patterns
Basic notes
Just some points to revisit the tour that we already had
cons operator
::is right-associative. The attachment of things is consistent with the knowledge that OCaml lists are singly-linked lists.this is why the extension doesn’t modify the existing list, it’s more of that the new element is “prepended” and has a refernce to the rest of the list.
the empty list
[]is polymorphic even though the overall list is NOT.conventional naming:
- follow the naming of
hdandtlfor head and tail of list.
- follow the naming of
Patterns to extract data from list
match expressions can let-bind new names
Match expressions can be used to bind new variables. This may lead to problems where we think that we’re reusing a variable within the arm of a match case but it’s actually going to shadow that variable instead.
LIMITATION: In the elixir world, this is why we have the use of the pin operator but in OCaml, there’s no similar construct
| |
the fix to this is to use guards or if statements, which would give the same function signature:
| |
Limitations and Blessings of Pattern Matching
Pattern matching is structural \(\implies\) we shouldn’t try to encode arbitrary conditions within the pattern. (That’s what the when conditions are for.)
The conditions we use pattern matching for should only be structurally relevant.
blessing: Performance
matchcases are entries in a jump table, so they are efficient.a good comparison is between the performance for match cases and an equivalent if else statement:
WITNESS: here’s how we can do a quick benchmark using
core_benchmodule1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(* this gives a jump table. *) let plus_one_match x = match x with | 0 -> 1 | 1 -> 2 | 2 -> 3 | 3 -> 4 | 4 -> 5 | 5 -> 6 | _ -> x + 1;; (* this is sequential, gives a conditional branch tree to navigate *) let plus_one_if x = if x = 0 then 1 else if x = 1 then 2 else if x = 2 then 3 else if x = 3 then 4 else if x = 4 then 5 else if x = 5 then 6 else x + 1;; #require "core_bench";; open Core_bench;; [ Bench.Test.create ~name:"plus_one_match" (fun () -> plus_one_match 10) ; Bench.Test.create ~name:"plus_one_if" (fun () -> plus_one_if 10) ] |> Bench.bench;;1 2 3 4 5 6 7Estimated testing time 20s (2 benchmarks x 10s). Change using '-quota'. ┌────────────────┬──────────┬────────────┐ │ Name │ Time/Run │ Percentage │ ├────────────────┼──────────┼────────────┤ │ plus_one_match │ 35.45ns │ 74.59% │ │ plus_one_if │ 47.53ns │ 100.00% │ └────────────────┴──────────┴────────────┘
blessing: Error Detection
Some examples of what we’ve already seen
- the exhaustiveness of
matchexpression-cases is checked by the compiler - redundancies of cases are also checked
These work by patterned-context, not by some algo that checks the predicates (doesn’t exist).
What we will see soon:
- the exhaustiveness checks in the context of variants and user defined types
- the usefulness of using match statement warnings when we want to chase a chain of refactor-effects – the ball of yarn will just show itself!!
List module
Similar to elixir, the default modules offer good APIs for us to use.
Something to note in this section is that we will see a bunch of composed funtions such as List.concat_map which composes map and concat. Like in our understanding of functions (which is right-associative) any composed function op x_y means apply y then x
Here’s an example set of code where we:
use
List.mapandList.map2_exn- the
_exnis the OCaml way of doing throwable function calls, like in Elixir where we have strict operations appended with!to the function name.
- the
use
StringfunctionsString.concatwhich are better for large strings. Using the^operator will generate multiple sub-strings first – becomes a problem for largers strings.String.make
use
List.foldwhich is just a more general version of reduce function in OCaml syntaxList.fold walks over the list from left to right, updating the accumulator at each step and returning the final value of the accumulator when it’s done.
1 2(* example usage for summing a list *) List.fold ~init:0 ~f:(+) [1;2;3;4];;Note:
- accumulator need NOT be the same time as the elements of the list
1List.fold ~init:[] ~f:(fun acc hd -> hd :: acc) [1;2;3;4];;
- accumulator need NOT be the same time as the elements of the list
The example is one where we try to render a table of values (i.e. a 2D array). So our table rendering function’s code would look something like:
| |
| |
Combining Elements with List.reduce
It’s a more specific case, of fold, without the init param supplied.
Result is wrapped in an optional.
Filtering with List.filter, List.filter_map
- the case for filtering then mapping is common enough for
List.filter_map1 2 3 4 5 6 7 8 9 10(* list of file extensions from a list of files, returning deduped and in sorted order. *) let extensions filenames = List.filter_map filenames ~f:(fun fname -> match String.rsplit2 ~on:'.' fname with | None | Some ("",_) -> None | Some (_,ext) -> Some ext) |> List.dedup_and_sort ~compare:String.compare;; extensions ["foo.c"; "foo.ml"; "bar.ml"; "bar.mli"];
Partitioning with List.partition_tf
The _tf is a helpful mnemonic that reminds us that the output is two lists: first list is with the elements where the predicate returns true and the second list is the one for falses.
Multiple ways to combine lists
We can append (which can be done via List.append or operator @) or we can concat via List.concat (something more like flattening a list of lists).
Since concat and map functions are often used together (same idea as flatmaps), there’s a dedicated List.concat_map for us to use
| |
Tail-call Optimisation: Writing tail recursive functions
Classic stack machine used so there’s a limit to the stack frames, but it’s easy to write out tail recursive functions.
| |
Tail-call optimisation is not new to me but here’s another short rundown:
in
length_plus_n, the recursive call for that function is a tail call \(\implies\) no need to allocate new stack frame- tail-call when the caller of the function doesn’t have to do anything withh the returned value (doesn’t use it) that is returned by the callee, other than to return it. This is why there’s no need to preserve that stack frame (of the caller) \(\implies\) compiler can make this tail call optimisation.
idioms (to transform non-TCO to TCO style):
tail calls = recursive call is the “final action”, no other important logic happens after
accumulator and helper usage:
it’s just a clean way of doing things so that we can isolate out non-tail calls from tail-calls within naive recursive functions \(\implies\) naturally we end up with tail-called helpers
if all calls are tail calls, then we’ve done tail-call optimisation correctly.
Terser / Faster Patterns
using the as pattern to alias the matched pattern
Typically if we add in name, a new object for the matching type is created. We can save space by using as to get handle to the matched value.
The as pattern lets you bind a name to the whole matched value in addition to deconstructing it.
| |
using when for precondition checks
continuing with the example, the similarity check was being done via an if-else. We could shift it to a when guard.
| |
polymorphic comparison
This is going to be so magical.
The idea here is that the Base’s default equality op is integers-only. If we use a polymorphic version, then we will be able to use that function we had for any type.
That’s what Base.Poly helps us do.
| |
LIMITATION: you can’t build these functions (polymorphic comparators) on your own. OCaml’s polymorphic comparison functions are built into the runtime to a low level. These comparisons are polymorphic on the basis of ignoring almost everything about the types of the values that are being compared, paying attention only to the structure of the values as they’re laid out in memory.
WITNESS: typically OCaml devs don’t like polymorphic comparison because it can introduce silent bugs that will be hard to catch. Restrictions are a good thing because they give predictability to our code.
NOTE: when guards may harm the compiler’s exhaustiveness check because the pattern matching and logical predicate (for when clause) is being directly used.
Chapter 4: Files, Modules, and Programs
Files are the unit of organisation for a module. This section is a bunch of OCaml specific notes on writing OCaml programs, working with modules and module signatures.
Single File Programs
here’s the demo code:
| |
For learning, we’re directly compiling a single file.
This needs explicit linking for imported modules, this is done by using ocamlfind together with ocamlopt for the linking (which is asked by -linkpkg)
ocamlfind ocamlopt -linkpkg -package base -package stdio freq.ml -o freq
We shall KIV first until we find a better way to
Multi file Programs and Modules
Files are the unit of association for modules. Should be seen as a collection of definitions stored within a namespace. Modules are always titlecased regardless the casing of the filename where they’re defined.
The buildsystem will figure out the dependencies accordingly.
Signatures and Abstract Types
We should always depend on interfaces instead of direct code implementations, interface segregation is always great.
OCaml uses interface/signature/module type interchangeably. Similar to C, we can have interface files (.mli) where the interfaces are defined for them to be implemented in corresponding .ml files.
The concrete datatypes that a module supports may be considered an implementation detail, therefore we might wish to define some abstract data types. That can be done within the interface files as well.
Interfaces are a good place to include docstrings, which can be naturally picked up by odoc
val declarations
val declarations form the syntax for specifying values in a signature: val <identifier> : <type>
Abstract Types in Signatures
This part is about declaring interfaces (signatures) in a way that the concreteness of that implementation is abstract to the module consumers.
An entry like of an abstract type t within the interface file (.mli) means: “This module defines a type t, but I am not telling you how it is implemented.”
| |
The compiler uses both files:
- for abstraction boundary, determining what’s visible to clients: use the
.mli - for how to build and use it internally,
.ml
When compiled:
The
.mlitype makes the type abstract to callers.Describes what a module exposes — the public API. Abstract type definitions hide representation details.
The
.mltype becomes a manifest implementation, checked to be consistent with the.mli.Provides the actual definitions and values — how the module’s promises are fulfilled.
Disambiguating “abstract” vs “polymorphic”
MISCONCEPTION:
This was somewhat a misconception of mine. They are similar but not the same.
Concrete Types in Signatures
We may wish to define concrete types (typically variant types) in our interfaces. Concreteness here means that the clients of that module have visibility to the structure of that type.
Whether a given type should be abstract or concrete is important and depends on context:
Abstract types give you more control over how values are created and accessed, and make it easier to enforce invariants beyond what is enforced by the type itself
concrete types let you expose more detail and structure to client code in a lightweight way
NOTE: types and values have distinct namespaces, so we may have the same name seen in both type and function definitions. Here’s an example:
In OCaml, if you define an abstract type in the interface (.mli), you must also concretise it in the implementation (.ml).
If you define a concrete type in the interface (.mli), you must also define the same concrete type in the implementation (.ml). It may look redundant.
| |
Nested Modules
Files are a unit of association for modules but we may want to have sub-modules within a module to have clearer separation of overlapping but different types. We may nest modules within other modules.
We want the ability to define type identifiers that are distinct but may have similar underlying implementations. I think it’s a bad analogy but it’s similar to subclassing an abstract/virtual class from the OOP world – only in that it provides separation of types with potentially shared underlying representation, but it’s better framed as type abstraction and information hiding, not inheritance.
This is because confusing different kinds of identifiers is a very real source of bugs so instead of using bare types (e.g. strings) we should use concrete types that we define (Username, Hostname).
Example given is that of usernames and hostnames within sessions.
| |
Opening Modules
Opening gives direct access to the namespace, but may shadow existing names as well.
Some general rules of thumb on this:
open rarely
Opening a module is basically a trade-off between terseness and explicitness—the more modules you open, the fewer module qualifications you need, and the harder it is to look at an identifier and figure out where it comes from.
There are some modules that were designed to be opened: like
Baseitself, orOption.Monad_infixorFloat.OwithinBase.Use local opens to limit the scope of the opened module
There are two syntactic approaches to this:
normal – let binding the module namespace
1 2 3let average x y = let open Int64 in (x + y) / of_int 2;;lightweight – better for small expressions
1 2let average x y = Int64.((x + y) / of_int 2);;
Use module shortcuts
Similar to module aliases like in Elixir. We should do this only to a small, local scope. Doing it at top-level is a mistake.
1 2 3 4 5 6let print_median m = let module C = Counter in match m with | C.Median string -> printf "True median:\n %s\n" string | C.Before_and_after (before, after) -> printf "Before and after median:\n %s\n %s\n" before after
Including Modules
Opening a module affects the environment used to search for identifiers, including a module is a way of adding new identifiers to a module proper.
The difference between include and open is that we’ve done more than change how identifiers are searched for: we’ve changed what’s in the module. Directly using open won’t work because that chaining up of namespace won’t be done.
| |
include works for both signatures (on interface files) as well as code, so this is one way we can extend the functionality of modules that we consume.
| |
The implementation is where shadowing happens \(\implies\) the order of declaration matters here (in the ml file). Order doesn’t matter in the interface file (.mli).
Common Definitions
Similar to barrel files, we may have an
import.mlfor common imports. They may hold things like intentional name overrides e.g. using a customExt_optionin place ofOptionwhen we use the nameOption1 2 3 4 5 6 7 8 9(* within import.ml *) module Option = Ext_option (* within our module file *) open Base open Import (*the common definitions imported*) let lookup_and_apply map key x = Option.apply (Map.find map key) x
Common Module Compilation Errors
Here’s the common sources of compilation errors:
Type Mismatches – the simplest types
The compiler will complain about this if the interface and implementation files differ in their types.
Type definitions missing
This is actually more of the implementation is missing: when we defined it in the interface but don’t have a corresponding implementation for it.
TERMINOLOGY: the interface declaration is “type spec” and the implementation declaration is the “type definition”.
Type Definition Order Mismatches
For abstract (variant) types that we define, the order matters and should match between the interface and the implementation file. The order of the declaration of variants matters to the OCaml compiler.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31(* I1: v1 of interface -- this will match with implementation below:*) (** Represents the median computed from a set of strings. In the case where there is an even number of choices, the one before and after the median is returned. *) type median = Median of string | Before_and_after of string * string val median : t -> median (* I2: v2 of interface -- this will not match with implementation below:*) (** Represents the median computed from a set of strings. In the case where there is an even number of choices, the one before and after the median is returned. *) type median = | Before_and_after of string * string | Median of string val median : t -> median (* Implemenation (within .ml file) *) type median = Median of string | Before_and_after of string * string let median t = let sorted_strings = List.sort (Map.to_alist t) ~compare:(fun (_, x) (_, y) -> Int.descending x y ) in let len = List.length sorted_strings in if len = 0 then failwith "median: empty frequency count" ; let nth n = fst (List.nth_exn sorted_strings n) in if len % 2 = 1 then Median (nth (len / 2)) else Before_and_after (nth ((len / 2) - 1), nth (len / 2))Cyclic dependencies
Technically, recursive modules (that have cyclic deps) are possible but for now assume it’s illegal.
What’s forbidden:
self-referential: module referencing its own name: e.g.
let singleton l = Counter.touch Counter.emptywithincounter.mltransitive references
the compiler will tell you where the cyclic dependency is from its trace.
Designing with Modules
Modules are essential to OCaml programs, some design tips:
Rarely expose concrete types, stick to abstract types
Most of the time, abstraction is the right choice, for two reasons:
it enhances the flexibility of your design, and
benefit: we’re free to change implementation with minimal blast radius if interface consumers depend on abstract types
it makes it possible to enforce invariants on the use of your module
problem: If your types are exposed, then users of the module can create new instances of that type (or if mutable, modify existing instances) in any way allowed by the underlying type. That may violate a desired invariant i.e., a property about your type that is always supposed to be true.
When will concrete types make sense?
when there’s a lot of value in pattern matching for the concrete types and when the invariants that you care about are already enforced by the data type itself.
Design for the Call Site
Beyond ease of understanding, you want the call to be as obvious as possible for someone who is reading it at the call site. This reduces the need to jump to the interface declarations to get more context.
Some ways of improving the readability of client code:
use labelled args
good names for functions, variant tags, record fields
naming RULE OF THUMBs:
A good rule of thumb is that names that have a small scope should be short, whereas names that have a large scope, like the name of a function in a module interface, should be longer and more descriptive.
Another useful rule of thumb is that more rarely used names should be longer and more explicit, since the cost of verbosity goes down and the benefit of explicitness goes up the less often a name is used.
uniform interfaces
Make the different interfaces in your codebase follow similar patterns to have some level of predictability.
To borrow guidelines from the common modules:
A module for (almost) every type. You should mint a module for almost every type in your program, and the primary type of a given module should be called
t.Put
tfirst. If you have a moduleMwhose primary type isM.t, the functions inMthat take a value of typeM.tshould take it as their first argument.Mark the exception throwable functions with
_exnFunctions that routinely throw an exception should end in
_exn. Otherwise, errors should be signaled by returning an option or anOr_error.tKIV chapter 7 on error handling.Some type signatures for specific functions should be uniform
signature for
mapis always essentially the same, no matter what the underlying type it is applied to. This kind of function-by-function API uniformity is achieved through the use of signature includes, which allow for different modules to share components of their interface.
Design interfaces before writing implemenation
This is just classic words of wisdom.
Types and signatures provide a lightweight tool for constructing a skeleton of your design in a way that helps clarify your goals and intent, before you spend a lot of time and effort fleshing it out.
Chapter 5: Records
Basics
- Syntax pointers:
the keynames within records must start with a lowercase letter.
here’s a basic example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20open Core (* record defined here -- corresponds to /etc/services file on unix *) type service_info = { service_name : string; port : int; protocol : string; };; # require "re";; let service_info_of_string line = let matches = let pat = "([a-zA-Z]+)[ \t]+([0-9]+)/([a-zA-Z]+)" in Re.exec (Re.Posix.compile_pat pat) line in { service_name = Re.Group.get matches 1; port = Int.of_string (Re.Group.get matches 2); protocol = Re.Group.get matches 3; } let ssh = service_info_of_string "ssh 22/udp # SSH Remote Login Protocol";;accessing fields from a record:
- field value accessing can be done with dot notation
- we may pattern match as well
- how the compiler infers the record types:
- when no ambiguity: the compiler bases its inference on the field names used in constructing the record. That inference is most straightforward when each field name belongs to only one record type.
- fields within the record can have an abstract type \(\implies\) the record is polymorphic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41open Core (* record defined here -- corresponds to /etc/services file on unix *) type service_info = { service_name : string; port : int; protocol : string; };; # require "re";; let service_info_of_string line = let matches = let pat = "([a-zA-Z]+)[ \t]+([0-9]+)/([a-zA-Z]+)" in Re.exec (Re.Posix.compile_pat pat) line in { service_name = Re.Group.get matches 1; port = Int.of_string (Re.Group.get matches 2); protocol = Re.Group.get matches 3; };; (* polymorphic record, with abstract type: *) type 'a with_line_num = { item: 'a; line_num: int } let parse_lines parse file_contents = let lines = String.split ~on:'\n' file_contents in List.mapi lines ~f:(fun line_num line -> { item = parse line; line_num = line_num + 1; }) ;; let service_file_content = "rtmp 1/ddp # Routing Table Maintenance Protocol tcpmux 1/udp # TCP Port Service Multiplexer tcpmux 1/tcp # TCP Port Service Multiplexer";; (* a' may be the service_info record *) parse_lines service_info_of_string service_file_content;; (* a' may be an integer *) parse_lines Int.of_string "1\n10\n100\n1000";;
Pattern Matching & Exhaustiveness
Record patterns are irrefutable, meaning that a record pattern match will never fail at runtime (since the fields are fixed). Therefore, we can pattern-match using a single pattern. In general, patterns for types with a fixed structure, like records and tuples, are irrefutable, unlike types with variable structures like lists and variants.
Also we can do a partial pattern match, though that may cause a drift in usage expectation if the record changes silently – this means that when new fields are added to the record, code that should be updated to react to the presence of those new fields will not be flagged by the compiler.
There’s optional warnings (see ocaml -warn-help) with which we can check for some of these sources of error. RULE OF THUMB: treat warnings as errors in dev but allow them for prod/packaged for distribution.
Use _ to signal fields that we don’t care about.
| |
Field Punning (and label punning)
Punning gives us expressive syntax while allowing us to be terse.
This style of punning allows us to propagate the same names throughout the codebase so names are consistent, making it easier to navigate the source.
| |
Reusing Field Names
When reusing field names, the compiler(typechecker) may exhibit implicit behaviour that may be non-obvious at first glance. This is why there are primitives for us to explicitly do type annotations.
Example of non-obvious compilation:
| |
Idiom: pack types into modules
Each record type can be given a namespace within which related values can be placed. Conventionally, the record type is just set to t.
The idea here is to qualify a record by the module it comes from.
| |
It’s possible for us to be more terse. The mental model here extends onto that case of ambiguity that we saw earlier (B2) in that the typechecker/compiler only needs to be shown how to resolve ambiguity once.
| |
for the pattern matching type annotation, the use of
.has 2 different ways. Considerlet is_important t = t.Log_entry.important;;.first dot: is a record field access, with everything to the right of the dot being interpreted as a field name;
second dot: accessing the contents of a module, referring to the record field important from within the module
Log_entry. The fact thatLog_entryis capitalized and so can’t be a field name is what disambiguates the two uses.
Type-Directed Constructor Disambiguation
We can avoid qualifying the record field if we allow the compiler to infer the type of record in question.
| |
The meaning of “constructor” here is special in OCaml
constructor constructs the value of a concrete data type
variant constructor \(\implies\) builds tagged union types (sum type cases)
e.g.
Some,None,Ok,Error…record constructor \(\implies\) builds structured records (product types)
e.g
{ field1 = v1; field2 = v2 }polymorphic variant constructor \(\implies\) builds polymorphic variant cases
e.g.
`Foo,`Bar
so the disambiguation here relates to which data-type constructor is being referred to when multiple modules or types define constructors with the same names.
Functional updates
There’s a functional update syntax that is terse and allows us to build records with specific field changes from another record. Functional updates make your code independent of the identity of the fields in the record that are not changing. It is a good choice for cases where it’s not important that you consider every field of the record when making a change
with keyword used here, the value-assignments on RHS are the diffs to be applied to the record on the LHS
| |
Mutable Fields in Records
This helps the immutable programming primitives that OCaml uses.
KIV chapter 8 for more on Imperative programming.
| |
First-class fields
Typically, we’d access fields using dot-operators and in that way they don’t behave as first-class objects – we can’t reer to the field itself as a value or pass it around.
Library-provided functionality can make fields first class. Seems like this behaviour is Jane Street-specific (within Base), where there are some patterns to generate accessor functions. It seems like the annotation is a meta annotation that uses fieldslib to generate some accessor functions (specific to the fields) and a Fields submodule to the existing module which gives us a bunch of helper functions to work with record fields.
This seems to be the first introduction of meta-programming directives [@@deriving fields]
Example:
| |
In the code above, we can also see access-control markers for the fields. Inspecting Field.get;; shows - : ('b, 'r, 'a) Field.t_with_perm -> 'r -> 'a = <fun>
| |
Fn module is a collection of useful primitives that deal with other functions (e.g. Identity function is just Fn.id – see the Fun library here)
For example, there’s an iterator that allows us to iterate all first-class fields:
- nice side effect of this approach is that it helps you adapt your code when the fields of a record change.