Chapter 6: Variants
Variants are sum types of multiple forms of data, all of them are tagged explicitly. The value this gives is that we can represent complex data and organise the case-analysis of that information.
Basics:
Each case within the variant has an associated tag and they are constructors because they can construct the concrete value. Optionally, it may have a sequence of fields.
The following example uses variants as simple tags with no associated data (optional sequence of fields omitted), effectively as Enums, a simple representation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15open Base open Stdio type basic_color = | Black | Red | Green | Yellow | Blue | Magenta | Cyan | White;; let basic_color_to_int = function | Black -> 0 | Red -> 1 | Green -> 2 | Yellow -> 3 | Blue -> 4 | Magenta -> 5 | Cyan -> 6 | White -> 7;; let color_by_number number text = Printf.sprintf "\027[38;5;%dm%s\027[0m" number text;; let blue = color_by_number (basic_color_to_int Blue) "Blue";; printf "Hello %s World!\n" blue;; (* the output should be "Hello Blue World" where "Blue" is in the colour blue.*)For more expressiveness, tags can have arguments which describe the data available in each case. Variants may have multiple arguments:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(* bg context: we're trying to represent colours within the terminal. *) type weight = Regular | Bold type color = | Basic of basic_color * weight (* basic colors, regular and bold *) | RGB of int * int * int (* 6x6x6 color cube *) | Gray of int (* 24 grayscale levels *) ;; (* here's a list of colors we can create: *) [RGB (250,70,70); Basic (Green, Regular)];; (* now we can have a to_int that handles all the cases *) let color_to_int = function | Basic (basic_color,weight) -> let base = match weight with Bold -> 8 | Regular -> 0 in base + basic_color_to_int basic_color | RGB (r,g,b) -> 16 + b + g * 6 + r * 36 | Gray i -> 232 + i;; let color_print color s = printf "%s\n" (color_by_number (color_to_int color) s);; color_print (Basic (Red,Bold)) "A bold red!";; color_print (Gray 4) "A muted gray...";;
Variants, Tuples and Parens
The differences between a multi-argument variant and a variant containing a tuple are mostly about performance. A multi-argument variant is a single allocated block in memory, while a variant containing a tuple requires an extra heap-allocated block for the tuple. KIV memory representation of values in Chapter 23.
The multi-argument variant-case constructor and tuple constructor looks very similar BUT they are not the same. The multi-arg variant constructor expects multiple arguments, if we supply a 3-valued tuple to it, it would be seen as supplying a singular argument.
The parens is what makes the difference and it’s subtle.
| |
Catch-All Cases & Refactoring
OCaml’s type system helps us make it easier to trace dependencies when refactoring, so we can just trace based on what the compiler says.
Our objective should be to write our code in a way that maximizes the compiler’s chances of helping you find the bugs. To this end, a useful RULE OF THUMB is to avoid catch-all cases in pattern matches.
Problem: catch-alls suppress exhaustiveness checks: catch-alls interfere with the compiler’s ability to do exhaustion checks
| |
Combining Records and Variants
Algebraic Data Types are more of a category:
product types – mathematically similar to Cartesian Products
- tuples
- records: can be thought of as conjunctions (ANDs of multiple types within the record)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25module Time_ns = Core.Time_ns module Log_entry = struct type t = { session_id: string; time: Time_ns.t; important: bool; message: string; } end module Heartbeat = struct type t = { session_id: string; time: Time_ns.t; status_message: string; } end module Logon = struct type t = { session_id: string; time: Time_ns.t; user: string; credentials: string; } end ;;
sum types – similar to disjoint unions
- variants: it’s a OR of multiple types
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27type client_message = | Logon of Logon.t | Heartbeat of Heartbeat.t | Log_entry of Log_entry.t (* consider this long-winded code (the session id extraction has repeated lines) *) let messages_for_user user messages = let (user_messages,_) = List.fold messages ~init:([], Set.empty (module String)) ~f:(fun ((messages,user_sessions) as acc) message -> match message with | Logon m -> if String.(m.user = user) then (message::messages, Set.add user_sessions m.session_id) else acc | Heartbeat _ | Log_entry _ -> let session_id = match message with | Logon m -> m.session_id | Heartbeat m -> m.session_id | Log_entry m -> m.session_id in if Set.mem user_sessions session_id then (message::messages,user_sessions) else acc ) in List.rev user_messages;; (* we can improve this by defining the types better *)
- variants: it’s a OR of multiple types
So, the power comes from combining both these types – layered combinations. Here’s how we can refactor the message types:
| |
Embedded Records
We could have directly put in the record types within the variant definition. These are inline records, help us be more concise and more efficient than free-standing record types being referenced – they don’t require a separate allocation object for the contents of the variant.
| |
Variants & Recursive Data Structures
Variants commonly used for recursive datastructures!
Following example helps illustrate how to define a boolean expression language, it’s similar in design to the Blang (boolean language).
| |
The tags here are easy to understand, the Base tag is the special one. It describes what can be tested, so it’s the base predicate. That’s why it’s described as the predicate that “ties the expr to your application”.
Let’s use the context of a mail processor, where we might wish to DEFINE filter expressions and then EVALUATE them. We might also wish to SIMPLIFY boolean expressions.
| |
Carrying onto SIMPLIFICATION routine: We need to define how the different expressions may be simplified, so we need to rely on some of simplification functions that can be used for our simplification routine.
| |
How we might use it:
| |
Polymorphic Variants
Polymorphic variants are more flexible and syntactically more lightweight than ordinary variants but they have a distinct cost to using them.
Interestingly, in OOP subtyping terminology we see this variance showing parallels to be like so:
| Notation | Meaning | Subtyping Direction | Analogy |
|---|---|---|---|
[> ...] | “These tags or more” | Supertype (covariant) | “open upwards” |
[< ...] | “These tags or less” | Subtype (contravariant) | “open downwards” |
| (no marker) | “exactly these tags” | Invariant | “closed fixed” |
KIV this until we come to the “Objects” part. I think it’s safe to treat this parallel to subtyping as just a mental model for now.
Here’s a covariant example:
| |
Here’s a contravariant example:
| |
Here’s a invariant example:
| |
Interestingly we can do both upper and lower bounding of the variant types:
| |
Using polymorphic variants can get confusing because of the type that is actually inferred.
Polymorphic variants and Catch-all Cases
In ordinary variants, catch-all cases are already error-prone (because of the loss of exhaustiveness-checks) – in polymorphic types, catch-alls are even worse.
So _ catch all will lowerbound the type. This becomes a problem if we have typos:
| |
with the typo (Float -> Floot), we realise that it will fall to that catch-all instead of the tag being caught (as an Error) as an unknown tag (as would have been the case of ordinary variants).
| |
RULE OF THUMB: As a general matter, one should be wary about mixing catch-all cases and polymorphic variants.
Use case of using polymorphic variants: Terminal Colors Redux
There’s still practical value to using polymorphic types. Suppose we want to have an alpha-channel extension to colors, if we define a new variant type with a new arm for RGBA, it’s not good enough because the compiler would see the two variants (old and extended) as two different, incompatible variant types.
What we need is to share tags between two different variant types and this is what polymorphic variants are useful for.
| |
NOTE: extend_color_to_int needs to invoke color_to_int with a narrower type, so it has to be contravariant. The use of explicit or-cases within the pattern match helps here. To be elaborate, the narrowing happens because `RGBA never will be passed to color_to_int. However, if we were to do a catch-all branch instead, then this narrowing doesn’t happen and so compilation would fail.
Packaging the code up
The interface:
| |
the implementation :
| |
Some notes about this:
the
.mlfile has exact variants only, this is what allows the pattern matching to be possibleConsider the case where we make a typo for some reason and add in a new polymorphic type as case as per the error seen above
why won’t it error out?
All that happened was that the compiler inferred a wider type for
extended_color_to_int, which happens to be compatible with the narrower type that was listed in themli. As a result, this library builds without error.how can we prevent this?
Adding an explicit type annotation to the code itself will help because the compiler will know enough.
1 2 3 4 5 6 7 8 9 10 11(* this will NOT error out: *) let extended_color_to_int = function | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216 | `Grey x -> 2000 + x | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color (* this will error out [which is what we want]: *) let extended_color_to_int : extended_color -> int = function | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216 | `Grey x -> 2000 + x | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color
given that we have the typedefs, we can use the name of the type explicitly for the type narrowing by prefixing with
#It is useful when you want to narrow down to a type whose definition is long, and you don’t want the verbosity of writing the tags down explicitly in the match.
1 2 3 4 5 6 7 8 9(* terse version *) let extended_color_to_int : extended_color -> int = function | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216 | #color as color -> color_to_int color (* this works too *) let extended_color_to_int : extended_color -> int = function | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216 | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color
When to Use Polymorphic Variants
- RULE OF THUMB: in general, regular variants are more pragmatic
- Safe use case:
- Probably the safest and most common use case for polymorphic variants is where ordinary variants would be sufficient but are syntactically too heavyweight. For example, you often want to create a variant type for encoding the inputs or outputs to a function, where it’s not worth declaring a separate type for it. Polymorphic variants are very useful here, and as long as there are type annotations that constrain these to have explicit, exact types, this tends to work well.
- Costs of using polymorphic variants:
complexity
confusing type inference may make it harder to read the code / debug this. Error messages may get really long as well.
Indeed, concision at the value level is often balanced out by more verbosity at the type level.
error-finding
type-safe but typing discipline not too good.
efficiency
it’s a little less efficient
polymorphic variants are somewhat heavier than regular variants, and OCaml can’t generate code for matching on polymorphic variants that is quite as efficient as what it generated for regular variants.
Chapter 7: Error Handling
OCaml’s support for handling errors minimises the pain of doing it. This chapter is about designing interfaces that make error handling easier.
Error-Aware Return Types
Error-handling functions are useful to use because they let us express error handling explicitly and concisely.
We can get some of these functions from the Option module and a bunch of others from Result and Or_error modules. The important part is to be sensitive to some of the idioms that exist because of some common patterns that show up.
Approach 1: Our called function may return errors explicitly
Including errors in the return values of your functions requires the caller to handle the error explicitly, allowing the caller to make the choice of whether to recover from the error or propagate it onward.
Wrapping in options is context dependent, not always clear what should be an Error vs valid outcome. Therefore, a general purpose library may not know this in advance.
Typical examples include wrapping the answer in optionals and such.
1 2 3 4 5 6 7 8 9 10 11 12let compute_bounds ~compare list = let sorted = List.sort ~compare list in match List.hd sorted, List.last sorted with | None,_ | _, None -> None (* "error case" propagates the None out*) | Some x, Some y -> Some (x, y) let find_mismatches table1 table2 = Hashtbl.fold table1 ~init:[] ~f:(fun ~key ~data mismatches -> match Hashtbl.find table2 key with | Some data' when data' <> data -> key :: mismatches | _ -> mismatches (* there's no "error" propagation and that's correct because of this case's semantic meaning*) );;Approach 2: encoding errors with result
Encoding Errors with Result
Wrapping outcomes within Options is non-specific and the nature of the Error is not conveyed by this. Result.t is for this type of information (can be seen as an augmented option).
Ok and Error are used and they’re available @ top-level without needing to open the Results module.
Error and Or_error
When using Result.t, we are care about success cases and error cases. As for the types, we should standardise on an error type for consistency sake. Some reasons to do so:
easier to write utility functions to automate common error-handling patterns
this point resonates with the choice of doing Error subclassing for specificity from the OOP world.
Or_error.t is just Result.t with the error case specialized to Error.t type. The Or_error module is useful for such common error-handling patterns:
Or_error.try_with: catch the exception yourself1 2 3 4 5 6 7 8 9let float_of_string s = Or_error.try_with (fun () -> Float.of_string s);; (* which gives the val as: val float_of_string : string -> float Or_error.t = <fun> *) float_of_string "3.34";; (*returns: Base__.Result.Ok 3.34 *) float_of_string "a.bc";; (*error case, returns Base__.Result.Error (Invalid_argument "Float.of_string a.bc")*)
idiom: using s-expressions to create
ErrorThis point is about how we can represent / create Error from sexps. A common idiom is to use
%messagesyntax-extension and pass in further values as s-expressions.An s-expression is a balanced parenthetical expression where the leaves of the expressions are strings. They’re (the
Sexliblibrary) good for common serialisation use-cases also.Sexplibcomes with a syntax extension that can autogenerate sexp converters for specific types.1 2 3 4 5 6#require "ppx_jane";; Error.t_of_sexp [%sexp ("List is too long",[1;2;3] : string * int list)];; (* -- generates: --*) (* - : Error.t = ("List is too long" (1 2 3)) *)We can tag errors as well. Example of tagging errors :
1 2 3 4 5 6 7 8 9Error.tag (Error.of_list [ Error.of_string "Your tires were slashed"; Error.of_string "Your windshield was smashed" ]) ~tag:"over the weekend";; (* This will give us: - : Error.t = ("over the weekend" "Your tires were slashed" "Your windshield was smashed") *)There’s a message syntax extension that we can use:
1 2 3 4 5 6 7let a = "foo" and b = ("foo",[3;4]);; Or_error.error_s [%message "Something went wrong" (a:string) (b: string * int list)];; (* results in this type: Base__.Result.Error ("Something went wrong" (a foo) (b (foo (3 4)))) *)
bind and other Error handling Idioms
Just like option-wrapping and results, there are other patterns we start to observe hence these have been codified as the following idioms.
Idiom:
bindfunction (safe stage-wise chain of operations)bindonly applies the function if the param isSomeand this can be applied as a function or even as an infix operator. We can get this infix operator fromOption.Monad_infix.The usefulness of this is that
bindcan be used as a way of sequencing together error-producing functions so that the first one to produce an error terminates the computation. It is better for large, complex examples with many stages of error handling, the bind idiom becomes clearer and easier to manage.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30(* this example is a small-scale example *) let compute_bounds ~compare list = let open Option.Monad_infix in let sorted = List.sort ~compare list in List.hd sorted >>= fun first -> List.last sorted >>= fun last -> Some (first,last);; (* generated val: val compute_bounds : compare:('a -> 'a -> int) -> 'a list -> ('a * 'a) option = <fun> *) (* manually writing it out *) let compute_bounds ~compare list = let sorted = List.sort ~compare list in Option.bind (List.hd sorted) ~f:(fun first -> Option.bind (List.last sorted) ~f:(fun last -> Some (first,last)));; (* val compute_bounds : compare:('a -> 'a -> int) -> 'a list -> ('a * 'a) option = <fun> *) (* NOTE: here's the way bind is implemented *) let bind option ~f = match option with | None -> None | Some x -> f x;; (* val bind : 'a option -> f:('a -> 'b option) -> 'b option = <fun> *)Perhaps a good analogy for
bindis like chaining with JavaScript’s optional chaining (?.) or Promise.then(...), but specialized for computations that may fail (None).
Idiom: syntax extension: Monads and
Let_syntaxThe monadic binding we saw in
bindcan look more like a regular let-binding. So the mental model here is just a special form of let-binding that has builtin error-handling semantics.1 2 3 4 5 6 7 8 9 10 11 12 13#require "ppx_let";; (*<--- need to enable the extension*) let compute_bounds ~compare list = let open Option.Let_syntax in let sorted = List.sort ~compare list in let%bind first = List.hd sorted in let%bind last = List.last sorted in Some (first,last);; (* val compute_bounds : compare:('a -> 'a -> int) -> 'a list -> ('a * 'a) option = <fun> *)
Idiom:
Option.bothTakes two optional values and produces a new optional pair which is
Noneif either of the optional values areNone1 2 3 4 5 6 7 8 9let compute_bounds ~compare list = let sorted = List.sort ~compare list in Option.both (List.hd sorted) (List.last sorted);; (* val compute_bounds : compare:('a -> 'a -> int) -> 'a list -> ('a * 'a) option = <fun> *)
Exceptions
Exceptions are a way to terminate a computation and report an error, while providing a mechanism to catch and handle (and possibly recover from) exceptions that are triggered by subcomputations – Runtime trapping, reporting of errors and allowing us to catch them and possibly recover gracefully from them.
They are ordinary values, so we can treat them like any other values (and define our own exceptions). They are all of the same exn type.
Exceptions are like variant types but they are special because they’re open (can be defined in multiple places) – new tags (new exceptions) can be added to it by different parts of the program. \(\implies\) we can’t exhaustive match on an exn because the universe of tags is not closed.
In contrast, variants have a closed universe of available tags.
| |
NOTE: the return type of raise is special, it’s a polymorphic 'a because it fits into whatever its surrounding context is just to do its job. This is what allows raise to be called from anywhere in the program.
It never really returns anyway. Similar behaviour exists in functions like infinite loops.
| |
Declaring Exceptions using [@@deriving sexp]
Plain exception definitions don’t give us useful info, we can declare exception and the types it depends on using this preprocessor annotation which generates sexps for us: [@@deriving sexp]. the representation generated includes the full module path of the module where the exception in question is defined.
PL-DESIGN: theres’s a whole PPX (preprocessor) pipeline to AST evaluation that is done within compilers. This is something that I hadn’t looked into deeply before. Such metaprogramming constructs allows us to extend the language safely. A short description / primer on this here – there should be more generic PL concepts and writeups for this that is language-agnostic (perhaps this book).
| |
Helper functions for throwing exceptions
These are some ergonomic aspects, many of them are within Common and Exn within Base.
Some examples:
- using
failswithfor the exception throwing. ThrowsFailure - using
assertfor invariant checks.we can use
assertwith an arbitrary condition for failure cases.assertis useful because it captures line number and char offset from the source, so it’s informative.1 2 3 4 5 6 7 8 9 10let merge_lists xs ys ~f = if List.length xs <> List.length ys then None else let rec loop xs ys = match xs,ys with | [],[] -> [] | x::xs, y::ys -> f x y :: loop xs ys | _ -> assert false in Some (loop xs ys);;
Exception Handlers
We wish to handle exceptions that are thrown (and propagated).
The general syntax looks like this:
| |
Cleaning Up in the Presence of Exceptions
As with other languages, we should have a finally syntax for exception handling. This is NOT a built-in primitive and we rely on libraries for this.
The idiom here is similar to context managers (e.g. Python’s with context manager).
Exn.protect from Base is a useful function for this:
- [body] thunk
f: function for the main body - [cleanup] thunk
finally: for the finally logic
This functionality is common enough for file handling or IO handling that there’s also a In_channel.with_file to manage closing of file descriptors.
| |
Catching Specific Exceptions
Consider the case where we wish to catch a specific error from a specific function call (the error type may be the same as from other places, the intent is to handle one particular error from one particular place).
the type system doesn’t tell you what exceptions a given function might throw. For this reason, it’s usually best to avoid relying on the identity of the exception to determine the nature of a failure. A better approach is to narrow the scope of the exception handler, so that when it fires it’s very clear what part of the code failed:
| |
This behaviour is common enough that there’s a concise version to write this.
| |
Backtraces
We desire useful debugging information, backtraces are useful to us.
Uncaught exceptions will show the backtrace already. We desire to capture a backtrace with our program and Backtrace.Exn.most_recent helps us do that \(\implies\) useful for error-reporting purposes.
RULE OF THUMB: it’s not a common pattern to use exceptions as part of your flow control and it’s generally better to use raise_notrace. Stack-traces should almost always be left on (responsibility is on code to keep things performant).
Backtraces affect speed.
Usually backtraces are turned off, there’s a bunch of ways that is controlled:
OCAMLRUNPARAMenv variable @ runtimethe program needs to be compiled with debugging symbols
this part is related to the bytecode vs native code compilations (wherein bytecode allows debugging symbols to be preserved and makes it easier to debug).
Backtrace.Exn.set_recording false.
| |
Here’s an example of the benchmark:
| |
Observations:
- setting up an exception handler is cheap, it may be left unused
- actually raising an exception is expensive (55 cycles)
- if we raise and exception without backtraces, it costs about 25 cycles.
Exceptions to Error-Aware Types & Back Again
Often, we’ll need to move in between using exceptions and using error-aware types and there’s some support to help this process.
[
Option] given code that may throw an exception, we can capture it within an option usingtry_with[
Result,Or_error] similartry_withfunctions- we can also re-raise the exception using
ok_exn
- we can also re-raise the exception using
| |
Choosing Error-Handling Strategy
When thinking about exceptions vs error-aware return types, the tradeoff is between concision vs explicitness.
- Exceptions \(\implies\) better for speed of implementation
- pro: more concise, can defer the error handling job to a larger scope and don’t clutter up types
- con: easy to ignore
- Error-Aware types \(\implies\) better for stability
- pro: fully manifest in type definitions, so errors generated are explicit and impossible to ignore
- for errors that are a foreseeable and ordinary part of the execution of your production code and that are not omnipresent, error-aware return types are typically the right solution.
use exceptions for exceptional conditions
RULE OF THUMB: The maxim of “use exceptions for exceptional conditions” applies. If an error occurs sufficiently rarely, then throwing an exception is often the right behavior.
Omnipresent errors (OOM)
it’s overkill to use error-aware return types for this, will be too tedious. Needing to mark everything will also make it less explicit as to what the problem actually was.
Chapter 8: Imperative Programming
The book makes the distinction between functional and imperative by emphasising that imperative programming and its modification of a program’s internal state is the action of interacting with changeable parts of the world.
Additionally, there’s also an argument that some algorithms can be implemented in imperative fashion hence a performance benefit to imperative code; examples:
- in-place sorting algos
- graph algos that rely on mutable data structures
- dynamic programming algos
- numeric linear algebra / Tamil
- real-time, embedded system algorithms
- low-level system programming
This chapter deals with OCaml’s support for the imperative programming paradigm.
Toy Example: imperative dictionaries
This is just a toy example, not for actual use.
An open hashing scheme, where the hash table will be an array of buckets, each bucket containing a list of key/value pairs that have been hashed into that bucket.
Interface:
| |
Implementation:
| |
Some useful notes:
forloops are suitable for imperative contexts, we should prefer those even if recursive ways to express that may exist.- some mutable operators shown in the example:
;sequencing operator, helps us control the sequence of imperative actions.let-bindings would have worked too but for imperative code, this is more idiomatic1 2 3 4 5 6 7 8 9 10 11<expr1>; <expr2>; ... <exprN> (* ANDD *) let () = <expr1> in let () = <expr2> in ... <exprN>are equivalent!
<-mutable update operator. works for:- elements of an array
(array.(i) <- expr) - updating record field
(record.field <- expression)
- elements of an array
- RULE OF THUMB: it’s good practice to leave all the side-effecting operations to the end of the function – minimises the chance of exceptions leaving corrupted state
Primitive Mutable Data
A walkthrough of the mutable primitives in OCaml
Array-like
Array like means mutable integer-indexed containers that provide constant-time access to elements
Ordinary Arrays
General array, we can create it using a literal syntax
[|1;2;3|]Some usual operations:
setting value: use the dot operator
reading/retrieving value: use the dot and assignment operator
blit: this a sliced copy which allows us to copy over a subarray to another subarrayIt’s closer in functionality to c’s memcopy or memmove rather than pythons sliced-copy. This is because out-of-bounds exceptions can happen if we’re not careful about array boundaries. Additionally, the copy-over overwrite behaviour happens safely, so careful about that.
NOTE: the signature of this function is a little odd compared to other languages like python:
Array.blit source source_start destination destination_start lengthand it’s not labelled arguments1 2 3 4 5 6 7 8 9(* Create arrays *) let src = [|1; 2; 3; 4; 5|];; let dst = [|0; 0; 0; 0; 0|];; (* Copy 3 elements from src (starting at index 1) to dst (starting at index 2) *) Array.blit src 1 dst 2 3;; (* dst now becomes *) dst;;
Bytes and Strings
To represent a character, a single byte width (8 bit character) is what we require.
Bytes and Strings are very similar.
1 2 3 4 5 6let b = Bytes.of_string "foobar";; (* val b : bytes = "foobar" *) Bytes.set b 0 (Char.uppercase (Bytes.get b 0));; (* - : unit = () *) Bytes.to_string b;; (* - : string = "Foobar" *)- Bytes: can be seen as char arrays
- each entry (of char) is 8-bytes long
- is mutable
- still somewhat space efficient
- Strings:
- each entry (of char) is 1-byte long
- is immutable
- more space-efficient than bytes
- Bytes: can be seen as char arrays
Bigarrays
for handling memory blocks outside of OCaml’s heap, this allows interoperability between languages of other libraries. KIV until the memory representation part.
Generalised syntax for this:
1 2<bigarray_expr>.{<index_expr>} <bigarray_expr>.{<index_expr>} <- <value_expr>
Mutable Record and Object Fields and Ref Cells
For mutable records, it’s something we’ve seen before. Take note that the record itself is immutable but field(s) within the record may be mutable.
Objects are similar KIV until Chapter 12.
Ref cells
like we’ve seen before, we might want to have an accumulator value to which we keep changing its state and
refis useful for that.Under the hood, it’s just a record with a single mutable field (contents) and it has some syntactic sugar that makes it easier to work with refs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14(* Create/Initialise *) let x = ref 1;; (* val x : int ref = {Base.Ref.contents = 1} *) (* Access!!! *) !x;; (* - : int = 1 *) (* Assign!!! *) x := !x + 1;; (* - : unit = () *) !x;; (* - : int = 2 *)
Foreign Functions
Foreign functions allow ocaml to use imperative constructs used by syscalls or external libraries (e.g. write syscall, clock).
KIV chapter 22.
For and while loops
Both for and while loops aren’t really necessary because we could have otherwise written out a recursive version. It’s more idiomatic to use them in situations where the code is imperative.
for loops:
- the bounds are both open bounds so the upper and lower are inclusive.
- we can iterative in the reverse direction by using
downto
| |
while loop:
- pretty much the same order of evaluation of expressions as other languages
| |
Example: Doubly-linked lists
Yet again, this is just a didactic example, for actual purposes, use Doubly_linked module within Core.
Some notes on the interface / implementation:
Elements:
- Elements act as pointers to the interior of a list and allow us to navigate the list and give us a point at which to apply mutating operations.
- We implement them as records. Records will have some optional mutable fiends – at the start of the list, the
prevwill beNoneand at the end of the list, thenextwill beNone
in the element-wise modification functions, notice how
matchis surrounded with parens. This is because the precedence of match is very low. There’s a need to separate that expression from the thereafter assignment operation.Without this separation, the final assignment expression would have become part of the
Nonecase.an alternative was to use
begin...end.the implementation of
removeshows why imperative code is tricky and sometimes can be fragile.in
remove, if we are removing the first element in the list, then we update the list pointer itself.in the actual implementations, the edge cases are handled by error detection and error correction logic.
example of problematic actions:
- double-removing an element will cause the main list reference to be set to
None, thus emptying the list. - Similar problems arise from removing an element from a list it doesn’t belong to.
RULE OF THUMB: for imperative data-structures, use the libraries as much as possible and if we don’t have one and need to implement it, give extra attention to the error handling.
- double-removing an element will cause the main list reference to be set to
Here’s the interface
| |
Here’s our implementation:
| |
Cyclic Data Structures
Doubly-linked lists are cyclic because it is possible to follow a nontrivial sequence of pointers that closes in on itself. In general, building cyclic data structures requires the use of side effects. This is done by constructing the data elements first, and then adding cycles using assignment afterward.
Cyclic data structures (general purpose):
- we need to use side-effects to create them
- we should construct the elements first
- then we should add the cycles
An rare exception: we can use let rec for fixed-size cyclical data structures.
| |
Modifying the List
Take note on the implementation point that the match expression has to be wrapped around
Iteration Functions
These are the map, iter, fold functions (refer to the code block above)
iter: the goal of which is to call a unit-producing function on every element of the list, in orderfind_el: runs a provided test function on each value stored in the list, returning the first element that passes the test
Laziness and other benign effects
Benign Effects: there are cases where we want to be pure by default and use some limited imperative side-effects that give us performance improvements.
benign effect: laziness
There’s a need to wrap up lazy operations with the lazy keyword. The application of that operation has to be explicitly forced out via Lazy.force.
| |
Here’s a didactic example of how we may implement laziness:
| |
Notes:
we can create a lazy value from a thunk (which is a nullary function).
NOTE: in the context of functional programming languages, a thunk is a nullary function. In others (e.g. js, it’s more generalised as a deferred computation that may or may not take in an argument)
Memoization and Dynamic Programming
Our favourite first-reach optimisation technique :)
CAUTION: a memoized function by nature leaks memory. As long as we hold onto the memoized function, we’re holding every result it has returned thus far.
here’s an example of how a memoisation function may be implemented by us:
| |
useful for top down recursive algos
useful for efficiently implementing some recursive algos
Examples:
Let’s use a didactic
fibfunction as a way to illustrate how to memoize:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28open Base (* -- helper timer function: *) let time f = let open Core in let start = Time.now () in let x = f () in let stop = Time.now () in printf "Time: %F ms\n" (Time.diff stop start |> Time.Span.to_ms); x (* val time : (unit -> 'a) -> 'a = <fun> *) (* slow, not memoized function *) let rec fib i = if i <= 1 then i else fib (i - 1) + fib (i - 2);; (* val fib : int -> int = <fun> *) (* NOTE: the tricky part is that we need to insert the memoization BEFORE the recursive calls within =fib= *) let memo_fib = memoize (module Int) fib;; (* val fib : int -> int = <fun> *) time (fun () -> memo_fib 40);; (* Time: 18174.5970249 ms *) (* - : int = 102334155 *) time (fun () -> memo_fib 40);; (* Time: 0.00524520874023 ms *) (* - : int = 102334155 *)To make things better, we should write
fibin a way that unwinds the recursion.We explore that re-write first, then look at how to make it memoized:
In this example, we pass in a function that gets called before the usual recursive call (but this won’t have the memoisation yet):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20(* not the recursive version *) let fib_norec fib i = if i <= 1 then i else fib (i - 1) + fib (i - 2);; (* turning it back to an ordinary Fib function by adding in the recursive knot: *) let rec fib i = fib_norec fib i;; (* here's a polymorphic variant: *) let make_rec f_norec = let rec f x = f_norec f x in f;; (* NOTE: 1. the function f_norec passed in to make_rec is a function that isn’t recursive but takes as an argument a function that it will call. 2. so, make_rec essentially feeds f_norec to itself -- which makes it a true recursive function *) fib 20;;Now we try to make
make_recsuch that we can memoize when it ties the recursive knot. We’re using a reference here as a way to tie the recursive knot without usinglet rec(doesn’t work here)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30(* our objective is to pass this function into a memoize function *) let fib_norec fib i = if i <= 1 then i else fib (i - 1) + fib (i - 2);; (* key point here is how we let bind the recursive function within *) let make_rec f_norec = let rec f x = f_norec f x in f;; (* val make_rec : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun> *) let memo_rec m f_norec x = let fref = ref (fun _ -> assert false) in let f = memoize m (fun x -> f_norec !fref x) in fref := f; f x;; (* memo_rec's signature is almost the same as make_rec, except for the accepting of a module param for the hashtable type. val memo_rec : 'a Hashtbl.Key.t -> (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun> *) (* now we can finally get a memoised version of fib *) let fib = memo_rec (module Int) fib_norec;; (* val fib : int -> int = <fun> *) time (fun () -> fib 40);; (* Time: 0.121355056763 ms - : int = 102334155 *)The code block above teaches us how to avoid the memory leak by defining an inner scope for memoized function calls and allowing the local scope to be the reason why garbage collection happens correctly and the memo-table doesn’t cause a memory leak.
In the code above, the memory behaviour is important for us to understand correctly:
only when
fibis called, then the final argument tomemo_rec(which isx, the param forfib) is presented and only then is thememoizefunction calledBecause the result of that call falls out of scope when
fibreturns, this is what makesmemo_recavoid a memory leak (since the memo table is garbage-collected after the computation completes)so, calling
memo_rec (module Int) fib_norecdoes not trigger the call tomemoizeyet until the last param is binded.in fact, we could have done a single line call:
1 2 3 4 5 6 7 8 9 10let memo_rec m f_norec x = let fref = ref (fun _ -> assert false) in let f = memoize m (fun x -> f_norec !fref x) in fref := f; f x;; let fib = memo_rec (module Int) ( fun fib i -> if i <= 1 then 1 else fib (i - 1) + fib (i - 2));; (* val fib : int -> int = <fun> *)
NOTE: org-babel won’t preserve the table and so we won’t see the outputs showing proof that memoization has happened.
Levenshtein edit distance between 2 strings:
similar to the
fibfunction, we need to memoize before the inner recursive call is made to the edit_distance function and to do that, we need it to take a pair of strings as a single argument (since ourmemo_recdefined earlier only works on single-argument functions). We also need to ensure that we can hash the pair of strings.we an use
ppx-janewhich has some meta-functions that can help us derive hash-functions and equality tests instead of writing them out ourselves.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35(* 1: using the deriver *) #require "ppx_jane";; module String_pair = struct type t = string * string [@@deriving sexp_of, hash, compare] end;; (* module String_pair : *) (* sig *) (* type t = string * string *) (* val sexp_of_t : t -> Sexp.t *) (* val hash_fold_t : Hash.state -> t -> Hash.state *) (* val hash : t -> int *) (* val compare : t -> t -> int *) (* end *) (* 2: now we can use memo_rec for edit_distance and memoise it properly *) let edit_distance = memo_rec (module String_pair) (fun edit_distance (s,t) -> match String.length s, String.length t with | (0,x) | (x,0) -> x | (len_s,len_t) -> (* 1 operation to drop suffix in either case *) let s' = String.drop_suffix s 1 in let t' = String.drop_suffix t 1 in let cost_to_drop_both = if Char.(=) s.[len_s - 1] t.[len_t - 1] then 0 else 1 in (* we shall get the smallest of the lot *) List.reduce_exn ~f:Int.min [ edit_distance (s',t ) + 1 ; edit_distance (s ,t') + 1 ; edit_distance (s',t') + cost_to_drop_both ]);; (* val edit_distance : String_pair.t -> int = <fun> *) time (fun () -> edit_distance ("OCaml 4.09","ocaml 4.09"));; (* Time: 0.964403152466 ms *)
Limitations of
let recLIMITATION: There are limits on what can appear on the RHS of a
let rec, such as not allowinglet rec x = x + 1to work. There are only 3 constructs that can show up on the RHS of alet rec:- a function definition
- a constructor
- the
lazykeyword
This is good and bad in the sense that our
memo_reccan’t be implemented usinglet recbut it also helps us avoid nonsensical cases for the compiler.NOTE, PL-DESIGN: Haskell is lazy and such compiler-restrictions don’t show up.
Also laziness is more constrained than explicit mutation, so in some cases might lead to code that is easier to reason with.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31(* we used let rec here *) let make_rec f_norec = let rec f x = f_norec f x in f;; (* we didn't use let rec here *) let memo_rec m f_norec x = let fref = ref (fun _ -> assert false) in let f = memoize m (fun x -> f_norec !fref x) in fref := f; f x;; (* memo_rec's signature is almost the same as make_rec, except for the accepting of a module param for the hashtable type.*) let lazy_memo_rec m f_norec x = let rec f = lazy (memoize m (fun x -> f_norec (force f) x)) in (force f) x;; (* generated type: val lazy_memo_rec : 'a Hashtbl.Key.t -> (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun> *) time (fun () -> lazy_memo_rec (module Int) fib_norec 40);; (* Time: 0.181913375854 ms *) (* - : int = 102334155 *) (* we can't try to force out a lazy version just to use let rec because a lazy value can't try to force itself as part of its own evaluation (without changing memo_rec into a lazy version) *) let rec x = lazy (force x + 1);; (* val x : int lazy_t = <lazy> *) force x;; (* Exception: Lazy.Undefined. *)
Input and Output
To elaborate on the “imperative” terminology, any function that doesn’t boil down to a deterministic transformation from its arguments to its return value is imperative in nature. That includes not only things that mutate your program’s data, but also operations that interact with the world outside of your program.
So IO such as:
- file IO (includes terminal IO)
- socket IO / network IO
Here we focus on OCaml’s buffered I/O library:
- only two types:
in_channel: for readingout_channel: for writing
- IO interfaces:
- core library deals with files and terminals
Unixmodule can be used to create other channels
Terminal I/O
The 3 channels (stdin, stdout, stderr) are available at the top level of Core’s namespace directly (we don’t need to go through In_channel and Out_channel modules).
There’s a chance that this code example is deprecated, I’m not going to investigate further on this.
| |
notes:
- since
out_channels are buffered, we need to flush it to get theOut_channel.output_stringto
Formatted Output with printf
printf in OCaml is special compared to C’s printf because it’s type-safe!
What kind of control can printf give?
- alignment and padding
- escape-strings
- formatting of numbers (decimal, hex, binary?)
- precision of float numbers
Functions similar to printf for other outputs:
eprintffor stderrfprintffor arbitraryout_channelsprintfthat returns a formatted string
Understanding format strings
Type-safe
printf: The compiler checks that the types referred to by the format string match the types of the rest of the args passed toprintfThis analysis of contents happens at compile-time \(\implies\) format string needs to be available as a string literal @ compile-time – it’s for this reason that the compiler complains if we pass in a string variable. We’d need to otherwise annotate that type so that a string literal is inferred as a format string
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16let fmt = "%i is an integer\n";; (* val fmt : string = "%i is an integer\n" *) printf fmt 3;; (* Line 1, characters 8-11: Error: This expression has type string but an expression was expected of type ('a -> 'b, Stdio.Out_channel.t, unit) format = ('a -> 'b, Stdio.Out_channel.t, unit, unit, unit, unit) format6 , *) (* if we type it correctly then it will work*) open CamlinternalFormatBasics;; let fmt : ('a, 'b, 'c) format = "%i is an integer\n";; printf fmt 3;;Here’s the time-conversion program updated using
printf, some notes:flushing the channel works via
%!format string when usingprintf:printf "The time in %s is %s.\n%!" (Time_unix.Zone.to_string zone) time_string
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28open Core let () = printf "Pick a timezone: %!"; match In_channel.input_line In_channel.stdin with | None -> failwith "No timezone provided" | Some zone_string -> let zone = Time_unix.Zone.find_exn zone_string in let time_string = Time.to_string_abs (Time.now ()) ~zone in printf "The time in %s is %s.\n%!" (Time_unix.Zone.to_string zone) time_string (* ----- older, more verbose syntax: open Core let () = Out_channel.output_string stdout "Pick a timezone: "; Out_channel.flush stdout; (* flushes the buffer to actually write to the channel*) match In_channel.(input_line stdin) with | None -> failwith "No timezone provided" (* In_channel.input_line returns an option -- None means end of input stream e.g. EOL *) | Some zone_string -> let zone = Time_unix.Zone.find_exn zone_string in let time_string = Time.to_string_abs (Time.now ()) ~zone in Out_channel.output_string stdout (String.concat ["The time in ";Time_unix.Zone.to_string zone;" is ";time_string;".\n"]); Out_channel.flush stdout (* flush it again to force the printing -- good habit, though not necessary since it's end of program*) *)
File I/O
The general pattern we realise for File IO is that we create the channel, then use the channel then we close the channel (file read vs write is the same thing, just the channels are swapped (e.g. file vs stdio)).
We need self-cleaning code here:
We need to be careful with how we handle the finite resource of File Descriptors, so exceptions shouldn’t mean that we don’t release the fd \(\implies\) no fd leak (add in a
finallystep)There’s better bookkeeping functions that are available to us
In_channelwith_filethat we should be using.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15(* we have ergonomic bookkeeping functions that we can use: *) let sum_file filename = In_channel.with_file filename ~f:(fun file -> let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in List.fold ~init:0 ~f:(+) numbers);; (* val sum_file : string -> int = <fun> *) (* the manual way would have been to add in a finally block for self-cleanup *) let sum_file filename = let file = In_channel.create filename in Exn.protect ~f:(fun () -> let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in List.fold ~init:0 ~f:(+) numbers) ~finally:(fun () -> In_channel.close file);; (* val sum_file : string -> int = <fun> *)we shouldn’t read the whole file into memory either and we should do line-by-line processing using
In_channel.fold_linesinstead ofIn_channel.input_lines(which reads the whole file)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15(* line by line processing *) let sum_file filename = In_channel.with_file filename ~f:(fun file -> In_channel.fold_lines file ~init:0 ~f:(fun sum line -> sum + Int.of_string line));; (* val sum_file : string -> int = <fun> *) (* compared with reading the whole file into memory *) let sum_file filename = In_channel.with_file filename ~f:(fun file -> let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in List.fold ~init:0 ~f:(+) numbers);; (* val sum_file : string -> int = <fun> *)
There’s better patterns in the API docs for this. There’s also a guide to file manipulation here.
Order of Evaluation
OCaml, like other languages is strict \(\implies\)
- when we bind an identifier to result of some expression, the expression is evaluated prior to the binding
- call a function on a set of args, those args are evaluated prior to being passed to the function
When we code imperatively, the order of evaluation matters. Additionally, it also matters if we evaluate lazily or not:
| |
Why OCaml is strict:
lazy evaluation and imperative programming generally don’t mix well because laziness makes it harder to reason about when a given side effect is going to occur. Understanding the order of side effects is essential to reasoning about the behavior of an imperative program.
Strictness benefits:
- expressions bound by a sequence of
letbindings, evaluated in the order that they’re defined
Counter-intuitive order of evaluation
OCaml compiler’s order of evaluation can be a little counter intuitive – the sub-expression that is last here is evaluated first:
| |
Side-effects and Weak Polymorphism
Weakly polymorphic variable: variable that can be used with any single type (the _ in the type variable naming is what indicates that something is a weakly polymorphic typing). The compiler wishes to concretise this type ASAP.
| |
- Note that the type of remember was settled by the definition of
remember_three, even thoughremember_threewas never called!
The Value Restriction
This part is about knowing when we have simple types that allow the code to remain polymorphic vs when it’s weakly polymorphic (and hence the values must be restricted).
Value Restriction is the rule that is used to concretise the types for weakly polymorphic variables.
The idea is that initially it’s an unknown type which is stored in a persistent, mutable cell. “Simple values” are types from the kinds of expressions that don’t introduce persistent, mutable cells – and so can remain polymorphic:
- constants
- constructors that only contain other simple values
- function declarations
- let bindings where the sub-bindings are all simple values
The value restriction doesn’t require that there is no mutable state, only that there is no persistent mutable state that could share values between uses of the same function.
| |
Partial Application and the Value Restriction
In most cases, the value restriction is a good thing because the value in question can only be safely be used with a single type.
Partial application is the exception because partial application is NOT a simple value, so functions created by partial application are sometimes less general than we would expect.
the solution to avoiding this inferring of a weakly polymorphic type is to do eta expansion – general approach for resolving problems that arise from the value restriction.
| |
[1]: this is inferred as a weakly polymorphic type for the resulting function because there’s nothing that guarantees that
List.initisn’t creating a persistent ref somewhere inside of it that would be shared across multiple call tolist_init_10.[2]: we do eta expansion to avoid the partial application and avoid the weakly polymorphic type inference.
Relaxing the Value Restriction
Value Restriction is just a syntactic check. We can do a few operations that count as simple values and anything that is a simple value can be generalised (polymorphic).
There’s a relaxed version of value-restriction that lets us use type info to allow polymorphic types for things that are not simple values
a function application (which may be inferred as weakly polymorphic) can be strongly polymorphic if the value is an immutable value
1 2 3 4 5 6 7 8 9 10 11 12(* function application -- inferred as weak *) identity (fun x -> [x;x]);; (* - : '_weak4 -> '_weak4 list = <fun> *) (* immutable hence can be strongly polymorphic *) identity [];; (* - : 'a list = [] *) [||];; (* mutable return value*) (* - : 'a array = [||] *) identity [||];; (* --> gets infered as weakly polymorphic*) (* - : '_weak5 array = [||] *)abstract values need explicit guarantees to be inferred as strongly polymorphic:
an abstract data type needs to be annotated in a way that it guarantees in the interface that the data structure doesn’t contain any persistent references to values of type
'aand only then will OCaml infer polymorphic types for abstract values.In the example,
Concat_list.tis immutable but without the guarantee, OCaml treats it as if it were mutable.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81module Concat_list : sig type 'a t val empty : 'a t val singleton : 'a -> 'a t val concat : 'a t -> 'a t -> 'a t (* constant time *) val to_list : 'a t -> 'a list (* linear time *) end = struct type 'a t = Empty | Singleton of 'a | Concat of 'a t * 'a t let empty = Empty let singleton x = Singleton x let concat x y = Concat (x,y) let rec to_list_with_tail t tail = match t with | Empty -> tail | Singleton x -> x :: tail | Concat (x,y) -> to_list_with_tail x (to_list_with_tail y tail) let to_list t = to_list_with_tail t [] end;; (* module Concat_list : sig type 'a t val empty : 'a t val singleton : 'a -> 'a t val concat : 'a t -> 'a t -> 'a t val to_list : 'a t -> 'a list end *) Concat_list.empty;; (* - : 'a Concat_list.t = <abstr> *) (*<--- it's abstract type but no guarantee*) identity Concat_list.empty;; (*<---- gives weak polymorphism*) (* - : '_weak6 Concat_list.t = <abstr> *) (* ============================================================== *) module Concat_list : sig type +'a t val empty : 'a t val singleton : 'a -> 'a t val concat : 'a t -> 'a t -> 'a t (* constant time *) val to_list : 'a t -> 'a list (* linear time *) end = struct type 'a t = Empty | Singleton of 'a | Concat of 'a t * 'a t let empty = Empty let singleton x = Singleton x let concat x y = Concat (x,y) let rec to_list_with_tail t tail = match t with | Empty -> tail | Singleton x -> x :: tail | Concat (x,y) -> to_list_with_tail x (to_list_with_tail y tail) let to_list t = to_list_with_tail t [] end;; (* module Concat_list : sig type +'a t val empty : 'a t val singleton : 'a -> 'a t val concat : 'a t -> 'a t -> 'a t val to_list : 'a t -> 'a list end *) (* now, it's strongly polymorphic: *) identity Concat_list.empty;; (* - : 'a Concat_list.t = <abstr> *)
Chapter 9: GADTs
Generalized Algebraic Data Types (GADTs) are an extension to variants that we’ve seen so far:
Benefits:
- more expressive
- can create types that match the shape of the intended program more precisely
- code can be safer, more concise and more efficient
Unique Features / Mechanisms:
- let the compiler learn more type info when we descend into a case of a pattern match
- make it easy to use existential types – working with data of a specific but unknown type
Costs:
- distinct cost (TODO see below)
- harder to use, less intuitive than ordinary variants
- can be confusing to figure out how to use them effectively \(\implies\) should only be used when it makes a big qualitative improvement to your design
We learn from some examples for this chapter.
A Little Language
A good use is to have typed expression languages (like Blang). This is going to be our DSL.
WE wish to create a language here that mixes arithmetic and boolean expressions – we have to deal with the possibility of ill-typed expressions (e.g. expressions that adds a bool and an int). We need type safety to ensure that the compiler will throw errors if such expressions are present. A non-type-safe system would otherwise require us to do dynamic type checking on the type of the input params of the function.
We attempt to use an ordinary variant first:
| |
Here, it’s because of the dynamic type checking in our logic did our type error get thrown here. We wish our language to be type safe
Making the Language Type-Safe
We’re still trying to make things work with ordinary variants.
A type-safe version of this API maybe include:
- ability for the expressions to have a type param to distinguish integer and bool expressions
Signature:
| |
Implementation:
| |
Notes:
the main trick here is the phantom-type:
type 'a t = exprSO the
'a(type parameter) is the phantom type because it doesn’t show up in the body definition oft\(\implies\) type param is unused \(\implies\) free to take on any value.by the way, the body definition refers to the RHS of the
=in the type definition, so sinceexprdoesn’t use'a, internally'a tis just implemented as the typeexpr. So the'adoesn’t really do much there and is a phantom. This is also called a manifest type definition or aliasing.More about the use of “phantom”. Our purpose is to encode extra compile-time type information without affecting runtime representation.
there are two eval functions here, if we had used GADTs, we could have had a single eval expression.
the function signature differing between the two evals is what allows the compiler to catch the ill-typing (along with the phantom type)
this still has type checking being done dynamically within the logic though.
this is type-safe because the type-system is what would reject the ill-typed expression. It’s not complete though because in some cases we still can get around it and that’s why the dynamic type checking is still needed in the eval functions:
1 2 3 4let expr = Typesafe_lang.(eq (bool true) (bool false));; (* val expr : bool Typesafe_lang.t = <abstr> *) Typesafe_lang.bool_eval expr;; (* Exception: Ill_typed. *)the types within the implementation don’t necessarily rule out ill-typed expressions. The same fact explains why we needed two different eval functions: the implementation of eval doesn’t have any type-level guarantee of when it’s handling a bool expression versus an int expression, so it can’t safely give results where the type of the result varies based on the result of the expression.
Trying to do Better with Ordinary Variants
We persist and try to encode the typing rules within the DSL directly for the expr and value types.
| |
implementations make it seem like they work as intended:
| |
There are some problems to this implementation:
- the inner and outer types are always going to be the same, we can’t inter-op with types as we need.
1 2 3 4 5 6 7If (Eq (i 3, i 4), i 0, i 1);; (* Line 1, characters 9-12: Error: This expression has type int expr but an expression was expected of type bool expr Type int is not compatible with type bool *)
Key Problem
The way we want to use the type param isn’t supported by ordinary variants – we want the type param to be populated in different ways in the different tags and to depend in non-trivial ways on the types of the data associated with each tag.
that’s what GADTs are for.
GADTs to the Rescue
Syntax
Here’s the GADT type signature.
1 2 3 4 5 6 7 8 9type _ value = | Int : int -> int value | Bool : bool -> bool value type _ expr = | Value : 'a value -> 'a expr | Eq : int expr * int expr -> bool expr | Plus : int expr * int expr -> int expr | If : bool expr * 'a expr * 'a expr -> 'a exprwhen we see the type declaration, the use of the
_is syntactic sugar for “This type has one parameter, but I’m not going to name it because I don’t need to.” So for type declaration, it’s like saying “I’m ignoring this type name”it’s an interesting parallel to the use of
_in the context of pattern matching where_means “I’m ignoring this value name”.the colon to the right of the tags tells us that it’s a GADT
consider the definition of a tag:
<tag name> : <function signature>.GOTCHA: when we say things like “type parameter can be determined by the tag”, it’s NOT JUST the tag name that we are referring to, it’s the annotation typically (i.e. is it algebraic or concrete?)
- LHS: the tag name
- RHS of
:can be seen as a single-arg function type \(\implies\) it’s like the type constructor.LHS of
->:- types of the args of the constructor
RHS of
->:- shows the types of the args of the constructed value
- is an instance of the type of the overall GADT
The type parameter CAN depend on BOTH the tag and the type of arguments.
What this really means is “can I tell the type param by looking at the tag?”. This includes the tag name as well as the annotations for that tag. Annotations may imply that the types are concrete OR have some usage of algebraic types. If they’re concrete then we can just know the type based on the tag. Else, we have to resolve what the algebraic type would actually be.
Eqis an example where the type param only depends on the tag. That’s why we already have the type concretized tointandboolEq : int expr * int expr -> bool exprIfis an example where the type param depends on the args to the tag: i.e. the type param of theIfis the type param of thethenandelseclauses, that’s why we’re using the'aalgebraic type.If : bool expr * 'a expr * 'a expr -> 'a expr.
Observations
Given the GADT above, we have the following code that gives some observations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42(* examples of type safety @ construction *) (*============================================= *) let i x = Value (Int x) and b x = Value (Bool x) and (+:) x y = Plus (x,y);; (* val i : int -> int expr = <fun> val b : bool -> bool expr = <fun> val ( +: ) : int expr -> int expr -> int expr = <fun> *) i 3;; (* - : int expr = Value (Int 3) *) b 3;; (* Line 1, characters 3-4: Error: This expression has type int but an expression was expected of type bool *) i 3 +: i 6;; (* - : int expr = Plus (Value (Int 3), Value (Int 6)) *) i 3 +: b false;; (* Line 1, characters 8-15: Error: This expression has type bool expr but an expression was expected of type int expr Type bool is not compatible with type int *) (* GADT-based evaluator function that doesn't need any dynamic type-safety checks: *) (* ========================================== *) let eval_value : type a. a value -> a = function | Int x -> x | Bool x -> x;; (* NOTE: the value generated for this function matches what our type annotation is!: val eval_value : 'a value -> 'a = <fun> *) let rec eval : type a. a expr -> a = function | Value v -> eval_value v | If (c, t, e) -> if eval c then eval t else eval e | Eq (x, y) -> eval x = eval y | Plus (x, y) -> eval x + eval y;; (* val eval : 'a expr -> 'a = <fun> *)we see that we’ve enforced type safety. The rules for the type safety are directly encoded in the definition of the expression type. We have a single
In our attempt without GADTs,we had to do the type-safety enforcement via signature-level restrictions on phantom types. In particular we had to alias the
'a t(abstract type) toexpr(concrete type defined prior).Showing expression type vs signature-based :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27type _ expr = | Value : 'a value -> 'a expr | Eq : int expr * int expr -> bool expr | Plus : int expr * int expr -> int expr | If : bool expr * 'a expr * 'a expr -> 'a expr (* -- previously, we enforced this type safety with signature-level restrictions on phantom types *) module Typesafe_lang : Typesafe_lang_sig = struct (* phantom type *) type 'a t = expr (*links the abstract type to the concrete expressions -- this is aliasing of the phantom type*) let int x = Value (Int x) let bool x = Value (Bool x) let if_ c t e = If (c, t, e) let eq x y = Eq (x, y) let plus x y = Plus (x, y) let int_eval expr = match eval expr with | Int x -> x | Bool _ -> raise Ill_typed let bool_eval expr = match eval expr with | Bool x -> x | Int _ -> raise Ill_typed endWith GADT, we have a single polymorphic
evalfunction. Previously, we needed to have two concretely typed specific evaluators when using phantom types.Downside: when using GADTs, the code using them needs extra type annotations:
1 2 3 4 5 6 7 8 9 10let eval_value : type a. a value -> a = function | Int x -> x | Bool x -> x;; (* without annotations, we'd have gotten something like the following error: Error: This pattern matches values of type bool value but a pattern was expected which matches values of type int value Type bool is not compatible with type int *)
GADTs, Locally Abstract Types, and Polymorphic Recursion
We’ve seen the correct type-annotated implementation for eval already.
Carrying on from before, we explore the need to have specific syntax for typing when our function is a polymorphic (GADT with abstract types) AND also recursive.
Problem
evalis both polymorphic and recursive which makes this complex.polymorphic:
The reason we need to do that is because OCaml, by default, is not willing to instantiate the ordinary type variables in DIFFERENT ways (as int vs as bool) within the body of the same function but that’s what we want here. Without the specific syntax, there’s some type restriction that happens and it eventually causes the type mismatch that results in the error we see in the preceding code block.
recursive:
recursive code that is also polymorphic may be prone to typing errors because the compiler may be forced to unify
a(the concrete version of the abstract type'a) across all recursive calls IF there’s a leakage of type from inner context to a more outer context. Could be in the form of an entire concretization of the abstract type or could be just a type restriction applied.
Longer Elaboration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63### Understanding "Escaping of Scope" in the Context of GADTs and Locally Abstract Types Your notes discuss a complex but important concept in OCaml's type system involving GADTs, polymorphic recursion, and locally abstract types. *** #### What does "escaping of scope" mean? - A **type variable (like `'a`) "escapes its scope"** when it is introduced or bound in a *local* context (like inside a pattern match or a locally abstract type annotation) but then tries to leak into a broader context where it was not intended to be visible. - In OCaml's type-checking, especially with GADTs and polymorphic recursion, **type variables introduced locally in pattern matching or annotations are allowed to be specialized within that local scope only.** They represent *fresh*, *locally abstract* types. - If, during type-checking, these variables would have to be used or unified with types outside of their defining scope, the compiler reports a scope escape error or a type error, because it breaks the correctness guarantees of the type system. *** #### How does this relate to your `eval` function example? - Your function `eval` is recursive and polymorphic: ```ocaml let rec eval (type a) (e : a expr) : a = ... ``` Here, `(type a)` introduces a *locally abstract type variable* `a` scoped to the function body. - When recursive calls to `eval` occur, the compiler attempts to unify or merge those recursive calls’ types. - Without explicit polymorphic annotation (`let rec eval : 'a. 'a expr -> 'a = ...`), the locally abstract type `a` **would leak (escape) from inside the recursive call's scope into the outer function's scope**, forcing the compiler to unify `a` across all recursive calls. - This escape causes the compiler to expect that `a` always equals a specific type (e.g., `bool`), leading to failures when other branches pass expressions with different types like `int`. - The fix, as your notes outline, is to give `eval` a **polymorphic (for all `a`) type signature** explicitly to ensure that `a` doesn’t escape — each recursive call is allowed a fresh, independent instantiation of `a`. *** #### Intuition summary: | Phrase | Meaning | |------------------------|---------------------------------------------------------------------------------------------| | *Locally abstract type* | A type variable introduced locally (inside a function, pattern match, or annotation). | | *Escaping its scope* | When such a type variable would have to be valid outside its defining local context. | | *Why a problem?* | Violates assumptions that the type variable is fresh and isolated; type safety breaks down. | | *How to fix?* | Use explicit polymorphic recursion syntax so each recursive call gets a fresh type instance. | *** #### Your notes align strongly with formal explanations found in these references: - [OCaml Manual Chapter on GADTs (Section on Existential and Locally Abstract Types)](https://ocaml.org/manual/gadts.html) - [Real World OCaml — GADTs chapter](https://dev.realworldocaml.org/gadts.html) - Thomas Leonard's article on OCaml’s locally abstract types and polymorphic recursion *** If you want, I can explain with a detailed example showing the scope of the locally abstract type and how polymorphic recursion fixes the escape problem, so let me know! [1](https://ocaml.org/manual/5.2/gadts-tutorial.html) [2](https://dev.realworldocaml.org/gadts.html) [3](https://discuss.ocaml.org/t/strange-grammar-in-the-real-world-ocaml-book-chapter-gadts/10093) [4](https://stackoverflow.com/questions/27864200/an-concrete-simple-example-to-demonstrate-gadt-in-ocaml) [5](https://www.math.nagoya-u.ac.jp/~garrigue/papers/ml2011.pdf) [6](https://www.cl.cam.ac.uk/teaching/1617/L28/gadts.pdf) [7](http://pauillac.inria.fr/~remy/gadts/) [8](https://www.cl.cam.ac.uk/teaching/1415/L28/gadts.pdf)
Locally abstract types as a solution
This type-restriction is a problem that we desire to “fix” and we do that by adding a locally abstract type that doesn’t have that restriction.
We might try this annotation, but it will fail because
evalis recursive in nature and inference of GADTs doesn’t play well with recursive calls.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19let eval_value (type a) (v : a value) : a = match v with | Int x -> x | Bool x -> x;; (* val eval_value : 'a value -> 'a = <fun> *) let rec eval (type a) (e : a expr) : a = match e with | Value v -> eval_value v | If (c, t, e) -> if eval c then eval t else eval e | Eq (x, y) -> eval x = eval y | Plus (x, y) -> eval x + eval y;; (* Line 4, characters 43-44: Error: This expression has type a expr but an expression was expected of type bool expr Type a is not compatible with type bool *)The main problem is that
evalis recursive so the type-checker is trying to merge the locally abstract typeainto the type of the re cursive functionevaland the way thatais escaping its scope (inner to outer) is when merging it into the outer scope within whichevalis defined, that’s why it’s expecting typebool expr.FIX: explicitly marking
evalas polymorphic and that’s where the type annotation syntax comes from:let rec eval : 'a. 'a expr -> 'a A = .... IFevalis marked as polymorphic, then the type ofevalis not specialised toasoadoesn’t escape its scope.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17(* locally abstract syntax + marked as polymorphic *) let rec eval : 'a. 'a expr -> 'a = fun (type a) (x : a expr) -> (* <-- this is an anon function definition, taking an explicit argument with explicit type annotations*) match x with | Value v -> eval_value v | If (c, t, e) -> if eval c then eval t else eval e | Eq (x, y) -> eval x = eval y | Plus (x, y) -> eval x + eval y;; (* val eval : 'a expr -> 'a = <fun> *) (* without the syntax (code won't work): *) let rec eval (type a) (e : a expr) : a = match e with | Value v -> eval_value v | If (c, t, e) -> if eval c then eval t else eval e | Eq (x, y) -> eval x = eval y | Plus (x, y) -> eval x + eval y;;type variable escaping its scope
when we have leakage of the concretization of an abstract type from inner to outer context (in the case of a recursive function call, it’s an outer to inner function call and back again the other way).
By marking
evalas polymorphic, then the type ofevalis not specialised toasoadoesn’t escape its scope.
evalbeing an example of polymorphic recursionAnother reason the compiler can’t infer the types automatically.
for example, with
If, since theIfitself must be of typebool, but the type of the then and else clauses could be of typeint. This means that when evaluatingIf, we’ll dispatchevalat a different type than it was called on.evalneeds to see itself as polymorphic
using the syntactic sugar
For Recursive functions that use GADTs, our type annotation should be able to do the following for it to feel good:
We can combine the two things:
- polymorphism annotation and
- the creation of the locally abstract types.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17(* this uses the syntactic sugar *) let rec eval : type a. a expr -> a = function | Value v -> eval_value v | If (c, t, e) -> if eval c then eval t else eval e | Eq (x, y) -> eval x = eval y | Plus (x, y) -> eval x + eval y;; (* val eval : 'a expr -> 'a = <fun> *) (* the more verbose (and correct) way earlier was: *) let rec eval : 'a. 'a expr -> 'a = fun (type a) (x : a expr) -> match x with | Value v -> eval_value v | If (c, t, e) -> if eval c then eval t else eval e | Eq (x, y) -> eval x = eval y | Plus (x, y) -> eval x + eval y;; (* val eval : 'a expr -> 'a = <fun> *)The syntax
: type a.(withfunctionafterward) tells the compiler:“Here I am explicitly introducing a locally abstract type
awhich will be instantiated freshly in each match branch or recursive call.”- the concretisation will be local to the function body and recursive calls, thereby preventing the type inference leakage
This abstracts
'aas a local type variable (a), allowing its different instantiations during recursion or pattern matching.
When are GADTs useful?
Use case: Varying your Return type
We are familiar with cases where return types vary based on the input type but usually these involve simple dependencies between types that correspond to the data flow within our code. List.find is a good example.
We might want more flexibility, e.g. for List.find on not found cases, we want to make it such that we can choose what returns.
A typical variant vs a GADT:
| |
Using the normal variant almost seems like what we want (to control the return type of the function based on our inputs in a non-trivial manner). The problem is that we don’t exactly achieve that because our output is always wrapped in an option:
| |
Here, our GDAT will work here. In our GDAT, we have two type params:
- one for the type of the list element, and
- one for the return type of the function.
These type params will be locally abstract in their definition.
Now, when we implement the flexible_find function, we get the behaviour we really want:
| |
Use Case: Capturing the Unknown
We’ve seen unknown types being used e.g. in the case of tuples. The abstract type variables may be universally quantified i.e. applies for all types that are provided.
For some cases, we might wish to have a existentially quantified type variables instead. This means the variables are a particular but unknown type. GADTs can help offer a natural way to do this.
| |
we know that 'a is existentially quantified because it is on the LHS of the arrow but not the RHS \(\implies\) the 'a that appears internally, doesn’t appear in the type param for stringable itself.
this makes 'a (existentially quantified type) bound within the definition of stringable. The type of the underlying object is existentially bound within type stringable, any function that tries to return such a value won’t type-check:
| |
In the error message, the type variable $Stringable_'a has 3 parts:
- the
$marks the variable as existential - the
Stringableis the name of the GADT tag this variable came from - the
'ais the name of the type variable from inside that tag.
Use case: Abstracting Computational Machines
GADTs are useful for writing combinators. The combinator pattern:
- allow the combination of small components into larger computational machines.
- are component-combining functions
Consider a pipeline pattern, a sequence of steps where each consumes the output of the previous step, possibly doing some side-effects and returning a value for the next step. We’re thinking of a custom pipeline, beyond the pipeline operator:
profiling the ability to profile each step of the pipeline and report it
control over execution
allowing users to pause the execution and restart at a later time
Custom error handling
pipeline knows where it failed and offers the possibility of restarting
Without Using GADT: concrete combinators
We can define, build and execute pipelines without GADTs by manually adding in functions that define the empty case (just to seed) and how we can build onto and existing pipeline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41(* signature of Pipeline *) module type Pipeline = sig type ('input,'output) t (* type ('a, 'b) t means input of type 'a output of type 'b *) (* for adding a step to the pipeline *) val ( @> ) : ('a -> 'b) -> ('b,'c) t -> ('a,'c) t (* for seeding the pipeline *) val empty : ('a,'a) t end module Example_pipeline (Pipeline : Pipeline) = struct open Pipeline let sum_file_sizes = (fun () -> Sys_unix.ls_dir ".") @> List.filter ~f:Sys_unix.is_file_exn @> List.map ~f:(fun file_name -> (Core_unix.lstat file_name).st_size) @> List.sum (module Int) ~f:Int64.to_int_exn @> empty end;; (*module Example_pipeline : functor (Pipeline : Pipeline) -> sig val sum_file_sizes : (unit, int) Pipeline.t end *) (* ---- basic pipeline implementation: pipeline can be defined as a simple function and we use the @> to compose. Pipeline execution is just function application. *) module Basic_pipeline : sig include Pipeline val exec : ('a,'b) t -> 'a -> 'b end= struct type ('input, 'output) t = 'input -> 'output let empty = Fn.id let ( @> ) f t input = t (f input) let exec t input = t input endThe
Basic_pipelinedoesn’t do any better than the|>operator though.To get better functionality, we have 2 choices:
we improve on the pipeline type and give it extra structures, handle exceptions and all the fun stuff
problem: we have to pre-commit to whatever services we’re supporting and we have to embed them in our pipeline representation
use GADTs to abstractly represent the pipeline we want and then build the functionality we want on top of that
Using GADT
Instead of concretely building a machine for executing a pipeline, we can use GADTs to abstractly represent the pipeline we want, and then build the functionality we want on top of that representation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42type (_, _) pipeline = | Step : ('a -> 'b) * ('b, 'c) pipeline -> ('a, 'c) pipeline (* corresponds to the @> operator *) | Empty : ('a, 'a) pipeline (* corresponds to the empty pipeline *) (*=== 1: basic functionality *) (* pipeline seed + extend functionality: *) let ( @> ) f pipeline = Step (f,pipeline);; (* val ( @> ) : ('a -> 'b) -> ('b, 'c) pipeline -> ('a, 'c) pipeline = <fun> *) let empty = Empty;; (* val empty : ('a, 'a) pipeline = Empty *) (* pipeline exec functionalitlly -- polymorphic recursion:*) let rec exec : type a b. (a, b) pipeline -> a -> b = fun pipeline input -> match pipeline with | Empty -> input | Step (f, tail) -> exec tail (f input);; (* val exec : ('a, 'b) pipeline -> 'a -> 'b = <fun> *) (*==== 2: extra functionality example: *) let exec_with_profile pipeline input = let rec loop : type a b. (a, b) pipeline -> a -> Time_ns.Span.t list -> b * Time_ns.Span.t list = fun pipeline input rev_profile -> match pipeline with | Empty -> input, rev_profile | Step (f, tail) -> let start = Time_ns.now () in let output = f input in let elapsed = Time_ns.diff (Time_ns.now ()) start in loop tail output (elapsed :: rev_profile) in let output, rev_profile = loop pipeline input [] in output, List.rev rev_profile;; (* val exec_with_profile : ('a, 'b) pipeline -> 'a -> 'b * Time_ns.Span.t list = <fun> *)When we extend functionality, the typing doesn’t need to change. It’s the logic in our function that uses GADTs that can change to update functionality.
Benefits of using GADT over having combinators that build a concrete computational machine:
- simpler core types that are typically built from GADT tags taht are reflections of the types of the base combinators
- more modular design, core types don’t need to contemplate on every possible use you wanna make of them
- more efficient code, because the more concrete approach would allocate closures to wrap up the necessary functionality and closures are more heavyweight than GADT tags.
Use Case: Narrowing the Possibilities (set of possible states)
We can narrow the set of possible states for a given data-type in different circumstances. Useful for managing complex application state.
Here’s how logon requests may be managed in a non-GDAT approach:
| |
Tracking the (application) state at the type level
Instead, we can deal with complexity of cases by tracking the state of a given request at the type level and using that to narrow the set of states a given request can be in which helps us remove extra case-analysis and error-handling and reduce the complexity of the code as well as remove opportunities for mistakes.
At the type level, we could:
mint different types for different states of the request (e.g. incomplete request, mandatory request…)
this becomes verbose
use GADTs to track the state of the request within a type parameter and use the parameter to narrow the set of available cases without duplicating the type.
A completion-sensitive option type
We can have distinct types that we use as markers of different states. Using the completion states, we can do a completeness-sensitive option type via a GADT where:
- one type variable is the type of contents of the option
- the second indicates whether it is at an incomplete state
1 2 3 4 5 6type incomplete = Incomplete type complete = Complete type (_, _) coption = | Absent : (_, incomplete) coption | Present : 'a -> ('a, _) coptionWe didn’t use
completehere explicitly, only have usedincomplete. This is so that only anincomplete coptioncan beAbsentand only acomplete coptioncan bePresent.The examples will show the narrowing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34let get ~default o = match o with | Present x -> x | Absent -> default;; (* val get : default:'a -> ('a, incomplete) coption -> 'a = <fun> *) (*^ here, the incomplete type was inferred here (based on the type def of coption*) (* [1] this wont compile (if we force the annotation as complete instead of incomplete): (* let get ~default (o : (_,complete) coption) = match o with | Absent -> default | Present x -> x;; Line 3, characters 7-13: Error: This pattern matches values of type ('a, incomplete) coption but a pattern was expected which matches values of type ('a, complete) coption Type incomplete is not compatible with type complete , *) *) (* [2] we can force it to compile by removing the absent branch from the function *) let get (o : (_,complete) coption) = match o with | Present x -> x;; (* val get : ('a, complete) coption -> 'a = <fun> *) (* or, more simply written as: *) let get (Present x : (_,complete) coption) = x;; (* let get (Present x : (_,complete) coption) = x;; *)when the coption is known to be complete, the pattern matching is narrowed to just the
Presentcase.By doing the type annotation on the GADT cases, we restrict what that function can be used for and that’s how we narrowed the type down.
A completion-sensitive request type
Similar idea but instead of using option-wrapping we’re looking into request wrapping. The general usage is for error-aware code but using GADTs for various cases.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38(* completion-sensitive version of logon_request record: *) type 'c logon_request = (*NOTE: the 'c is the type param that marks completeness state. If the outer request is marked as complete, it covers the inner fields' completeness as well.*) { user_name : User_name.t ; user_id : (User_id.t, 'c) coption ; permissions : (Permissions.t, 'c) coption } (* easy to fill in values for the fields (it's using the with-update syntax below:) *) let set_user_id request x = { request with user_id = Present x };; (* val set_user_id : 'a logon_request -> User_id.t -> 'a logon_request = <fun> *) let set_permissions request x = { request with permissions = Present x };; (* val set_permissions : 'a logon_request -> Permissions.t -> 'a logon_request = *) (* <fun> *) (* === NOTE: we should have a explicit completeness check, which constructs a version of the record with the completed fields filled in *) (* --- this is a polymorphic completeness check: includes both complete and incomplete return types *) let check_completeness request = match request.user_id, request.permissions with | Absent, _ | _, Absent -> None | (Present _ as user_id), (Present _ as permissions) -> Some { request with user_id; permissions };; (* val check_completeness : incomplete logon_request -> 'a logon_request option = <fun> *) (* --- this is a restricted return type (return value explicitly returns a complete request):*) let check_completeness request : complete logon_request option = match request.user_id, request.permissions with | Absent, _ | _, Absent -> None | (Present _ as user_id), (Present _ as permissions) -> Some { request with user_id; permissions };; (* val check_completeness : incomplete logon_request -> complete logon_request option = <fun> *)And because of that, we can finally write an authorisation checker:
1 2 3 4let authorized (request : complete logon_request) = let { user_id = Present user_id; permissions = Present permissions; _ } = request in Permissions.check permissions user_id;; (* val authorized : complete logon_request -> bool = <fun> *)We realise that most of the time, this kind of type narrowing is NOT worth the complexity of setting it up unless our FSM is sufficiently complex – in which case cutting down the possibilities that our code needs to handle / contemplate can make a big difference to the comprehensibility and correctness of the result.
type distinctness and abstraction
Our completeness types just needed them to have different states marked. We didn’t need to define obviously different types to them. This is because OCaml’s variant types are nominal.
Variant, Record and Abstract types are nominally typed in OCaml.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21(* older version: *) type incomplete = Incomplete (* Incomplete is a tag for the variant incomplete *) type complete = Complete (* this works *) type incomplete = Z (* Z is a tag of a variant type *) type complete = Z (* here, Z is the constructor for the variant type *) let i = (Z : incomplete) and c = (Z : complete);; (* we can narrow a pattern match using types as indices: *) type ('a, _) coption = | Absent : (_, incomplete) coption | Present : 'a -> ('a, _) coption (* since we mark complete case only, we only need to contemplate the Present case *) let assume_complete (coption : (_,complete) coption) = match coption with | Present x -> x;; (* val assume_complete : ('a, complete) coption -> 'a = <fun> *)In some ways this type-specific contemplating makes me remember the way we overload names (with different patterns within the parameters) to handle different cases in different functions.
Possible Type Distinctness problem:
the way we expose these types through an interface can cause OCaml to lose track of the distinctness of the types in question. When creating types to act as abstract markers for the type parameter of a GADT, you should choose definitions that make the distinctness of those types clear, and you should expose those definitions in your
mlis.Distinctness problems that give rise to non-exhaustiveness:
In the first example below, we hide the implementation of the marker type so there’s a distinctness problem.
second case, we see that the exposed types are NOT definitively different (they would have been if we defined them as variants with differently named tags [like originally])
this is especially so because types appearing differently in an interface (the sig) may turn out to be the same in the implementation (see 3rd attempt)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62(* ===== 1: the definition of complete and incomplete is hidden *) module M : sig type incomplete (* -- this is what hides the definition*) type complete end = struct type incomplete = Z type complete = Z end include M type ('a, _) coption = | Absent : (_, incomplete) coption | Present : 'a -> ('a, _) coption (*this is no longer exhaustive and it expectes teh Absent case to be matched*) let assume_complete (coption : (_,complete) coption) = match coption with | Present x -> x;; (* Lines 2-3, characters 5-21: Warning 8 [partial-match]: this pattern-matching is not exhaustive. Here is an example of a case that is not matched: Absent val assume_complete : ('a, complete) coption -> 'a = <fun> *) (* =============== Attempt 2: exposing the implementation of the marker types ============ *) module M : sig type incomplete = Z (* -- exposed the definition of the marker types (the variant tag) *) type complete = Z end = struct type incomplete = Z type complete = Z end include M type ('a, _) coption = | Absent : (_, incomplete) coption | Present : 'a -> ('a, _) coption (* Still not exhaustive: *) let assume_complete (coption : (_,complete) coption) = match coption with | Present x -> x;; (* Lines 2-3, characters 5-21: Warning 8 [partial-match]: this pattern-matching is not exhaustive. Here is an example of a case that is not matched: Absent val assume_complete : ('a, complete) coption -> 'a = <fun> *) (*===== 3: types appearing as distinct in the interface (the sig) may have the same implementation: *) module M : sig (*-- sig is the interface*) type incomplete = Z type complete = Z end = struct type incomplete = Z type complete = incomplete = Z (* same implementation *) end
narrowing without GADTs (using uninhabitable types instead)
OCaml can eliminate impossible cases from ordinary variants too, not just for GADTs.
The idea is to (just like GADTs) demonstrate that the case in question is impossible at the type level. We need to use an uninhabited type (i.e. type with no associated values). This can be created using a variant with no tags.
Baseoffers aNotion.tas a standard uninhabited type.PL_DESIGN: This allows us to have refutation cases:
the period in the final case tells the compiler that we believe this case can never be reached, and OCaml will verify that it’s true. In some simple cases, however, the compiler can automatically add the refutation case for you, so you don’t need to write it out explicitly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19(* custom uninhabited type *) type nothing = | (*nothing is a variant with no tags*) open Stdio;; (* here, the Error case contains an uninhabited type (Nothing.t) , so we can refute that case in the branches. *) let print_result (x : (int, Nothing.t) Result.t) = match x with | Ok x -> printf "%d\n" x | Error _ -> .;; (* val print_result : (int, Nothing.t) result -> unit = <fun *) (* actually this function is pretty simple, the compiler can add the refutation case automatically for us *) let print_result (x : (int, Nothing.t) Result.t) = match x with | Ok x -> printf "%d\n" x;; (* val print_result : (int, Nothing.t) result -> unit = <fun> *)Use-case for uninhabitable types
When using a highly configurable library that supports multiple different modes of use and we don’t need to use all of them for a given application. (e.g.
Async’s RPC lib)In this example, we see narrowing being applied to code that isn’t designed with narrowing in mind.
AsyncRPCs have aState_rpcflavor of interaction, which is parameterised via 4 types for diff kinds of data:- query: initial client request
- state: initial snapshot from server
- update: sequence of updates to that snapshot
- error: error for stream termination
Imagine we want to use
State_rpcwithout needing to terminate the stream with a custom error. We could consider 2 options:so we can instantiate
State_rpcusing theunittype for the error type. \(\implies\) we’d still need to handle the error case for the dispatching code1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18open Core open Async let rpc = Rpc.State_rpc.create ~name:"int-map" ~version:1 ~bin_query:[%bin_type_class: unit] ~bin_state:[%bin_type_class: int Map.M(String).t] ~bin_update:[%bin_type_class: int Map.M(String).t] ~bin_error:[%bin_type_class: unit] (*-- using unit type here for error*) () let dispatch conn = match%bind Rpc.State_rpc.dispatch rpc conn () >>| ok_exn with | Ok (initial_state, updates, _) -> handle_state_changes initial_state updates | Error () -> failwith "this is not supposed to happen";; (* val dispatch : Rpc.Connection.t -> unit Deferred.t = <fun> *)using an uninhabited type for the error \(\implies\) narrowed down the cases and the use of the
errortype is just banned.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15let rpc = Rpc.State_rpc.create ~name:"foo" ~version:1 ~bin_query:[%bin_type_class: unit] ~bin_state:[%bin_type_class: int Map.M(String).t] ~bin_update:[%bin_type_class: int Map.M(String).t] ~bin_error:[%bin_type_class: Nothing.t] (*--- using Nothing.t (uninhabited type) *) () let dispatch conn = match%bind Rpc.State_rpc.dispatch rpc conn () >>| ok_exn with | Ok (initial_state, updates, _) -> handle_state_changes initial_state updates;; (* val dispatch : Rpc.Connection.t -> unit Deferred.t = <fun> *)
Limitations of GADTs
We saw the utility and the complexity. Now it’s time to see some limitations and how to navigate around them.
Limitation: Or-patterns
GADTs don’t work well with or-patterns.
Or-patterns won’t work if we merge cases together if we end up using the type information that is discovered during the pattern match (the concretisation of the locally abstract type).
This is annoying but not a deal-breaker because we can just extract out the logic into a function and call the same function for the different arms that we wanted to merge via or-patterns anyway.
| |
TODO Deriving Serializers
*NOTE: I skipped this, my brain got fried. 🧠
Serialisers like [@@derive sexp] won’t work with GADTs.
LIMITATIONS: There’s a dependency between the value of an argument and the type of the returned value, which is something that OCaml’s type system can’t express. Therefore, in the code below, it’s not possible for the
| |
Chapter 10: Functors
We shift our focus back to OCaml’s modules.
Beyond the purpose of organising code into units with specified interfaces, we wish to see how they are useful in building generic code and structuring large-scale systems. Functors are essential for this.
Functors: functions from modules to modules, useful to structure:
Dependency Injection
- components of a system can become swappable
- useful for mocking for testing and simulation
Auto extension of modules
- standardised way of extending modules with new functionality
- we can write logic once and apply to different types
Instantiating modules with state
- functors allow automation of the construction of modules that have their own separate and independent mutable state
Functors are powerful for modularising code.
Costs of using Functors:
- syntactically heavyweight compared to other language features
- tricky issues to understand to use them properly – with sharing constraints and destructive substitution being some of them on that list
for small and simple programs, heavy use of functors is probably a mistake. But as your programs get more complicated and you need more effective modular architectures, functors become a highly valuable tool.
A trivial example
| |
Functor syntax is a little extra:
require explicit type annotations (module type annotations), ordinary functions don’t
- only the type on the input is mandatory
Nice to haves in practice:
should constraint the module returned by the function
If we don’t, then the output type is inferred more specifically instead of being referenced by the named signature (
X_int)1 2 3 4 5(* note the output type: it's written explicitly instead of being referenced by the named signatture *) module Increment (M : X_int) = struct let x = M.x + 1 end;; (* module Increment : functor (M : X_int) -> sig val x : int end *)should also use an
mlieven if it’s not mandatory
Signature matching is what determines interoperability between functors and other modules. This satisfiability of signatures works similarly to OOP languages (Liskov Substitution Principle – between ‘Parent/child types’).
we can apply
Incrementto any module whose signature satisfies interfaceX_int. This is similar to how contents of anmlfile must satisfy themli.- Substitution loss: in the example below
Three_and_moresatisfiesX_intand thereforeIncrementcan be used but because it has other fields (y), those fields are ignored byIncrement. That’s why we seeFournot having the fieldy.
- Substitution loss: in the example below
module type can omit some information available in the module, either by dropping fields or leaving some fields abstract
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18(* satisfies X_int => can be used by Increment *) module Three = struct let x = 3 end;; (* module Three : sig val x : int end *) module Four = Increment(Three);; (* module Four : sig val x : int end *) Four.x - Three.x;; (* - : int = 1 *) (* extending the interface (and it still matches) *) module Three_and_more = struct let x = 3 let y = "three" end;; (* module Three_and_more : sig val x : int val y : string end *) module Four = Increment(Three_and_more);; (* module Four : sig val x : int end *) Four.x - Three_and_more.x;; (* - : int = 1 *)
A bigger example: computing with intervals
We shall use functors to define a generic interval library that can be used with any type that supports a total ordering on the underlying set.
working with generic intervals usually has the usual operations:
- emptiness check
- containment check
- intersection of intervals…
| |
Notes:
the Functor’s body has implementations of useful functions/primitives to interact with Intervals
in the examples where we directly applied the functor to aligned libaries (Int, String)… there’s some learnings:
- the use of standardised interfaces (e.g. comparable) is what makes the codebase consistent and predictable
- also helps us use functors more easily.
the functor is NOT abstract and that’s a problem because some of the invariants (like low <= high) is being manually handled via a creator function which can be bypassed.
The use of
createis like an initiator factory that does some dynamic logical checks to see if the input params are right (low <= high).However, it’s possible to bypass it:
1 2 3 4 5 6 7 8Int_interval.is_empty (* going through create *) (Int_interval.create 4 3);; (* - : bool = true *) Int_interval.is_empty (* bypassing create by using the Interval constructor*) (Int_interval.Interval (4,3));; (* - : bool = false *)
Making the functor abstract
We can make the functor abstract by restricting the output of Make_interval with an interface.
In the code below, we get an abstract functor, but the type endpoint is not exposed and so we can’t construct an interval yet.
However, we see the following happen:
we have a reference to the endpoint type within the new interface
the functor returns this interface which is more restrictive than the anon one before.
in the interface, we see the use of an abstract type
endpointwhich intentionally creates an abstraction barrier: users of this module will not know what typeendpointis concretised to since that type info is NOT exposed via the interface.Typically this is a good thing because:
prevents dependencies on implementation details
creates ADTs safely
allows different internal representations later without breaking callers (e.g. by concretising it to a particular type and losing its abstract typing.)
| |
By “not exposed” we mean that the Int_interval.endpoint is still abstract. If we were to expose it, then we can say that endpoint is equal to Int.t (or generally speaking, Endpoint.t where Endpoint is the argument to the functor). A type is not exposed when it appears abstract in a module’s interface — that is, its name is visible, but its concrete implementation (what it’s equal to) is hidden.
This happens because the module’s signature does not include a manifest equality for the type.
In such cases, clients cannot create, deconstruct, or assume equality between that type and another; they can only manipulate it through the operations provided by the module.
sharing constraints
Other than explicitly tying down the type in the interface, one way we can expose the abstract type is by sharing constraints – compiler gets told that a given type equals some other type.
One way is via a sharing constraint that tells the compiler to expose the fact that a given type is equal to some other type. Syntax: <Module_type> with type <type1> = <type1'> and type <type2> = <type2'>
This gives a new signature that’s been modified so that the type1 defined within the module is equal to the type1' whose definition is outside of it.
doing the sharing @ the interface
We can make a specialised version of the
Interval_intand get a specialised version for ints by havingInt_interval_intfwhere we share the constraint for the endpoint type:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36(* the original, abstract interface without the constraint on the type: *) module type Interval_intf = sig type t type endpoint (*-- this is an abstract type from the POV of consumers of this module, it gives us a way of referring to the endpoint type*) val create : endpoint -> endpoint -> t val is_empty : t -> bool val contains : t -> endpoint -> bool val intersect : t -> t -> t end;; (* module type Interval_intf = sig type t type endpoint val create : endpoint -> endpoint -> t val is_empty : t -> bool val contains : t -> endpoint -> bool val intersect : t -> t -> t end *) (* using a sharing constraint to create a specialized version of Interval_intf: *) module type Int_interval_intf = Interval_intf with type endpoint = int;; (* module type Int_interval_intf = sig type t type endpoint = int val create : endpoint -> endpoint -> t val is_empty : t -> bool val contains : t -> endpoint -> bool val intersect : t -> t -> t end *)
doing the sharing @ the functor
We can also use the sharing constraints in the context of a functor.
Common use case: when we want to expose that the following are related:
some of the types of the module being generated by the functor
So this is the type
endpointin the new module (functor output)some of the types of the module being fed to the functor
And this is the type
Endpoint.tfrom the moduleEndpoint(which is the functor argument.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68module Make_interval(Endpoint : Comparable) : (Interval_intf with type endpoint = Endpoint.t) = struct type endpoint = Endpoint.t type t = | Interval of Endpoint.t * Endpoint.t | Empty (** [create low high] creates a new interval from [low] to [high]. If [low > high], then the interval is empty *) let create low high = if Endpoint.compare low high > 0 then Empty else Interval (low,high) (** Returns true iff the interval is empty *) let is_empty = function | Empty -> true | Interval _ -> false (** [contains t x] returns true iff [x] is contained in the interval [t] *) let contains t x = match t with | Empty -> false | Interval (l,h) -> Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0 (** [intersect t1 t2] returns the intersection of the two input intervals *) let intersect t1 t2 = let min x y = if Endpoint.compare x y <= 0 then x else y in let max x y = if Endpoint.compare x y >= 0 then x else y in match t1,t2 with | Empty, _ | _, Empty -> Empty | Interval (l1,h1), Interval (l2,h2) -> create (max l1 l2) (min h1 h2) end;; (* module Make_interval : functor (Endpoint : Comparable) -> sig type t type endpoint = Endpoint.t val create : endpoint -> endpoint -> t val is_empty : t -> bool val contains : t -> endpoint -> bool val intersect : t -> t -> t end *) (* ====== now we can do things that require the =endpoint= type to be exposed e.g. constructing intervals: *) module Int_interval = Make_interval(Int);; (* module Int_interval : sig type t = Make_interval(Base.Int).t type endpoint = int val create : endpoint -> endpoint -> t val is_empty : t -> bool val contains : t -> endpoint -> bool val intersect : t -> t -> t end *) let i = Int_interval.create 3 4;; (* val i : Int_interval.t = <abstr> *) Int_interval.contains i 5;; (* - : bool = false *)
an alternative: destructive substitution
Constraint sharing works but it’s a little ugly.
We can do better using destructive substitution. A bit of misnomer, there’s nothing destructive about destructive substitution, it’s just a way of creating a new signature by transforming an existing one.
destructive substitution of the interface signature
Here, we modify the signature of
Interval_intfand replacingendpointwithEndpoint.teverywhere, which deletes the definition ofendpointfrom the signature.Syntax looks similar to the sharing constraint but we use
:=instead of=.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28(* ==== using destructive substitution *) module type Int_interval_intf = Interval_intf with type endpoint := int;; (* module type Int_interval_intf = sig type t val create : int -> int -> t val is_empty : t -> bool val contains : t -> int -> bool val intersect : t -> t -> t end *) (* using a sharing constraint to create a specialized version of Interval_intf: *) module type Int_interval_intf = Interval_intf with type endpoint = int;; (* module type Int_interval_intf = sig type t type endpoint = int val create : endpoint -> endpoint -> t val is_empty : t -> bool val contains : t -> endpoint -> bool val intersect : t -> t -> t end *)
destructive substitution in the context of the functor
Similar to constraint sharing, we can do destructive substitution in the context of the functor. A little different from the substitution at the interface level, we have kept the type
tin this interface as abstract and have exposed the type of theendpoint. This allows us to create values of typeInt_interval.tusing the creation function but not using the constructors directly (which would have allowed the violation of the module).1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72module Make_interval(Endpoint : Comparable) : Interval_intf with type endpoint := Endpoint.t = struct (* the type t is abstract, and the type of the endpoint is exposed; so we can create values of type Int_interval.t *) type t = | Interval of Endpoint.t * Endpoint.t | Empty (** [create low high] creates a new interval from [low] to [high]. If [low > high], then the interval is empty *) let create low high = if Endpoint.compare low high > 0 then Empty else Interval (low,high) (** Returns true iff the interval is empty *) let is_empty = function | Empty -> true | Interval _ -> false (** [contains t x] returns true iff [x] is contained in the interval [t] *) let contains t x = match t with | Empty -> false | Interval (l,h) -> Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0 (** [intersect t1 t2] returns the intersection of the two input intervals *) let intersect t1 t2 = let min x y = if Endpoint.compare x y <= 0 then x else y in let max x y = if Endpoint.compare x y >= 0 then x else y in match t1,t2 with | Empty, _ | _, Empty -> Empty | Interval (l1,h1), Interval (l2,h2) -> create (max l1 l2) (min h1 h2) end;; (* module Make_interval : functor (Endpoint : Comparable) -> sig type t val create : Endpoint.t -> Endpoint.t -> t val is_empty : t -> bool val contains : t -> Endpoint.t -> bool val intersect : t -> t -> t end *) (* ======= applying them: *) module Int_interval = Make_interval(Int);; (* module Int_interval : sig type t = Make_interval(Base.Int).t val create : int -> int -> t val is_empty : t -> bool val contains : t -> int -> bool val intersect : t -> t -> t end *) Int_interval.is_empty (Int_interval.create 3 4);; (* - : bool = false *) (* attempting to bypass the creation function and using the constructor directly will error out *) Int_interval.is_empty (Int_interval.Interval (4,3));; (* Line 1, characters 24-45: Error: Unbound constructor Int_interval.Interval *)There’s no need to define the
endpointtype alias in the body of the module because theendpointtype is gone from the interface.
Using Multiple Interfaces
We wish to make our interval module serialisable – i.e. read and write intervals as a stream of bytes.
Our intermediate format for this shall be sexps: a parenthesised expression whose atoms are strings, it’s what Base uses as its common serialisation format.
For this case, we can use derivation annotations for types that will generate the converter functions (serialise and de-serialise).
We can’t directly apply the [@@deriving sexp] to type t within the Make_interval functor signature until we ensure that Endpoint.t satisfies Sexpable.S which is a signature that looks like this:
| |
We have two choices:
Now, we can modify
Make_intervalto use theSexpable.Sand theInterval_intfinterfaces. For theSexpable.Sinterface, we use destructive substitution on the typet.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17module type Interval_intf_with_sexp = sig include Interval_intf include Sexpable.S with type t := t end;; (* module type Interval_intf_with_sexp = sig type t type endpoint val create : endpoint -> endpoint -> t val is_empty : t -> bool val contains : t -> endpoint -> bool val intersect : t -> t -> t val t_of_sexp : Sexp.t -> t val sexp_of_t : t -> Sexp.t end *)we can define type
twithin the new module and apply destructive substitutions to all the included interfaces. This is cleaner when combining multiple interfaces because it correctly reflects that all of the signatures are being handled equivalently:Interval_intfSexpable.S
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106module type Interval_intf_with_sexp = sig type t include Interval_intf with type t := t include Sexpable.S with type t := t end;; (* module type Interval_intf_with_sexp = sig type t type endpoint val create : endpoint -> endpoint -> t val is_empty : t -> bool val contains : t -> endpoint -> bool val intersect : t -> t -> t val t_of_sexp : Sexp.t -> t val sexp_of_t : t -> Sexp.t end *) (* Now, we can update our functor: *) module Make_interval(Endpoint : sig type t include Comparable with type t := t include Sexpable.S with type t := t end) : (Interval_intf_with_sexp with type endpoint := Endpoint.t) = struct type t = | Interval of Endpoint.t * Endpoint.t | Empty [@@deriving sexp] (** [create low high] creates a new interval from [low] to [high]. If [low > high], then the interval is empty *) let create low high = if Endpoint.compare low high > 0 then Empty else Interval (low,high) (* NOTE: here, we wrapped the autogenerated [t_of_sexp] to enforce the invariants of the data structure -- modification *) let t_of_sexp sexp = match t_of_sexp sexp with | Empty -> Empty | Interval (x,y) -> create x y (*-- creator function is what enforces our invariants.*) (** Returns true iff the interval is empty *) let is_empty = function | Empty -> true | Interval _ -> false (** [contains t x] returns true iff [x] is contained in the interval [t] *) let contains t x = match t with | Empty -> false | Interval (l,h) -> Endpoint.compare x l >= 0 && Endpoint.compare x h <= 0 (** [intersect t1 t2] returns the intersection of the two input intervals *) let intersect t1 t2 = let min x y = if Endpoint.compare x y <= 0 then x else y in let max x y = if Endpoint.compare x y >= 0 then x else y in match t1,t2 with | Empty, _ | _, Empty -> Empty | Interval (l1,h1), Interval (l2,h2) -> create (max l1 l2) (min h1 h2) end;; (* module Make_interval : functor (Endpoint : sig type t val compare : t -> t -> int val t_of_sexp : Sexp.t -> t val sexp_of_t : t -> Sexp.t end) -> sig type t val create : Endpoint.t -> Endpoint.t -> t val is_empty : t -> bool val contains : t -> Endpoint.t -> bool val intersect : t -> t -> t val t_of_sexp : Sexp.t -> t val sexp_of_t : t -> Sexp.t end *) (* ==== USAGE: *) module Int_interval = Make_interval(Int);; (* module Int_interval : sig type t = Make_interval(Base.Int).t val create : int -> int -> t val is_empty : t -> bool val contains : t -> int -> bool val intersect : t -> t -> t val t_of_sexp : Sexp.t -> t val sexp_of_t : t -> Sexp.t end *) Int_interval.sexp_of_t (Int_interval.create 3 4);; (* - : Sexp.t = (Interval 3 4) *) Int_interval.sexp_of_t (Int_interval.create 4 3);; (* - : Sexp.t = Empty *)
Use-case: Extending modules
We can use functors to generate type-specific functionality for a given module in a standardised way.
here’s a somewhat skeletal interface for a functional version of a FIFO queue.
- problem: it’s quite skeletal, we could have had many useful helper functions that don’t appear in the interface. For a reference to what some common functions on container types look like, we can see the
Listtype and see things likeList.for_alland other useful helper functions available to us.
| |
and here’s the implementation.
Trick: we maintain an input list and an output list. This allows us to efficiently enqueue on the input list and dequeue from the output list.
The input list is reversed and becomes the new output list.
| |
Improving by using a Foldable module
We don’t want to keep repeating the implementations for the container types.
We acknowledge that many of the helper functions can be derived from the fold function that we already implemented.
Instead of writing all the helper functions, we can define a functor that can add functionality to any of the containers that have a fold function. Foldable module here automates the process of adding helper functions to a fold-supporting container.
- it contains a module signature
Swhich defines the signature that is required to support folding. - contains a functor
Extendthat allows one to extend any module matchingFoldable.S
| |
Now, the Extend functor (which takes in something that satisfies the signature for Foldable) can be used on Fqueue and we would have extended the functionality for Fqueue.
To apply the functor, we shall put the definition of Fqueue in a submodule called T and then call the functor (Foldable.Extend) on T.
This is the interface for the extended version of Fqueue:
| |
This is the application of the functor:
| |
Functors that Base provides
Similar pattern for extension is provided by Base in the form of the following functors:
Container.Make: similar toFoldable.ExtendComparable.Make: comparison function, with support for containers like maps and sets.Hashable.Make: add support for hashing-based datastructures including hash tables, hash sets and hash heaps.Monad.Make: for monadic libraries. Here, functor is used to provide a collection of standard helper functions based on thebindandreturnoperators.