Skip to main content
  1. Readings/
  2. Books/
  3. Real World OCaml: Functional Programming for the Masses/

Chapter 9: Generalized Abstract Data Types

··8558 words·41 mins

GADTs

Generalized Algebraic Data Types (GADTs) are an extension to variants that we’ve seen so far:

Benefits:

  1. more expressive
  2. can create types that match the shape of the intended program more precisely
  3. code can be safer, more concise and more efficient

Unique Features / Mechanisms:

  1. let the compiler learn more type info when we descend into a case of a pattern match
  2. make it easy to use existential types – working with data of a specific but unknown type

Costs:

  1. distinct cost (TODO see below)
  2. harder to use, less intuitive than ordinary variants
  3. can be confusing to figure out how to use them effectively \(\implies\) should only be used when it makes a big qualitative improvement to your design

We learn from some examples for this chapter.

A Little Language #

A good use is to have typed expression languages (like Blang). This is going to be our DSL.

WE wish to create a language here that mixes arithmetic and boolean expressions – we have to deal with the possibility of ill-typed expressions (e.g. expressions that adds a bool and an int). We need type safety to ensure that the compiler will throw errors if such expressions are present. A non-type-safe system would otherwise require us to do dynamic type checking on the type of the input params of the function.

We attempt to use an ordinary variant first:

(* 1. Type defs (interface): *)
open Base

type value =
  | Int of int
  | Bool of bool

type expr =
  | Value of value
  | Eq of expr * expr
  | Plus of expr * expr
  | If of expr * expr * expr

(* 2. Defining a custom exception for ill-typed expressions *)
exception Ill_typed

(* 3. v0: evaluator that had many dynamic checks to detect errors
 , The problem here is that it's possible to create ill-typed expressions which will trip these dynamic checks that we're doing.
 Our objective was to make them work together.
 *)
let rec eval expr =
  match expr with
  | Value v -> v
  | If (c, t, e) ->
    (match eval c with
     | Bool b -> if b then eval t else eval e
     | Int _ -> raise Ill_typed) (*-- dynamic type checking as an attempt to have type safety*)
  | Eq (x, y) ->
    (match eval x, eval y with
     | Bool _, _ | _, Bool _ -> raise Ill_typed
     | Int f1, Int f2 -> Bool (f1 = f2))
  | Plus (x, y) ->
    (match eval x, eval y with
     | Bool _, _ | _, Bool _ -> raise Ill_typed
     | Int f1, Int f2 -> Int (f1 + f2));;
(* val eval : expr -> value = <fun> *)
let i x = Value (Int x)
and b x = Value (Bool x)
and (+:) x y = Plus (x,y);;
(* val i : int -> expr = <fun> *)
(* val b : bool -> expr = <fun> *)
(* val ( +: ) : expr -> expr -> expr = <fun> *)
eval (i 3 +: b false);;
(* Exception: Ill_typed. *)

Here, it’s because of the dynamic type checking in our logic did our type error get thrown here. We wish our language to be type safe

Making the Language Type-Safe #

We’re still trying to make things work with ordinary variants.

A type-safe version of this API maybe include:

  1. ability for the expressions to have a type param to distinguish integer and bool expressions

Signature:

module type Typesafe_lang_sig = sig
  type 'a t

  (** functions for constructing expressions *)

  val int : int -> int t
  val bool : bool -> bool t
  val if_ : bool t -> 'a t -> 'a t -> 'a t
  val eq : 'a t -> 'a t -> bool t
  val plus : int t -> int t -> int t

  (** Evaluation functions *)

  val int_eval : int t -> int
  val bool_eval : bool t -> bool
end

Implementation:

module Typesafe_lang : Typesafe_lang_sig = struct
  (* phantom type *)
  type 'a t = expr (*links the abstract type to the concrete expressions*)

  let int x = Value (Int x)
  let bool x = Value (Bool x)
  let if_ c t e = If (c, t, e)
  let eq x y = Eq (x, y)
  let plus x y = Plus (x, y)

  let int_eval expr =
    match eval expr with
    | Int x -> x
    | Bool _ -> raise Ill_typed

  let bool_eval expr =
    match eval expr with
    | Bool x -> x
    | Int _ -> raise Ill_typed
end

(* attempting an ill-typed expression *)
let expr = Typesafe_lang.(plus (int 3) (bool false));;
(*
Line 1, characters 40-52:
Error: This expression has type bool t but an expression was expected of type
         int t
   Type bool is not compatible with type int
       *)

Notes:

  1. the main trick here is the phantom-type: type 'a t = expr

    SO the 'a (type parameter) is the phantom type because it doesn’t show up in the body definition of t \(\implies\) type param is unused \(\implies\) free to take on any value.

    by the way, the body definition refers to the RHS of the = in the type definition, so since expr doesn’t use 'a, internally 'a t is just implemented as the type expr. So the 'a doesn’t really do much there and is a phantom. This is also called a manifest type definition or aliasing.

    More about the use of “phantom”. Our purpose is to encode extra compile-time type information without affecting runtime representation.

  2. there are two eval functions here, if we had used GADTs, we could have had a single eval expression.

    the function signature differing between the two evals is what allows the compiler to catch the ill-typing (along with the phantom type)

    this still has type checking being done dynamically within the logic though.

  3. this is type-safe because the type-system is what would reject the ill-typed expression. It’s not complete though because in some cases we still can get around it and that’s why the dynamic type checking is still needed in the eval functions:

       let expr = Typesafe_lang.(eq (bool true) (bool false));;
       (* val expr : bool Typesafe_lang.t = <abstr> *)
       Typesafe_lang.bool_eval expr;;
       (* Exception: Ill_typed. *)
    

    the types within the implementation don’t necessarily rule out ill-typed expressions. The same fact explains why we needed two different eval functions: the implementation of eval doesn’t have any type-level guarantee of when it’s handling a bool expression versus an int expression, so it can’t safely give results where the type of the result varies based on the result of the expression.

Trying to do Better with Ordinary Variants #

We persist and try to encode the typing rules within the DSL directly for the expr and value types.

type 'a value =
  | Int of 'a
  | Bool of 'a

type 'a expr =
  | Value of 'a value
  | Eq of 'a expr * 'a expr
  | Plus of 'a expr * 'a expr
  | If of bool expr * 'a expr * 'a expr

implementations make it seem like they work as intended:

let i x = Value (Int x)
and b x = Value (Bool x)
and (+:) x y = Plus (x,y);;
(*
val i : 'a -> 'a expr = <fun>
val b : 'a -> 'a expr = <fun>
val ( +: ) : 'a expr -> 'a expr -> 'a expr = <fun>
*)
i 3;;
(* - : int expr = Value (Int 3) *)
b false;;
(* - : bool expr = Value (Bool false) *)
i 3 +: i 4;;
(* - : int expr = Plus (Value (Int 3), Value (Int 4)) *)

There are some problems to this implementation:

  1. the inner and outer types are always going to be the same, we can’t inter-op with types as we need.
       If (Eq (i 3, i 4), i 0, i 1);;
       (*
       Line 1, characters 9-12:
       Error: This expression has type int expr
              but an expression was expected of type bool expr
              Type int is not compatible with type bool
       *)
    

Key Problem #

The way we want to use the type param isn’t supported by ordinary variants – we want the type param to be populated in different ways in the different tags and to depend in non-trivial ways on the types of the data associated with each tag.

that’s what GADTs are for.

GADTs to the Rescue #

Syntax #

Here’s the GADT type signature.

type _ value =
  | Int : int -> int value
  | Bool : bool -> bool value

type _ expr =
  | Value : 'a value -> 'a expr
  | Eq : int expr * int expr -> bool expr
  | Plus : int expr * int expr -> int expr
  | If : bool expr * 'a expr * 'a expr -> 'a expr
  1. when we see the type declaration, the use of the _ is syntactic sugar for “This type has one parameter, but I’m not going to name it because I don’t need to.” So for type declaration, it’s like saying “I’m ignoring this type name”

    it’s an interesting parallel to the use of _ in the context of pattern matching where _ means “I’m ignoring this value name”.

  2. the colon to the right of the tags tells us that it’s a GADT

  3. consider the definition of a tag: <tag name> : <function signature>.

    GOTCHA: when we say things like “type parameter can be determined by the tag”, it’s NOT JUST the tag name that we are referring to, it’s the annotation typically (i.e. is it algebraic or concrete?)

    1. LHS: the tag name
    2. RHS of : can be seen as a single-arg function type \(\implies\) it’s like the type constructor.
      • LHS of -> :

        • types of the args of the constructor
      • RHS of -> :

        • shows the types of the args of the constructed value
        • is an instance of the type of the overall GADT
  4. The type parameter CAN depend on BOTH the tag and the type of arguments.

    What this really means is “can I tell the type param by looking at the tag?”. This includes the tag name as well as the annotations for that tag. Annotations may imply that the types are concrete OR have some usage of algebraic types. If they’re concrete then we can just know the type based on the tag. Else, we have to resolve what the algebraic type would actually be.

    • Eq is an example where the type param only depends on the tag. That’s why we already have the type concretized to int and bool

      Eq : int expr * int expr -> bool expr

    • If is an example where the type param depends on the args to the tag: i.e. the type param of the If is the type param of the then and else clauses, that’s why we’re using the 'a algebraic type.

      If : bool expr * 'a expr * 'a expr -> 'a expr.

Observations #

Given the GADT above, we have the following code that gives some observations:

(* examples of type safety @ construction *)
(*=============================================  *)
let i x = Value (Int x)
and b x = Value (Bool x)
and (+:) x y = Plus (x,y);;
(*
val i : int -> int expr = <fun>
val b : bool -> bool expr = <fun>
val ( +: ) : int expr -> int expr -> int expr = <fun>
*)
i 3;;
(* - : int expr = Value (Int 3) *)
b 3;;
(*
Line 1, characters 3-4:
Error: This expression has type int but an expression was expected of type
         bool
         *)
i 3 +: i 6;;
(* - : int expr = Plus (Value (Int 3), Value (Int 6)) *)
i 3 +: b false;;
(*
Line 1, characters 8-15:
Error: This expression has type bool expr
       but an expression was expected of type int expr
       Type bool is not compatible with type int
*)

(* GADT-based evaluator function that doesn't need any dynamic type-safety checks: *)
(* ========================================== *)
let eval_value : type a. a value -> a = function
  | Int x -> x
  | Bool x -> x;;
(* NOTE: the value generated for this function matches what our type annotation is!:

 val eval_value : 'a value -> 'a = <fun> *)
let rec eval : type a. a expr -> a = function
  | Value v -> eval_value v
  | If (c, t, e) -> if eval c then eval t else eval e
  | Eq (x, y) -> eval x = eval y
  | Plus (x, y) -> eval x + eval y;;
(* val eval : 'a expr -> 'a = <fun> *)
  1. we see that we’ve enforced type safety. The rules for the type safety are directly encoded in the definition of the expression type. We have a single

    In our attempt without GADTs,we had to do the type-safety enforcement via signature-level restrictions on phantom types. In particular we had to alias the 'a t (abstract type) to expr (concrete type defined prior).

    Showing expression type vs signature-based :

       type _ expr =
         | Value : 'a value -> 'a expr
         | Eq : int expr * int expr -> bool expr
         | Plus : int expr * int expr -> int expr
         | If : bool expr * 'a expr * 'a expr -> 'a expr
    
       (* -- previously, we enforced this type safety with signature-level restrictions on phantom types *)
       module Typesafe_lang : Typesafe_lang_sig = struct
         (* phantom type *)
         type 'a t = expr (*links the abstract type to the concrete expressions -- this is aliasing of the phantom type*)
    
         let int x = Value (Int x)
         let bool x = Value (Bool x)
         let if_ c t e = If (c, t, e)
         let eq x y = Eq (x, y)
         let plus x y = Plus (x, y)
    
         let int_eval expr =
           match eval expr with
           | Int x -> x
           | Bool _ -> raise Ill_typed
    
         let bool_eval expr =
           match eval expr with
           | Bool x -> x
           | Int _ -> raise Ill_typed
       end
    
  2. With GADT, we have a single polymorphic eval function. Previously, we needed to have two concretely typed specific evaluators when using phantom types.

  3. Downside: when using GADTs, the code using them needs extra type annotations:

       let eval_value : type a. a value -> a = function
         | Int x -> x
         | Bool x -> x;;
    
       (* without annotations, we'd have gotten something like the following error:
    
          Error: This pattern matches values of type bool value
              but a pattern was expected which matches values of type int value
              Type bool is not compatible with type int
          *)
    

GADTs, Locally Abstract Types, and Polymorphic Recursion #

We’ve seen the correct type-annotated implementation for eval already. Carrying on from before, we explore the need to have specific syntax for typing when our function is a polymorphic (GADT with abstract types) AND also recursive.

Problem #

eval is both polymorphic and recursive which makes this complex.

  1. polymorphic:

    The reason we need to do that is because OCaml, by default, is not willing to instantiate the ordinary type variables in DIFFERENT ways (as int vs as bool) within the body of the same function but that’s what we want here. Without the specific syntax, there’s some type restriction that happens and it eventually causes the type mismatch that results in the error we see in the preceding code block.

  2. recursive:

    recursive code that is also polymorphic may be prone to typing errors because the compiler may be forced to unify a (the concrete version of the abstract type 'a) across all recursive calls IF there’s a leakage of type from inner context to a more outer context. Could be in the form of an entire concretization of the abstract type or could be just a type restriction applied.

Longer Elaboration:

### Understanding "Escaping of Scope" in the Context of GADTs and Locally Abstract Types

Your notes discuss a complex but important concept in OCaml's type system involving GADTs, polymorphic recursion, and locally abstract types.

***

#### What does "escaping of scope" mean?

- A **type variable (like `'a`) "escapes its scope"** when it is introduced or bound in a *local* context (like inside a pattern match or a locally abstract type annotation) but then tries to leak into a broader context where it was not intended to be visible.

- In OCaml's type-checking, especially with GADTs and polymorphic recursion, **type variables introduced locally in pattern matching or annotations are allowed to be specialized within that local scope only.** They represent *fresh*, *locally abstract* types.

- If, during type-checking, these variables would have to be used or unified with types outside of their defining scope, the compiler reports a scope escape error or a type error, because it breaks the correctness guarantees of the type system.

***

#### How does this relate to your `eval` function example?

- Your function `eval` is recursive and polymorphic:
  ```ocaml
  let rec eval (type a) (e : a expr) : a = ...
  ```
  Here, `(type a)` introduces a *locally abstract type variable* `a` scoped to the function body.

- When recursive calls to `eval` occur, the compiler attempts to unify or merge those recursive calls’ types.

- Without explicit polymorphic annotation (`let rec eval : 'a. 'a expr -> 'a = ...`), the locally abstract type `a` **would leak (escape) from inside the recursive call's scope into the outer function's scope**, forcing the compiler to unify `a` across all recursive calls.

- This escape causes the compiler to expect that `a` always equals a specific type (e.g., `bool`), leading to failures when other branches pass expressions with different types like `int`.

- The fix, as your notes outline, is to give `eval` a **polymorphic (for all `a`) type signature** explicitly to ensure that `a` doesn’t escape — each recursive call is allowed a fresh, independent instantiation of `a`.

***

#### Intuition summary:

| Phrase                 | Meaning                                                                                     |
|------------------------|---------------------------------------------------------------------------------------------|
| *Locally abstract type* | A type variable introduced locally (inside a function, pattern match, or annotation).       |
| *Escaping its scope*    | When such a type variable would have to be valid outside its defining local context.          |
| *Why a problem?*        | Violates assumptions that the type variable is fresh and isolated; type safety breaks down. |
| *How to fix?*           | Use explicit polymorphic recursion syntax so each recursive call gets a fresh type instance. |

***

#### Your notes align strongly with formal explanations found in these references:

- [OCaml Manual Chapter on GADTs (Section on Existential and Locally Abstract Types)](https://ocaml.org/manual/gadts.html)
- [Real World OCaml — GADTs chapter](https://dev.realworldocaml.org/gadts.html)
- Thomas Leonard's article on OCaml’s locally abstract types and polymorphic recursion

***

If you want, I can explain with a detailed example showing the scope of the locally abstract type and how polymorphic recursion fixes the escape problem, so let me know!

[1](https://ocaml.org/manual/5.2/gadts-tutorial.html)
[2](https://dev.realworldocaml.org/gadts.html)
[3](https://discuss.ocaml.org/t/strange-grammar-in-the-real-world-ocaml-book-chapter-gadts/10093)
[4](https://stackoverflow.com/questions/27864200/an-concrete-simple-example-to-demonstrate-gadt-in-ocaml)
[5](https://www.math.nagoya-u.ac.jp/~garrigue/papers/ml2011.pdf)
[6](https://www.cl.cam.ac.uk/teaching/1617/L28/gadts.pdf)
[7](http://pauillac.inria.fr/~remy/gadts/)
[8](https://www.cl.cam.ac.uk/teaching/1415/L28/gadts.pdf)

Locally abstract types as a solution #

This type-restriction is a problem that we desire to “fix” and we do that by adding a locally abstract type that doesn’t have that restriction.

We might try this annotation, but it will fail because eval is recursive in nature and inference of GADTs doesn’t play well with recursive calls.

let eval_value (type a) (v : a value) : a =
  match v with
  | Int x -> x
  | Bool x -> x;;
(* val eval_value : 'a value -> 'a = <fun> *)


let rec eval (type a) (e : a expr) : a =
  match e with
  | Value v -> eval_value v
  | If (c, t, e) -> if eval c then eval t else eval e
  | Eq (x, y) -> eval x = eval y
  | Plus (x, y) -> eval x + eval y;;
(*
  Line 4, characters 43-44:
Error: This expression has type a expr but an expression was expected of type
         bool expr
         Type a is not compatible with type bool
*)

The main problem is that eval is recursive so the type-checker is trying to merge the locally abstract type a into the type of the re cursive function eval and the way that a is escaping its scope (inner to outer) is when merging it into the outer scope within which eval is defined, that’s why it’s expecting type bool expr.

FIX: explicitly marking eval as polymorphic and that’s where the type annotation syntax comes from: let rec eval : 'a. 'a expr -> 'a A = .... IF eval is marked as polymorphic, then the type of eval is not specialised to a so a doesn’t escape its scope.

(* locally abstract syntax + marked as polymorphic  *)
let rec eval : 'a. 'a expr -> 'a =
  fun (type a) (x : a expr) -> (* <-- this is an anon function definition, taking an explicit argument with explicit type annotations*)
   match x with
   | Value v -> eval_value v
   | If (c, t, e) -> if eval c then eval t else eval e
   | Eq (x, y) -> eval x = eval y
   | Plus (x, y) -> eval x + eval y;;
(* val eval : 'a expr -> 'a = <fun> *)

(* without the syntax (code won't work):  *)
let rec eval (type a) (e : a expr) : a =
  match e with
  | Value v -> eval_value v
  | If (c, t, e) -> if eval c then eval t else eval e
  | Eq (x, y) -> eval x = eval y
  | Plus (x, y) -> eval x + eval y;;
  • type variable escaping its scope

    when we have leakage of the concretization of an abstract type from inner to outer context (in the case of a recursive function call, it’s an outer to inner function call and back again the other way).

    By marking eval as polymorphic, then the type of eval is not specialised to a so a doesn’t escape its scope.

  • eval being an example of polymorphic recursion

    Another reason the compiler can’t infer the types automatically.

    for example, with If, since the If itself must be of type bool, but the type of the then and else clauses could be of type int. This means that when evaluating If, we’ll dispatch eval at a different type than it was called on.

    eval needs to see itself as polymorphic

  • using the syntactic sugar

    For Recursive functions that use GADTs, our type annotation should be able to do the following for it to feel good:

    We can combine the two things:

    1. polymorphism annotation and
    2. the creation of the locally abstract types.
    (* this uses the syntactic sugar *)
    let rec eval : type a. a expr -> a = function
      | Value v -> eval_value v
      | If (c, t, e) -> if eval c then eval t else eval e
      | Eq (x, y) -> eval x = eval y
      | Plus (x, y) -> eval x + eval y;;
    (* val eval : 'a expr -> 'a = <fun> *)
    
    (* the more verbose (and correct) way earlier was: *)
    let rec eval : 'a. 'a expr -> 'a =
      fun (type a) (x : a expr) ->
       match x with
       | Value v -> eval_value v
       | If (c, t, e) -> if eval c then eval t else eval e
       | Eq (x, y) -> eval x = eval y
       | Plus (x, y) -> eval x + eval y;;
    (* val eval : 'a expr -> 'a = <fun> *)
    

    The syntax : type a. (with function afterward) tells the compiler:

    • “Here I am explicitly introducing a locally abstract type a which will be instantiated freshly in each match branch or recursive call.”

      • the concretisation will be local to the function body and recursive calls, thereby preventing the type inference leakage
    • This abstracts 'a as a local type variable (a), allowing its different instantiations during recursion or pattern matching.

When are GADTs useful? #

Use case: Varying your Return type #

We are familiar with cases where return types vary based on the input type but usually these involve simple dependencies between types that correspond to the data flow within our code. List.find is a good example.

We might want more flexibility, e.g. for List.find on not found cases, we want to make it such that we can choose what returns.

A typical variant vs a GADT:

(* the GADT version of If_not_found -- the type of our argument that we give to control the return type.*)
module If_not_found = struct
  type (_, _) t =
    | Raise : ('a, 'a) t
    | Return_none : ('a, 'a option) t
    | Default_to : 'a -> ('a, 'a) t
end

(* typical variant, for comparison *)
module If_not_found = struct
  type 'a t =
    | Raise
    | Return_none
    | Default_to of 'a
end

Using the normal variant almost seems like what we want (to control the return type of the function based on our inputs in a non-trivial manner). The problem is that we don’t exactly achieve that because our output is always wrapped in an option:

let rec flexible_find list ~f (if_not_found : _ If_not_found.t) =
  match list with
  | hd :: tl ->
    if f hd then Some hd else flexible_find ~f tl if_not_found
  | [] ->
    (match if_not_found with
    | Raise -> failwith "Element not found"
    | Return_none -> None
    | Default_to x -> Some x);;

(* --- this almost seems to work, except the return type is always wrapped in an option.

val flexible_find :
  'a list -> f:('a -> bool) -> 'a If_not_found.t -> 'a option = <fun>
*)

(* --- demo calls:  note how all of them (other than raised) are option-wrapped*)
flexible_find ~f:(fun x -> x > 10) [1;2;5] Return_none;;
(* - : int option = None *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] (Default_to 10);;
(* - : int option = Some 10 *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] Raise;;
(* Exception: (Failure "Element not found"). *)
flexible_find ~f:(fun x -> x > 10) [1;2;20] Raise;;
(* - : int option = Some 20 *)

Here, our GDAT will work here. In our GDAT, we have two type params:

  • one for the type of the list element, and
  • one for the return type of the function.

These type params will be locally abstract in their definition.

Now, when we implement the flexible_find function, we get the behaviour we really want:

let rec flexible_find
 : type a b. f:(a -> bool) -> a list -> (a, b) If_not_found.t -> b =
 fun ~f list if_not_found ->
  match list with
  | [] ->
    (match if_not_found with
    | Raise -> failwith "No matching item found"
    | Return_none -> None
    | Default_to x -> x)
  | hd :: tl ->
    if f hd
    then (
      match if_not_found with
      | Raise -> hd
      | Return_none -> Some hd
      | Default_to _ -> hd)
    else flexible_find ~f tl if_not_found;;
(* notice that the generated signature matches what our locally abstract type definition is above within the function type annotation:

val flexible_find :
  f:('a -> bool) -> 'a list -> ('a, 'b) If_not_found.t -> 'b = <fun>
 *)

(* --- demo calls *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] Return_none;;
(* - : int option = Base.Option.None *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] (Default_to 10);;
(* - : int = 10 *)
flexible_find ~f:(fun x -> x > 10) [1;2;5] Raise;;
(* Exception: (Failure "No matching item found"). *)
flexible_find ~f:(fun x -> x > 10) [1;2;20] Raise;;
(* - : int = 20 *)

Use Case: Capturing the Unknown #

We’ve seen unknown types being used e.g. in the case of tuples. The abstract type variables may be universally quantified i.e. applies for all types that are provided.

For some cases, we might wish to have a existentially quantified type variables instead. This means the variables are a particular but unknown type. GADTs can help offer a natural way to do this.

(* stringable GADT *)
type stringable =
  Stringable : { value: 'a; to_string: 'a -> string } -> stringable
(* 'a is an arbitrary type here *)

(* printing a stringable  *)
let print_stringable (Stringable s) =
  Stdio.print_endline (s.to_string s.value);;
(* val print_stringable : stringable -> unit = <fun> *)

(* using print_stringable on different types of stringable. *)
let stringables =
  (let s value to_string = Stringable { to_string; value } in
    [ s 100 Int.to_string
    ; s 12.3 Float.to_string
    ; s "foo" Fn.id
    ]);;
(* Generated types:
val stringables : stringable list =
  [Stringable {value = <poly>; to_string = <fun>};
   Stringable {value = <poly>; to_string = <fun>};
   Stringable {value = <poly>; to_string = <fun>}]
*)
List.iter ~f:print_stringable stringables;;

we know that 'a is existentially quantified because it is on the LHS of the arrow but not the RHS \(\implies\) the 'a that appears internally, doesn’t appear in the type param for stringable itself.

this makes 'a (existentially quantified type) bound within the definition of stringable. The type of the underlying object is existentially bound within type stringable, any function that tries to return such a value won’t type-check:

(* this function is trying to return the underlying value (with abstract type 'a). It's not right because it's supposed to be only kept within the stringable. By returning it, the function will be trying to leak the type out of its scope. *)
let get_value (Stringable s) = s.value;;
(*
Line 1, characters 32-39:
Error: This expression has type $Stringable_'a
       but an expression was expected of type 'a
       The type constructor $Stringable_'a would escape its scope
*)

In the error message, the type variable $Stringable_'a has 3 parts:

  1. the $ marks the variable as existential
  2. the Stringable is the name of the GADT tag this variable came from
  3. the 'a is the name of the type variable from inside that tag.

Use case: Abstracting Computational Machines #

GADTs are useful for writing combinators. The combinator pattern:

  1. allow the combination of small components into larger computational machines.
  2. are component-combining functions

Consider a pipeline pattern, a sequence of steps where each consumes the output of the previous step, possibly doing some side-effects and returning a value for the next step. We’re thinking of a custom pipeline, beyond the pipeline operator:

  1. profiling the ability to profile each step of the pipeline and report it

  2. control over execution

    allowing users to pause the execution and restart at a later time

  3. Custom error handling

    pipeline knows where it failed and offers the possibility of restarting

Without Using GADT: concrete combinators #

We can define, build and execute pipelines without GADTs by manually adding in functions that define the empty case (just to seed) and how we can build onto and existing pipeline.

(* signature of Pipeline *)
module type Pipeline = sig
  type ('input,'output) t (* type ('a, 'b) t means input of type 'a output of type 'b *)

  (* for adding a step to the pipeline *)
  val ( @> ) : ('a -> 'b) -> ('b,'c) t -> ('a,'c) t
  (* for seeding the pipeline *)
  val empty : ('a,'a) t
end

module Example_pipeline (Pipeline : Pipeline) = struct
  open Pipeline
  let sum_file_sizes =
    (fun () -> Sys_unix.ls_dir ".")
    @> List.filter ~f:Sys_unix.is_file_exn
    @> List.map ~f:(fun file_name -> (Core_unix.lstat file_name).st_size)
    @> List.sum (module Int) ~f:Int64.to_int_exn
    @> empty
end;;
(*module Example_pipeline :
  functor (Pipeline : Pipeline) ->
    sig val sum_file_sizes : (unit, int) Pipeline.t end
    *)

(* ---- basic pipeline implementation:

   pipeline can be defined as a simple function and we use the @> to compose. Pipeline execution is just function application. *)

module Basic_pipeline : sig
   include Pipeline
   val exec : ('a,'b) t -> 'a -> 'b
 end= struct
  type ('input, 'output) t = 'input -> 'output

  let empty = Fn.id

  let ( @> ) f t input =
    t (f input)

  let exec t input = t input
end

The Basic_pipeline doesn’t do any better than the |> operator though.

To get better functionality, we have 2 choices:

  1. we improve on the pipeline type and give it extra structures, handle exceptions and all the fun stuff

    problem: we have to pre-commit to whatever services we’re supporting and we have to embed them in our pipeline representation

  2. use GADTs to abstractly represent the pipeline we want and then build the functionality we want on top of that

Using GADT #

Instead of concretely building a machine for executing a pipeline, we can use GADTs to abstractly represent the pipeline we want, and then build the functionality we want on top of that representation.

type (_, _) pipeline =
  | Step : ('a -> 'b) * ('b, 'c) pipeline -> ('a, 'c) pipeline (* corresponds to the @> operator *)
  | Empty : ('a, 'a) pipeline (* corresponds to the empty pipeline *)

(*=== 1: basic functionality *)
(* pipeline seed + extend functionality: *)
let ( @> ) f pipeline = Step (f,pipeline);;
(* val ( @> ) : ('a -> 'b) -> ('b, 'c) pipeline -> ('a, 'c) pipeline = <fun> *)
let empty = Empty;;
(* val empty : ('a, 'a) pipeline = Empty *)

(* pipeline exec functionalitlly -- polymorphic recursion:*)
let rec exec : type a b. (a, b) pipeline -> a -> b =
 fun pipeline input ->
  match pipeline with
  | Empty -> input
  | Step (f, tail) -> exec tail (f input);;
(* val exec : ('a, 'b) pipeline -> 'a -> 'b = <fun> *)


(*==== 2: extra functionality example: *)
let exec_with_profile pipeline input =
  let rec loop
      : type a b.
        (a, b) pipeline -> a -> Time_ns.Span.t list -> b * Time_ns.Span.t list
    =
   fun pipeline input rev_profile ->
    match pipeline with
    | Empty -> input, rev_profile
    | Step (f, tail) ->
      let start = Time_ns.now () in
      let output = f input in
      let elapsed = Time_ns.diff (Time_ns.now ()) start in
      loop tail output (elapsed :: rev_profile)
  in
  let output, rev_profile = loop pipeline input [] in
  output, List.rev rev_profile;;

(*
val exec_with_profile : ('a, 'b) pipeline -> 'a -> 'b * Time_ns.Span.t list =
  <fun>
*)

When we extend functionality, the typing doesn’t need to change. It’s the logic in our function that uses GADTs that can change to update functionality.

Benefits of using GADT over having combinators that build a concrete computational machine:

  1. simpler core types that are typically built from GADT tags taht are reflections of the types of the base combinators
  2. more modular design, core types don’t need to contemplate on every possible use you wanna make of them
  3. more efficient code, because the more concrete approach would allocate closures to wrap up the necessary functionality and closures are more heavyweight than GADT tags.

Use Case: Narrowing the Possibilities (set of possible states) #

We can narrow the set of possible states for a given data-type in different circumstances. Useful for managing complex application state.

Here’s how logon requests may be managed in a non-GDAT approach:

type logon_request =
  { user_name : User_name.t
  ; user_id : User_id.t option
  ; permissions : Permissions.t option
  }


let authorized request =
  match request.user_id, request.permissions with
  | None, _ | _, None ->
    Error "Can't check authorization: data incomplete"
  | Some user_id, Some permissions ->
    Ok (Permissions.check permissions user_id);;
(* val authorized : logon_request -> (bool, string) result = <fun> *)

(* PROBLEMs:
   - this works alright for a limited set of functionality, but is hard to manage when there's other types of complications:
   1. needing to manage more fields (including optional fields)
   2. more operations depending on the optional fields
   3. needing to handle multiple parallel requests @ different states of completion
 *)

Tracking the (application) state at the type level

Instead, we can deal with complexity of cases by tracking the state of a given request at the type level and using that to narrow the set of states a given request can be in which helps us remove extra case-analysis and error-handling and reduce the complexity of the code as well as remove opportunities for mistakes.

At the type level, we could:

  1. mint different types for different states of the request (e.g. incomplete request, mandatory request…)

    this becomes verbose

  2. use GADTs to track the state of the request within a type parameter and use the parameter to narrow the set of available cases without duplicating the type.

A completion-sensitive option type #

We can have distinct types that we use as markers of different states. Using the completion states, we can do a completeness-sensitive option type via a GADT where:

  1. one type variable is the type of contents of the option
  2. the second indicates whether it is at an incomplete state
type incomplete = Incomplete
type complete = Complete

type (_, _) coption =
  | Absent : (_, incomplete) coption
  | Present : 'a -> ('a, _) coption

We didn’t use complete here explicitly, only have used incomplete. This is so that only an incomplete coption can be Absent and only a complete coption can be Present.

The examples will show the narrowing:

let get ~default o =
   match o with
   | Present x -> x
   | Absent -> default;;
(* val get : default:'a -> ('a, incomplete) coption -> 'a = <fun> *)
(*^ here, the incomplete type was inferred here (based on the type def of coption*)

(*
[1] this wont compile (if we force the annotation as complete instead of incomplete):
   (*
let get ~default (o : (_,complete) coption) =
  match o with
  | Absent -> default
  | Present x -> x;;

Line 3, characters 7-13:
Error: This pattern matches values of type ('a, incomplete) coption
       but a pattern was expected which matches values of type
         ('a, complete) coption
       Type incomplete is not compatible with type complete
   ,
   *)
 *)


(* [2] we can force it to compile by removing the absent branch from the function *)

let get (o : (_,complete) coption) =
  match o with
  | Present x -> x;;
(* val get : ('a, complete) coption -> 'a = <fun> *)
(* or, more simply written as: *)
let get (Present x : (_,complete) coption) = x;;
(* let get (Present x : (_,complete) coption) = x;; *)

when the coption is known to be complete, the pattern matching is narrowed to just the Present case.

By doing the type annotation on the GADT cases, we restrict what that function can be used for and that’s how we narrowed the type down.

A completion-sensitive request type #

Similar idea but instead of using option-wrapping we’re looking into request wrapping. The general usage is for error-aware code but using GADTs for various cases.

(* completion-sensitive version of logon_request record: *)
type 'c logon_request = (*NOTE: the 'c is the type param that marks completeness state. If the outer request is marked as complete, it covers the inner fields' completeness as well.*)
  { user_name : User_name.t
  ; user_id : (User_id.t, 'c) coption
  ; permissions : (Permissions.t, 'c) coption
  }

(* easy to fill in values for the fields (it's using the with-update syntax below:) *)
let set_user_id request x = { request with user_id = Present x };;
(* val set_user_id : 'a logon_request -> User_id.t -> 'a logon_request = <fun> *)
let set_permissions request x = { request with permissions = Present x };;
(* val set_permissions : 'a logon_request -> Permissions.t -> 'a logon_request = *)
(*   <fun> *)


(* === NOTE: we should have a explicit completeness check, which constructs a version of the record with the completed fields filled in *)

(* --- this is a polymorphic completeness check: includes both complete and incomplete return types *)
let check_completeness request =
  match request.user_id, request.permissions with
  | Absent, _ | _, Absent -> None
  | (Present _ as user_id), (Present _ as permissions) ->
    Some { request with user_id; permissions };;
(*
val check_completeness : incomplete logon_request -> 'a logon_request option =
  <fun>
  *)

(* --- this is a restricted return type (return value explicitly returns a complete request):*)
let check_completeness request : complete logon_request option =
  match request.user_id, request.permissions with
  | Absent, _ | _, Absent -> None
  | (Present _ as user_id), (Present _ as permissions) ->
    Some { request with user_id; permissions };;
(*
val check_completeness :
  incomplete logon_request -> complete logon_request option = <fun>
  *)

And because of that, we can finally write an authorisation checker:

let authorized (request : complete logon_request) =
  let { user_id = Present user_id; permissions = Present permissions; _ } = request in
  Permissions.check permissions user_id;;
(* val authorized : complete logon_request -> bool = <fun> *)

We realise that most of the time, this kind of type narrowing is NOT worth the complexity of setting it up unless our FSM is sufficiently complex – in which case cutting down the possibilities that our code needs to handle / contemplate can make a big difference to the comprehensibility and correctness of the result.

type distinctness and abstraction #

Our completeness types just needed them to have different states marked. We didn’t need to define obviously different types to them. This is because OCaml’s variant types are nominal.

Variant, Record and Abstract types are nominally typed in OCaml.

(* older version: *)
type incomplete = Incomplete (* Incomplete is a tag for the variant incomplete *)
type complete = Complete

(* this works *)
type incomplete = Z (* Z is a tag of a variant type *)
type complete = Z

(* here, Z is the constructor for the variant type *)
let i = (Z : incomplete) and c = (Z : complete);;

(* we can narrow a pattern match using types as indices: *)
type ('a, _) coption =
  | Absent : (_, incomplete) coption
  | Present : 'a -> ('a, _) coption

(* since we mark complete case only, we only need to contemplate the Present case *)
let assume_complete (coption : (_,complete) coption) =
  match coption with
  | Present x -> x;;
(* val assume_complete : ('a, complete) coption -> 'a = <fun> *)

In some ways this type-specific contemplating makes me remember the way we overload names (with different patterns within the parameters) to handle different cases in different functions.

Possible Type Distinctness problem:

the way we expose these types through an interface can cause OCaml to lose track of the distinctness of the types in question. When creating types to act as abstract markers for the type parameter of a GADT, you should choose definitions that make the distinctness of those types clear, and you should expose those definitions in your mli s.

Distinctness problems that give rise to non-exhaustiveness:

  1. In the first example below, we hide the implementation of the marker type so there’s a distinctness problem.

  2. second case, we see that the exposed types are NOT definitively different (they would have been if we defined them as variants with differently named tags [like originally])

    this is especially so because types appearing differently in an interface (the sig) may turn out to be the same in the implementation (see 3rd attempt)

(* ===== 1: the definition of complete and incomplete is hidden *)
module M : sig
  type incomplete (* -- this is what hides the definition*)
  type complete
end = struct
  type incomplete = Z
  type complete = Z
end
include M

type ('a, _) coption =
  | Absent : (_, incomplete) coption
  | Present : 'a -> ('a, _) coption

(*this is no longer exhaustive and it expectes teh Absent case to be matched*)
let assume_complete (coption : (_,complete) coption) =
  match coption with
  | Present x -> x;;
(*
Lines 2-3, characters 5-21:

Warning 8 [partial-match]: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
Absent
val assume_complete : ('a, complete) coption -> 'a = <fun>
 *)

(* =============== Attempt 2: exposing the implementation of the marker types ============ *)
module M : sig
  type incomplete = Z (* -- exposed the definition of the marker types (the variant tag) *)
  type complete = Z
end = struct
  type incomplete = Z
  type complete = Z
end
include M

type ('a, _) coption =
  | Absent : (_, incomplete) coption
  | Present : 'a -> ('a, _) coption

(* Still not exhaustive: *)
let assume_complete (coption : (_,complete) coption) =
  match coption with
  | Present x -> x;;
(*
Lines 2-3, characters 5-21:
Warning 8 [partial-match]: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
Absent
val assume_complete : ('a, complete) coption -> 'a = <fun>
*)


(*===== 3: types appearing as distinct in the interface (the sig) may have the same implementation: *)
module M : sig (*-- sig is the interface*)
  type incomplete = Z
  type complete = Z
end = struct
  type incomplete = Z
  type complete = incomplete = Z (* same implementation *)
end

narrowing without GADTs (using uninhabitable types instead) #

OCaml can eliminate impossible cases from ordinary variants too, not just for GADTs.

The idea is to (just like GADTs) demonstrate that the case in question is impossible at the type level. We need to use an uninhabited type (i.e. type with no associated values). This can be created using a variant with no tags. Base offers a Notion.t as a standard uninhabited type.

PL_DESIGN: This allows us to have refutation cases:

the period in the final case tells the compiler that we believe this case can never be reached, and OCaml will verify that it’s true. In some simple cases, however, the compiler can automatically add the refutation case for you, so you don’t need to write it out explicitly.

(* custom uninhabited type *)
type nothing = | (*nothing is a variant with no tags*)

open Stdio;;

(* here, the Error case contains an uninhabited type (Nothing.t)
 , so we can refute that case in the branches.
 *)
let print_result (x : (int, Nothing.t) Result.t) =
  match x with
  | Ok x -> printf "%d\n" x
  | Error _ -> .;;
(* val print_result : (int, Nothing.t) result -> unit = <fun *)

(* actually this function is pretty simple, the compiler can add the refutation case automatically for us *)
let print_result (x : (int, Nothing.t) Result.t) =
  match x with
  | Ok x -> printf "%d\n" x;;
(* val print_result : (int, Nothing.t) result -> unit = <fun> *)
  • Use-case for uninhabitable types

    When using a highly configurable library that supports multiple different modes of use and we don’t need to use all of them for a given application. (e.g. Async’s RPC lib)

    In this example, we see narrowing being applied to code that isn’t designed with narrowing in mind.

    Async RPCs have a State_rpc flavor of interaction, which is parameterised via 4 types for diff kinds of data:

    1. query: initial client request
    2. state: initial snapshot from server
    3. update: sequence of updates to that snapshot
    4. error: error for stream termination

    Imagine we want to use State_rpc without needing to terminate the stream with a custom error. We could consider 2 options:

    1. so we can instantiate State_rpc using the unit type for the error type. \(\implies\) we’d still need to handle the error case for the dispatching code

         open Core
         open Async
         let rpc =
           Rpc.State_rpc.create
             ~name:"int-map"
             ~version:1
             ~bin_query:[%bin_type_class: unit]
             ~bin_state:[%bin_type_class: int Map.M(String).t]
             ~bin_update:[%bin_type_class: int Map.M(String).t]
             ~bin_error:[%bin_type_class: unit] (*-- using unit type here for error*)
             ()
      
      
         let dispatch conn =
           match%bind Rpc.State_rpc.dispatch rpc conn () >>| ok_exn with
           | Ok (initial_state, updates, _) -> handle_state_changes initial_state updates
           | Error () -> failwith "this is not supposed to happen";;
         (* val dispatch : Rpc.Connection.t -> unit Deferred.t = <fun> *)
      
    2. using an uninhabited type for the error \(\implies\) narrowed down the cases and the use of the error type is just banned.

         let rpc =
           Rpc.State_rpc.create
             ~name:"foo"
             ~version:1
             ~bin_query:[%bin_type_class: unit]
             ~bin_state:[%bin_type_class: int Map.M(String).t]
             ~bin_update:[%bin_type_class: int Map.M(String).t]
             ~bin_error:[%bin_type_class: Nothing.t]  (*--- using Nothing.t (uninhabited type) *)
             ()
      
      
         let dispatch conn =
           match%bind Rpc.State_rpc.dispatch rpc conn () >>| ok_exn with
           | Ok (initial_state, updates, _) -> handle_state_changes initial_state updates;;
         (* val dispatch : Rpc.Connection.t -> unit Deferred.t = <fun> *)
      

Limitations of GADTs #

We saw the utility and the complexity. Now it’s time to see some limitations and how to navigate around them.

Limitation: Or-patterns #

GADTs don’t work well with or-patterns.

Or-patterns won’t work if we merge cases together if we end up using the type information that is discovered during the pattern match (the concretisation of the locally abstract type).

This is annoying but not a deal-breaker because we can just extract out the logic into a function and call the same function for the different arms that we wanted to merge via or-patterns anyway.

open Core
module Source_kind = struct
  type _ t =
    | Filename : string t
    | Host_and_port : Host_and_port.t t
    | Raw_data : string t
end

let source_to_sexp (type a) (kind : a Source_kind.t) (source : a) =
  match kind with
  | Filename -> String.sexp_of_t source
  | Host_and_port -> Host_and_port.sexp_of_t source
  | Raw_data -> String.sexp_of_t source;;
(*-- NOTE: type a here is a locally abstract type*)

(* val source_to_sexp : 'a Source_kind.t -> 'a -> Sexp.t = <fun> *)


(* -- this won't work because it relies on the concretisation of 'a (even if we as humans know that a will only ever be string based on the Source_kind) *)
let source_to_sexp (type a) (kind : a Source_kind.t) (source : a) =
  match kind with
  | Filename | Raw_data -> String.sexp_of_t source
  | Host_and_port -> Host_and_port.sexp_of_t source;;
(*
Line 3, characters 47-53:
Error: This expression has type a but an expression was expected of type
         string
*)

(* Exceptional case: when we don't use the concretised form of 'a then we can still use or-patterns. *)
let requires_io (type a) (kind : a Source_kind.t) =
  match kind with
  | Filename | Host_and_port -> true
  | Raw_data -> false;;
(* val requires_io : 'a Source_kind.t -> bool = <fun> *)

TODO Deriving Serializers #

*NOTE: I skipped this, my brain got fried. 🧠

Serialisers like [@@derive sexp] won’t work with GADTs.

LIMITATIONS: There’s a dependency between the value of an argument and the type of the returned value, which is something that OCaml’s type system can’t express. Therefore, in the code below, it’s not possible for the

type _ number_kind =
  | Int : int number_kind
  | Float : float number_kind
[@@deriving sexp];;

(*
Lines 1-4, characters 1-20:
Error: This expression has type int number_kind
       but an expression was expected of type a__007_ number_kind
       Type int is not compatible with type a__007_
*)