Chapter 6: Variants

Table of Contents

Basics:
- Variants, Tuples and Parens
Catch-All Cases & Refactoring
Combining Records and Variants
- Embedded Records
Variants & Recursive Data Structures
Polymorphic Variants
Polymorphic variants and Catch-all Cases
- Use case of using polymorphic variants: Terminal Colors Redux
- Packaging the code up
When to Use Polymorphic Variants

Variants are sum types of multiple forms of data, all of them are tagged explicitly. The value this gives is that we can represent complex data and organise the case-analysis of that information.

Basics: #

Each case within the variant has an associated tag and they are constructors because they can construct the concrete value. Optionally, it may have a sequence of fields.

The following example uses variants as simple tags with no associated data (optional sequence of fields omitted), effectively as Enums, a simple representation:

  open Base
  open Stdio
  type basic_color =
    | Black | Red | Green | Yellow | Blue | Magenta | Cyan | White;;

  let basic_color_to_int = function
    | Black -> 0 | Red     -> 1 | Green -> 2 | Yellow -> 3
    | Blue  -> 4 | Magenta -> 5 | Cyan  -> 6 | White  -> 7;;

  let color_by_number number text =
  Printf.sprintf "\027[38;5;%dm%s\027[0m" number text;;

  let blue = color_by_number (basic_color_to_int Blue) "Blue";;

  printf "Hello %s World!\n" blue;; (* the output should be "Hello Blue World" where "Blue" is in the colour blue.*)

For more expressiveness, tags can have arguments which describe the data available in each case. Variants may have multiple arguments:

  (* bg context: we're trying to represent colours within the terminal. *)
  type weight = Regular | Bold
  type color =
    | Basic of basic_color * weight (* basic colors, regular and bold *)
    | RGB   of int * int * int      (* 6x6x6 color cube *)
    | Gray  of int                  (* 24 grayscale levels *)
  ;;

  (* here's a list of colors we can create: *)
  [RGB (250,70,70); Basic (Green, Regular)];;

  (* now we can have a to_int that handles all the cases *)
  let color_to_int = function
    | Basic (basic_color,weight) ->
       let base =
         match weight with
           Bold -> 8
         | Regular -> 0 in
       base + basic_color_to_int basic_color
    | RGB (r,g,b) -> 16 + b + g * 6 + r * 36
    | Gray i -> 232 + i;;

  let color_print color s =
  printf "%s\n" (color_by_number (color_to_int color) s);;

  color_print (Basic (Red,Bold)) "A bold red!";;
  color_print (Gray 4) "A muted gray...";;

Variants, Tuples and Parens #

The differences between a multi-argument variant and a variant containing a tuple are mostly about performance. A multi-argument variant is a single allocated block in memory, while a variant containing a tuple requires an extra heap-allocated block for the tuple. KIV memory representation of values in Chapter 23.

The multi-argument variant-case constructor and tuple constructor looks very similar BUT they are not the same. The multi-arg variant constructor expects multiple arguments, if we supply a 3-valued tuple to it, it would be seen as supplying a singular argument.

The parens is what makes the difference and it’s subtle.

(* demonstrating the difference: *)
RGB(200, 0, 200);; (* these are 3 separate arguments that we are passing! *)
let purple = (200,0,200);;
(* this will error out because the tuple is passed as a single arg: *)


(* V1: tupled constructor: *)
type tupled = Tupled of (int * int);;
let of_tuple x = Tupled x;;
let to_tuple (Tupled x) = x;;

(* V2: untupled constructor *)
type untupled = Untupled of int * int;;
let of_tuple' x = Untupled x;;
let to_tuple' (Untupled x) = x;;

(* V3: similar to V1, because we've destructured it (though the type is differently tagged) *)
let of_tuple'' (x,y) = Untupled (x,y);;
let to_tuple'' (Untupled (x,y)) = (x,y);;

Catch-All Cases & Refactoring #

OCaml’s type system helps us make it easier to trace dependencies when refactoring, so we can just trace based on what the compiler says.

Our objective should be to write our code in a way that maximizes the compiler’s chances of helping you find the bugs. To this end, a useful RULE OF THUMB is to avoid catch-all cases in pattern matches.

Problem: catch-alls suppress exhaustiveness checks: catch-alls interfere with the compiler’s ability to do exhaustion checks

(* suppose we wronte a function like so: *)
let oldschool_color_to_int = function
  | Basic (basic_color,weight) ->
    let base = match weight with Bold -> 8 | Regular -> 0 in
    base + basic_color_to_int basic_color
  | _ -> basic_color_to_int White;; (*uses catch-all case*)

(* suppose we change our color variant type *)
type color =
  | Basic of basic_color     (* basic colors *)
  | Bold  of basic_color     (* bold basic colors *) (*<-- this part is new*)
  | RGB   of int * int * int (* 6x6x6 color cube *)
  | Gray  of int             (* 24 grayscale levels *)

(* Then, because of the catch-all usage, we won't know about the missing case for Bold *)

Combining Records and Variants #

Algebraic Data Types are more of a category:

product types – mathematically similar to Cartesian Products

tuples

records: can be thought of as conjunctions (ANDs of multiple types within the record)

    module Time_ns = Core.Time_ns
    module Log_entry = struct
      type t =
        { session_id: string;
          time: Time_ns.t;
          important: bool;
          message: string;
        }
    end
    module Heartbeat = struct
      type t =
        { session_id: string;
          time: Time_ns.t;
          status_message: string;
        }
    end
    module Logon = struct
      type t =
        { session_id: string;
          time: Time_ns.t;
          user: string;
          credentials: string;
        }
    end
    ;;

sum types – similar to disjoint unions

variants: it’s a OR of multiple types

    type client_message = | Logon of Logon.t
                          | Heartbeat of Heartbeat.t
                          | Log_entry of Log_entry.t

    (* consider this long-winded code (the session id extraction has repeated lines) *)
    let messages_for_user user messages =
      let (user_messages,_) =
        List.fold messages ~init:([], Set.empty (module String))
          ~f:(fun ((messages,user_sessions) as acc) message ->
            match message with
            | Logon m ->
              if String.(m.user = user) then
                (message::messages, Set.add user_sessions m.session_id)
              else acc
            | Heartbeat _ | Log_entry _ ->
              let session_id = match message with
                | Logon     m -> m.session_id
                | Heartbeat m -> m.session_id
                | Log_entry m -> m.session_id
              in
              if Set.mem user_sessions session_id then
                (message::messages,user_sessions)
              else acc
          )
      in
      List.rev user_messages;;
    (* we can improve this by defining the types better *)

So, the power comes from combining both these types – layered combinations. Here’s how we can refactor the message types:

(* the distinct message types should hold fields that are unique to them *)
module Log_entry = struct
  type t = { important: bool;
             message: string;
           }
end
module Heartbeat = struct
  type t = { status_message: string; }
end
module Logon = struct
  type t = { user: string;
             credentials: string;
           }
end

(* we can store a record that contains the fields that are common across all messages *)
module Common = struct
  type t = { session_id: string;
             time: Time_ns.t;
           }
end

(* which allows us to define a common variant type for details *)
type details =
  | Logon of Logon.t
  | Heartbeat of Heartbeat.t
  | Log_entry of Log_entry.t

(* Now, the same function is simpler because we don't need to handle the specific ways of extracting the session id: *)
let messages_for_user user (messages : (Common.t * details) list) =
  let (user_messages,_) =
    List.fold messages ~init:([],Set.empty (module String))
      ~f:(fun ((messages,user_sessions) as acc) ((common,details) as message) ->
        match details with
        | Logon m ->
          if String.(=) m.user user then
            (message::messages, Set.add user_sessions common.session_id)
          else acc
        | Heartbeat _ | Log_entry _ ->
          if Set.mem user_sessions common.session_id then
            (message::messages, user_sessions)
          else acc
      )
  in
  List.rev user_messages;;

(* this design is also useful for us to match on the message type directly *)
let handle_message server_state ((common:Common.t), details) =
  match details with
  | Log_entry m -> handle_log_entry server_state (common,m)
  | Logon     m -> handle_logon     server_state (common,m)
  | Heartbeat m -> handle_heartbeat server_state (common,m);;

Embedded Records #

We could have directly put in the record types within the variant definition. These are inline records, help us be more concise and more efficient than free-standing record types being referenced – they don’t require a separate allocation object for the contents of the variant.

type details =
  | Logon     of { user: string; credentials: string; }
  | Heartbeat of { status_message: string; }
  | Log_entry of { important: bool; message: string; }

(* inlining didn't change the existing function, we're matching on the tag names within *)
let messages_for_user user (messages : (Common.t * details) list) =
  let (user_messages,_) =
    List.fold messages ~init:([],Set.empty (module String))
      ~f:(fun ((messages,user_sessions) as acc) ((common,details) as message) ->
        match details with
        | Logon m ->
          if String.(=) m.user user then
            (message::messages, Set.add user_sessions common.session_id)
          else acc
        | Heartbeat _ | Log_entry _ ->
          if Set.mem user_sessions common.session_id then
            (message::messages, user_sessions)
          else acc
      )
  in
  List.rev user_messages;;

(* We won't be able to access the tag directly because they are no longer free-standing record types *)

let get_logon_contents = function
  | Logon m -> Some m
  | _ -> None;;

Variants & Recursive Data Structures #

Variants commonly used for recursive datastructures!

Following example helps illustrate how to define a boolean expression language, it’s similar in design to the Blang (boolean language).

type 'a expr =
  | Base  of 'a (* this is the base predicate type; we use the polymorphic type 'a*)
  | Const of bool
  | And   of 'a expr list
  | Or    of 'a expr list
  | Not   of 'a expr

The tags here are easy to understand, the Base tag is the special one. It describes what can be tested, so it’s the base predicate. That’s why it’s described as the predicate that “ties the expr to your application”.

Let’s use the context of a mail processor, where we might wish to DEFINE filter expressions and then EVALUATE them. We might also wish to SIMPLIFY boolean expressions.

(* consider mail processor use-case *)
type mail_field = To | From | CC | Date | Subject
type mail_predicate = { field: mail_field;
                        contains: string } (* this is what we want our base predicate to be*)

(* DEFINE mail_predicate expr: *)
let test field contains = Base { field; contains };; (* here, mail_predicate is the base predicate, so test, see the type of this value of test:*)
(* val test : mail_field -> string -> mail_predicate expr = <fun>*)

And [
        Or [ test To "doligez"; test CC "doligez" ];
        test Subject "runtime";
    ];;

(* EVALUATE mail_predicate expr:
 , this example just needs to be supplied a base_eval function, the rest of it is straightforward.
 *)
let rec eval expr base_eval =
  (* a shortcut, so we don't need to repeatedly pass [base_eval]
     explicitly to [eval]
     - this is a local helper, it aliases partially applied eval to the same base_eval
     *)
  let eval' expr = eval expr base_eval in
  match expr with
  | Base  base  -> base_eval base
  | Const bool  -> bool
  | And   exprs -> List.for_all exprs ~f:eval'
  | Or    exprs -> List.exists  exprs ~f:eval'
  | Not   expr  -> not (eval' expr);;

Carrying onto SIMPLIFICATION routine: We need to define how the different expressions may be simplified, so we need to rely on some of simplification functions that can be used for our simplification routine.

let and_ l =
  if List.exists l ~f:(function Const false -> true | _ -> false)
  then Const false (* reduces to Const false if any of the arms of And are false*)
  else
    match List.filter l ~f:(function Const true -> false | _ -> true) with
    | [] -> Const true (* Drops all the constant true*)
    | [ x ] -> x (* Drop the And if it only has one arm*)
    | l -> And l (* No arms ==> reduce to Const true*)

let or_ l =
  if List.exists l ~f:(function Const true -> true | _ -> false) then Const true
  else
    match List.filter l ~f:(function Const false -> false | _ -> true) with
    | [] -> Const false
    | [x] -> x
    | l -> Or l

(* for the not_, we shouldn't use catch-alls*)
let not_ = function
  | Const b -> Const (not b)
  | Not e -> e (* this is the double negation case, which simplies it*)
  | (Base _ | And _ | Or _) as e -> Not e;;
  (* | e -> Not e (\* Don't use catch-alls*\) *)

let rec simplify = function
  | Base _ | Const _ as x -> x
  | And l -> and_ (List.map ~f:simplify l)
  | Or l  -> or_  (List.map ~f:simplify l)
  | Not e -> not_ (simplify e);;

How we might use it:

simplify (Not (And [ Or [Base "it's snowing"; Const true];
Base "it's raining"]));;

Polymorphic Variants #

Polymorphic variants are more flexible and syntactically more lightweight than ordinary variants but they have a distinct cost to using them.

Interestingly, in OOP subtyping terminology we see this variance showing parallels to be like so:

Notation	Meaning	Subtyping Direction	Analogy
`[> ...]`	“These tags or more”	Supertype (covariant)	“open upwards”
`[< ...]`	“These tags or less”	Subtype (contravariant)	“open downwards”
(no marker)	“exactly these tags”	Invariant	“closed fixed”

KIV this until we come to the “Objects” part. I think it’s safe to treat this parallel to subtyping as just a mental model for now.

Here’s a covariant example:

let three = `Int 3;; (*`Int is the tag, it's type will get inferred*)
let four = `Float 4;;
let nan = `Not_a_number;;

(* we can make create a list of this, which now represents a set of tags *)
[three; four; nan];;
(* the type for this is a covariant:
  - : [> `Float of float | `Int of int | `Not_a_number ] list = *)

(* we can still create a new type with the same tag *)
(* let five = `Int "five";; *)
(* but we may not use it with clashing interpretations of their type inference *)
(* [three;four;five] (*this fails*)  *)
(* let foo = [four; nan; five];; (\*this won't fail*\) *)

Here’s a contravariant example:

let is_positive = function
  | `Int   x -> x > 0
  | `Float x -> Float.(x > 0.);;
(* this is a contravariant collection: so "these tags or less"
   this checks out because is_positive doesn't know how to deal with any other cases -- it can handle types with one or more of the cases identified here.

  val is_positive : [< `Float of float | `Int of int ] -> bool = <fun> *)

Here’s a invariant example:

(* Consistent types for polymorphic variant tags *)
let three = `Int 3;;
let four = `Float 4.;;
let nan = `Not_a_number;;

(* This list combines tags with consistent payloads *)
let valid_list = [three; four; nan];;

(* This will fail to compile due to incompatible payloads *)
(* let invalid_list = [three; `Int "five"];; *)

(* Proper function to handle known tags only *)
let is_positive = function
  | `Int x -> x > 0
  | `Float x -> x > 0.
  | `Not_a_number -> false
;;

(* Usage with explicit type annotation for disambiguation *)
let exact = List.filter ~f:is_positive [three; four];;
(* this is the resultant type:
   val exact : [ `Float of float | `Int of int ] list = [`Int 3; `Float 4.] *)

Interestingly we can do both upper and lower bounding of the variant types:

let is_positive = function
  | `Int   x -> Ok (x > 0)
  | `Float x -> Ok Float.O.(x > 0.)
  | `Not_a_number -> Error "not a number";;

(* gives the following resultant val:
   val is_positive :
   [< `Float of float | `Int of int | `Not_a_number ] -> (bool, string) result = <fun> *)

Using polymorphic variants can get confusing because of the type that is actually inferred.

Polymorphic variants and Catch-all Cases #

In ordinary variants, catch-all cases are already error-prone (because of the loss of exhaustiveness-checks) – in polymorphic types, catch-alls are even worse.

So _ catch all will lowerbound the type. This becomes a problem if we have typos:

let is_positive_permissive = function
  | `Int   x -> Ok Int.(x > 0)
  | `Float x -> Ok Float.(x > 0.)
  | _ -> Error "Unknown number type";;


is_positive_permissive (`Ratio (3,4));;

with the typo (Float -> Floot), we realise that it will fall to that catch-all instead of the tag being caught (as an Error) as an unknown tag (as would have been the case of ordinary variants).

is_positive_permissive (`Floot 3.5);;

RULE OF THUMB: As a general matter, one should be wary about mixing catch-all cases and polymorphic variants.

Use case of using polymorphic variants: Terminal Colors Redux #

There’s still practical value to using polymorphic types. Suppose we want to have an alpha-channel extension to colors, if we define a new variant type with a new arm for RGBA, it’s not good enough because the compiler would see the two variants (old and extended) as two different, incompatible variant types.

What we need is to share tags between two different variant types and this is what polymorphic variants are useful for.

let basic_color_to_int = function
  | `Black -> 0 | `Red     -> 1 | `Green -> 2 | `Yellow -> 3
  | `Blue  -> 4 | `Magenta -> 5 | `Cyan  -> 6 | `White  -> 7;;

let color_to_int = function
  | `Basic (basic_color,weight) ->
    let base = match weight with `Bold -> 8 | `Regular -> 0 in
    base + basic_color_to_int basic_color
  | `RGB (r,g,b) -> 16 + b + g * 6 + r * 36
  | `Gray i -> 232 + i;;

let extended_color_to_int = function
  | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
  | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color;;

NOTE: extend_color_to_int needs to invoke color_to_int with a narrower type, so it has to be contravariant. The use of explicit or-cases within the pattern match helps here. To be elaborate, the narrowing happens because `RGBA never will be passed to color_to_int. However, if we were to do a catch-all branch instead, then this narrowing doesn’t happen and so compilation would fail.

Packaging the code up #

The interface:

open Base

type basic_color =
  [ `Black   | `Blue | `Cyan  | `Green
  | `Magenta | `Red  | `White | `Yellow ]

type color =
  [ `Basic of basic_color * [ `Bold | `Regular ]
  | `Gray of int
  | `RGB  of int * int * int ]

type extended_color =
  [ color
  | `RGBA of int * int * int * int ]

val color_to_int          : color -> int
val extended_color_to_int : extended_color -> int

the implementation :

open Base

type basic_color =
  [ `Black   | `Blue | `Cyan  | `Green
  | `Magenta | `Red  | `White | `Yellow ]

type color =
  [ `Basic of basic_color * [ `Bold | `Regular ]
  | `Gray of int
  | `RGB  of int * int * int ]

type extended_color =
  [ color
  | `RGBA of int * int * int * int ]

let basic_color_to_int = function
  | `Black -> 0 | `Red     -> 1 | `Green -> 2 | `Yellow -> 3
  | `Blue  -> 4 | `Magenta -> 5 | `Cyan  -> 6 | `White  -> 7

let color_to_int = function
  | `Basic (basic_color,weight) ->
    let base = match weight with `Bold -> 8 | `Regular -> 0 in
    base + basic_color_to_int basic_color
  | `RGB (r,g,b) -> 16 + b + g * 6 + r * 36
  | `Gray i -> 232 + i

let extended_color_to_int = function
  | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
  | `Grey x -> 2000 + x (* <<<<<<<< didactic error <<<<*)
  | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color

Some notes about this:

the .ml file has exact variants only, this is what allows the pattern matching to be possible

Consider the case where we make a typo for some reason and add in a new polymorphic type as case as per the error seen above

why won’t it error out?
All that happened was that the compiler inferred a wider type for extended_color_to_int, which happens to be compatible with the narrower type that was listed in the mli. As a result, this library builds without error.

how can we prevent this?

Adding an explicit type annotation to the code itself will help because the compiler will know enough.

        (* this will NOT error out: *)
        let extended_color_to_int = function
          | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
          | `Grey x -> 2000 + x
          | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color

        (* this will error out [which is what we want]: *)
        let extended_color_to_int : extended_color -> int = function
          | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
          | `Grey x -> 2000 + x
          | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color

given that we have the typedefs, we can use the name of the type explicitly for the type narrowing by prefixing with #

It is useful when you want to narrow down to a type whose definition is long, and you don’t want the verbosity of writing the tags down explicitly in the match.

   (* terse version *)
   let extended_color_to_int : extended_color -> int = function
     | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
     | #color as color -> color_to_int color

   (* this works too *)
   let extended_color_to_int : extended_color -> int = function
     | `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
     | (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color

When to Use Polymorphic Variants #

RULE OF THUMB: in general, regular variants are more pragmatic
Safe use case:
- Probably the safest and most common use case for polymorphic variants is where ordinary variants would be sufficient but are syntactically too heavyweight. For example, you often want to create a variant type for encoding the inputs or outputs to a function, where it’s not worth declaring a separate type for it. Polymorphic variants are very useful here, and as long as there are type annotations that constrain these to have explicit, exact types, this tends to work well.
Costs of using polymorphic variants:
1. complexity
  confusing type inference may make it harder to read the code / debug this. Error messages may get really long as well.
  Indeed, concision at the value level is often balanced out by more verbosity at the type level.
2. error-finding
  type-safe but typing discipline not too good.
3. efficiency
  it’s a little less efficient
  polymorphic variants are somewhat heavier than regular variants, and OCaml can’t generate code for matching on polymorphic variants that is quite as efficient as what it generated for regular variants.