Skip to main content
  1. Readings/
  2. Books/
  3. Real World OCaml: Functional Programming for the Masses/

Chapter 3: Lists and Patterns

··2248 words·11 mins

Lists and Patterns

Basic notes #

Just some points to revisit the tour that we already had

  1. cons operator :: is right-associative. The attachment of things is consistent with the knowledge that OCaml lists are singly-linked lists.

    this is why the extension doesn’t modify the existing list, it’s more of that the new element is “prepended” and has a refernce to the rest of the list.

  2. the empty list [] is polymorphic even though the overall list is NOT.

  3. conventional naming:

    • follow the naming of hd and tl for head and tail of list.

Patterns to extract data from list #

match expressions can let-bind new names #

Match expressions can be used to bind new variables. This may lead to problems where we think that we’re reusing a variable within the arm of a match case but it’s actually going to shadow that variable instead.

LIMITATION: In the elixir world, this is why we have the use of the pin operator but in OCaml, there’s no similar construct

(* example of how match expressions are used to bind new variables *)
let rec sum l =
  match l with
  | [] -> 0
  | hd :: tl -> hd + sum tl;;

(* GOTCHA: here, it's not a reference but a new declaration, hence the older variable gets shadowed:  *)
let rec drop_value l to_drop =
  match l with
  | [] -> []
  | to_drop :: tl -> drop_value tl to_drop (* this is not a referene, it's a re-declaration of to_drop *)
  | hd :: tl -> hd :: drop_value tl to_drop;; (* this case is the same as the 2nd case above.*)

the fix to this is to use guards or if statements, which would give the same function signature:

let rec drop_value l to_drop =
  match l with
  | [] -> []
  | hd :: tl when hd = to_drop -> drop_value tl to_drop
  | hd :: tl -> hd :: drop_value tl to_drop
;;

let rec drop_value l to_drop =
  match l with
  | [] -> []
  | hd :: tl ->
    let new_tl = drop_value tl to_drop in
    if hd = to_drop then new_tl else hd :: new_tl;;

Limitations and Blessings of Pattern Matching #

Pattern matching is structural \(\implies\) we shouldn’t try to encode arbitrary conditions within the pattern. (That’s what the when conditions are for.)

The conditions we use pattern matching for should only be structurally relevant.

blessing: Performance #

  • match cases are entries in a jump table, so they are efficient.

    a good comparison is between the performance for match cases and an equivalent if else statement:

    WITNESS: here’s how we can do a quick benchmark using core_bench module

    
      (* this gives a jump table. *)
      let plus_one_match x =
        match x with
        | 0 -> 1
        | 1 -> 2
        | 2 -> 3
        | 3 -> 4
        | 4 -> 5
        | 5 -> 6
        | _ -> x + 1;;
    
      (* this is sequential, gives a conditional branch tree to navigate *)
      let plus_one_if x =
        if      x = 0 then 1
        else if x = 1 then 2
        else if x = 2 then 3
        else if x = 3 then 4
        else if x = 4 then 5
        else if x = 5 then 6
        else x + 1;;
    
    
    
      #require "core_bench";;
      open Core_bench;;
    
      [ Bench.Test.create ~name:"plus_one_match" (fun () ->
            plus_one_match 10)
      ; Bench.Test.create ~name:"plus_one_if" (fun () ->
            plus_one_if 10) ]
      |> Bench.bench;;
    
    Estimated testing time 20s (2 benchmarks x 10s). Change using '-quota'.
    ┌────────────────┬──────────┬────────────┐
    │ Name           │ Time/Run │ Percentage │
    ├────────────────┼──────────┼────────────┤
    │ plus_one_match │  35.45ns │     74.59% │
    │ plus_one_if    │  47.53ns │    100.00% │
    └────────────────┴──────────┴────────────┘
    

blessing: Error Detection #

Some examples of what we’ve already seen

  • the exhaustiveness of match expression-cases is checked by the compiler
  • redundancies of cases are also checked

These work by patterned-context, not by some algo that checks the predicates (doesn’t exist).

What we will see soon:

  • the exhaustiveness checks in the context of variants and user defined types
  • the usefulness of using match statement warnings when we want to chase a chain of refactor-effects – the ball of yarn will just show itself!!

List module #

Similar to elixir, the default modules offer good APIs for us to use.

Something to note in this section is that we will see a bunch of composed funtions such as List.concat_map which composes map and concat. Like in our understanding of functions (which is right-associative) any composed function op x_y means apply y then x

Here’s an example set of code where we:

  1. use List.map and List.map2_exn

    • the _exn is the OCaml way of doing throwable function calls, like in Elixir where we have strict operations appended with ! to the function name.
  2. use String functions

    1. String.concat which are better for large strings. Using the ^ operator will generate multiple sub-strings first – becomes a problem for largers strings.
    2. String.make
  3. use List.fold which is just a more general version of reduce function in OCaml syntax

    List.fold walks over the list from left to right, updating the accumulator at each step and returning the final value of the accumulator when it’s done.

       (* example usage for summing a list *)
       List.fold ~init:0 ~f:(+) [1;2;3;4];;
    

    Note:

    1. accumulator need NOT be the same time as the elements of the list
            List.fold ~init:[] ~f:(fun acc hd -> hd :: acc) [1;2;3;4];;
      

The example is one where we try to render a table of values (i.e. a 2D array). So our table rendering function’s code would look something like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
(* 1: find out the max widths for the headers based on the values that are in the columns *)
let max_widths header rows =
  let lengths l = List.map ~f:String.length l in
  List.fold rows
    ~init:(lengths header)
    ~f:(fun acc row ->
        List.map2_exn ~f:Int.max acc (lengths row));; (* The _exn is there because the function throws an exception if the lists are of mismatched length:*)

(* 2: function that renders the header-values separator *)
let render_separator widths =
  let pieces = List.map widths
      ~f:(fun w -> String.make w '-')
  in
  "|-" ^ String.concat ~sep:"-+-" pieces ^ "-|";;

(* 3. to pad a string to a specified length *)
let pad s length =
  s ^ String.make (length - String.length s) ' ';;

(* 4. now we can render a row of data: *)
let render_row row widths =
  let padded = List.map2_exn row widths ~f:pad in
  "| " ^ String.concat ~sep:" | " padded ^ " |";;

(* 5. Combine altogether to render the table: *)
let render_table header rows =
  let widths = max_widths header rows in
  String.concat ~sep:"\n"
    (render_row header widths
     :: render_separator widths
     :: List.map rows ~f:(fun row -> render_row row widths)
    );;

(* applying it for the example *)
Stdio.print_endline
  (render_table
     ["language";"architect";"first release"]
     [ ["Lisp" ;"John McCarthy" ;"1958"] ;
       ["C"    ;"Dennis Ritchie";"1969"] ;
       ["ML"   ;"Robin Milner"  ;"1973"] ;
       ["OCaml";"Xavier Leroy"  ;"1996"] ;
]);;

(* Here's what the output looks like:

| language | architect      | first release |
|----------+----------------+---------------|
| Lisp     | John McCarthy  | 1958          |
| C        | Dennis Ritchie | 1969          |
| ML       | Robin Milner   | 1973          |
| OCaml    | Xavier Leroy   | 1996          |
- : unit = ()
 *)
| language | architect      | first release |
|----------+----------------+---------------|
| Lisp     | John McCarthy  | 1958          |
| C        | Dennis Ritchie | 1969          |
| ML       | Robin Milner   | 1973          |
| OCaml    | Xavier Leroy   | 1996          |

Combining Elements with List.reduce #

It’s a more specific case, of fold, without the init param supplied.

Result is wrapped in an optional.

Filtering with List.filter, List.filter_map #

  • the case for filtering then mapping is common enough for List.filter_map
      (* list of file extensions from a list of files, returning deduped and in sorted order. *)
      let extensions filenames =
        List.filter_map filenames ~f:(fun fname ->
            match String.rsplit2 ~on:'.' fname with
            | None  | Some ("",_) -> None
            | Some (_,ext) ->
              Some ext)
        |> List.dedup_and_sort ~compare:String.compare;;
    
      extensions ["foo.c"; "foo.ml"; "bar.ml"; "bar.mli"];
    

Partitioning with List.partition_tf #

The _tf is a helpful mnemonic that reminds us that the output is two lists: first list is with the elements where the predicate returns true and the second list is the one for falses.

Multiple ways to combine lists #

We can append (which can be done via List.append or operator @) or we can concat via List.concat (something more like flattening a list of lists).

Since concat and map functions are often used together (same idea as flatmaps), there’s a dedicated List.concat_map for us to use

#require "core_unix.sys_unix";;
module Sys = Core.Sys
module Filename = Core.Filename;;

(* example without concat_map *)
let rec ls_rec s =
  if Sys_unix.is_file_exn ~follow_symlinks:true s
  then [s]
  else
    Sys_unix.ls_dir s
    |> List.map ~f:(fun sub -> ls_rec (Filename.concat s sub))
    |> List.concat;; (*flattens*)

(* example with concat_map *)
let rec ls_rec s =
  if Sys_unix.is_file_exn ~follow_symlinks:true s
  then [s]
  else
    Sys_unix.ls_dir s
    |> List.concat_map ~f:(fun sub -> ls_rec (Filename.concat s sub));;
val ls_rec : string -> string list = <fun>

Tail-call Optimisation: Writing tail recursive functions #

Classic stack machine used so there’s a limit to the stack frames, but it’s easy to write out tail recursive functions.

(* Not tail-recursive: recursive call wrapped inside addition *)
let rec length = function
  | [] -> 0
  | _ :: tl -> 1 + length tl  (* not tail position *)
;;

(* Tail-recursive helper with accumulator `n` *)
let rec length_plus_n l n =
  match l with
  | [] -> n
  | _ :: tl -> length_plus_n tl (n + 1)  (* tail position *)

(* Wrapper function to hide accumulator *)
let length l = length_plus_n l 0

Tail-call optimisation is not new to me but here’s another short rundown:

  • in length_plus_n, the recursive call for that function is a tail call \(\implies\) no need to allocate new stack frame

    • tail-call when the caller of the function doesn’t have to do anything withh the returned value (doesn’t use it) that is returned by the callee, other than to return it. This is why there’s no need to preserve that stack frame (of the caller) \(\implies\) compiler can make this tail call optimisation.
  • idioms (to transform non-TCO to TCO style):

    1. tail calls = recursive call is the “final action”, no other important logic happens after

    2. accumulator and helper usage:

      it’s just a clean way of doing things so that we can isolate out non-tail calls from tail-calls within naive recursive functions \(\implies\) naturally we end up with tail-called helpers

  • if all calls are tail calls, then we’ve done tail-call optimisation correctly.

Terser / Faster Patterns #

using the as pattern to alias the matched pattern #

Typically if we add in name, a new object for the matching type is created. We can save space by using as to get handle to the matched value.

The as pattern lets you bind a name to the whole matched value in addition to deconstructing it.

let rec remove_sequential_duplicates list =
  match list with
  | [] | [_] as l -> l  (* or-pattern to collapse the handling *)
  | first :: (second :: _ as tl) -> (* see the matched object _ is what we have a direct handle to (instead of creating a copy.)  *)
    if first = second then
      remove_sequential_duplicates tl
    else
      first :: remove_sequential_duplicates tl;;

(* SLOW, BAD EXAMPLE *)
let rec remove_sequential_duplicates list =
  match list with
  | [] -> []
  | [x] -> [x]
  | first :: second :: tl ->
    if first = second then
      remove_sequential_duplicates (second :: tl)
    else
      first :: remove_sequential_duplicates (second :: tl);;

using when for precondition checks #

continuing with the example, the similarity check was being done via an if-else. We could shift it to a when guard.

let rec remove_sequential_duplicates list =
  match list with
  | [] | [_] as l -> l
  | first :: (second :: _ as tl) when first = second ->
    remove_sequential_duplicates tl
  | first :: tl -> first :: remove_sequential_duplicates tl;;

polymorphic comparison #

This is going to be so magical.

The idea here is that the Base’s default equality op is integers-only. If we use a polymorphic version, then we will be able to use that function we had for any type.

That’s what Base.Poly helps us do.

open Base.Poly

let rec remove_sequential_duplicates list =
  match list with
  | [] | [_] as l -> l
  | first :: (second :: _ as tl) when first = second ->
    remove_sequential_duplicates tl
  | first :: tl -> first :: remove_sequential_duplicates tl;;

remove_sequential_duplicates ["one";"two";"two";"two";"three"];;

LIMITATION: you can’t build these functions (polymorphic comparators) on your own. OCaml’s polymorphic comparison functions are built into the runtime to a low level. These comparisons are polymorphic on the basis of ignoring almost everything about the types of the values that are being compared, paying attention only to the structure of the values as they’re laid out in memory.

WITNESS: typically OCaml devs don’t like polymorphic comparison because it can introduce silent bugs that will be hard to catch. Restrictions are a good thing because they give predictability to our code.

NOTE: when guards may harm the compiler’s exhaustiveness check because the pattern matching and logical predicate (for when clause) is being directly used.