Skip to main content
  1. Readings/
  2. Books/
  3. Real World OCaml: Functional Programming for the Masses/

Chapter 8: Imperative Programming

··7014 words·33 mins

Imperative Programming

The book makes the distinction between functional and imperative by emphasising that imperative programming and its modification of a program’s internal state is the action of interacting with changeable parts of the world.

Additionally, there’s also an argument that some algorithms can be implemented in imperative fashion hence a performance benefit to imperative code; examples:

  1. in-place sorting algos
  2. graph algos that rely on mutable data structures
  3. dynamic programming algos
  4. numeric linear algebra / Tamil
  5. real-time, embedded system algorithms
  6. low-level system programming

This chapter deals with OCaml’s support for the imperative programming paradigm.

Toy Example: imperative dictionaries #

This is just a toy example, not for actual use.

An open hashing scheme, where the hash table will be an array of buckets, each bucket containing a list of key/value pairs that have been hashed into that bucket.

Interface:

open Base

(* dictionary with keys of type 'a and data of tyep 'b' *)
type ('a, 'b) t

val create
  :  hash:('a -> int)
  -> equal:('a -> 'a -> bool)
  -> ('a, 'b) t

val length : ('a, 'b) t -> int
val add : ('a, 'b) t -> key:'a -> data:'b -> unit (* the imperative functions typically return unit() as a way of signalling side-effects *)
val find : ('a, 'b) t -> 'a -> 'b option
val iter : ('a, 'b) t -> f:(key:'a -> data:'b -> unit) -> unit
val remove : ('a, 'b) t -> 'a -> unit

Implementation:

open Base

(* 1: define the dict as a record *)
type ('a, 'b) t =
  { mutable length : int
  ; buckets : ('a * 'b) list array
  ; hash : 'a -> int (* hashing function kept within the record itself *)
  ; equal : 'a -> 'a -> bool (* equality function kept within the record itself *)
  }

(* 2: basic functions to manipulate a dictionary  *)
let num_buckets = 17 (* bucket array is fixed length because of this const *)
(** Chooses the position in the array that a key should be stored at*)
let hash_bucket t key = t.hash key % num_buckets
(* this is the ini; binds the hashing and equality functions *)
let create ~hash ~equal =
  { length = 0
  ; buckets = Array.create ~len:num_buckets []
  ; hash
  ; equal
  }

let length t = t.length
let find t key =
  List.find_map (* returns the first Some or None if all are None in the array*)
    t.buckets.(hash_bucket t key) (* imperative syntax: reading value *)
    ~f:(fun (key', data) ->
      if t.equal key' key then Some data else None)

(* 3: iter implementation -- works by side-effect, walks through values in a given bucket *)
let iter t ~f =
  for i = 0 to Array.length t.buckets - 1 do
    List.iter t.buckets.(i) ~f:(fun (key, data) -> f ~key ~data)
  done
  (* since this works by side-effect, it returns a unit() *)


(* 4: handle the adding and removing of mappings from the dictionary *)
let bucket_has_key t i key =
  List.exists t.buckets.(i) ~f:(fun (key', _) -> t.equal key' key)

let add t ~key ~data =
  let i = hash_bucket t key in
  let replace = bucket_has_key t i key in
  let filtered_bucket =
    if replace
    then
      List.filter t.buckets.(i) ~f:(fun (key', _) ->
          not (t.equal key' key))
    else t.buckets.(i)
  in
  t.buckets.(i) <- (key, data) :: filtered_bucket; (* note the sequence of operations, which is controlled by the use of ; *)
  if not replace then t.length <- t.length + 1 (* mutable update on the lengthh if it's a new addition (vs a replacement) *)

let remove t key =
  let i = hash_bucket t key in
  if bucket_has_key t i key
  then (
    let filtered_bucket =
      List.filter t.buckets.(i) ~f:(fun (key', _) ->
          not (t.equal key' key))
    in
    t.buckets.(i) <- filtered_bucket;
    t.length <- t.length - 1)

Some useful notes:

  1. for loops are suitable for imperative contexts, we should prefer those even if recursive ways to express that may exist.
  2. some mutable operators shown in the example:
    • ; sequencing operator, helps us control the sequence of imperative actions. let-bindings would have worked too but for imperative code, this is more idiomatic

           <expr1>;
           <expr2>;
           ...
           <exprN>
      
             (* ANDD  *)
      
           let () = <expr1> in
           let () = <expr2> in
           ...
           <exprN>
      

      are equivalent!

    • <- mutable update operator. works for:

      • elements of an array (array.(i) <- expr)
      • updating record field (record.field <- expression)
  3. RULE OF THUMB: it’s good practice to leave all the side-effecting operations to the end of the function – minimises the chance of exceptions leaving corrupted state

Primitive Mutable Data #

A walkthrough of the mutable primitives in OCaml

Array-like #

Array like means mutable integer-indexed containers that provide constant-time access to elements

Ordinary Arrays #

General array, we can create it using a literal syntax [|1;2;3|]

Some usual operations:

  • setting value: use the dot operator

  • reading/retrieving value: use the dot and assignment operator

  • blit: this a sliced copy which allows us to copy over a subarray to another subarray

    It’s closer in functionality to c’s memcopy or memmove rather than pythons sliced-copy. This is because out-of-bounds exceptions can happen if we’re not careful about array boundaries. Additionally, the copy-over overwrite behaviour happens safely, so careful about that.

    NOTE: the signature of this function is a little odd compared to other languages like python: Array.blit source source_start destination destination_start length and it’s not labelled arguments

      (* Create arrays *)
      let src = [|1; 2; 3; 4; 5|];;
      let dst = [|0; 0; 0; 0; 0|];;
    
      (* Copy 3 elements from src (starting at index 1) to dst (starting at index 2) *)
      Array.blit src 1 dst 2 3;;
    
      (* dst now becomes *)
      dst;;
    

Bytes and Strings #

To represent a character, a single byte width (8 bit character) is what we require.

Bytes and Strings are very similar.

let b = Bytes.of_string "foobar";;
(* val b : bytes = "foobar" *)
Bytes.set b 0 (Char.uppercase (Bytes.get b 0));;
(* - : unit = () *)
Bytes.to_string b;;
(* - : string = "Foobar" *)
  • Bytes: can be seen as char arrays
    • each entry (of char) is 8-bytes long
    • is mutable
    • still somewhat space efficient
  • Strings:
    • each entry (of char) is 1-byte long
    • is immutable
    • more space-efficient than bytes

Bigarrays #

for handling memory blocks outside of OCaml’s heap, this allows interoperability between languages of other libraries. KIV until the memory representation part.

Generalised syntax for this:

<bigarray_expr>.{<index_expr>}
<bigarray_expr>.{<index_expr>} <- <value_expr>

Mutable Record and Object Fields and Ref Cells #

For mutable records, it’s something we’ve seen before. Take note that the record itself is immutable but field(s) within the record may be mutable.

Objects are similar KIV until Chapter 12.

Ref cells #

like we’ve seen before, we might want to have an accumulator value to which we keep changing its state and ref is useful for that.

Under the hood, it’s just a record with a single mutable field (contents) and it has some syntactic sugar that makes it easier to work with refs.

(* Create/Initialise *)
let x = ref 1;;
(* val x : int ref = {Base.Ref.contents = 1} *)

(* Access!!! *)
!x;;
(* - : int = 1 *)

(* Assign!!! *)
x := !x + 1;;
(* - : unit = () *)

!x;;
(* - : int = 2 *)

Foreign Functions #

Foreign functions allow ocaml to use imperative constructs used by syscalls or external libraries (e.g. write syscall, clock).

KIV chapter 22.

For and while loops #

Both for and while loops aren’t really necessary because we could have otherwise written out a recursive version. It’s more idiomatic to use them in situations where the code is imperative.

for loops:

  • the bounds are both open bounds so the upper and lower are inclusive.
  • we can iterative in the reverse direction by using downto
open Stdio;;
for i = 0 to 3 do printf "i = %d\n" i done;;
(* i = 0 *)
(* i = 1 *)
(* i = 2 *)
(* i = 3 *)
(* - : unit = () *)

for i = 3 downto 0 do printf "i = %d\n" i done;;
(* i = 3 *)
(* i = 2 *)
(* i = 1 *)
(* i = 0 *)
(* - : unit = () *)

while loop:

  • pretty much the same order of evaluation of expressions as other languages
let rev_inplace ar =
  let i = ref 0 in
  let j = ref (Array.length ar - 1) in
  (* terminate when the upper and lower indices meet *)
  while !i < !j do
    (* swap the two elements *)
    let tmp = ar.(!i) in
    ar.(!i) <- ar.(!j);
    ar.(!j) <- tmp;
    (* bump the indices *)
    Int.incr i; (* <-- this is the builtin idiomatic way of incrementing an int ref *)
    Int.decr j (* likewise, for decrementing int ref *)
  done;;
(* val rev_inplace : 'a array -> unit = <fun> *)
let nums = [|1;2;3;4;5|];;
(* val nums : int array = [|1; 2; 3; 4; 5|] *)
rev_inplace nums;;
(* - : unit = () *)
nums;;
(* - : int array = [|5; 4; 3; 2; 1|] *)

Example: Doubly-linked lists #

Yet again, this is just a didactic example, for actual purposes, use Doubly_linked module within Core.

Some notes on the interface / implementation:

  1. Elements:

    • Elements act as pointers to the interior of a list and allow us to navigate the list and give us a point at which to apply mutating operations.
    • We implement them as records. Records will have some optional mutable fiends – at the start of the list, the prev will be None and at the end of the list, the next will be None
  2. in the element-wise modification functions, notice how match is surrounded with parens. This is because the precedence of match is very low. There’s a need to separate that expression from the thereafter assignment operation.

    Without this separation, the final assignment expression would have become part of the None case.

    an alternative was to use begin...end.

  3. the implementation of remove shows why imperative code is tricky and sometimes can be fragile.

    • in remove, if we are removing the first element in the list, then we update the list pointer itself.

    • in the actual implementations, the edge cases are handled by error detection and error correction logic.

    • example of problematic actions:

      • double-removing an element will cause the main list reference to be set to None, thus emptying the list.
      • Similar problems arise from removing an element from a list it doesn’t belong to.

      RULE OF THUMB: for imperative data-structures, use the libraries as much as possible and if we don’t have one and need to implement it, give extra attention to the error handling.

Here’s the interface

open Base

(* 1: there are two types defined here *)
type 'a t (* type of the list *)
type 'a element (* type of an element within the list *)

(** Basic list operations *)

val create : unit -> 'a t
val is_empty : 'a t -> bool

(** Navigation using [element]s *)

val first : 'a t -> 'a element option
val next : 'a element -> 'a element option
val prev : 'a element -> 'a element option
val value : 'a element -> 'a

(** Whole-data-structure iteration *)

val iter : 'a t -> f:('a -> unit) -> unit
val find_el : 'a t -> f:('a -> bool) -> 'a element option

(** Mutation *)

val insert_first : 'a t -> 'a -> 'a element
val insert_after : 'a element -> 'a -> 'a element
val remove : 'a t -> 'a element -> unit

Here’s our implementation:

open Base
(* 1: we define the types first:  *)

(* an ='a element= is a record *)
type 'a element =
  { value : 'a (* const value *)
  ; mutable next : 'a element option (* mutable  *)
  ; mutable prev : 'a element option
  }

(* the tye of the list will be a mutable reference to a an optional element (the first element) *)
type 'a t = 'a element option ref

(* 2: We can impement some basic functions on the list and on elements *)
let create () = ref None
let is_empty t = Option.is_none !t
let value elt = elt.value
let first t = !t
let next elt = elt.next
let prev elt = elt.prev


(* 3: adding list modification functions *)
let insert_first t value =
  let new_elt = { prev = None; next = !t; value } in

  (* note: precedence of match is low, that's why we wrap it in parens*)
  (match !t with
  | Some old_first -> old_first.prev <- Some new_elt (* adding in the reverse pointer *)
  | None -> ()); (* being added to empty list*)
  t := Some new_elt; (* mutable assign to t*)
  new_elt

(* 4: add in element-relative functions *)
let insert_after elt value =
  let new_elt = { value; prev = Some elt; next = elt.next } in
  (match elt.next with
  | Some old_next -> old_next.prev <- Some new_elt
  | None -> ());
  elt.next <- Some new_elt;
  new_elt

let remove t elt =
  let { prev; next; _ } = elt in
  (match prev with
  | Some prev -> prev.next <- next
  | None -> t := next); (* if there isn't any previous, then the list pointer itself is updated.*)
  (match next with
  | Some next -> next.prev <- prev
  | None -> ());
  (* adjust pointers for the removed element *)
  elt.prev <- None;
  elt.next <- None

(* 5: iteration functions -- implemented using simple recursive loops*)
let iter t ~f =
  let rec loop = function
    | None -> ()
    | Some el ->
      f (value el); (* the function is applied to the value of the element *)
      loop (next el) (* the loop function is recursively called on the next element from the current element*)
  in
  loop !t (* invokes loop on the first element *)

let find_el t ~f =
  let rec loop = function
    | None -> None
    | Some elt -> if f (value elt) then Some elt else loop (next elt)
  in
  loop !t

Cyclic Data Structures #

Doubly-linked lists are cyclic because it is possible to follow a nontrivial sequence of pointers that closes in on itself. In general, building cyclic data structures requires the use of side effects. This is done by constructing the data elements first, and then adding cycles using assignment afterward.

Cyclic data structures (general purpose):

  • we need to use side-effects to create them
  • we should construct the elements first
  • then we should add the cycles

An rare exception: we can use let rec for fixed-size cyclical data structures.

let rec endless_loop = 1 :: 2 :: 3 :: endless_loop;;
(* val endless_loop : int list = [1; 2; 3; <cycle>] *)

Modifying the List #

Take note on the implementation point that the match expression has to be wrapped around

Iteration Functions #

These are the map, iter, fold functions (refer to the code block above)

  • iter: the goal of which is to call a unit-producing function on every element of the list, in order
  • find_el: runs a provided test function on each value stored in the list, returning the first element that passes the test

Laziness and other benign effects #

Benign Effects: there are cases where we want to be pure by default and use some limited imperative side-effects that give us performance improvements.

benign effect: laziness #

There’s a need to wrap up lazy operations with the lazy keyword. The application of that operation has to be explicitly forced out via Lazy.force.

open Stdio

let v = lazy (print_endline "performing lazy computation"; Float.sqrt 16.);;
print_endline "hello world";
(* it won't actually be applied until we force the application: *)
Lazy.force v;;

Here’s a didactic example of how we may implement laziness:

type 'a lazy_state =
  | Delayed of (unit -> 'a)
  | Value of 'a
  | Exn of exn;;
(* type 'a lazy_state = Delayed of (unit -> 'a) | Value of 'a | Exn of exn *)

type 'a our_lazy = { mutable state : 'a lazy_state };;
(* type 'a our_lazy = { mutable state : 'a lazy_state; } *)

let our_lazy f = { state = Delayed f };;
(* val our_lazy : (unit -> 'a) -> 'a our_lazy = <fun> *)

(* we've wrapped the thunk up here *)
let v =
  our_lazy (fun () -> (* -- this is the thunk *)
    print_endline "performing lazy computation"; Float.sqrt 16.);;
(* val v : float our_lazy = {state = Delayed <fun>} *)

(* here's our attempt at forcing out the function evaluation *)
let our_force l =
  match l.state with
  | Value x -> x
  | Exn e -> raise e
  | Delayed f ->
    try
      let x = f () in
      l.state <- Value x;
      x
    with exn ->
      l.state <- Exn exn;
      raise exn;;
(* val our_force : 'a our_lazy -> 'a = <fun> *)

Notes:

  1. we can create a lazy value from a thunk (which is a nullary function).

    NOTE: in the context of functional programming languages, a thunk is a nullary function. In others (e.g. js, it’s more generalised as a deferred computation that may or may not take in an argument)

Memoization and Dynamic Programming #

Our favourite first-reach optimisation technique :)

CAUTION: a memoized function by nature leaks memory. As long as we hold onto the memoized function, we’re holding every result it has returned thus far.

here’s an example of how a memoisation function may be implemented by us:

let memoize m f =
  let memo_table = Hashtbl.create m in
  (fun x ->
     Hashtbl.find_or_add memo_table x ~default:(fun () -> f x));;
(* val memoize : 'a Hashtbl.Key.t -> ('a -> 'b) -> 'a -> 'b = <fun> *)
  • useful for top down recursive algos

    useful for efficiently implementing some recursive algos

    Examples:

    1. Let’s use a didactic fib function as a way to illustrate how to memoize:

          open Base
      
          (* -- helper timer function: *)
          let time f =
            let open Core in
            let start = Time.now () in
            let x = f () in
            let stop = Time.now () in
            printf "Time: %F ms\n" (Time.diff stop start |> Time.Span.to_ms);
            x
      
          (* val time : (unit -> 'a) -> 'a = <fun> *)
      
          (* slow, not memoized function *)
          let rec fib i =
          if i <= 1 then i else fib (i - 1) + fib (i - 2);;
          (* val fib : int -> int = <fun> *)
      
          (* NOTE: the tricky part is that we need to insert the memoization BEFORE the recursive calls within =fib=  *)
          let memo_fib = memoize (module Int) fib;;
      
          (* val fib : int -> int = <fun> *)
          time (fun () -> memo_fib 40);;
          (* Time: 18174.5970249 ms *)
          (* - : int = 102334155 *)
          time (fun () -> memo_fib 40);;
          (* Time: 0.00524520874023 ms *)
          (* - : int = 102334155 *)
      

      To make things better, we should write fib in a way that unwinds the recursion.

      We explore that re-write first, then look at how to make it memoized:

      In this example, we pass in a function that gets called before the usual recursive call (but this won’t have the memoisation yet):

         (* not the recursive version *)
         let fib_norec fib i =
           if i <= 1 then i
           else fib (i - 1) + fib (i - 2);;
      
         (* turning it back to an ordinary Fib function by adding in the recursive knot:  *)
         let rec fib i = fib_norec fib i;;
      
         (* here's a polymorphic variant: *)
         let make_rec f_norec =
           let rec f x = f_norec f x in
           f;;
         (* NOTE:
            1. the function f_norec passed in to make_rec is a function that isn’t recursive but takes as an argument a function that it will call.
      
            2. so, make_rec essentially feeds f_norec to itself -- which makes it a true recursive function
      
            *)
      
         fib 20;;
      

      Now we try to make make_rec such that we can memoize when it ties the recursive knot. We’re using a reference here as a way to tie the recursive knot without using let rec (doesn’t work here)

         (* our objective is to pass this function into a memoize function *)
         let fib_norec fib i =
           if i <= 1 then i
           else fib (i - 1) + fib (i - 2);;
      
         (* key point here is how we let bind the recursive function within *)
         let make_rec f_norec =
           let rec f x = f_norec f x in
           f;;
         (* val make_rec : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun> *)
      
         let memo_rec m f_norec x =
           let fref = ref (fun _ -> assert false) in
           let f = memoize m (fun x -> f_norec !fref x) in
           fref := f;
           f x;;
         (* memo_rec's signature is almost the same as make_rec, except for the accepting of a module param for the hashtable type.
      
         val memo_rec : 'a Hashtbl.Key.t -> (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b =
           <fun>
         *)
      
         (* now we can finally get a memoised version of fib *)
         let fib = memo_rec (module Int) fib_norec;;
         (* val fib : int -> int = <fun> *)
         time (fun () -> fib 40);;
         (*
         Time: 0.121355056763 ms
      ​   - : int = 102334155
         *)
      

      The code block above teaches us how to avoid the memory leak by defining an inner scope for memoized function calls and allowing the local scope to be the reason why garbage collection happens correctly and the memo-table doesn’t cause a memory leak.

      In the code above, the memory behaviour is important for us to understand correctly:

      1. only when fib is called, then the final argument to memo_rec (which is x, the param for fib) is presented and only then is the memoize function called

        Because the result of that call falls out of scope when fib returns, this is what makes memo_rec avoid a memory leak (since the memo table is garbage-collected after the computation completes)

      2. so, calling memo_rec (module Int) fib_norec does not trigger the call to memoize yet until the last param is binded.

        in fact, we could have done a single line call:

              let memo_rec m f_norec x =
                let fref = ref (fun _ -> assert false) in
                let f = memoize m (fun x -> f_norec !fref x) in
                fref := f;
                f x;;
        
              let fib = memo_rec (module Int) (
                            fun fib i ->
                            if i <= 1 then 1 else fib (i - 1) + fib (i - 2));;
              (* val fib : int -> int = <fun> *)
        

      NOTE: org-babel won’t preserve the table and so we won’t see the outputs showing proof that memoization has happened.

    2. Levenshtein edit distance between 2 strings:

      similar to the fib function, we need to memoize before the inner recursive call is made to the edit_distance function and to do that, we need it to take a pair of strings as a single argument (since our memo_rec defined earlier only works on single-argument functions). We also need to ensure that we can hash the pair of strings.

      we an use ppx-jane which has some meta-functions that can help us derive hash-functions and equality tests instead of writing them out ourselves.

         (* 1: using the deriver *)
         #require "ppx_jane";;
         module String_pair = struct
           type t = string * string [@@deriving sexp_of, hash, compare]
         end;;
         (* module String_pair : *)
         (*   sig *)
         (*     type t = string * string *)
         (*     val sexp_of_t : t -> Sexp.t *)
         (*     val hash_fold_t : Hash.state -> t -> Hash.state *)
         (*     val hash : t -> int *)
         (*     val compare : t -> t -> int *)
         (*   end *)
      
         (* 2: now we can use memo_rec for edit_distance and memoise it properly *)
         let edit_distance = memo_rec (module String_pair)
           (fun edit_distance (s,t) ->
              match String.length s, String.length t with
              | (0,x) | (x,0) -> x
              | (len_s,len_t) ->
                (* 1 operation to drop suffix in either case *)
                let s' = String.drop_suffix s 1 in
                let t' = String.drop_suffix t 1 in
                let cost_to_drop_both =
                  if Char.(=) s.[len_s - 1] t.[len_t - 1] then 0 else 1
                in
                (* we shall get the smallest of the lot *)
                List.reduce_exn ~f:Int.min
                  [ edit_distance (s',t ) + 1
                  ; edit_distance (s ,t') + 1
                  ; edit_distance (s',t') + cost_to_drop_both
         ]);;
         (* val edit_distance : String_pair.t -> int = <fun> *)
         time (fun () -> edit_distance ("OCaml 4.09","ocaml 4.09"));;
         (* Time: 0.964403152466 ms *)
      

Limitations of let rec #

LIMITATION: There are limits on what can appear on the RHS of a let rec, such as not allowing let rec x = x + 1 to work. There are only 3 constructs that can show up on the RHS of a let rec:

  1. a function definition
  2. a constructor
  3. the lazy keyword

This is good and bad in the sense that our memo_rec can’t be implemented using let rec but it also helps us avoid nonsensical cases for the compiler.

NOTE, PL-DESIGN: Haskell is lazy and such compiler-restrictions don’t show up.

Also laziness is more constrained than explicit mutation, so in some cases might lead to code that is easier to reason with.

(* we used let rec here *)
let make_rec f_norec =
  let rec f x = f_norec f x in
  f;;

(* we didn't use let rec here *)
let memo_rec m f_norec x =
  let fref = ref (fun _ -> assert false) in
  let f = memoize m (fun x -> f_norec !fref x) in
  fref := f;
  f x;;
(* memo_rec's signature is almost the same as make_rec, except for the accepting of a module param for the hashtable type.*)

let lazy_memo_rec m f_norec x =
  let rec f = lazy (memoize m (fun x -> f_norec (force f) x)) in
  (force f) x;;
(* generated type:

  val lazy_memo_rec : 'a Hashtbl.Key.t -> (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b =
  <fun>
  *)
time (fun () -> lazy_memo_rec (module Int) fib_norec 40);;
(* Time: 0.181913375854 ms *)
(* - : int = 102334155 *)


(* we can't try to force out a lazy version just to use let rec because a lazy value can't try to force itself as part of its own evaluation (without changing memo_rec into a lazy version) *)
let rec x = lazy (force x + 1);;
(* val x : int lazy_t = <lazy> *)
force x;;
(* Exception: Lazy.Undefined. *)

Input and Output #

To elaborate on the “imperative” terminology, any function that doesn’t boil down to a deterministic transformation from its arguments to its return value is imperative in nature. That includes not only things that mutate your program’s data, but also operations that interact with the world outside of your program.

So IO such as:

  • file IO (includes terminal IO)
  • socket IO / network IO

Here we focus on OCaml’s buffered I/O library:

  1. only two types:
    • in_channel: for reading
    • out_channel: for writing
  2. IO interfaces:
    • core library deals with files and terminals
    • Unix module can be used to create other channels

Terminal I/O #

The 3 channels (stdin, stdout, stderr) are available at the top level of Core’s namespace directly (we don’t need to go through In_channel and Out_channel modules).

There’s a chance that this code example is deprecated, I’m not going to investigate further on this.

(* supposed to be in a .ml file, compiled by dune *)
open Core

let () =
  Out_channel.output_string stdout "Pick a timezone: ";
  Out_channel.flush stdout; (* flushes the buffer to actually write to the channel*)
  match In_channel.(input_line stdin) with
  | None -> failwith "No timezone provided" (* In_channel.input_line returns an option -- None means end of input stream e.g. EOL *)
  | Some zone_string ->
    let zone = Time_unix.Zone.find_exn zone_string in
    let time_string = Time.to_string_abs (Time.now ()) ~zone in
    Out_channel.output_string stdout
      (String.concat
         ["The time in ";Time_unix.Zone.to_string zone;" is ";time_string;".\n"]);
    Out_channel.flush stdout (* flush it again to force the printing -- good habit, though not necessary since it's end of program*)

notes:

  1. since out_channel s are buffered, we need to flush it to get the Out_channel.output_string to

Formatted Output with printf #

printf in OCaml is special compared to C’s printf because it’s type-safe!

What kind of control can printf give?

  • alignment and padding
  • escape-strings
  • formatting of numbers (decimal, hex, binary?)
  • precision of float numbers

Functions similar to printf for other outputs:

  • eprintf for stderr
  • fprintf for arbitrary out_channel
  • sprintf that returns a formatted string

Understanding format strings #

Type-safe printf: The compiler checks that the types referred to by the format string match the types of the rest of the args passed to printf

This analysis of contents happens at compile-time \(\implies\) format string needs to be available as a string literal @ compile-time – it’s for this reason that the compiler complains if we pass in a string variable. We’d need to otherwise annotate that type so that a string literal is inferred as a format string

let fmt = "%i is an integer\n";;
(* val fmt : string = "%i is an integer\n" *)
printf fmt 3;;
(*
  Line 1, characters 8-11:
Error: This expression has type string but an expression was expected of type
         ('a -> 'b, Stdio.Out_channel.t, unit) format =
           ('a -> 'b, Stdio.Out_channel.t, unit, unit, unit, unit) format6
 ,
 *)

(* if we type it correctly then it will work*)
open CamlinternalFormatBasics;;
let fmt : ('a, 'b, 'c) format =
"%i is an integer\n";;
printf fmt 3;;

Here’s the time-conversion program updated using printf, some notes:

  1. flushing the channel works via %! format string when using printf:

    printf "The time in %s is %s.\n%!" (Time_unix.Zone.to_string zone) time_string

open Core

let () =
  printf "Pick a timezone: %!";
  match In_channel.input_line In_channel.stdin with
  | None -> failwith "No timezone provided"
  | Some zone_string ->
    let zone = Time_unix.Zone.find_exn zone_string in
    let time_string = Time.to_string_abs (Time.now ()) ~zone in
    printf "The time in %s is %s.\n%!" (Time_unix.Zone.to_string zone) time_string

(* ----- older, more verbose syntax:
open Core

let () =
  Out_channel.output_string stdout "Pick a timezone: ";
  Out_channel.flush stdout; (* flushes the buffer to actually write to the channel*)
  match In_channel.(input_line stdin) with
  | None -> failwith "No timezone provided" (* In_channel.input_line returns an option -- None means end of input stream e.g. EOL *)
  | Some zone_string ->
    let zone = Time_unix.Zone.find_exn zone_string in
    let time_string = Time.to_string_abs (Time.now ()) ~zone in
    Out_channel.output_string stdout
      (String.concat
         ["The time in ";Time_unix.Zone.to_string zone;" is ";time_string;".\n"]);
    Out_channel.flush stdout (* flush it again to force the printing -- good habit, though not necessary since it's end of program*)

*)

File I/O #

The general pattern we realise for File IO is that we create the channel, then use the channel then we close the channel (file read vs write is the same thing, just the channels are swapped (e.g. file vs stdio)).

We need self-cleaning code here:

  • We need to be careful with how we handle the finite resource of File Descriptors, so exceptions shouldn’t mean that we don’t release the fd \(\implies\) no fd leak (add in a finally step)

    There’s better bookkeeping functions that are available to us In_channelwith_file that we should be using.

      (* we have ergonomic bookkeeping functions that we can use:  *)
      let sum_file filename =
        In_channel.with_file filename ~f:(fun file ->
          let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
          List.fold ~init:0 ~f:(+) numbers);;
      (* val sum_file : string -> int = <fun> *)
    
      (* the manual way would have been to add in a finally block for self-cleanup *)
      let sum_file filename =
        let file = In_channel.create filename in
        Exn.protect ~f:(fun () ->
          let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
          List.fold ~init:0 ~f:(+) numbers)
          ~finally:(fun () -> In_channel.close file);;
      (* val sum_file : string -> int = <fun> *)
    
  • we shouldn’t read the whole file into memory either and we should do line-by-line processing using In_channel.fold_lines instead of In_channel.input_lines (which reads the whole file)

      (* line by line processing *)
      let sum_file filename =
        In_channel.with_file filename ~f:(fun file ->
          In_channel.fold_lines file ~init:0 ~f:(fun sum line ->
            sum + Int.of_string line));;
    
      (* val sum_file : string -> int = <fun> *)
    
    
      (* compared with reading the whole file into memory *)
      let sum_file filename =
        In_channel.with_file filename ~f:(fun file ->
          let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
          List.fold ~init:0 ~f:(+) numbers);;
      (* val sum_file : string -> int = <fun> *)
    

There’s better patterns in the API docs for this. There’s also a guide to file manipulation here.

Order of Evaluation #

OCaml, like other languages is strict \(\implies\)

  • when we bind an identifier to result of some expression, the expression is evaluated prior to the binding
  • call a function on a set of args, those args are evaluated prior to being passed to the function

When we code imperatively, the order of evaluation matters. Additionally, it also matters if we evaluate lazily or not:

(* example: any of these sins are negative? *)
(* this code will unnecessarily compute sins -- we could have been lazy about it *)
let x = Float.sin 120. in
let y = Float.sin 75.  in
let z = Float.sin 128. in
List.exists ~f:(fun x -> Float.O.(x < 0.)) [x;y;z];;
(* - : bool = true *)

let x = lazy (Float.sin 120.) in
let y = lazy (Float.sin 75.)  in
let z = lazy (Float.sin 128.) in
List.exists ~f:(fun x -> Float.O.(Lazy.force x < 0.)) [x;y;z];;
(* - : bool = true *)

Why OCaml is strict:

lazy evaluation and imperative programming generally don’t mix well because laziness makes it harder to reason about when a given side effect is going to occur. Understanding the order of side effects is essential to reasoning about the behavior of an imperative program.

Strictness benefits:

  1. expressions bound by a sequence of let bindings, evaluated in the order that they’re defined

Counter-intuitive order of evaluation #

OCaml compiler’s order of evaluation can be a little counter intuitive – the sub-expression that is last here is evaluated first:

List.exists ~f:(fun x -> Float.O.(x < 0.))
  [ (printf "1\n"; Float.sin 120.);
    (printf "2\n"; Float.sin 75.);
    (printf "3\n"; Float.sin 128.); ];;
(* prints out: *)
(* 3 *)
(* 2 *)
(* 1 *)
(* - : bool = true *)

Side-effects and Weak Polymorphism #

Weakly polymorphic variable: variable that can be used with any single type (the _ in the type variable naming is what indicates that something is a weakly polymorphic typing). The compiler wishes to concretise this type ASAP.

(* example of variable with weak polymorphic typing *)
let remember =
  let cache = ref None in
  (fun x ->
     match !cache with
     | Some y -> y
     | None -> cache := Some x; x);;
(* val remember : '_weak1 -> '_weak1 = <fun> *)

(* example of general polymorphic typing: *)
let identity x = x;;
(* val identity : 'a -> 'a = <fun> *)

(* concretisation of the weakly polymorphic type *)
let remember_three () = remember 3;;
(* val remember_three : unit -> int = <fun> *)
remember "avocado";; (*<---- this will error out*)
  • Note that the type of remember was settled by the definition of remember_three, even though remember_three was never called!

The Value Restriction #

This part is about knowing when we have simple types that allow the code to remain polymorphic vs when it’s weakly polymorphic (and hence the values must be restricted).

Value Restriction is the rule that is used to concretise the types for weakly polymorphic variables.

The idea is that initially it’s an unknown type which is stored in a persistent, mutable cell. “Simple values” are types from the kinds of expressions that don’t introduce persistent, mutable cells – and so can remain polymorphic:

  • constants
  • constructors that only contain other simple values
  • function declarations
  • let bindings where the sub-bindings are all simple values

The value restriction doesn’t require that there is no mutable state, only that there is no persistent mutable state that could share values between uses of the same function.

(* this is simple, it's polymorphic *)
(fun x -> [x;x]);;
(* - : 'a -> 'a list = <fun> *)

(* this ends up being weakly polymorphic -- mainly because OCaml can't separate pure and impure functions. *)
identity (fun x -> [x;x]);;
(* - : '_weak2 -> '_weak2 list = <fun> *)

(* this has a fresh reference for each call to this function, hence it's NOT weakly typed. In comparison, a memoized function would be weakly typed (because the mutable cache would have access across different function calls for the same function -- persistent, mutable state.)  *)
let f () = ref None;;
(* val f : unit -> 'a option ref = <fun> *)

Partial Application and the Value Restriction #

In most cases, the value restriction is a good thing because the value in question can only be safely be used with a single type.

Partial application is the exception because partial application is NOT a simple value, so functions created by partial application are sometimes less general than we would expect.

the solution to avoiding this inferring of a weakly polymorphic type is to do eta expansion – general approach for resolving problems that arise from the value restriction.

let list_init_10 = List.init 10;; (* [1] *)
(* val list_init_10 : f:(int -> '_weak3) -> '_weak3 list = <fun> *)

(* eta expansion: this keeps things polymorphic -- we avoid the partial application -- [2] *)
let list_init_10 ~f = List.init 10 ~f;;
val list_init_10 : f:(int -> 'a) -> 'a list = <fun>
  • [1]: this is inferred as a weakly polymorphic type for the resulting function because there’s nothing that guarantees that List.init isn’t creating a persistent ref somewhere inside of it that would be shared across multiple call to list_init_10.

  • [2]: we do eta expansion to avoid the partial application and avoid the weakly polymorphic type inference.

Relaxing the Value Restriction #

Value Restriction is just a syntactic check. We can do a few operations that count as simple values and anything that is a simple value can be generalised (polymorphic).

There’s a relaxed version of value-restriction that lets us use type info to allow polymorphic types for things that are not simple values

  1. a function application (which may be inferred as weakly polymorphic) can be strongly polymorphic if the value is an immutable value

        (* function application -- inferred as weak *)
        identity (fun x -> [x;x]);;
        (* - : '_weak4 -> '_weak4 list = <fun> *)
    
        (* immutable hence can be strongly polymorphic *)
        identity [];;
        (* - : 'a list = [] *)
    
        [||];; (* mutable return value*)
        (* - : 'a array = [||] *)
        identity [||];; (* --> gets infered as weakly polymorphic*)
        (* - : '_weak5 array = [||] *)
    
  2. abstract values need explicit guarantees to be inferred as strongly polymorphic:

    an abstract data type needs to be annotated in a way that it guarantees in the interface that the data structure doesn’t contain any persistent references to values of type 'a and only then will OCaml infer polymorphic types for abstract values.

    In the example, Concat_list.t is immutable but without the guarantee, OCaml treats it as if it were mutable.

       module Concat_list : sig
         type 'a t
         val empty : 'a t
         val singleton : 'a -> 'a t
         val concat  : 'a t -> 'a t -> 'a t  (* constant time *)
         val to_list : 'a t -> 'a list       (* linear time   *)
       end = struct
    
         type 'a t = Empty | Singleton of 'a | Concat of 'a t * 'a t
    
         let empty = Empty
         let singleton x = Singleton x
         let concat x y = Concat (x,y)
    
         let rec to_list_with_tail t tail =
           match t with
           | Empty -> tail
           | Singleton x -> x :: tail
           | Concat (x,y) -> to_list_with_tail x (to_list_with_tail y tail)
    
         let to_list t =
           to_list_with_tail t []
    
       end;;
    
       (*
         module Concat_list :
         sig
           type 'a t
           val empty : 'a t
           val singleton : 'a -> 'a t
           val concat : 'a t -> 'a t -> 'a t
           val to_list : 'a t -> 'a list
         end
       *)
    
       Concat_list.empty;;
       (* - : 'a Concat_list.t = <abstr> *) (*<--- it's abstract type but no guarantee*)
       identity Concat_list.empty;; (*<---- gives weak polymorphism*)
       (* - : '_weak6 Concat_list.t = <abstr> *)
    
    
       (* ============================================================== *)
       module Concat_list : sig
         type +'a t
         val empty : 'a t
         val singleton : 'a -> 'a t
         val concat  : 'a t -> 'a t -> 'a t  (* constant time *)
         val to_list : 'a t -> 'a list       (* linear time   *)
       end = struct
    
         type 'a t = Empty | Singleton of 'a | Concat of 'a t * 'a t
    
         let empty = Empty
         let singleton x = Singleton x
         let concat x y = Concat (x,y)
    
         let rec to_list_with_tail t tail =
           match t with
           | Empty -> tail
           | Singleton x -> x :: tail
           | Concat (x,y) -> to_list_with_tail x (to_list_with_tail y tail)
    
         let to_list t =
           to_list_with_tail t []
    
       end;;
       (*
       module Concat_list :
         sig
           type +'a t
           val empty : 'a t
           val singleton : 'a -> 'a t
           val concat : 'a t -> 'a t -> 'a t
           val to_list : 'a t -> 'a list
         end
         *)
    
       (* now, it's strongly polymorphic: *)
       identity Concat_list.empty;;
       (* - : 'a Concat_list.t = <abstr> *)