Skip to main content
  1. Readings/
  2. Books/
  3. Real World OCaml: Functional Programming for the Masses/

Chapter 17: Testing

·· 2061 words· 10 mins
  • CLOSING NOTE [2026-04-22 Wed 22:58]
  • mental model for the importance of testing when it comes to strong type systems like in OCaml:

    • types help get more value out of testing effort. There’s less need to test the trivial properties that are auto-enforced by the type system

      also, rigidity of the type system – code has a snap-together quality

      so a relatively small number of tests can do an outsized amount to ensure that the code behaves as expected.

  • Ideal characteristics of tests:

    • easy to write and run

      easy to set up, auto-running and preventing regressions easily

    • easy to update

      so that updating tests doesn’t become its own kind of tech debt

    • fast

      allows focus to persist for the overall development process

    • deterministic

    • understandable

Testing

Inline Tests #

  • setup:

    • let dune known to expect inline tests to show up in the lib
    • enable ppx_inline_test as a preprocessor
  • the dune testrunner will handle the running of inline tests, the statements don’t get executed otherwise

    e.g. basic inline test:

    1
    2
    3
    4
    5
    
      open Base
    
      (*just needs the RHS to return true for the inline test to pass. *)
      let%test "rev" =
        List.equal Int.equal (List.rev [ 3; 2; 1 ]) [ 1; 2; 3 ]

More readable errors using test_eq #

  • the rudimentary version doesn’t help us know more about the structure and debug based on the failure message (if the test fails) – we can choose to throw an error for that to happen

  • uses the [%test_eq] syntax, which, given a type, generates code to test for equality and throw a meaningful exception if the arguments are unequal – test assertion that is done using ppx_assert syntax extension (via ppx_inline_test ppx_assert)

  • one problem is that using exceptions will show the backtrace, which is usually not that useful here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
open Base

let%test_unit "rev" =
  [%test_eq: int list] (List.rev [ 3; 2; 1 ]) [ 3; 2; 1 ]

(* -- so errors show the structures we care about

dune runtest
File "test.ml", line 3, characters 0-79: rev threw
(duniverse/ppx_assert/runtime-lib/runtime.ml.E "comparison failed"
  ((1 2 3) vs (3 2 1) (Loc test.ml:4:13))).
  Raised at Ppx_assert_lib__Runtime.test_eq in file "duniverse/ppx_assert/runtime-lib/runtime.ml", line 95, characters 22-69
  Called from Foo__Test.(fun) in file "test.ml", line 4, characters 13-21

FAILED 1 / 1 tests
[1]

*)

Where should tests go? #

  • it’s legal to put it near the code, within the lib itself but that’s not a good idea:

    • readability for code vs tests
    • bloat
    • testing mindset: encourages internal functionality testing vs exposed behaviour (public API); code written might not be very testable in the first place
  • there’s rare cases where putting the tests within the lib might be a good idea (e.g. hard to expose functionality)

  • should put tests into test-only libs

why inline tests can’t go into executables #

  • Dune doesn’t support inline_tests declaration in source files that ware directly part of an executable
  • the test runner would need to instantiate the modules that contain the tests – if there’s toplevel side-effects then you don’t want the test framework to effect them
  • solution: break up the program:
    • dir with lib that contains the logic of program but NO top-level effects
    • dir for exec that links in the library – for code launching

Expect Tests #

  • when our goal is to test behaviour, not just properties – so the capturing and making visible of the code’s behaviour

Basic Mechanics #

  • source specs both:

    • code to execute
    • expected output

    failure is when there’s a diff b/w the two

  • the expect test workflow is interesting:

    1. write out the behaviour you want to inspect

      e.g.

      1
      2
      3
      4
      
           open! Base
           open Stdio
      
           let%expect_test "trivial" = print_endline "Hello World!" (* this will fail *)
    2. run the expect test, which will “fail” but a corrected version will be written by the test runner

      • patdiff is the standard way the diff is shown

        e.g.:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        
               dune runtest
                    patdiff (internal) (exit 1)
               (cd _build/default && rwo/_build/install/default/bin/patdiff -keep-whitespace -location-style omake -ascii test.ml test.ml.corrected)
               ------ test.ml
               ++++++ test.ml.corrected
               File "test.ml", line 5, characters 0-1:
                |open! Base
                |open Stdio
                |
                |let%expect_test "trivial" =
               -|  print_endline "Hello World!"
               +|  print_endline "Hello World!";
               +|  [%expect {| Hello World! |}]
               [1]
    3. inspect the corrected version, if acceptable, then we can copy it over and let that be the test – henceforth the test won’t fail

      example:

      1
      2
      3
      4
      5
      6
      
           open Base
           open Stdio
      
           let%expect_test "trivial" =
             print_endline "Hello World!";
             [%expect {| Hello World! |}]

      multiple expect blocks within a single test works too

      1
      2
      3
      4
      5
      6
      7
      8
      
           open Base
           open Stdio
      
           let%expect_test "multi-block" =
             print_endline "Hello";
             [%expect {| Hello |}];
             print_endline "World!";
             [%expect {| World! |}]

syntactic sugar aside: using open vs open! #

  • A sensible idiom is to always use open! when opening a library like Base, so that you don’t have to choose when to use the !, and when not to.
  • it’s a warning suppression syntax:
    • warning because: we may not be directly using the values from Base – so compiler warns us
    • we want to keep that “standard lib” open because any new code we write, we want to find Base’s modules

syntax: quoted strings {| quoted string |};; #

  • allows content to be written without the usual escaping required for str literals

  • basically raw strings in other langs like python

  • especially useful when writing strings containing text from another language, like HTML. With quoted strings, you can just paste in a snippet of some other source language, and it should work unmodified.

  • tricky case: what if we want to write |} inside the quoted string?

    trick: change the delim for the quoted string by adding an arbitrary identifier:

      {xxx|This is how you quote a {|quoted string|}|xxx};;
    
      (*  - : string = "This is how you quote a {|quoted string|}"  *)

What are Expect Tests Good for? #

  • contrast it with property testing, property tests are your best bet when you have a clear set of predicates that you want to test, and examples can be naturally generated at random.
  • Where expect tests shine is where you want to make visible some aspect of the behavior of your system that’s hard to capture in a predicate.

Exploratory Programming #

  • when the code path isn’t clear in our heads yet, we can explore it first using expect testing e.g. for web-scraping

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    
      open Base
      open Stdio
    
      let get_href_hosts soup =
        Soup.select "a[href]" soup
        |> Soup.to_list
        |> List.map ~f:(Soup.R.attribute "href")
        |> Set.of_list (module String)
    
      (*-- using expect tests to demo what the function does on an example page: *)
    
      let%expect_test _ =
        let example_html =
          {|
          <html>
            Some random <b>text</b> with a
            <a href="http://ocaml.org/base">link</a>.
            And here's another
            <a href="http://github.com/ocaml/dune">link</a>.
            And here is <a>link</a> with no href.
          </html>|}
        in
        let soup = Soup.parse example_html in
        let hrefs = get_href_hosts soup in
        print_s [%sexp (hrefs : Set.M(String).t)]

    improved version:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    
      let get_href_hosts soup =
        Soup.select "a[href]" soup
        |> Soup.to_list
        |> List.map ~f:(Soup.R.attribute "href")
        |> List.filter_map ~f:(fun uri -> Uri.host (Uri.of_string uri))
        |> Set.of_list (module String)
    
      (*
        dune runtest
           patdiff (internal) (exit 1)
      ...
      ------ test.ml
      ++++++ test.ml.corrected
      File "test.ml", line 26, characters 0-1:
       |  |> Set.of_list (module String)
       |
       |[@@@part "1"] ;;
       |let%expect_test _ =
       |  let example_html = {|
       |    <html>
       |      Some random <b>text</b> with a
       |      <a href="http://ocaml.org/base">link</a>.
       |      And here's another
       |      <a href="http://github.com/ocaml/dune">link</a>.
       |      And here is <a>link</a> with no href.
       |    </html>|}
       |  in
       |  let soup = Soup.parse example_html in
       |  let hrefs = get_href_hosts soup in
       |  print_s [%sexp (hrefs : Set.M(String).t)];
      -|  [%expect {| (http://github.com/ocaml/dune http://ocaml.org/base) |}]
      +|  [%expect {| (github.com ocaml.org) |}]
      [1]
      *)
  • using expect tests, we can leave the exploratory cases and examples as tests for our benefit

Visualising Complex Behaviour #

  • can be used to examine the dynamic behaviour of a system

  • rate limiter example:

    here’s the context and setup for the example, it’s a rate limiter module, with some helper functions to make it easy for us

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    
      (* -- .mli for a rate limiter module *)
      open Core
    
      type t
    
      val create : now:Time_ns.t -> period:Time_ns.Span.t -> rate:int -> t
      val maybe_consume : t -> now:Time_ns.t -> [ `Consumed | `No_capacity ]
    
      (* -- some helper examples for the example: *)
      open Core
    
      let start_time =
        Time_ns.of_string_with_utc_offset "2021-06-01 7:00:00Z"
    
      let limiter () =
        Rate_limiter.create
          ~now:start_time
          ~period:(Time_ns.Span.of_sec 1.)
          ~rate:2
    
      let consume lim offset =
        let result =
          Rate_limiter.maybe_consume
            lim
            ~now:(Time_ns.add start_time (Time_ns.Span.of_sec offset))
        in
        printf
          "%4.2f: %s\n"
          offset
          (match result with
          | `Consumed -> "C"
          | `No_capacity -> "N")
    
    
    
      (*---- example bug that the expect tests will reveal -- faulty implementation of the drainage function: *)
      let rec drain_old_events t =
        match Queue.peek t.events with
        | None -> ()
        | Some time ->
          if Time_ns.Span.( < ) (Time_ns.diff t.now time) t.period
          then (
            ignore (Queue.dequeue_exn t.events : Time_ns.t);
            drain_old_events t)

    now for the expect tests on the behaviour:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    
      let%expect_test _ =
        let lim = limiter () in
        let consume offset = consume lim offset in
        (* Exhaust the rate limit, without advancing the clock. *)
        for _ = 1 to 3 do
          consume 0.
        done;
        [%expect {| |}];
        (* Wait until a half-second has elapsed, try again *)
        consume 0.5;
        [%expect {| |}];
        (* Wait until a full second has elapsed, try again *)
        consume 1.;
        [%expect {|  |}]
    
      (*-- suppose there's actually a bug, as alluded to above*)
      (*--  on running the expect test, then applying the promotions to the expect tests ( the "improvements" ), we get THE WRONG VERSION OF THIS EXPECT TEST:*)
      let%expect_test _ =
        let lim = limiter () in
        let consume offset = consume lim offset in
        (* Exhaust the rate limit, without advancing the clock. *)
        for _ = 1 to 3 do
          consume 0.
        done;
        [%expect {|
          0.00: C
          0.00: C
          0.00: C |}];
        (* Wait until a half-second has elapsed, try again *)
        consume 0.5;
        [%expect {| 0.50: C |}];
        (* Wait until a full second has elapsed, try again *)
        consume 1.;
        [%expect {| 1.00: C |}]
    
    
      (*--- now that we realise the bug has to be fixed, on fixing it, the expect test gets updated to be: *)
      let%expect_test _ =
        let lim = limiter () in
        let consume offset = consume lim offset in
        (* Exhaust the rate limit, without advancing the clock. *)
        for _ = 1 to 3 do
          consume 0.
        done;
        [%expect {|
          0.00: C
          0.00: C
          0.00: N |}];
        (* Wait until a half-second has elapsed, try again *)
        consume 0.5;
        [%expect {| 0.50: N |}];
        (* Wait until a full second has elapsed, try again *)
        consume 1.;
        [%expect {| 1.00: C |}]

    an aside about test readability is that the creation of good helpers is what allowed us to get the test code concise.

End-to-end tests #

  • test examples so far have been deterministic, nothing special like IO-bound or interacting with system resources.
  • expect tests can still be useful when we’ve got multiple processes interacting with each other and using real IO

setup #

  • we let the inline_tests declaration have a deps on the binary that we wish to test
  • we have to write some helper functions to make life easy, here’s an example of signature for it:
      open! Core
      open Async
    
      (** Launches the echo server *)
      val launch : port:int -> uppercase:bool -> Process.t Deferred.t
    
      (** Connects to the echo server, returning a reader and writer for
         communicating with the server. *)
      val connect : port:int -> (Reader.t * Writer.t) Deferred.t
    
      (** Sends data to the server, printing out the result  *)
      val send_data : Reader.t -> Writer.t -> string -> unit Deferred.t
    
      (** Kills the echo server, and waits until it exits  *)
      val cleanup : Process.t -> unit Deferred.t

test example #

Using the helper functions, we setup an expect test. Note that we put in the expect annotations where we want to see data – it’s not been filled in yet, we can get that part promoted

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
open! Core
open Async
open Helpers

let%expect_test "test uppercase echo" =
  let port = 8081 in
  let%bind process  = launch ~port ~uppercase:true in
  Monitor.protect (fun () ->
      let%bind () = Clock.after (Time.Span.of_sec 1.) in (*--allows the server to be setup so that we don't have a silly erroring out of the test*)
      let%bind (r,w) = connect ~port in
      let%bind () = send_data r w "one two three\n" in
      let%bind () = [%expect{| ONE TWO THREE |}] in
      let%bind () = send_data r w "one 2 three\n" in
      let%bind () = [%expect{| ONE 2 THREE |}] in
      return ())
    ~finally:(fun () -> cleanup process)

this solution is an example of how real I/O makes things messy, fast – because of the lack of determinism.

The fixed timeout is ugly here, we might make an improvement and wrap it around some retry logic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
let rec connect ~port =
  match%bind
    Monitor.try_with (fun () ->
        Tcp.connect
          (Tcp.Where_to_connect.of_host_and_port
             { host = "localhost"; port }))
  with
  | Ok (_, r, w) -> return (r, w)
  | Error _ ->
    let%bind () = Clock.after (Time.Span.of_sec 0.01) in
    connect ~port
  • we should write code without such real deps – we can still use expect tests for that purpose when we need to though.

How to Write a Good Expect Test #

Some guidelines:

  1. write helper functions

    helps with setup and isolates out actual test-logic from harnessing and scaffolding

  2. write custom pretty-printers

    • should surface the info that we need to see in the test

    • easier to read, minimises churn when details are irrelevant to the test change

  3. aim for determinism

    • without needing it to interact with the outside world (generally the source of non-determinism)

    • if non-determinism really needed, avoid:

      • timeouts and stopgaps that would make the tests no longer performant

Property Testing – with Quickcheck #

  • property testing: when you’re using simple assertions to check this or that property
  • two things neede by a property test:
    1. function with an example input, checking that some property holds

    2. way of generating random examples – the probability distribution matters here – we can let Quickcheck handle it for us.

      in the example below, we use quickcheck to include the min_value case which shows unexpected behaviour from our POV because it’s an edge case that needs to be explicitly considered.

      this is because of 2s complement, “the negation of min_value is equal to itself”

      example:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      
             open Base
      
      
             (* simple test check where the choice of inputs follow a rudimentary distribution (simple, uniform distribution) *)
             let%test_unit "negation flips the sign" =
               for _ = 0 to 100_000 do
                 let x = Random.int_incl Int.min_value Int.max_value in
                 [%test_eq: Sign.t]
                   (Int.sign (Int.neg x))
                   (Sign.flip (Int.sign x))
               done
      
             (* -- using Quickcheck to prioritise  *)
             open Core
      
             let%test_unit "negation flips the sign" =
               Quickcheck.test
                 ~sexp_of:[%sexp_of: int]
                 (Int.gen_incl Int.min_value Int.max_value)
                 ~f:(fun x ->
                   [%test_eq: Sign.t]
                     (Int.sign (Int.neg x))
                     (Sign.flip (Int.sign x)))
      
             (* which can uncover the following edge case -- this is because of 2s complement, "the negation of min_value is equal to itself" -- some common behaviour: *)
      
             (*
             dune runtest
             File "test.ml", line 3, characters 0-244: negation flips the sign threw
             ("Base_quickcheck.Test.run: test failed" (input -4611686018427387904)
               (error
                 ((duniverse/ppx_assert/runtime-lib/runtime.ml.E "comparison failed"
                    (Neg vs Pos (Loc test.ml:7:19)))
                    "Raised at Ppx_assert_lib__Runtime.failwith in file \"duniverse/ppx_assert/runtime-lib/runtime.ml\", line 28, characters 28-53\
                   \nCalled from Base__Or_error.try_with in file \"duniverse/base/src/or_error.ml\", line 76, characters 9-15\
                   \n"))).
               Raised at Base__Exn.protectx in file "duniverse/base/src/exn.ml", line 71, characters 4-114
               Called from Ppx_inline_test_lib__Runtime.time_and_reset_random_seeds in file "duniverse/ppx_inline_test/runtime-lib/runtime.ml", line 356, characters 15-52
               Called from Ppx_inline_test_lib__Runtime.test in file "duniverse/ppx_inline_test/runtime-lib/runtime.ml", line 444, characters 52-83
      
             FAILED 1 / 1 tests
             *)
    3. Quickcheck’s decision to put much larger weight on special cases is what allowed us to discover this unexpected behavior. Note that in this case, it’s not really a bug that we’ve uncovered, it’s just that the property that we thought would hold can’t in practice. But either way, Quickcheck helped us understand the behavior of our code better.

Handling Complex Types #

  • we typically need more than just the atomic types to be generated for testing
  • Quickcheck has combinators that serve this need:
    • pair generator example
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      
          open Core
      
          let gen_int_list_pair =
            let int_list_gen =
              List.gen_non_empty (Int.gen_incl Int.min_value Int.max_value)
            in
            Quickcheck.Generator.both int_list_gen int_list_gen (* -- both is useful for creating a generator for pairs from two generators of constituent types.*)
      
          let%test_unit "List.rev_append is List.append of List.rev" =
            Quickcheck.test
              ~sexp_of:[%sexp_of: int list * int list]
              gen_int_list_pair
              ~f:(fun (l1, l2) ->
                [%test_eq: int list]
                  (List.rev_append l1 l2)
                  (List.append (List.rev l1) l2))
      there’s a ppx to simplify this process:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      
          open Core
      
          let%test_unit "List.rev_append is List.append of List.rev" =
            Quickcheck.test
              ~sexp_of:[%sexp_of: int list * int list]
              [%quickcheck.generator: int list * int list]
              ~f:(fun (l1, l2) ->
                [%test_eq: int list]
                  (List.rev_append l1 l2)
                  (List.append (List.rev l1) l2))
    • variants example
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      
          type shape =
            | Circle of { radius: float }
            | Rect of { height: float; width: float }
            | Poly of (float * float) list
          [@@deriving quickcheck];;
      
      
          type shape =
            | Circle of { radius: float } [@quickcheck.weight 0.5]
            | Rect of { height: float; width: float }
            | Poly of (float * float) list
          [@@deriving quickcheck];;
      
      
          (* -- understanding the weights:
             Note that the default weight on each case is 1, so now Circle will be generated with probability 0.5 / 2.5 or 0.2, instead of the 1/3rd probability that it would have natively.
          *)

More control with Let-syntax #

  • quickcheck generators form a monad >> we can exploit this if the ppx annotations don’t work for us

  • In combination with let syntax, the generator monad gives us a convenient way to specify generators for custom types.

    Throughout this function we’re making choices about the probability distribution.

      let gen_shape =
        let open Quickcheck.Generator.Let_syntax in
        let module G = Base_quickcheck.Generator in
        let circle =
          let%map radius = G.float_positive_or_zero in
          Circle { radius }
        in
        let rect =
          let%bind height = G.float_positive_or_zero in
          let%map width = G.float_inclusive height Float.infinity in
          Rect { height; width }
        in
        let poly =
          let%map points =
            List.gen_non_empty
              (G.both G.float_positive_or_zero G.float_positive_or_zero)
          in
          Poly points
        in
        G.union [ circle; rect; poly ];;

Other Testing Tools #

Other Tools to do (Mostly) the same Things #

  • Alcotest: different system for registering and test-running
  • there’s a bunch of others, see the section from the online textbook here

Fuzzing #

  • beyond just (traditional) random fuzzing that looks at mutants, we can do instrumentation-guided fuzzing
  • instrumenting the program, and then using that instrumentation to guide the randomization in the direction of more code coverage.
    • e.g. AFL and for the OCaml ecosystem, there’s Crowbar and Bun for it

Reference Repos #

ocaml-lsp #

This is a great reference because they have a comprehensive test-suite with sound setup and utils scaffolds that makes it easy to read and understand tests. They also align with jane street when it comes to the ppx usage and such.

observations #

  1. the use of util modules within test files.

    most of them have util modules for setup and teardown helpers.:

  2. the dune setup configs look a little complex because of a mix of old and new test suites, there are areas that are version-dependent as well (see this example where there’s a version pegging).

    There’s also the handling of transitive deps that get handled – see example here

    The e2e-new tests have their dunefile written here, (permalink) which seems to be the typical way to set this up.

  3. Interestingly, there’s a larger entrypoint for e2e tests, which spins up a sys command to call yarn and run the test (ref), that’s because the e2e tests seem to be run using js/ts to spin up a harness for tests written in jest (the js testing framework)