Chapter 17: Testing

OCaml’s type system is so strong that a large class of bugs simply can’t compile — which changes what testing is for. This chapter covers the OCaml testing ecosystem: inline expect tests via ppx_expect, property-based testing, and the general discipline of writing tests that complement rather than duplicate what the type system already proves. The snap-together quality of well-typed OCaml code means testing effort concentrates where it matters: the interesting invariants, not the trivial ones.

Figure 1: This is what happens if you don’t write tests — *The Court of the Crimson King, by King Crimson* (image retrieved, legendary album link)

mental model for the importance of testing when it comes to strong type systems like in OCaml:
- types help get more value out of testing effort. There’s less need to test the trivial properties that are auto-enforced by the type system
  also, rigidity of the type system – code has a snap-together quality
  so a relatively small number of tests can do an outsized amount to ensure that the code behaves as expected.
Ideal characteristics of tests:
- easy to write and run
  easy to set up, auto-running and preventing regressions easily
- easy to update
  so that updating tests doesn’t become its own kind of tech debt
- fast
  allows focus to persist for the overall development process
- deterministic
- understandable

Testing

Inline Tests #

setup:
- let dune known to expect inline tests to show up in the lib
- enable ppx_inline_test as a preprocessor

the dune testrunner will handle the running of inline tests, the statements don’t get executed otherwise

e.g. basic inline test:

1
2
3
4
5
  open Base

  (*just needs the RHS to return true for the inline test to pass. *)
  let%test "rev" =
    List.equal Int.equal (List.rev [ 3; 2; 1 ]) [ 1; 2; 3 ]

Where should tests go? #

it’s legal to put it near the code, within the lib itself but that’s not a good idea:
- readability for code vs tests
- bloat
- testing mindset: encourages internal functionality testing vs exposed behaviour (public API); code written might not be very testable in the first place
there’s rare cases where putting the tests within the lib might be a good idea (e.g. hard to expose functionality)
should put tests into test-only libs

why inline tests can’t go into executables #

Dune doesn’t support inline_tests declaration in source files that ware directly part of an executable
the test runner would need to instantiate the modules that contain the tests – if there’s toplevel side-effects then you don’t want the test framework to effect them
solution: break up the program:
- dir with lib that contains the logic of program but NO top-level effects
- dir for exec that links in the library – for code launching

Expect Tests #

when our goal is to test behaviour, not just properties – so the capturing and making visible of the code’s behaviour

Basic Mechanics #

source specs both:
- code to execute
- expected output
failure is when there’s a diff b/w the two

the expect test workflow is interesting:

write out the behaviour you want to inspect

e.g.

1
2
3
4
     open! Base
     open Stdio

     let%expect_test "trivial" = print_endline "Hello World!" (* this will fail *)

run the expect test, which will “fail” but a corrected version will be written by the test runner

patdiff is the standard way the diff is shown

e.g.:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
       dune runtest
            patdiff (internal) (exit 1)
       (cd _build/default && rwo/_build/install/default/bin/patdiff -keep-whitespace -location-style omake -ascii test.ml test.ml.corrected)
       ------ test.ml
       ++++++ test.ml.corrected
       File "test.ml", line 5, characters 0-1:
        |open! Base
        |open Stdio
        |
        |let%expect_test "trivial" =
       -|  print_endline "Hello World!"
       +|  print_endline "Hello World!";
       +|  [%expect {| Hello World! |}]
       [1]

inspect the corrected version, if acceptable, then we can copy it over and let that be the test – henceforth the test won’t fail

example:

1
2
3
4
5
6
     open Base
     open Stdio

     let%expect_test "trivial" =
       print_endline "Hello World!";
       [%expect {| Hello World! |}]

multiple expect blocks within a single test works too

1
2
3
4
5
6
7
8
     open Base
     open Stdio

     let%expect_test "multi-block" =
       print_endline "Hello";
       [%expect {| Hello |}];
       print_endline "World!";
       [%expect {| World! |}]

syntactic sugar aside: using `open` vs `open!` #

A sensible idiom is to always use open! when opening a library like Base, so that you don’t have to choose when to use the !, and when not to.
it’s a warning suppression syntax:
- warning because: we may not be directly using the values from Base – so compiler warns us
- we want to keep that “standard lib” open because any new code we write, we want to find Base’s modules

syntax: quoted strings `{| quoted string |};;` #

allows content to be written without the usual escaping required for str literals
basically raw strings in other langs like python
especially useful when writing strings containing text from another language, like HTML. With quoted strings, you can just paste in a snippet of some other source language, and it should work unmodified.
tricky case: what if we want to write |} inside the quoted string?
trick: change the delim for the quoted string by adding an arbitrary identifier:
```
  {xxx|This is how you quote a {|quoted string|}|xxx};;

  (*  - : string = "This is how you quote a {|quoted string|}"  *)
```

What are Expect Tests Good for? #

contrast it with property testing, property tests are your best bet when you have a clear set of predicates that you want to test, and examples can be naturally generated at random.
Where expect tests shine is where you want to make visible some aspect of the behavior of your system that’s hard to capture in a predicate.

Exploratory Programming #

when the code path isn’t clear in our heads yet, we can explore it first using expect testing e.g. for web-scraping

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  open Base
  open Stdio

  let get_href_hosts soup =
    Soup.select "a[href]" soup
    |> Soup.to_list
    |> List.map ~f:(Soup.R.attribute "href")
    |> Set.of_list (module String)

  (*-- using expect tests to demo what the function does on an example page: *)

  let%expect_test _ =
    let example_html =
      {|
      <html>
        Some random <b>text</b> with a
        <a href="http://ocaml.org/base">link</a>.
        And here's another
        <a href="http://github.com/ocaml/dune">link</a>.
        And here is <a>link</a> with no href.
      </html>|}
    in
    let soup = Soup.parse example_html in
    let hrefs = get_href_hosts soup in
    print_s [%sexp (hrefs : Set.M(String).t)]

improved version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
  let get_href_hosts soup =
    Soup.select "a[href]" soup
    |> Soup.to_list
    |> List.map ~f:(Soup.R.attribute "href")
    |> List.filter_map ~f:(fun uri -> Uri.host (Uri.of_string uri))
    |> Set.of_list (module String)

  (*
    dune runtest
       patdiff (internal) (exit 1)
  ...
  ------ test.ml
  ++++++ test.ml.corrected
  File "test.ml", line 26, characters 0-1:
   |  |> Set.of_list (module String)
   |
   |[@@@part "1"] ;;
   |let%expect_test _ =
   |  let example_html = {|
   |    <html>
   |      Some random <b>text</b> with a
   |      <a href="http://ocaml.org/base">link</a>.
   |      And here's another
   |      <a href="http://github.com/ocaml/dune">link</a>.
   |      And here is <a>link</a> with no href.
   |    </html>|}
   |  in
   |  let soup = Soup.parse example_html in
   |  let hrefs = get_href_hosts soup in
   |  print_s [%sexp (hrefs : Set.M(String).t)];
  -|  [%expect {| (http://github.com/ocaml/dune http://ocaml.org/base) |}]
  +|  [%expect {| (github.com ocaml.org) |}]
  [1]
  *)

using expect tests, we can leave the exploratory cases and examples as tests for our benefit

Visualising Complex Behaviour #

can be used to examine the dynamic behaviour of a system

rate limiter example:

here’s the context and setup for the example, it’s a rate limiter module, with some helper functions to make it easy for us

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
  (* -- .mli for a rate limiter module *)
  open Core

  type t

  val create : now:Time_ns.t -> period:Time_ns.Span.t -> rate:int -> t
  val maybe_consume : t -> now:Time_ns.t -> [ `Consumed | `No_capacity ]

  (* -- some helper examples for the example: *)
  open Core

  let start_time =
    Time_ns.of_string_with_utc_offset "2021-06-01 7:00:00Z"

  let limiter () =
    Rate_limiter.create
      ~now:start_time
      ~period:(Time_ns.Span.of_sec 1.)
      ~rate:2

  let consume lim offset =
    let result =
      Rate_limiter.maybe_consume
        lim
        ~now:(Time_ns.add start_time (Time_ns.Span.of_sec offset))
    in
    printf
      "%4.2f: %s\n"
      offset
      (match result with
      | `Consumed -> "C"
      | `No_capacity -> "N")



  (*---- example bug that the expect tests will reveal -- faulty implementation of the drainage function: *)
  let rec drain_old_events t =
    match Queue.peek t.events with
    | None -> ()
    | Some time ->
      if Time_ns.Span.( < ) (Time_ns.diff t.now time) t.period
      then (
        ignore (Queue.dequeue_exn t.events : Time_ns.t);
        drain_old_events t)

now for the expect tests on the behaviour:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
  let%expect_test _ =
    let lim = limiter () in
    let consume offset = consume lim offset in
    (* Exhaust the rate limit, without advancing the clock. *)
    for _ = 1 to 3 do
      consume 0.
    done;
    [%expect {| |}];
    (* Wait until a half-second has elapsed, try again *)
    consume 0.5;
    [%expect {| |}];
    (* Wait until a full second has elapsed, try again *)
    consume 1.;
    [%expect {|  |}]

  (*-- suppose there's actually a bug, as alluded to above*)
  (*--  on running the expect test, then applying the promotions to the expect tests ( the "improvements" ), we get THE WRONG VERSION OF THIS EXPECT TEST:*)
  let%expect_test _ =
    let lim = limiter () in
    let consume offset = consume lim offset in
    (* Exhaust the rate limit, without advancing the clock. *)
    for _ = 1 to 3 do
      consume 0.
    done;
    [%expect {|
      0.00: C
      0.00: C
      0.00: C |}];
    (* Wait until a half-second has elapsed, try again *)
    consume 0.5;
    [%expect {| 0.50: C |}];
    (* Wait until a full second has elapsed, try again *)
    consume 1.;
    [%expect {| 1.00: C |}]


  (*--- now that we realise the bug has to be fixed, on fixing it, the expect test gets updated to be: *)
  let%expect_test _ =
    let lim = limiter () in
    let consume offset = consume lim offset in
    (* Exhaust the rate limit, without advancing the clock. *)
    for _ = 1 to 3 do
      consume 0.
    done;
    [%expect {|
      0.00: C
      0.00: C
      0.00: N |}];
    (* Wait until a half-second has elapsed, try again *)
    consume 0.5;
    [%expect {| 0.50: N |}];
    (* Wait until a full second has elapsed, try again *)
    consume 1.;
    [%expect {| 1.00: C |}]

an aside about test readability is that the creation of good helpers is what allowed us to get the test code concise.

End-to-end tests #

test examples so far have been deterministic, nothing special like IO-bound or interacting with system resources.
expect tests can still be useful when we’ve got multiple processes interacting with each other and using real IO

setup #

we let the inline_tests declaration have a deps on the binary that we wish to test

we have to write some helper functions to make life easy, here’s an example of signature for it:

  open! Core
  open Async

  (** Launches the echo server *)
  val launch : port:int -> uppercase:bool -> Process.t Deferred.t

  (** Connects to the echo server, returning a reader and writer for
     communicating with the server. *)
  val connect : port:int -> (Reader.t * Writer.t) Deferred.t

  (** Sends data to the server, printing out the result  *)
  val send_data : Reader.t -> Writer.t -> string -> unit Deferred.t

  (** Kills the echo server, and waits until it exits  *)
  val cleanup : Process.t -> unit Deferred.t

test example #

Using the helper functions, we setup an expect test. Note that we put in the expect annotations where we want to see data – it’s not been filled in yet, we can get that part promoted

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
open! Core
open Async
open Helpers

let%expect_test "test uppercase echo" =
  let port = 8081 in
  let%bind process  = launch ~port ~uppercase:true in
  Monitor.protect (fun () ->
      let%bind () = Clock.after (Time.Span.of_sec 1.) in (*--allows the server to be setup so that we don't have a silly erroring out of the test*)
      let%bind (r,w) = connect ~port in
      let%bind () = send_data r w "one two three\n" in
      let%bind () = [%expect{| ONE TWO THREE |}] in
      let%bind () = send_data r w "one 2 three\n" in
      let%bind () = [%expect{| ONE 2 THREE |}] in
      return ())
    ~finally:(fun () -> cleanup process)

this solution is an example of how real I/O makes things messy, fast – because of the lack of determinism.

The fixed timeout is ugly here, we might make an improvement and wrap it around some retry logic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
let rec connect ~port =
  match%bind
    Monitor.try_with (fun () ->
        Tcp.connect
          (Tcp.Where_to_connect.of_host_and_port
             { host = "localhost"; port }))
  with
  | Ok (_, r, w) -> return (r, w)
  | Error _ ->
    let%bind () = Clock.after (Time.Span.of_sec 0.01) in
    connect ~port

we should write code without such real deps – we can still use expect tests for that purpose when we need to though.

How to Write a Good Expect Test #

Some guidelines:

write helper functions
helps with setup and isolates out actual test-logic from harnessing and scaffolding
write custom pretty-printers
- should surface the info that we need to see in the test
- easier to read, minimises churn when details are irrelevant to the test change
aim for determinism
- without needing it to interact with the outside world (generally the source of non-determinism)
- if non-determinism really needed, avoid:
  - timeouts and stopgaps that would make the tests no longer performant

Property Testing – with Quickcheck #

property testing: when you’re using simple assertions to check this or that property

two things neede by a property test:

function with an example input, checking that some property holds

way of generating random examples – the probability distribution matters here – we can let Quickcheck handle it for us.

in the example below, we use quickcheck to include the min_value case which shows unexpected behaviour from our POV because it’s an edge case that needs to be explicitly considered.

this is because of 2s complement, “the negation of min_value is equal to itself”

example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
       open Base


       (* simple test check where the choice of inputs follow a rudimentary distribution (simple, uniform distribution) *)
       let%test_unit "negation flips the sign" =
         for _ = 0 to 100_000 do
           let x = Random.int_incl Int.min_value Int.max_value in
           [%test_eq: Sign.t]
             (Int.sign (Int.neg x))
             (Sign.flip (Int.sign x))
         done

       (* -- using Quickcheck to prioritise  *)
       open Core

       let%test_unit "negation flips the sign" =
         Quickcheck.test
           ~sexp_of:[%sexp_of: int]
           (Int.gen_incl Int.min_value Int.max_value)
           ~f:(fun x ->
             [%test_eq: Sign.t]
               (Int.sign (Int.neg x))
               (Sign.flip (Int.sign x)))

       (* which can uncover the following edge case -- this is because of 2s complement, "the negation of min_value is equal to itself" -- some common behaviour: *)

       (*
       dune runtest
       File "test.ml", line 3, characters 0-244: negation flips the sign threw
       ("Base_quickcheck.Test.run: test failed" (input -4611686018427387904)
         (error
           ((duniverse/ppx_assert/runtime-lib/runtime.ml.E "comparison failed"
              (Neg vs Pos (Loc test.ml:7:19)))
              "Raised at Ppx_assert_lib__Runtime.failwith in file \"duniverse/ppx_assert/runtime-lib/runtime.ml\", line 28, characters 28-53\
             \nCalled from Base__Or_error.try_with in file \"duniverse/base/src/or_error.ml\", line 76, characters 9-15\
             \n"))).
         Raised at Base__Exn.protectx in file "duniverse/base/src/exn.ml", line 71, characters 4-114
         Called from Ppx_inline_test_lib__Runtime.time_and_reset_random_seeds in file "duniverse/ppx_inline_test/runtime-lib/runtime.ml", line 356, characters 15-52
         Called from Ppx_inline_test_lib__Runtime.test in file "duniverse/ppx_inline_test/runtime-lib/runtime.ml", line 444, characters 52-83

       FAILED 1 / 1 tests
       *)

Quickcheck’s decision to put much larger weight on special cases is what allowed us to discover this unexpected behavior. Note that in this case, it’s not really a bug that we’ve uncovered, it’s just that the property that we thought would hold can’t in practice. But either way, Quickcheck helped us understand the behavior of our code better.

Handling Complex Types #

we typically need more than just the atomic types to be generated for testing

Quickcheck has combinators that serve this need:

pair generator example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
    open Core

    let gen_int_list_pair =
      let int_list_gen =
        List.gen_non_empty (Int.gen_incl Int.min_value Int.max_value)
      in
      Quickcheck.Generator.both int_list_gen int_list_gen (* -- both is useful for creating a generator for pairs from two generators of constituent types.*)

    let%test_unit "List.rev_append is List.append of List.rev" =
      Quickcheck.test
        ~sexp_of:[%sexp_of: int list * int list]
        gen_int_list_pair
        ~f:(fun (l1, l2) ->
          [%test_eq: int list]
            (List.rev_append l1 l2)
            (List.append (List.rev l1) l2))

there’s a ppx to simplify this process:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    open Core

    let%test_unit "List.rev_append is List.append of List.rev" =
      Quickcheck.test
        ~sexp_of:[%sexp_of: int list * int list]
        [%quickcheck.generator: int list * int list]
        ~f:(fun (l1, l2) ->
          [%test_eq: int list]
            (List.rev_append l1 l2)
            (List.append (List.rev l1) l2))

variants example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    type shape =
      | Circle of { radius: float }
      | Rect of { height: float; width: float }
      | Poly of (float * float) list
    [@@deriving quickcheck];;


    type shape =
      | Circle of { radius: float } [@quickcheck.weight 0.5]
      | Rect of { height: float; width: float }
      | Poly of (float * float) list
    [@@deriving quickcheck];;


    (* -- understanding the weights:
       Note that the default weight on each case is 1, so now Circle will be generated with probability 0.5 / 2.5 or 0.2, instead of the 1/3rd probability that it would have natively.
    *)

More control with Let-syntax #

quickcheck generators form a monad >> we can exploit this if the ppx annotations don’t work for us

In combination with let syntax, the generator monad gives us a convenient way to specify generators for custom types.

Throughout this function we’re making choices about the probability distribution.

  let gen_shape =
    let open Quickcheck.Generator.Let_syntax in
    let module G = Base_quickcheck.Generator in
    let circle =
      let%map radius = G.float_positive_or_zero in
      Circle { radius }
    in
    let rect =
      let%bind height = G.float_positive_or_zero in
      let%map width = G.float_inclusive height Float.infinity in
      Rect { height; width }
    in
    let poly =
      let%map points =
        List.gen_non_empty
          (G.both G.float_positive_or_zero G.float_positive_or_zero)
      in
      Poly points
    in
    G.union [ circle; rect; poly ];;

Other Testing Tools #

Other Tools to do (Mostly) the same Things #

Alcotest: different system for registering and test-running
there’s a bunch of others, see the section from the online textbook here

Fuzzing #

beyond just (traditional) random fuzzing that looks at mutants, we can do instrumentation-guided fuzzing
instrumenting the program, and then using that instrumentation to guide the randomization in the direction of more code coverage.
- e.g. AFL and for the OCaml ecosystem, there’s Crowbar and Bun for it

Reference Repos #

ocaml-lsp #

This is a great reference because they have a comprehensive test-suite with sound setup and utils scaffolds that makes it easy to read and understand tests. They also align with jane street when it comes to the ppx usage and such.

observations #

the use of util modules within test files.
most of them have util modules for setup and teardown helpers.:
the dune setup configs look a little complex because of a mix of old and new test suites, there are areas that are version-dependent as well (see this example where there’s a version pegging).
There’s also the handling of transitive deps that get handled – see example here
The e2e-new tests have their dunefile written here, (permalink) which seems to be the typical way to set this up.
Interestingly, there’s a larger entrypoint for e2e tests, which spins up a sys command to call yarn and run the test (ref), that’s because the e2e tests seem to be run using js/ts to spin up a harness for tests written in jest (the js testing framework)