- rtshkmr's digital garden/
- Readings/
- Books/
- Real World OCaml: Functional Programming for the Masses/
- Chapter 17: Testing/
Chapter 17: Testing
Table of Contents
- CLOSING NOTE
mental model for the importance of testing when it comes to strong type systems like in OCaml:
types help get more value out of testing effort. There’s less need to test the trivial properties that are auto-enforced by the type system
also, rigidity of the type system – code has a snap-together quality
so a relatively small number of tests can do an outsized amount to ensure that the code behaves as expected.
Ideal characteristics of tests:
easy to write and run
easy to set up, auto-running and preventing regressions easily
easy to update
so that updating tests doesn’t become its own kind of tech debt
fast
allows focus to persist for the overall development process
deterministic
understandable
Inline Tests #
setup:
- let dune known to expect inline tests to show up in the lib
- enable
ppx_inline_testas a preprocessor
the dune testrunner will handle the running of inline tests, the statements don’t get executed otherwise
e.g. basic inline test:
1 2 3 4 5open Base (*just needs the RHS to return true for the inline test to pass. *) let%test "rev" = List.equal Int.equal (List.rev [ 3; 2; 1 ]) [ 1; 2; 3 ]
More readable errors using test_eq #
the rudimentary version doesn’t help us know more about the structure and debug based on the failure message (if the test fails) – we can choose to throw an error for that to happen
uses the
[%test_eq]syntax, which, given a type, generates code to test for equality and throw a meaningful exception if the arguments are unequal – test assertion that is done usingppx_assertsyntax extension (viappx_inline_test ppx_assert)one problem is that using exceptions will show the backtrace, which is usually not that useful here.
| |
Where should tests go? #
it’s legal to put it near the code, within the lib itself but that’s not a good idea:
- readability for code vs tests
- bloat
- testing mindset: encourages internal functionality testing vs exposed behaviour (public API); code written might not be very testable in the first place
there’s rare cases where putting the tests within the lib might be a good idea (e.g. hard to expose functionality)
should put tests into test-only libs
why inline tests can’t go into executables #
- Dune doesn’t support
inline_testsdeclaration in source files that ware directly part of an executable - the test runner would need to instantiate the modules that contain the tests – if there’s toplevel side-effects then you don’t want the test framework to effect them
- solution: break up the program:
- dir with lib that contains the logic of program but NO top-level effects
- dir for exec that links in the library – for code launching
Expect Tests #
- when our goal is to test behaviour, not just properties – so the capturing and making visible of the code’s behaviour
Basic Mechanics #
source specs both:
- code to execute
- expected output
failure is when there’s a diff b/w the two
the expect test workflow is interesting:
write out the behaviour you want to inspect
e.g.
1 2 3 4open! Base open Stdio let%expect_test "trivial" = print_endline "Hello World!" (* this will fail *)run the expect test, which will “fail” but a corrected version will be written by the test runner
patdiff is the standard way the diff is shown
e.g.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14dune runtest patdiff (internal) (exit 1) (cd _build/default && rwo/_build/install/default/bin/patdiff -keep-whitespace -location-style omake -ascii test.ml test.ml.corrected) ------ test.ml ++++++ test.ml.corrected File "test.ml", line 5, characters 0-1: |open! Base |open Stdio | |let%expect_test "trivial" = -| print_endline "Hello World!" +| print_endline "Hello World!"; +| [%expect {| Hello World! |}] [1]
inspect the corrected version, if acceptable, then we can copy it over and let that be the test – henceforth the test won’t fail
example:
1 2 3 4 5 6open Base open Stdio let%expect_test "trivial" = print_endline "Hello World!"; [%expect {| Hello World! |}]multiple expect blocks within a single test works too
1 2 3 4 5 6 7 8open Base open Stdio let%expect_test "multi-block" = print_endline "Hello"; [%expect {| Hello |}]; print_endline "World!"; [%expect {| World! |}]
syntactic sugar aside: using open vs open! #
- A sensible idiom is to always use
open!when opening a library likeBase, so that you don’t have to choose when to use the!, and when not to. - it’s a warning suppression syntax:
- warning because: we may not be directly using the values from
Base– so compiler warns us - we want to keep that “standard lib” open because any new code we write, we want to find
Base’s modules
- warning because: we may not be directly using the values from
syntax: quoted strings {| quoted string |};; #
allows content to be written without the usual escaping required for str literals
basically raw strings in other langs like python
especially useful when writing strings containing text from another language, like HTML. With quoted strings, you can just paste in a snippet of some other source language, and it should work unmodified.
tricky case: what if we want to write
|}inside the quoted string?trick: change the delim for the quoted string by adding an arbitrary identifier:
{xxx|This is how you quote a {|quoted string|}|xxx};; (* - : string = "This is how you quote a {|quoted string|}" *)
What are Expect Tests Good for? #
- contrast it with property testing, property tests are your best bet when you have a clear set of predicates that you want to test, and examples can be naturally generated at random.
- Where expect tests shine is where you want to make visible some aspect of the behavior of your system that’s hard to capture in a predicate.
Exploratory Programming #
when the code path isn’t clear in our heads yet, we can explore it first using expect testing e.g. for web-scraping
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25open Base open Stdio let get_href_hosts soup = Soup.select "a[href]" soup |> Soup.to_list |> List.map ~f:(Soup.R.attribute "href") |> Set.of_list (module String) (*-- using expect tests to demo what the function does on an example page: *) let%expect_test _ = let example_html = {| <html> Some random <b>text</b> with a <a href="http://ocaml.org/base">link</a>. And here's another <a href="http://github.com/ocaml/dune">link</a>. And here is <a>link</a> with no href. </html>|} in let soup = Soup.parse example_html in let hrefs = get_href_hosts soup in print_s [%sexp (hrefs : Set.M(String).t)]improved version:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34let get_href_hosts soup = Soup.select "a[href]" soup |> Soup.to_list |> List.map ~f:(Soup.R.attribute "href") |> List.filter_map ~f:(fun uri -> Uri.host (Uri.of_string uri)) |> Set.of_list (module String) (* dune runtest patdiff (internal) (exit 1) ... ------ test.ml ++++++ test.ml.corrected File "test.ml", line 26, characters 0-1: | |> Set.of_list (module String) | |[@@@part "1"] ;; |let%expect_test _ = | let example_html = {| | <html> | Some random <b>text</b> with a | <a href="http://ocaml.org/base">link</a>. | And here's another | <a href="http://github.com/ocaml/dune">link</a>. | And here is <a>link</a> with no href. | </html>|} | in | let soup = Soup.parse example_html in | let hrefs = get_href_hosts soup in | print_s [%sexp (hrefs : Set.M(String).t)]; -| [%expect {| (http://github.com/ocaml/dune http://ocaml.org/base) |}] +| [%expect {| (github.com ocaml.org) |}] [1] *)using expect tests, we can leave the exploratory cases and examples as tests for our benefit
Visualising Complex Behaviour #
can be used to examine the dynamic behaviour of a system
rate limiter example:
here’s the context and setup for the example, it’s a rate limiter module, with some helper functions to make it easy for us
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44(* -- .mli for a rate limiter module *) open Core type t val create : now:Time_ns.t -> period:Time_ns.Span.t -> rate:int -> t val maybe_consume : t -> now:Time_ns.t -> [ `Consumed | `No_capacity ] (* -- some helper examples for the example: *) open Core let start_time = Time_ns.of_string_with_utc_offset "2021-06-01 7:00:00Z" let limiter () = Rate_limiter.create ~now:start_time ~period:(Time_ns.Span.of_sec 1.) ~rate:2 let consume lim offset = let result = Rate_limiter.maybe_consume lim ~now:(Time_ns.add start_time (Time_ns.Span.of_sec offset)) in printf "%4.2f: %s\n" offset (match result with | `Consumed -> "C" | `No_capacity -> "N") (*---- example bug that the expect tests will reveal -- faulty implementation of the drainage function: *) let rec drain_old_events t = match Queue.peek t.events with | None -> () | Some time -> if Time_ns.Span.( < ) (Time_ns.diff t.now time) t.period then ( ignore (Queue.dequeue_exn t.events : Time_ns.t); drain_old_events t)now for the expect tests on the behaviour:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54let%expect_test _ = let lim = limiter () in let consume offset = consume lim offset in (* Exhaust the rate limit, without advancing the clock. *) for _ = 1 to 3 do consume 0. done; [%expect {| |}]; (* Wait until a half-second has elapsed, try again *) consume 0.5; [%expect {| |}]; (* Wait until a full second has elapsed, try again *) consume 1.; [%expect {| |}] (*-- suppose there's actually a bug, as alluded to above*) (*-- on running the expect test, then applying the promotions to the expect tests ( the "improvements" ), we get THE WRONG VERSION OF THIS EXPECT TEST:*) let%expect_test _ = let lim = limiter () in let consume offset = consume lim offset in (* Exhaust the rate limit, without advancing the clock. *) for _ = 1 to 3 do consume 0. done; [%expect {| 0.00: C 0.00: C 0.00: C |}]; (* Wait until a half-second has elapsed, try again *) consume 0.5; [%expect {| 0.50: C |}]; (* Wait until a full second has elapsed, try again *) consume 1.; [%expect {| 1.00: C |}] (*--- now that we realise the bug has to be fixed, on fixing it, the expect test gets updated to be: *) let%expect_test _ = let lim = limiter () in let consume offset = consume lim offset in (* Exhaust the rate limit, without advancing the clock. *) for _ = 1 to 3 do consume 0. done; [%expect {| 0.00: C 0.00: C 0.00: N |}]; (* Wait until a half-second has elapsed, try again *) consume 0.5; [%expect {| 0.50: N |}]; (* Wait until a full second has elapsed, try again *) consume 1.; [%expect {| 1.00: C |}]an aside about test readability is that the creation of good helpers is what allowed us to get the test code concise.
End-to-end tests #
- test examples so far have been deterministic, nothing special like IO-bound or interacting with system resources.
- expect tests can still be useful when we’ve got multiple processes interacting with each other and using real IO
setup #
- we let the
inline_testsdeclaration have a deps on the binary that we wish to test - we have to write some helper functions to make life easy, here’s an example of signature for it:
open! Core open Async (** Launches the echo server *) val launch : port:int -> uppercase:bool -> Process.t Deferred.t (** Connects to the echo server, returning a reader and writer for communicating with the server. *) val connect : port:int -> (Reader.t * Writer.t) Deferred.t (** Sends data to the server, printing out the result *) val send_data : Reader.t -> Writer.t -> string -> unit Deferred.t (** Kills the echo server, and waits until it exits *) val cleanup : Process.t -> unit Deferred.t
test example #
Using the helper functions, we setup an expect test. Note that we put in the expect annotations where we want to see data – it’s not been filled in yet, we can get that part promoted
| |
this solution is an example of how real I/O makes things messy, fast – because of the lack of determinism.
The fixed timeout is ugly here, we might make an improvement and wrap it around some retry logic:
| |
- we should write code without such real deps – we can still use expect tests for that purpose when we need to though.
How to Write a Good Expect Test #
Some guidelines:
write helper functions
helps with setup and isolates out actual test-logic from harnessing and scaffolding
write custom pretty-printers
should surface the info that we need to see in the test
easier to read, minimises churn when details are irrelevant to the test change
aim for determinism
without needing it to interact with the outside world (generally the source of non-determinism)
if non-determinism really needed, avoid:
- timeouts and stopgaps that would make the tests no longer performant
Property Testing – with Quickcheck #
- property testing: when you’re using simple assertions to check this or that property
- two things neede by a property test:
function with an example input, checking that some property holds
way of generating random examples – the probability distribution matters here – we can let Quickcheck handle it for us.
in the example below, we use quickcheck to include the
min_valuecase which shows unexpected behaviour from our POV because it’s an edge case that needs to be explicitly considered.this is because of 2s complement, “the negation of min_value is equal to itself”
example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42open Base (* simple test check where the choice of inputs follow a rudimentary distribution (simple, uniform distribution) *) let%test_unit "negation flips the sign" = for _ = 0 to 100_000 do let x = Random.int_incl Int.min_value Int.max_value in [%test_eq: Sign.t] (Int.sign (Int.neg x)) (Sign.flip (Int.sign x)) done (* -- using Quickcheck to prioritise *) open Core let%test_unit "negation flips the sign" = Quickcheck.test ~sexp_of:[%sexp_of: int] (Int.gen_incl Int.min_value Int.max_value) ~f:(fun x -> [%test_eq: Sign.t] (Int.sign (Int.neg x)) (Sign.flip (Int.sign x))) (* which can uncover the following edge case -- this is because of 2s complement, "the negation of min_value is equal to itself" -- some common behaviour: *) (* dune runtest File "test.ml", line 3, characters 0-244: negation flips the sign threw ("Base_quickcheck.Test.run: test failed" (input -4611686018427387904) (error ((duniverse/ppx_assert/runtime-lib/runtime.ml.E "comparison failed" (Neg vs Pos (Loc test.ml:7:19))) "Raised at Ppx_assert_lib__Runtime.failwith in file \"duniverse/ppx_assert/runtime-lib/runtime.ml\", line 28, characters 28-53\ \nCalled from Base__Or_error.try_with in file \"duniverse/base/src/or_error.ml\", line 76, characters 9-15\ \n"))). Raised at Base__Exn.protectx in file "duniverse/base/src/exn.ml", line 71, characters 4-114 Called from Ppx_inline_test_lib__Runtime.time_and_reset_random_seeds in file "duniverse/ppx_inline_test/runtime-lib/runtime.ml", line 356, characters 15-52 Called from Ppx_inline_test_lib__Runtime.test in file "duniverse/ppx_inline_test/runtime-lib/runtime.ml", line 444, characters 52-83 FAILED 1 / 1 tests *)Quickcheck’s decision to put much larger weight on special cases is what allowed us to discover this unexpected behavior. Note that in this case, it’s not really a bug that we’ve uncovered, it’s just that the property that we thought would hold can’t in practice. But either way, Quickcheck helped us understand the behavior of our code better.
Handling Complex Types #
- we typically need more than just the atomic types to be generated for testing
- Quickcheck has combinators that serve this need:
- pair generator examplethere’s a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16open Core let gen_int_list_pair = let int_list_gen = List.gen_non_empty (Int.gen_incl Int.min_value Int.max_value) in Quickcheck.Generator.both int_list_gen int_list_gen (* -- both is useful for creating a generator for pairs from two generators of constituent types.*) let%test_unit "List.rev_append is List.append of List.rev" = Quickcheck.test ~sexp_of:[%sexp_of: int list * int list] gen_int_list_pair ~f:(fun (l1, l2) -> [%test_eq: int list] (List.rev_append l1 l2) (List.append (List.rev l1) l2))ppxto simplify this process:1 2 3 4 5 6 7 8 9 10open Core let%test_unit "List.rev_append is List.append of List.rev" = Quickcheck.test ~sexp_of:[%sexp_of: int list * int list] [%quickcheck.generator: int list * int list] ~f:(fun (l1, l2) -> [%test_eq: int list] (List.rev_append l1 l2) (List.append (List.rev l1) l2)) - variants example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17type shape = | Circle of { radius: float } | Rect of { height: float; width: float } | Poly of (float * float) list [@@deriving quickcheck];; type shape = | Circle of { radius: float } [@quickcheck.weight 0.5] | Rect of { height: float; width: float } | Poly of (float * float) list [@@deriving quickcheck];; (* -- understanding the weights: Note that the default weight on each case is 1, so now Circle will be generated with probability 0.5 / 2.5 or 0.2, instead of the 1/3rd probability that it would have natively. *)
- pair generator example
More control with Let-syntax #
quickcheck generators form a monad >> we can exploit this if the ppx annotations don’t work for us
In combination with
letsyntax, the generator monad gives us a convenient way to specify generators for custom types.Throughout this function we’re making choices about the probability distribution.
let gen_shape = let open Quickcheck.Generator.Let_syntax in let module G = Base_quickcheck.Generator in let circle = let%map radius = G.float_positive_or_zero in Circle { radius } in let rect = let%bind height = G.float_positive_or_zero in let%map width = G.float_inclusive height Float.infinity in Rect { height; width } in let poly = let%map points = List.gen_non_empty (G.both G.float_positive_or_zero G.float_positive_or_zero) in Poly points in G.union [ circle; rect; poly ];;
Other Testing Tools #
Other Tools to do (Mostly) the same Things #
- Alcotest: different system for registering and test-running
- there’s a bunch of others, see the section from the online textbook here
Fuzzing #
- beyond just (traditional) random fuzzing that looks at mutants, we can do instrumentation-guided fuzzing
- instrumenting the program, and then using that instrumentation to guide the randomization in the direction of more code coverage.
Reference Repos #
ocaml-lsp #
This is a great reference because they have a comprehensive test-suite with sound setup and utils scaffolds that makes it easy to read and understand tests. They also align with jane street when it comes to the ppx usage and such.
observations #
the use of util modules within test files.
most of them have util modules for setup and teardown helpers.:
the dune setup configs look a little complex because of a mix of old and new test suites, there are areas that are version-dependent as well (see this example where there’s a version pegging).
There’s also the handling of transitive deps that get handled – see example here
The e2e-new tests have their dunefile written here, (permalink) which seems to be the typical way to set this up.
Interestingly, there’s a larger entrypoint for e2e tests, which spins up a sys command to call yarn and run the test (ref), that’s because the e2e tests seem to be run using js/ts to spin up a harness for tests written in jest (the js testing framework)