Chapter 4: Files, Modules, and Programs

Table of Contents

Files are the unit of organisation for a module. This section is a bunch of OCaml specific notes on writing OCaml programs, working with modules and module signatures.

Single File Programs
Multi file Programs and Modules
Signatures and Abstract Types
Common Module Compilation Errors
Designing with Modules
- Rarely expose concrete types, stick to abstract types
- Design for the Call Site

Single File Programs #

here’s the demo code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
open Base
open Stdio

let build_counts () =
  In_channel.fold_lines In_channel.stdin ~init:[] ~f:(fun counts line ->
      let count =
        match List.Assoc.find ~equal:String.equal counts line with
        | None -> 0
        | Some x -> x
      in
      List.Assoc.add ~equal:String.equal counts line (count + 1))

let () =
  build_counts ()
  |> List.sort ~compare:(fun (_, x) (_, y) -> Int.descending x y)
  |> (fun l -> List.take l 10)
  |> List.iter ~f:(fun (line, count) -> printf "%3d: %s\n" count line)

(* NOTE: the let () = ... uses the unit and this autmatically does a type check with the return on the RHS which should also be unit*)

For learning, we’re directly compiling a single file.

This needs explicit linking for imported modules, this is done by using ocamlfind together with ocamlopt for the linking (which is asked by -linkpkg)

ocamlfind ocamlopt -linkpkg -package base -package stdio freq.ml -o freq

We shall KIV first until we find a better way to

Multi file Programs and Modules #

Files are the unit of association for modules. Should be seen as a collection of definitions stored within a namespace. Modules are always titlecased regardless the casing of the filename where they’re defined.

The buildsystem will figure out the dependencies accordingly.

Signatures and Abstract Types #

We should always depend on interfaces instead of direct code implementations, interface segregation is always great.

OCaml uses interface/signature/module type interchangeably. Similar to C, we can have interface files (.mli) where the interfaces are defined for them to be implemented in corresponding .ml files.

The concrete datatypes that a module supports may be considered an implementation detail, therefore we might wish to define some abstract data types. That can be done within the interface files as well.

Interfaces are a good place to include docstrings, which can be naturally picked up by odoc

`val` declarations #

val declarations form the syntax for specifying values in a signature: val <identifier> : <type>

Abstract Types in Signatures #

This part is about declaring interfaces (signatures) in a way that the concreteness of that implementation is abstract to the module consumers.

An entry like of an abstract type t within the interface file (.mli) means: “This module defines a type t, but I am not telling you how it is implemented.”

(* interface .mli *)
type t

(* implementation file needs to provide the actual implementation.  *)
type t = int Map.M(String).t

The compiler uses both files:

for abstraction boundary, determining what’s visible to clients: use the .mli
for how to build and use it internally, .ml

When compiled:

The .mli type makes the type abstract to callers.
Describes what a module exposes — the public API. Abstract type definitions hide representation details.
The .ml type becomes a manifest implementation, checked to be consistent with the .mli.
Provides the actual definitions and values — how the module’s promises are fulfilled.

Disambiguating “abstract” vs “polymorphic” #

MISCONCEPTION:

This was somewhat a misconception of mine. They are similar but not the same.

Read this section.

Concrete Types in Signatures #

We may wish to define concrete types (typically variant types) in our interfaces. Concreteness here means that the clients of that module have visibility to the structure of that type.

Whether a given type should be abstract or concrete is important and depends on context:

Abstract types give you more control over how values are created and accessed, and make it easier to enforce invariants beyond what is enforced by the type itself
concrete types let you expose more detail and structure to client code in a lightweight way

NOTE: types and values have distinct namespaces, so we may have the same name seen in both type and function definitions. Here’s an example:

In OCaml, if you define an abstract type in the interface (.mli), you must also concretise it in the implementation (.ml).

If you define a concrete type in the interface (.mli), you must also define the same concrete type in the implementation (.ml). It may look redundant.

(* INTERFACE: *)
(** Represents the median computed from a set of strings. In the case
    where there is an even number of choices, the one before and after
    the median is returned. *)
(* this is a concrete type, it's visible to clients of this module. *)
type median = Median of string | Before_and_after of string * string

val median : t -> median

(* IMPLEMENTATION *)
(* we duplicate the concrete definition of the median type. *)
type median = Median of string | Before_and_after of string * string

let median t =
  let sorted_strings =
    List.sort (Map.to_alist t) ~compare:(fun (_, x) (_, y) ->
        Int.descending x y )
  in
  let len = List.length sorted_strings in
  if len = 0 then failwith "median: empty frequency count" ;
  let nth n = fst (List.nth_exn sorted_strings n) in
  if len % 2 = 1 then Median (nth (len / 2))
  else Before_and_after (nth ((len / 2) - 1), nth (len / 2))

Nested Modules #

Files are a unit of association for modules but we may want to have sub-modules within a module to have clearer separation of overlapping but different types. We may nest modules within other modules.

We want the ability to define type identifiers that are distinct but may have similar underlying implementations. I think it’s a bad analogy but it’s similar to subclassing an abstract/virtual class from the OOP world – only in that it provides separation of types with potentially shared underlying representation, but it’s better framed as type abstraction and information hiding, not inheritance.

This is because confusing different kinds of identifiers is a very real source of bugs so instead of using bare types (e.g. strings) we should use concrete types that we define (Username, Hostname).

Example given is that of usernames and hostnames within sessions.

open Base
module Time = Core.Time

(** This ID type is a base abstract type. *)
module type ID = sig
  type t

  val of_string : string -> t
  val to_string : t -> string
  val ( = ) : t -> t -> bool
end

module String_id = struct
  type t = string

  let of_string x = x
  let to_string x = x
  let ( = ) = String.( = )
end

module Username : ID = String_id
module Hostname : ID = String_id

type session_info =
  { user : Username.t
  ; host : Hostname.t
  ; when_started : Time.t
  }

(* this will bug out because user and host are two distinct types and the comparison here is faulty *)
(* let sessions_have_same_user s1 s2 = Username.( = ) s1.user s2.host *)
let sessions_have_same_user s1 s2 = Username.( = ) s1.user s2.user

Opening Modules #

Opening gives direct access to the namespace, but may shadow existing names as well.

Some general rules of thumb on this:

open rarely
Opening a module is basically a trade-off between terseness and explicitness—the more modules you open, the fewer module qualifications you need, and the harder it is to look at an identifier and figure out where it comes from.
There are some modules that were designed to be opened: like Base itself, or Option.Monad_infix or Float.O within Base.

Use local opens to limit the scope of the opened module

There are two syntactic approaches to this:

normal – let binding the module namespace

      let average x y =
        let open Int64 in
        (x + y) / of_int 2;;

lightweight – better for small expressions

      let average x y =
        Int64.((x + y) / of_int 2);;

Use module shortcuts

Similar to module aliases like in Elixir. We should do this only to a small, local scope. Doing it at top-level is a mistake.

   let print_median m =
     let module C = Counter in
     match m with
     | C.Median string -> printf "True median:\n   %s\n" string
     | C.Before_and_after (before, after) ->
       printf "Before and after median:\n   %s\n   %s\n" before after

Including Modules #

Opening a module affects the environment used to search for identifiers, including a module is a way of adding new identifiers to a module proper.

The difference between include and open is that we’ve done more than change how identifiers are searched for: we’ve changed what’s in the module. Directly using open won’t work because that chaining up of namespace won’t be done.

(* consider this Interval module *)
module Interval = struct
  type t = | Interval of int * int
           | Empty

  let create low high =
    if high < low then Empty else Interval (low,high)
end;;

(* we can create a new, extended version of Interval by including its namespace: *)
module Extended_interval = struct
  include Interval

  let contains t x =
    match t with
    | Empty -> false
    | Interval (low,high) -> x >= low && x <= high
end;;

(* this is what the module signature looks like:

 module Extended_interval :
  sig
    type t = Interval.t = Interval of int * int | Empty
    val create : int -> int -> t
    val contains : t -> int -> bool
  end
 *)

include works for both signatures (on interface files) as well as code, so this is one way we can extend the functionality of modules that we consume.

(******************************)
(*    the implementation:     *)
(* ========================== *)
(******************************)
open Base

(* The full contents of the option module *)
include Option

(* The new function we're going to add *)
let apply f_opt x =
  match f_opt with
  | None -> None
  | Some f -> Some (f x)


(******************)
(* the interface  *)
(* ============== *)
(******************)
open Base

(* Include the interface of the option module from Base *)
include module type of Option

(* Signature of function we're adding *)
val apply : ('a -> 'b) t -> 'a -> 'b t

The implementation is where shadowing happens \(\implies\) the order of declaration matters here (in the ml file). Order doesn’t matter in the interface file (.mli).

Common Definitions #

Similar to barrel files, we may have an import.ml for common imports. They may hold things like intentional name overrides e.g. using a custom Ext_option in place of Option when we use the name Option


(* within import.ml *)
module Option = Ext_option

(* within our module file *)
open Base
open Import (*the common definitions imported*)

let lookup_and_apply map key x = Option.apply (Map.find map key) x

Common Module Compilation Errors #

Here’s the common sources of compilation errors:

Type Mismatches – the simplest types
The compiler will complain about this if the interface and implementation files differ in their types.
Type definitions missing
This is actually more of the implementation is missing: when we defined it in the interface but don’t have a corresponding implementation for it.
TERMINOLOGY: the interface declaration is “type spec” and the implementation declaration is the “type definition”.

Type Definition Order Mismatches

For abstract (variant) types that we define, the order matters and should match between the interface and the implementation file. The order of the declaration of variants matters to the OCaml compiler.

   (* I1: v1 of interface -- this will match with implementation below:*)
   (** Represents the median computed from a set of strings. In the case
       where there is an even number of choices, the one before and after
       the median is returned. *)
   type median = Median of string | Before_and_after of string * string

   val median : t -> median

   (* I2: v2 of interface -- this will not match with implementation below:*)
   (** Represents the median computed from a set of strings. In the case
       where there is an even number of choices, the one before and after
       the median is returned. *)
   type median =
     | Before_and_after of string * string
     | Median of string

   val median : t -> median

   (* Implemenation (within .ml file) *)
   type median = Median of string | Before_and_after of string * string

   let median t =
     let sorted_strings =
       List.sort (Map.to_alist t) ~compare:(fun (_, x) (_, y) ->
           Int.descending x y )
     in
     let len = List.length sorted_strings in
     if len = 0 then failwith "median: empty frequency count" ;
     let nth n = fst (List.nth_exn sorted_strings n) in
     if len % 2 = 1 then Median (nth (len / 2))
     else Before_and_after (nth ((len / 2) - 1), nth (len / 2))

Cyclic dependencies
Technically, recursive modules (that have cyclic deps) are possible but for now assume it’s illegal.
What’s forbidden:
1. self-referential: module referencing its own name: e.g. let singleton l = Counter.touch Counter.empty within counter.ml
2. transitive references
  the compiler will tell you where the cyclic dependency is from its trace.

Designing with Modules #

Modules are essential to OCaml programs, some design tips:

Rarely expose concrete types, stick to abstract types #

Most of the time, abstraction is the right choice, for two reasons:

it enhances the flexibility of your design, and
benefit: we’re free to change implementation with minimal blast radius if interface consumers depend on abstract types
it makes it possible to enforce invariants on the use of your module
problem: If your types are exposed, then users of the module can create new instances of that type (or if mutable, modify existing instances) in any way allowed by the underlying type. That may violate a desired invariant i.e., a property about your type that is always supposed to be true.

When will concrete types make sense?

when there’s a lot of value in pattern matching for the concrete types and when the invariants that you care about are already enforced by the data type itself.

Design for the Call Site #

Beyond ease of understanding, you want the call to be as obvious as possible for someone who is reading it at the call site. This reduces the need to jump to the interface declarations to get more context.

Some ways of improving the readability of client code:

use labelled args
good names for functions, variant tags, record fields
naming RULE OF THUMBs:
- A good rule of thumb is that names that have a small scope should be short, whereas names that have a large scope, like the name of a function in a module interface, should be longer and more descriptive.
- Another useful rule of thumb is that more rarely used names should be longer and more explicit, since the cost of verbosity goes down and the benefit of explicitness goes up the less often a name is used.
uniform interfaces
Make the different interfaces in your codebase follow similar patterns to have some level of predictability.
To borrow guidelines from the common modules:
1. A module for (almost) every type. You should mint a module for almost every type in your program, and the primary type of a given module should be called t.
2. Put t first. If you have a module M whose primary type is M.t, the functions in M that take a value of type M.t should take it as their first argument.
3. Mark the exception throwable functions with _exn
  Functions that routinely throw an exception should end in _exn. Otherwise, errors should be signaled by returning an option or an Or_error.t KIV chapter 7 on error handling.
4. Some type signatures for specific functions should be uniform
  signature for map is always essentially the same, no matter what the underlying type it is applied to. This kind of function-by-function API uniformity is achieved through the use of signature includes, which allow for different modules to share components of their interface.
Design interfaces before writing implemenation
This is just classic words of wisdom.
Types and signatures provide a lightweight tool for constructing a skeleton of your design in a way that helps clarify your goals and intent, before you spend a lot of time and effort fleshing it out.