- rtshkmr's digital garden/
- Posts/
- Series/
- π« Songs of the OCaml Compiler/
- π« Songs of the OCaml Compiler Part II: A Design Tour/
π« Songs of the OCaml Compiler Part II: A Design Tour
Table of Contents
This is Part II of a series on learning OCaml by writing a Paxos simulator. We build on Part I’s abstract grammar and witness how subsystems take shape from it — guided, as it turns out, by the OCaml compiler Iβm sure the magical source within the compiler is the underlying Hindley-Milner Type system which expects the programmer to exercise clearer type-discipline in exchange for superior inference capabilities that feel like an extension to the programmerβs own mind. Iβve put up a few words about what that experience has been like here <TODO: add reference to part 3 where we reflect on the experience of working with the compiler> itself.
Written primarily for a technical audience A little programming experience goes a long way. I wonβt focus heavily on syntax β OCamlβs consistency makes it readable. For code-examples, I describe intent wherever possible so that unfamiliar syntax never gets in the way. Part II gets more involved in places where the type system is the point, but non-technical readers can gloss over those sections without losing the thread. , this series can be read in different orders – see the cover post for suggested reading paths. By prioritising intuition over a pedantic focus on syntax, I’m hoping that a more general audience may find this useful There are some early signs of usefulness (see Part IIIβs) <TODO link to appropriate part in Part 3 for this>. Please reach out for comments <TODO link to part 3 for RFC> as well.

From Forces to Forms: the Emergence of Subsystems #
Part I ended before any code was written so that when the first let statement appeared, the system’s shape was already half-decided by deliberating on the forces at play – this was by design. What follows is an account of the other half: the three places where the OCaml compiler itself influenced my design, and what I learned from listening to it.
A Quick Primer on the Paxos Protocol #
Here’s a short primer on the algorithm, using the same narrative backdrop.
Paxos is a set of rules that helps a scattered group agree on one thing even when the “world” around them is unreliable. Going back to our caravans in the desert: formations drift apart, messengers get lost, storms blow in, camels faint, and missives sometimes arrive long after they matter. Yet the group still needs to answer a simple question: “Which campsite are we heading towards?” Paxos gives them a way to do this without a chief, without a central authority, and without assuming everyone is present at the same time. Every formation is equal. The only thing they can do is send messages – and hope they arrive – while following a small set of disciplined steps.
The heart of the algorithm is surprisingly modest:
If a majority follow the same rules when responding to new information, the group will never contradict itself β no matter how unreliable the desert becomes.
Hereβs how it plays out:
| Step | ||
|---|---|---|
| 1 | Someone wants to make a proposal | A caravan β say, Aghilas β wants to suggest a campsite. Before naming it, it asks the others: “If I propose something, will you at least consider it?β. Paxos requires every proposer to begin by asking for permission. |
| 2 | A majority must acknowledge | Not all caravans need to reply; some may be stranded or tending to a collapsed messenger. As long as more than half respond (i.e. a quorum is reached), the journey can continue. (Only majority agreement matters – though a majority declining can stall progress.) |
| 3 | They reveal any older, more serious proposals | If a caravan has heard of a previous suggestion the group nearly agreed on, it reports it. The new proposer must honour that older information. This safeguards consistency. |
| 4 | The actual proposal is made | Only now does Aghilas say, “Given everything Iβve learned, here is the campsite I propose.” |
| 5 | A majority must accept | Not everyone needs to hear it immediately β sandstorms still exist β but once a majority have written the same decision onto their parchment, the group can never choose differently in the future. |
Thatβs the beauty of Paxos: it guarantees agreement without guaranteeing that everyone is present, awake, or reachable. Messages may be late, out of order, or duplicated. A caravan may wake up after days of silence. Two caravans may propose ideas at once. It doesnβt matter as long as everyone follows the rules, the group will not split into contradictory futures.
This is why Paxos remains a classic in distributed systems. It is deliberately conservative, distrustful of the worldβs reliability, and yet manages to let a group converge on one shared truth. Alternatives like the Raft Consensus Algorithm are often easier to implement, but Paxos remains valuable because of the flexibility it affords across different use-cases. To get more information, here are some notes (TODO: permalink) from the repo about the algorithm.
Before understanding the Simulator, here’s a happy-case run of the Protocol.
WaitingForPromises, ProposerAccepting,…) might make more sense after the GADT pressure section when we see the State-machines, it’s possible to follow the message flow while glossing over these foreign things for now. The sequence diagram is an evolution of the steps we’ve seen in the Tuareg-themed table before so it should be relatable.(initiates as Proposer) participant B as Node B
(Acceptor + Learner) participant C as Node C
(Acceptor + Learner) note over A, C: Each node maintains all 3 sub-states (Proposer, Acceptor, Learner) independently.
For each sub-state, all state transitions persist a snapshot to Storage. note over A: Initial Proposer state: Idle %% note over A: Initial Proposer state: Idle
Every state transition persists a snapshot to Storage. %% βββββββββββββββββββββββββββββββββββββββ %% PHASE 1 β Permission / Prepare %% βββββββββββββββββββββββββββββββββββββββ rect rgba(100, 150, 255, 0.08) note left of A: Phase 1: Prepare note over A: Node A changes its Proposer state
on sending the first prepare msg from
Idle β Preparing β WaitingForPromises par Broadcast PermissionRequest A ->>+ B: PermissionRequest(proposal, value) and A ->>+ C: PermissionRequest(proposal, value) and Self-loopback A ->>+ A: PermissionRequest(proposal, value) note right of A: Loopback keeps actor semantics
consistent & clean end note over A, C: Each Acceptor independently checks permissibility:
is this proposal higher than any prior promise? par Responses arrive asynchronously B -->>- A: PermissionGranted(last_accepted_b) and C -->>- A: PermissionGranted(last_accepted_c) and A -->>- A: PermissionGranted(last_accepted_a) end note over A: Quorum reached (at least 2 of 3 acceptors).
Pick value: from highest-proposal-id promise,
or own value if all promises are None. end %% βββββββββββββββββββββββββββββββββββββββ %% PHASE 2 β Accept %% βββββββββββββββββββββββββββββββββββββββ rect rgba(100, 200, 100, 0.08) note left of A: Phase 2: Accept note over A: On quorum reached from Phase 1,
Node A changes state again
WaitingForPromises β ProposerAccepting par Broadcast Suggestion A ->>+ B: Suggestion(proposal, chosen_value) and A ->>+ C: Suggestion(proposal, chosen_value) and Self-loopback A ->>+ A: Suggestion(proposal, chosen_value) end note over A, C: Each Acceptor checks: proposal β₯ promised?
Accept and record value. par Responses arrive asynchronously B -->>- A: Accepted(proposal, value) and C -->>- A: Accepted(proposal, value) and A -->>- A: Accepted(proposal, value) end note over A: Quorum of acceptances reached. end %% βββββββββββββββββββββββββββββββββββββββ %% RESOLUTION β Decided %% βββββββββββββββββββββββββββββββββββββββ rect rgba(255, 200, 80, 0.08) note left of A: Resolution
to be broadcasted note over A: ProposerAccepting β Decided(value) par Broadcast Decided A ->> B: Decided(value) and A ->> C: Decided(value) and A ->> A: Decided(value) end note over A,C: All nodes learn the decided value.
Consensus achieved for this round. end
4 Layers of Responsibility #
These forces encouraged me to see the implementation as having the following concrete layers of responsibility:
| Subsystem | Tuareg Parallel | What it owns |
|---|---|---|
| Node Logic | The caravan formations | Paxos protocol state machines, per-role sub-states, message-handling logic ( the most complex layer ) |
| Messaging Layer | The messengers | Message routing and delivery, topic-based pub-sub, network partition simulation |
| Time & Scheduling | The desert sun | Global event queue, discrete logical tick loop, deterministic event ordering |
| Simulation Harness | The chronicler | Scenario definition and injection, terminal UI, structured logging, replayable traces |
These aren’t equal in weight: the Node Logic layer carries the Paxos protocol itself and is where most of the interesting type-system work lives. The other three are the infrastructure harnesses that make the system observable and controllable. The boundaries emerged from the grammar constraints, not from OCaml conventions – so this is just my way of reacting to the forces at play here.
The Big Picture Map #
The architecture map below outlines my simulator’s design – it’s my way of consolidating structures and some rules of thumb in a diagram
Clicking on various parts of the diagram will lead you directly to the code.
. I recommend reading it top-to-bottom in two passes: first, ignoring the details and trying to understand the layers that the diagram shows and then zooming in layer-by-layer. Each layer zooms closer to the Node and the last section shows how to make a Node.
To prime us for the design pressure stories that come after, I'll draw attention
the OCaml specifics (e.g. functors like Make_node) will be explained in each section; for now, treat this as a signpost of where weβre headed and the following diagram as the overall map of what was implemented
to a couple of things:
TODO: [ polish ] with the new phrasing and flair applied to the 3 design pressures, consider if there’s better ways to prime / sign-post them from within this table.
TODO: [ polish ] the description of the pressures likely needs to be tightened
| Pressure | Where | What |
|---|---|---|
| How a type-mismatch taught me what OCaml’s module system actually guarantees | Primitives layer: the three functor arrows | Make_node, Make_node_state, Make_storage compose the system bottom-up by injecting Value and Bus as type parameters |
| How the type system was asked to enforce state-machine invariants, and where it fell short | Primitives layer: Node_state box and its role_selector annotation | Each role carries its own precise sub-state type; transition_role_state is the only write path |
| How naming conventions and a single chokepoint function drew the boundary between immutable reasoning and mutable storage | “What a Node Contains” layer: the State pentagon | State is owned entirely by the node; no other layer reaches into it |
This is the form that the grammar ended up taking. The map feels orderly from here: there are clean layers, named boundaries and the functor arrows show how we compose our structs. This orderliness took some iterating. What the map doesn’t show are the moments where the compiler refused to accept what I had written. Understanding why it refused changed how I thought about the design.
Design Pressures #
These moments are the focus of the rest of this post. We start small by describing some newbie snags these are just common misconceptions / missteps since weβre learning something fresh β Iβd like to think of it as the tax we pay as newbies faced when wiring up the subsystems using functor-based dependency injection. It will show how the compiler’s feedback and cautions helped identify some fundamental blind-spots in my understanding of the type system. We then progress to how state-machine invariants had to be managed – and how, when I reached for GADTs to codify them, the compiler pushed back and tempered my ambition in certain cases. Finally, we see the pattern of keeping clean mutation-boundaries as an example of how the collaborative relationship between the programmer and the compiler leads to thinking in its grammar anyway – even if the compiler has nothing to say.
These moments started off as noise – type errors that I couldn’t immediately explain and constraints that felt arbitrary – and they gradually acquired a melody. Three movements, three design pressures.
Functor-based Dependency Injection: Decoupling Subsystems at Compile-Time #
We’ve seen that the combined node (center of the architecture map) has its own State, is bound to a Bus for communication purposes and relies on Storage for having the ability to recover from failures. Central to these is the idea of a “value” that the system achieves consensus on, which all these modules need to play nice with. So we define it as an abstract type behind a module signature (Value.S) — any module that satisfies Value.S can be swapped in without us having to rewrite anything. A byte-string today, perhaps a MIDI note tomorrow to make nodes that literally sing
not entirely a joke β the consensus logic is completely agnostic to whatβs being agreed upon. A leadership lease, a configuration delta, an audio sample: the protocol doesnβt care.
, or more practically, a heterogeneous product type — consensus value AND a leadership-lease so the cluster knows which node to follow.
This is a classic case of parametric polymorphism
Parametric polymorphism means: write the logic once, and it works for any type β the parameter fills in the blank. A list that sorts any comparable element, a bus that carries any message type, a consensus protocol that works for any agreed-upon value. The word βparametricβ here signals that the abstraction is over a type parameter, not over specific values.
that we want: the logic for achieving consensus has to be abstracted over the type of payload that we’re achieving consensus over. We just need the same V.t (value type) to flow through the different modules consistently. This seems straightforward: the ML
ML as in Meta-Language. Itβs a whole family of programming languages out of which OCaml is a member
-world commonly uses parameterised modules as a building block for structure and composition of swappable-logic. This fits our need for injecting this payload dependency and it’s how we end up with the Make_node functor.
What’s a functor?
A functor is a module that takes another module (or modules) as a parameter. Think of it as a function, but at the module level – instead of computing values, it computes whole modules.
This works because OCaml’s module system supports higher-order functors – functors that take other modules as arguments. This is distinct from first-class modules (a separate OCaml feature that contrasts functors), which allow modules to be packed and passed around as ordinary values at runtime.
In the example below, Make_node(V)(Bus) takes a Value module and a Bus module and produces a fully wired Node module whose types are consistent with both.
The types don’t need to be threaded manually – the compiler enforces the connections for us at the point of application.
Why care about the “compile-time” point?
Consider how a Python equivalent of Make_node might look:
# Python: injection happens at runtime -- V and Bus are passed as objects
class Node:
def __init__(self, value_module, bus_module):
self.value = value_module
self.bus = bus_module
def propose(self, v):
serialized = self.value.serialize(v) # runtime dispatch: dict lookup
self.bus.publish(serialized) # runtime dispatch: dict lookup
This works well — and Python’s type system (Protocol, mypy) can add static
checks on top to give us some peace of mind. Notice what happens at runtime though: every call to self.value.serialize and self.bus.publish involves a dictionary lookup
Thereβs an adage that everything in Python is a dictionary. Itβs hyperbolic but also: PEP 252 established dict semantics for objects, PEP 232 extended it to functions, and PEP 412 optimized the memory layout for instances β the whole python object model is built on dict-based attribute lookup.
to resolve which concrete method to call. The abstraction is real, it just has a runtime-cost.
Functor application in OCaml resolves all of this before a single
instruction executes. Once the compiler sees Make_node(Value_string)(Event_bus),
it fully specialises the module: V.sexp_of_t is directly Value_string.sexp_of_t,
Bus.publish_broadcast is directly Event_bus.publish_broadcast. There are no vtables, no
dictionary lookups, no boxing. The generated code is as efficient as if we
had written a concrete Node module with string values hard-coded — but
we still get the full abstraction in source.
Interestingly, OCaml does support runtime DI via first-class modules — we can pack a
module as (module M : S) and pass it as an ordinary value, store it in a
list, or resolve it from a dispatch table at runtime. (TODO: permalink) The simulator’s UI layer does exactly this: Logger.resolve_ui selects a (module Ui.S) implementation
from a flag at startup. The trade-off is that we give up the zero-cost abstraction
guarantee at the packing boundary.
Functor-based DI is the right tool when the dependency is fixed at composition time; first-class modules are the right tool when it genuinely needs to vary at runtime.
| |
V.t flows through Node, State, Bus and Storage consistently ref lib/node.ml TODO: permalinkThe Simulator calls Make_node, and that’s also where the Bus module type is concretised. In the snippet below, the partition type shows us the actual type of the bus here.
| |
The bus creation signature (B.create) makes the polymorphism explicit:
Things look right. V.t appears to flow correctly: Value_string.Value_string.t is string under the hood and the B.create signature is polymorphic in 'a – the bus doesn’t care what it carries, as long as we supply a serialiser for logging the payload.
So we should be also be able to implement the create function like so \(\ldots\) right?
| |
Newbie Snag: Understanding Nominal Typing #
File "lib/simulator.ml", line 189, characters 52-55:
189 | { id; member_node_ids = Set.empty (module Int); bus }
^^^
Error: The value bus has type string Message.t B.t
but an expression was expected of type V.t Message.t B.t
Type string is not compatible with type V.t
Had 1 error, waiting for filesystem changes...
This is the first proper newbie snag I faced. Without the fundamentals, this is extremely annoying because it’s hard to connect the dots. The newbie thinking would likely be that technically, at the concrete level we should have already satisfied the type of payload_to_string, it should have work because Value_string.Value_string.t is string. The implicit assumption being that surely OCaml can handle structural typing (e.g. like in TypeScript), not nominal typing?
If we listen carefully, the error message is already guiding us: type string Message.t B.t and type V.t Message.t B.t are not the same i.e. the two types don’t unify.
V.t declared behind Value.S is nominally distinct from string — even when the underlying representation is identical.The fix is simple, we just have to respect the module boundaries and use V’s serialiser (V.sexp_of_t):
let bus =
B.create
- ~payload_to_string:(fun msg ->
- Sexplib.Sexp.to_string (Message.sexp_of_t Base.String.sexp_of_t msg))
+ ~payload_to_string:(V.sexp_of_t |> Message.to_string)
()
in
This was an unfortunate time-sink. The optimist in me would be appreciative of the fact that the OCaml compiler is strict and principled – it encourages us to respect interface boundaries. So, it becomes a matter of perspective whether the compiler is admonishing us or singing a cautionary note the compiler as critic before it becomes collaborator β this is the first bar of that melody .
The second snag was in the same neighbourhood — the module signature boundary — but more architectural in its consequences.
Newbie Snag: Destructive Substitution #
During an earlier refactor, I had extracted Acceptor_record and Node_state as
separate functors (before they were eventually collapsed back into Make_node). The intent felt
clean: Acceptor_record depends on Value, so it should be its own
Make_acceptor_record(V); Node_state depends on Acceptor_record, so
Make_node_state(A : Acceptor_record.S) should naturally reach through to A.V
for the value type. To keep the output signature tidy, I used destructive substitution (:=) when
writing the Acceptor_record signature:
module Make_acceptor_record (V : Value.S) :
S with module V := V = struct
(* ... *)
end
and tried to access the V within the concrete Acceptor_record, A like so:
module Make_node_state (A : Acceptor_record.S) : S = struct
module V = A.V (* βοΈ compiler error: A has no member V *)
(*...*)
end
Acceptor_record is no longer its own thing now, this pressure describes a historical version of how things used to be.The compiler outright refused, saying that “A had no V within it”. From my newbie-lens, this made no sense: V was plainly declared within Acceptor_record.
Turns out, the diagnosis is this: := works by erasing. with module V := V removes V from the output signature entirely. The path A.V no longer exists because there is no V to reach through to. Hence, the term “destructive”.
:= (Destructive Substitution) erases the path. = (Sharing-Constraint) constrains it. If I needed that path, I was to use constraint-sharing
Pins a module component to a specific identity without removing its name from the signature.
.The fix was to switch to a plain sharing constraint:
- S with module V := V = struct
+ S with module V = V = struct
A.V is now accessible downstream because V remains in the output signature (Acceptor_record.S), its identity is pinned to the functor argument, A.
When to use which? Destructive substitution (:=) vs Constraint-sharing (=)
Destructive substitution (:=) is the right tool when we want to hide an
internal type from callers — we’re saying “this type is resolved, callers
don’t need to name it.” Sharing constraints (=) are the right tool when
downstream code needs to reach through the module to its contents — we’re
saying “this type is resolved AND still accessible by name.”
The confusion as newbies makes sense: both forms are about type unification
Type unification is the process of making two types equal by finding a consistent substitution. In OCamlβs module system, constraints like with module V = V unify module components across boundaries β the key question is whether that unified type remains nameable (accessible as a path like A.V) or is erased from the signature entirely (as with :).
at a module boundary. The difference is whether the unified type remains a named path in the output signature or disappears into it.
There’s a useful heuristic here that parallels the Law of Demeter: if
downstream code needs to write A.V, it is depending on A’s internal
structure, not just its interface. That’s a signal worth paying attention to.
The question to ask is: does V belong in A’s public contract?
- If yes — it’s part of what callers legitimately need to know about
A— use=, keep the path open. - If no — it’s an implementation detail callers shouldn’t reach into —
use
:=, erase it, and if downstream code still needs it, inject it directly as its own parameter.
In this case, V naturally belongs in Acceptor_record.S’s contract — the
value type is fundamental to what an acceptor record is. Perhaps the destructive
substitution was an over-eager attempt at tidiness that severed a path that
should have stayed open.
Takeaways #
The meta-lesson here is less about syntax and more about what module signatures
actually are: access-control boundaries, not just type hints. Erasing a
path is a meaningful act — one that the := syntax makes easy to do accidentally.
In hindsight, there were some early code-smells which were itches that I hadn’t scratched yet, which might have arrived at these simplifications sooner.
Before the types were correctly wired, there were two parallel type families living side-by-side in the node module — an internal representation and a message-layer representation of the same semantic objects:
(* Internal node types *)
type assertion = V.t Types.paxos_assertion_state [@@deriving sexp]
type promise = assertion option [@@deriving sexp]
(* Boilerplate converters: internal β message-layer *)
let paxos_assertion_of_assertion (a : assertion) : V.t Types.paxos_assertion_state =
{ proposal = a.proposal; value = a.value }
let assertion_of_paxos_assertion (paxos_a : V.t Types.paxos_assertion_state) : assertion =
{ proposal = paxos_a.proposal; value = paxos_a.value }
let promise_of_paxos_promise (paxos_p : V.t Types.paxos_promise) : promise = (* ... *)
These converters were the smell: identical fields, shuffled back and forth, buying nothing other than preventing the compiler’s complaints. The problem wasn’t exactly the converters — it was the mental gymnastics that they had to exist at all.
The root cause: the Node signature was explicitly fixing the type of
assertion in the interface:
- type assertion = V.t Types.paxos_assertion_state [@@deriving sexp]
+ type assertion [@@deriving sexp]
By making assertion abstract in the signature and re-introducing the
concrete definition inside the functor body, the type equality is no
induced and longer explicitly declared. The message layer, event bus, node
state, and storage layer are all instantiated with the same V, so their
exposed types unify automatically. No explicit constraint needed, nor any converters, or
boilerplate.
Essentially, the system healed itself.
Why did the boilerplate disappear? \(\Leftarrow\) a mechanistic explanation on what the compiler does now
The message layer, the event bus, the node state, and the storage layer are all instantiated with the same module argument V : Value.S, so their exposed types all carry the same type constructor V.t. This induces structural sharing of type parameters across the modules, producing the necessary type equalities without requiring explicit with type clauses in the interface.
OCaml’s module system ensures that all these modules refer to the same instance of V, which is sufficient to unify the types of assertions, messages, accepted values, etc. The resulting unification is robust, type-safe, and requires no explicit signature annotation.
We now advance to a deeper pressure: one that asks how much the type system can actually guarantee about a state machine.
Enforcing State Machine Invariants: to GADT or not to GADT? #
Earlier in the quick primer for the Paxos Protocol, we saw the happy case where 3 different nodes coordinated to achieve consensus. There’s a whole world of edge-cases to deliberate on if we went with that approach — so we shift our attention to representing the protocol’s wisdom via Finite State Machines (FSMs) Just like the adage earlier about the Python language and its reliance on lookup tables, hereβs another one about Computer Science / Programming in that βAll Programming Philosophies are about Stateβ or, βeverything is just a state machineβ. Some interesting reads on this if youβre not familiar with what State Machines are about: old blog post by Shopify on why every developer should be force-fed state machines. Coming from the Elixir world, the GenServer abstraction is a beautiful pattern that is implicitly an FSM β see more elixir-discourse on FSM use-cases here. Sound FSMs make concurrency problems so much simpler to wrap our human-brains around! .
A Node Wears 3 Hats at the Same Time #
A node maintains 3 sub-states simultaneously: Proposer, Acceptor, Learner. Drawing out their FSMs before deep-diving into their logic forces us to be clear about a few things: which states are truly distinct versus merely named differently, which transitions carry side-effects (e.g. persistence, message-dispatch) versus pure state-changes, and which transitions should be impossible. The last category makes FSMs a powerful design and analysis tool for protocol implementations: if we can enumerate the illegal transitions, we can start asking whether the type system can prevent them at compile time rather than at runtime.
The Proposer FSM is the most complex of the 3 — six states
for v0, the Aborted state is actually just a shim. Itβs not critical to the flow yet because we omit retry mechanisms in v0.
, two quorum decision gates, and a simulation-control state that will cause us trouble shortly \(\ldots\)
FSM for Acceptor Hat
FSM for Learner Hat (trivial)
What immediately stands out to us from the Proposer state-diagram is that the current version of this FSM actually has 2 categories of states that are represented:
| Category | Relevant States (examples) | |
|---|---|---|
| Protocol-Level State | Preparing, WaitingForPromises, ProposerAccepting, Aborted, Decided | Entry and exit from these states are all done via Coordination messages between nodes. These are relevant for upholding the Protocol’s invariants. |
| Simulation-Level State | Idle, ProposerInactive, AcceptorInactive | A node enters or exits these states via Control messages. These are semantically excluded from the Protocol operations and exist for the Simulation harness. |
This means that as it currently stands, the simulation-level states (Blue-coloured states) and the protocol-level states (Green-coloured states) are in the same sum-type A sum type (also called a tagged union or discriminated union) is a type that can be exactly one of several named alternatives. In OCaml, these alternatives are called variants. If you come from TypeScript, think of a discriminated union; from Rust, an enum with data; from Java/Kotlin, a sealed class hierarchy. The key property being: a match expression over a sum type is exhaustive β the compiler warns you if you forget a case. A product type, mentioned earlier, is the complement: a type that holds several values simultaneously (a record or tuple). So, Sum = βone of theseβ; product = βall of these togetherβ. . Therefore, our type-system can’t distinguish “active protocol state” from “simulation inactive state” since they’re all siblings in one flat variant. This affects how we make guarantees that the system won’t have illegal state transitions, which is fundamental to many of the protocol invariants An invariant is a property that must always hold true, no matter what path the system takes. For our Paxos FSM, one invariant is: βa quorum check only happens when the proposer is in WaitingForPromises or ProposerAccepting.β Another: βa node may only broadcast Decided if it has actually reached a decision.β Violating invariants goes beyond just producing wrong answers, it actually makes a protocolβs safety-guarantees meaningless. The question is whether we enforce these invariants via runtime checks (hoping the programmer got it right) or via the type system (making it impossible for anyone to get wrong when writing the source code) .
A great place to see this happening is when we’re making decisions on what to do next based on whether a quorum has been reached. These correspond to the two quorum-gates (orange-coloured diamond nodes) that we see in the diagram. Both these decision-points rely on the same predicate function, is_quorum_reached. We know that the node only calls this predicate function when it’s in 2 possible Proposer states: WaitingForPromises OR ProposerAccepting – all other entry-points coming from all other states are illegal. So how can we guarantee that we meet this invariant?
| |
is_quorum_reached and managing invariants (ref lib/node_state.ml) TODO: permalinkOur current solution is to use a wildcard arm that invokes a failwith — our hands are forced at the moment to rely on runtime checks. This failwith approach works because the program does the right thing at runtime if the call-sites are correct.
failwith in a pattern match is a statement that the programmer knows something the compiler doesn’t: “this branch will never be reached.” That’s a pinky-promise with no enforcement.If someone – including a future version of us – adds a new call to is_quorum_reached from a handler that processes Idle nodes, the compiler won’t object. The program would compile, the tests would pass (unless we wrote one for that exact path), and the bug would hide until a specific scenario triggers it at runtime. For a consensus protocol, that’s a silent correctness violation. We’re leaving safety on the table that the compiler could be providing for free. By the way, we risk this at other places too:
@ announce_decision, whereby a node may only announce what it has decided if it’s currently in a Decided state
| |
announce_decision, ref node.ml (TODO: permalink)@ propose, whereby a node may only propose if it’s currently in an Idle state
| |
propose, ref node.ml (TODO: permalink)There’s something interesting in is_quorum_reached that may already help us \(\ldots\)
GADTs to the rescue? #
The get_role function-call within is_quorum_reached is fascinating and compels us to look at its signature.
let get_role : type a. role_state -> a role_selector -> a
| observation from signature | implication |
|---|---|
a, the return type for get_role, is not fixed, it is determined by the selector | So, if we pass in a Proposer, we get back proposer_state. When we pass in Acceptor, we get back acceptor_state – that’s why in is_quorum_reached, we could pattern match on specific variants of proposer_state like WaitingForPromises and ProposerAccepting |
Though get_role is a single function, the compiler tracks which specific type flows out based on which constructor flows in. | This feels like a type-safe Visitor Pattern, where the return type is determined by the variant that you pass in – touching the edge of dependent-typing |
This invites us to inspect the machinery here to understand how it works.
type a in the signature of get_role is a locally-abstract type
read: placeholder type that exists within this function (get_role) only. This type variable (a) is progressively refined as we pattern-match on the role_selector. Read more here.
. It introduces a type variable (a) that is scoped to this function (get_role) which the compiler refines on each branch of
Thereβs an OCaml forum post that goes through some misconceptions around this
role_selector. This means that the compiler learns which specific type a is by looking at which constructor matched. role_selector is our Generalised Algebraic Data Type (GADT) here.
| |
role_selector GADT (ref lib/node_stae.ml TODO:permalink)Each constructor carries its return type in its signature, that’s why we can say that the GADT indexes the type parameter. This means that if the pattern-matching triggers on Proposer, then the compiler will know that a = proposer_state — Proposer is the variant that produces proposer_state. This indexing mechanism is useful to select amongst the different states (proposer_state, acceptor_state, learner_state).
The signature for Node_state shows how all six proposer states – including ProposerInactive – live in one flat variant, which is precisely the structure the GADT indexes over:
| |
Node_state signature ref lib/node_state.ml (TODO: permalink)The usefulness of this indexing is clearer when we see the signature of set_role, the write complement to get_role’s read operation.
let set_role : type a. role_state -> a role_selector -> a -> role_state
Since the third argument to this function ( a ) is constrained by the role_selector, the compiler will forbid us from passing the wrong role. We won’t be able to pass an acceptor_state when the selector says Proposer. GADTs are shaping up to be the antidote to our earlier problem where we were looking for compile-time assurances so that we could do better than “correct by programmer discipline”. With this, an entire class of “wrong role” bugs are avoided at compile time.
In fact, we actually use it for our transition_role_state where we can see it all coming together.
| |
transition_role_state ref:TODO:permalink to the learning noteEvery state transition in the entire system flows through this one function. The GADT ensures that the selector and the new substate agree on which role is being updated. It prevents us from accidentally transitioning the proposer state with an acceptor state. This is a genuine compile-time enforcement of types and it’s already helping us.
Though it solves the “wrong role” class of errors, it doesn’t really solve the “wrong state within a role” class of errors. is_quorum_reached still accepts ANY proposer_state (including Idle, ProposerInactive) because currently the GADT indexes over roles and not over states within a role. The _ in role_selector means “we don’t need to name the type parameter at this point – there exists a type parameter here, but we’re not going to bother naming it”. It helps us track
More precisely, _ is an anonymous type parameter. Each GADT constructor instantiates it with a specific concrete type (proposer_state, acceptor_state, learner_state). This is called type indexing: the parameter serves as a compile-time tag. Pattern-matching on it introduces local type equalities that are valid within each branch.
the sub-state type (e.g. acceptor_state vs learner_state). However, it doesn’t constrain which variant within that substate is allowed – so we can still call operations like is_quorum_reached on Idle or ProposerInactive nodes.

The Serialisation Wall & Knowing the Tradeoff-Surface #
So, the natural move is to add in more GADTs, right? The idea practically writes itself. We saw the role_selector GADT prevent “wrong role” bugs. The natural next step is to introduce an active_proposer_state GADT that only admits WaitingForPromises and ProposerAccepting – making it structurally impossible to call is_quorum_reached on Idle or ProposerInactive nodes. The failwith arms then disappear. The compiler catches the rest. Clean, green, state-machine.
Except, our state types carry a pre-processor annotation
As seen in the highlighted lines of the Node_state signature above, @@deriving@@ is an OCaml preprocessor attribute (metaprogramming in OCaml that does AST-to-AST transformations) that auto-generates boilerplate code. Think βcode that writes codeβ: telling the compiler βemit serialization functions for this typeβ saves us writing dozens of lines of sexp_of_ and of_yojson by hand.
([@@deriving sexp, yojson]) – they’re serialisable. At every state transition, we persist a snapshot to storage. This enables the simulator to dump, reload, diff, and replay node state at any point in time. That’s a fundamental capability of the simulator as a teaching tool.
Why GADTs and ppx deriving don’t play nicely
Briefly: ppx deriving directives (in practice) assume uniform, first-order algebraic types. GADTs violate this assumption by allowing constructors to refine their result types. Both operate at compile time β but with fundamentally different expectations about what a type definition looks like.
Concretely: for most ppx deriving directives to work, every constructor must produce the same result type. This follows from how OCaml does metaprogramming: we annotate a type definition, and a compiler plugin (a ppx rewriter) walks the Abstract Syntax Tree (AST) and splices in generated code. This is a purely structural approach – the generated code is derived from syntax alone, without type inference or the ability to reason about type equalities.
A typical variant satisfies this nicely:
type active_proposer_state =
| WaitingForPromises
| ProposerAccepting (* every arm produces active_proposer_state *)
The rewriter can mechanically generate uniform cases over this shape.
Now let’s contrast that with a GADT:
type _ active_proposer_state =
| WaitingForPromises : waiting_for_promise_state -> waiting_for_promise_state active_proposer_state
| ProposerAccepting : proposer_accepting_state -> proposer_accepting_state active_proposer_state
Here, each constructor refines the result type. There is no longer a single uniform type that all constructors return. While this refinement is present in the AST, the rewriter cannot reason about it. Its code-generation templates assume a single return type – an assumption that no longer holds.
Therefore, derived code will either be rejected or fail to type-check.
This isn’t a fundamental limitation of ppx – it’s more of a consequence of the simplifying assumption that types are uniform. An assumption that GADTs deliberately break.
The compiler knows about it too – attempting to slap [@@deriving sexp] on our hypothetical active_proposer_state GADT would produce a build error. This is the compiler scolding us again, but this time, the scolding is pointing at a deeper design tension.
Pushing ahead with a full GADT encoding of the proposer states would mean writing custom serialisers, maintaining a parallel “erased” representation for storage, and updating both layers every time the state machine evolves. Every new constructor would require updating the GADT, the erased representation, the serialiser, the deserialiser, and the pattern matches that assumed exhaustiveness – five coordinated changes where the compiler can’t help us across the erased boundary. The burden of the impossible-case failwith we wanted to eliminate from the FSM handling just migrates to the serialisation logic – if we do this, we’d just be relocating the problem, not removing it. That’s the Serialisation Wall we face but we need not climb it.
Why it’s especially not worth-it to climb this wall for a Paxos simulator
Paxos state machines are not clean protocol FSMs in ways that full GADT encoding would fight against:
| Forces | |
|---|---|
| Cross-cutting state | Proposer logic depends on acceptor responses; learner logic feeds back into proposer termination. The states are not as independent as the FSM diagrams suggest. |
| Temporal looseness | In a simulator, states can be skipped, replayed, or re-entered. A GADT encoding that forbids “impossible” states would also forbid interesting states that we want to inspect mid-flight. |
| Observability requirements | The simulator wants snapshots, rewinding, diffing, and replay. All of these require serialisable state. Full GADT encoding fights against all three. |
The full-scope GADT refactor could make sense when the project matures and the representation stabilises. For v0, where we’re still actively evolving how states are factored, locking in a GADT-serialiser contract would be premature.
What a medium-term resolution may look like: typed GADT views
The principled resolution for this would likely be a two-layer architecture
| Layer | |
|---|---|
| A: storage layer | we keep the serialisable proposer_state as the storage layer |
| B: typed view layer | we introduce a non-serialisable active_proposer_state GADT and we bridge between Layer A and Layer B using converter functions |
Layer A stays unchanged – our current proposer_state with [@@deriving sexp, yojson]:
type proposer_state =
| ProposerInactive | Idle
| Preparing of assertion
| WaitingForPromises of waiting_for_promise_state
| ProposerAccepting of proposer_accepting_state
| Decided of V.t
[@@deriving sexp, yojson]
Layer B introduces a GADT that only admits the states where quorum logic is valid:
type _ active_proposer_state =
| WaitingForPromises :
waiting_for_promise_state -> waiting_for_promise_state active_proposer_state
| ProposerAccepting :
proposer_accepting_state -> proposer_accepting_state active_proposer_state
A bridge function converts from storage to view, returning None for states that have no typed view:
let active_proposer_of : proposer_state -> _ active_proposer_state option = function
| WaitingForPromises w -> Some (WaitingForPromises w)
| ProposerAccepting p -> Some (ProposerAccepting p)
| _ -> None
With this, is_quorum_reached can be rewritten to accept only the typed view:
let is_quorum_reached_active ~cluster_size
(state : _ active_proposer_state) : quorum_result =
match state with
| WaitingForPromises wfp ->
is_quorum_reached_on_promise_wait cluster_size wfp
| ProposerAccepting pa ->
is_quorum_reached_on_proposer_accepting_wait cluster_size pa
No failwith needed and no wildcard arm. The compiler guarantees exhaustiveness because active_proposer_state has only the two valid constructors. The runtime check moves to the bridge function (active_proposer_of), where it belongs – at the boundary between serialised storage and typed logic, not deep inside the protocol code.
The main appeal of this separation is that the two layers evolve independently: storage representations can change conservatively (backwards-compatible schema migrations), while the typed views can be refactored aggressively as we learn more about which invariants are worth encoding.
The failwith calls stay. They’re documented, they’re tested
The test-suites for this project have covered the fundamental areas to protect. For this, this section within test_fsml.ml TODO: permalink to test failwith/is_quorum_reached section
, and the two-layer escape hatch is designed for when the state machine stabilises a bit more.
Emerging Patterns: Clarity from Clean Mutation-Boundaries #
So far, the design pressures were driven by the compiler speaking – errors, warnings and build-failures that forced us to reconsider our design. Now, we arrive at a case where the compiler has nothing to say. And yet, instinctively, we find ourselves inheriting its discipline and idioms to judge clearer code-paths that give us clarity in places where the compiler can’t directly assist.
Let’s look at transition_role_state again — we had side-stepped the side-effects to focus on the GADT-indexed dispatch previously:
| side-effect | |
|---|---|
| mutations | node.state <- new_state updates the node’s mutable state and node.storage := updated_storage updates the mutable storage ref. These are state updates. |
| logging | An emission of a log event is an effect that has IO implications (via UI). |
| persistence | Snapshot persistence is most direct side-effect to pick up on here, involves a write operation onto a file in most cases — an IO side-effect. |
If we swap out the lens of type-level correctness with one for discipline-level correctness, we start seeing the clarity within this flow of data. Every protocol handler (e.g. propose) computes a new substate purely and then delegates to this sole function for the side-effects. A single-point for all stateful effects of transitions means that it’s a clear mutation-boundary — it wasn’t planned from the beginning, it emerged.
Actors in Charge of Their Own Mailboxes #
We’ve been following the Actor Model of computation to represent logical units within our system. Each node owns its state exclusively and talks to others only through messages — a pattern we prescribed back in the Design Grammar in Part I.
What’s this “Actor Model” about?
Itβs a concurrency model Concurrency models are ways to reason about doing many things at once. Without them, the number of possible interactions becomes unmanageable, so each model imposes constraints that make systems easier to think about. Entire ecosystems exist to provide these constraints as primitives β Erlangβs OTP being a classic example of behaviour-rules for the Actor model. In Python terms, think of each actor as an object that can only communicate with others through a message queue, never by directly calling methods on another object. where computation is organised around independent actors.
| Each actor: |
|---|
| 1) owns its state exclusively (no one else can touch it) |
| 2) reacts to incoming messages |
| 3) and influences others only by sending messages |
Step back, and you might recognise that an actor is just a finite state machine with an input stream: a message arrives \(\Rightarrow\) the actor looks at its current state \(\Rightarrow\) computes the next state \(\Rightarrow\) optionally emits more messages.
There were other ways this could have been modelled also e.g. shared mutable state with locks, or a central coordinator — but those blur ownership. The actor model keeps it sharp: each node is responsible for its own state, and coordination is always explicit.
Coming from the world of Elixir/Erlang, this concurrency-model is instinctive for me. We have processes (GenServers) with mailboxes, message passing, and isolation by default there. That instinct carries over Coming from Elixir, where GenServer gives you process-isolated mailboxes with selective receive and supervision for free, the instinct to build a per-node inbox was natural. The pattern was sound and it worked β but OCamlβs single-threaded (self-imposed v0 implementation constraint), functor-composed world offered little runtime support for the abstraction, so the coupling that Erlangβs runtime hides became visible in this OCaml codebase. : so the natural first-reach for an earlier version of the implementation was to give each node (Actor) a mailbox and let messages accumulate.
Here’s the rough shape of how it worked:
For this design, we notice that mutations get distributed across layers — every segment of the pipeline has mutations. This means three different functions \(\Rightarrow\) three different mutation-sites. There’s no single place where we can see “what happens when state changes”. Also, the lifecycle of an inbox is spread over multiple places.
The concurrency model is correct here. The inbox-pattern also works, but it doesn’t have the snap-together feel it would have had if it were a simple design.
Chaos Naturally Converges into Clarity #
The FSM restructuring that was made in response to the compiler’s guidance for the previous pressure actually improved more than just type-safety. It unlocked new code-paths for us. Thanks to the layered, role-based substates, each FSM state can now carry its own accumulated data. We saw this earlier within waiting_for_promise_state (part of the Proposer state). There, we now hold promises_received and nacks_received (both, as an ordered list of promises/nacks) as part of that substate record. The stateful inbox gets sidelined into a redundant, parallel accumulator.
With the inbox now gone, where do the side-effects go?
Once the pure computation (assembling of new substate at handlers like propose ) is separated from the effectful execution (mutation, logging, persistence), we automagically see a clear mutation-boundary forming. All the side-effects naturally converge on transition_role_state.
There’s one small exception to this pattern
handle_simulation_control mutates node.state (ref TODO:permalink) outside this chokepoint because it’s a simulation-harness-level control, not part of the Paxos protocol logic – the same simulation-vs-protocol distinction from the GADT section. It’s known, bounded and consistent so we maintain our peace.
This emergent pattern has some overlaps with existing OCaml / FP idioms.
It seems that a common FP idiom that people in the interwebs cite is that of a “Functional Core Imperative Shell”, a demo can be found here. From the linked article, the part that I can relate to is:
The natural evolution is to make even more functions pure and focus more on the separation of concerns between domain logic and infrastructure. This can be achieved by representing effectful interactions as pure values and separating the evaluation and effectful execution from their pure representations.
That’s similar to our observations in how this mutation boundary emerged. I didn’t manage to frontload that much reading on more generalisable design patterns from the FP world. I do have this book – “Pearls of Functional Algorithm Design” by Richard Bird and this repo on design-patterns in Reason — in my reading list though. Anything with case-studies will be a good reference for gaining some intuition on this.
Once the boundary between pure and effectful code was visible, conventions naturally cascaded from it – in how we name things, how we construct them, and how we guard entry to state-dependent logic.
Pure assembly \(\Leftarrow\) make_* vs Stateful Creation / Initialisation \(\Leftarrow\) create_*
(* Pure assembly β no mutation, deterministic from arguments *)
let make_permission_request ~msg_id ~time ~from ~assertion =
{meta= make_meta msg_id time Types.Coordination; from; assertion}
(* Resource allocation β creates fresh mutable bus, initialises set *)
let create_partition id =
let bus = B.create ~payload_to_string:(...) in
{id; member_node_ids= Set.empty (module Int); bus}
This is a straightforward one: the name-prefix tells you about the construction. make_* assembles a value; create_* allocates a resource with its own mutable identity. A reader who knows this convention can scan the codebase and know where the effects live without reading function bodies.
If Module Has_spec \(\implies\) can be hydrated/de-hydrated (spec-driven construction)
This pattern is about spec-driven construction. of_spec functions define a clear boundary where serialisable configuration (i.e. a spec) is hydrated into fully-initialised runtime structures.
There’s a module signature for this:
module type Has_spec = sig
type t
type spec [@@deriving sexp, yojson]
val of_spec : spec -> t
end
We can mix that signature into an existing module signature like so:
module type S = sig
module V : Value.S
module Bus : Event_bus.S
module State : Node_state.S
module Storage : Storage.S with type snapshot_payload = State.role_state
type t
include Has_spec with type t := t (* π₯ DESTRUCTIVE! π₯ *)
(*...*)
end
Node module signature satisfies Has_spec (TODO: permalink)To complete the example, here’s the actual hydration boundary functions for Node, implemented within the Make_node functor that we had seen before:
module Make_node (V : Value.S) (Bus : Event_bus.S) :
S with module V = V with module Bus = Bus = struct
(*...*)
(**Shape of the static config for Node *)
type spec = {
node_id : int;
node_alias : string;
topic_strs : string list;
initial_cluster_size : int;
initial_state : string option; [@default None] [@yojson_drop_default]
storage_config : string option; [@default None] [@yojson_drop_default]
}
[@@deriving sexp, yojson]
(*...*)
(** Hydration routine*)
let of_spec { node_id; node_alias; topic_strs; initial_cluster_size; _ } =
let default_state = State.init_state () in
let config =
{
runtime = { cluster_size = ref initial_cluster_size };
roles =
[ "Acceptor"; "Learner"; "Proposer" ]
|> List.filter_map ~f:role_of_str;
topics = topic_strs |> List.filter_map ~f:Types.topic_of_str;
}
in
{
id = node_id;
alias = node_alias;
state = default_state;
config;
subs = Hashtbl.Poly.create ();
storage = ref (Storage.create ~alias:node_alias ());
logger = Logger.create Stdlib.__MODULE__ ();
}
(*...*)
(** De-hydration routine*)
let to_spec { id = node_id; alias = node_alias; config; _ } =
{
node_id;
node_alias;
topic_strs = config.topics |> List.map ~f:Types.topic_to_str;
initial_cluster_size = !(config.runtime.cluster_size);
initial_state = None;
storage_config = None;
}
(*...*)
end
This pattern allows us to load an effectful world out of static data (config files). It also overlaps with a few other patterns we might be familiar with:
- hydration/re-hydration boundaries which is common in persistence layers
- smart constructors (FP idiom, ref Haskell wiki).
- configuration as a data-philosophy
Custom with-guards for State-Gated Effect Execution
This is about having guarded execution and effect-gating that allows for early-exit precondition checks.
let with_active ~node ~ignore_reason f =
if node.state |> State.is_inactive then ignore_reason |> noop_ignore node
else f ()
This also has the extra benefit of centralising both the guard and its fallback behaviour. It couples the condition, the reason and its fallback-effect on not meeting the state-preconditions. So now, handlers can have business-logic assuming preconditions hold.
let handle_coordination node msg =
with_active ~node
~ignore_reason:
"inactive right now and can't be reached to get coordinated..."
@@ fun () ->
log_flow ~routine:Stdlib.__FUNCTION__ ~msg:"...coordination is happening"
node;
match msg |> Message.proposal_id_of with
| None -> ()
| Some _proposal_id -> (
match msg with
| Message.Coordination (PermissionRequest pr) ->
pr |> handle_permission_request node
| Message.Coordination (PermissionGranted pg) ->
pg |> handle_permission_granted node
| Message.Coordination (Suggestion s) -> s |> handle_suggestion node
| Message.Coordination (Accepted a) -> a |> handle_accepted node
| Message.Coordination (Nack n) -> n |> handle_nack node
| Message.Coordination (Decided d) -> d |> handle_decided node
| _ ->
failwith
"We can only coordinate if we receive a coordination message.")
This approach feels familiar if you’ve used Python’s precondition decorators before:
def requires_active(func):
""" A decorator-based precondition guard"""
def wrapper(node, *args, **kwargs):
if node.is_inactive():
log(f"{node.alias}: inactive, ignoring")
return
return func(node, *args, **kwargs)
return wrapper
@requires_active # <--- applying that decorator
def handle_coordination(node, msg):
# business logic assumes node is active
...
The OCaml version achieves the same pattern without decorators or dynamic dispatch β with_active is a plain higher-order function that takes the guarded logic as a thunk. The guard, the reason, and the fallback are all co-located in one call.
These aren’t any compiler-rules that we’re obliged to follow — they’re just extrapolations that we arrive at ourselves because they give us more of that satisfactory snap-together feel when we implement them.

In the Absence of the Compiler, We Hum its Tunes #
What’s fascinating is that the type system doesn't enforce any of the things
In fact, OCaml has no effect system i.e. thereβs no annotation that says βthis function is pureβ or βside-effects live hereβ. Also β and this is something that blew my mind as a newbie into this language β OCaml is actually a general-purpose language. For example, we can use the power of Classes, Objects, First-class Modules (packed), Mixins, Inheritance \(\ldots\) levers youβd typically find in the OOP-world. Not to forget, imperative programming levers are also available.
we just went through — e.g. nothing will stop someone from writing a create_* function that actually is pure, or a make_* function that allocates a mutable ref.
Yet, the discipline holds because of the compiler’s influence on our way of thinking. “What’s the shape of that struct, the direction of data-flow, the signature of that function, is it immutable or are we writing imperative logic in that area \(\ldots\) " – these are the threads of thought that makes the patterns feel natural, self-evident and emergent.
What started as noise – type errors we didn’t yet understand, constraints that felt arbitrary – became a teacher’s voice we learned to trust. And what that voice left behind, once the specific errors were resolved and the specific constraints were satisfied, was a habit of thought: a preference for explicit boundaries, a sensitivity to where mutation lives, a belief that the names we give things should carry the weight of the architecture.
Closer #
