- rtshkmr's digital garden/
- Readings/
- Books/
- Real World OCaml: Functional Programming for the Masses/
- Chapter 23: Memory Representation of Values/
Chapter 23: Memory Representation of Values
Table of Contents
There’s a need to know the underlying OCaml memory model to be able to use other features like FFIs properly — because of the ways values are exchanged across C libs and the OCaml runtime. Also useful for interfacing with external system tools that operate on OCaml binaries (e.g. profiling tools, debuggers [who don’t have any knowledge of the static OCaml types]). So the learning of memory representation is done with translation between C world and OCaml world in mind.
The OCaml compiler toolchain makes things predictable — we will be able to know precisely where a block of critical OCaml code spends it time in memory.

OCaml Types Disappear @ Runtime to keep runtime simple
Since type-checking is done @ compilation, the runtime doesn’t need to keep track of that much info — they can be discarded and the type declarations can be reduced to a simplified subset (e.g. to distinguish polymorphic values @ runtime).
Contrast this with JIT dynamic patching (an amortisation strategy) in languages like Java/.NET where @ runtime, there’s a need to lookup the concrete instance of the object and dispatch the method call.
TODO/KIV: compilation pipeline discussed in chapters 25 and 26, for now we just focus on the mapping from OCaml types to runtime values.
Ocaml Blocks and Values #
Uniform memory representation: every OCaml variable stored as a value.
it’s a single memory word that is a) an immediate integer OR b) a pointer to some other memory
that’s why it need to be able to differentiate between integer and pointer values — pointers need to be followed for scanning since they indirect
runtime tracks all values to free them when unneeded.
A running program will use blocks of memory (contiguous words in RAM) to represent tuples, records, closures, arrays — allocation implicitly happens at creation of the value
| |
Distinguishing Integers and Pointers at Runtime #
Boxing is when we wrap primitives in structs that hold metadata about it. In this case, boxing is helpful to give information to the GC, so the layer of indirection is tolerable.
Not all of the values need to be boxed @ runtime — there’s a tag bit per word to distinguish integers and pointers at runtime.
- integer \(\Leftarrow\) if lowest bit of the block word is non-zero
- used to represent:
bool,int, empty list,unit, sometimes variants may use this also (for constant constructors [i.e. no args likeNone]) - so, integers are unboxed runtime values in OCaml \(\implies\) no need wrappers, can be directly stored \(\implies\) can be loaded to registers easily that’s why they’re the cheapest and fastest values to use in OCaml
- used to represent:
- pointer \(\Leftarrow\) if the lowest bit of the block word (i.e. lowest bit) is Zero
- pointers are always word-aligned
Pointers are seen from the lens of something that does indirection, but not all pointers should be followed.
why?
followed by whom, and for what purpose?
“Following” is what the GC does during the mark / trace phase – a traversal of the object graph to find all live values so it knows what to keep and what to collect
So GC is supposed to trace the reachability graph of OCaml values — that’s why it should follow pointers to other OCaml heap blocks so that it does this job properly (so that live data is not corrupted) \(\implies\) that’s why should follow the OCaml pointers
C pointer points to memory that OCaml runtime didn’t allocate on its own (e.g.
malloc-ed buffer, a fd-wrapper, a C lib struct…). If GC follows this then it would end up reading raw C memory like as if it were OCaml heap structure – but it would lack block header, tag, size fields… so GC wouldn’t be able to read and corruption might happen.GC cannot free or move C memory either — because that’s owned by C allocator.
C iterop in OCaml uses
AbstractorCustomblock types for this reason – which are heap-allocated wrappers that contain the C pointer as an opaque payloadAlso, an optional
finalizefunction registered with thecustomblock will be called when the wrapper is collected – that’s how C resources get freed.
- A: Pointers to OCaml values \(\implies\) should be followed If pointer is within a heap-chunk that is marked as managed by OCaml runtime \(\implies\) it’s a OCaml value
- B: Pointers to C values \(\implies\) shouldn’t be followed If pointer points to outside the OCaml runtime \(\implies\) it’s an opaque C pointer to some other system resource.
Blocks and Values #
Recall that values can be categorised as: boxed values or immediate values.
Inspection helpers using Obj
| |
Obj moduleObj is an undocumented module that exposes the internals of the OCaml compiler and runtime. It is very useful for examining and understanding how your code will behave at runtime but should never be used for production code unless you understand the implications. The module bypasses the OCaml type system, making memory corruption and segmentation faults possible.
Great for debugging.
Immediate values: have no block at all.
The value is the word, with the tag-bit set. So there’s no header, no data-segment and nothing on the heap — e.g. (
int,char,unit,bool, constant variants)Boxed values: block structure has meta fields and a data-segment
A block is structured like so:
• • •
•••••••••••••••• one word header •••••••••••••••• variable data-segment •••
• • •
███████████████████████████████████████████████████████████████████████████
█ █ █ █ █
█ size field █ colour (by GC) █ tag byte █ variable data-segment █
█ █ █ █ █
███████████████████████████████████████████████████████████████████████████Breakdown of the structure
- Header (one word)
64 bits- size field: the length of the block in memory words
22 bits- this is why on 32-bit platforms, OCaml strings limited to 16MB, any bigger, we need to use
Bigarrayor use 64-bit architecture
- this is why on 32-bit platforms, OCaml strings limited to 16MB, any bigger, we need to use
- color field
2 bits- for the GC to use, to keep track of state during mark-and-sweep collection. The tag isn’t exposed to the OCaml source code, ever
- tag byte
8 bits- multiple purposes:
- indicates whether data array is opaque bytes or fields
- indicates info based on the tag value
No_scan_tag— value 251 \(\implies\) block’s data are all opaque bytes and the GC shouldn’t scan it — most common e.g. of opaque bytes isstringtype
- multiple purposes:
- size field: the length of the block in memory words
- Variable data-segment This is what the rest of the block is about — for the representation of the value
| OCaml Type | Boxed? | Runtime Representation |
|---|---|---|
int, char | No | Value shifted left 1 bit, LSB set to 1 |
unit, [], false | No | OCaml int 0 |
true | No | OCaml int 1 |
Constant variants (Foo Bar construtors without params) | No | Ascending OCaml ints from 0 |
| Variants with params | Yes | Block (tag 0); constant constructors remain unboxed |
| Polymorphic variants (no params) | No | Unboxed int |
| Polymorphic variants (with params) | Yes | Block with extra header word to store hash |
float | Yes | Block with single field containing double-precision float |
string | Yes | Word-aligned byte array with explicit length; tag >= No_scan_tag |
List [1;2;3] | Mixed | [] is int 0; h::t is block (tag 0) with two fields, every :: cell is a heap block. So, a list is a chain of blocks terminated by an unboxed zero. |
| Tuple, record, array | Yes | C array of values; tuples/records fixed-size, arrays variable |
| All-float record or array | Yes | Special unboxed float array tag |
Integers, Characters, Other Basic Types (Atomic Types) #
These atomic types are just stored as unboxed integers at runtime.
OCaml ints are effectively 63-bit immediates – loss of 1 bit of precision because of the block tag
for
ints, there’s a single bit lost to the tag bit and there’s a loss of a single bit of precision there. So the int representable range is \([ - 2^{62}, 2^{62} - 1 ]\)
Tuples, Records and Arrays #
- all of these are boxed with the tag
0 - tuples and records are fixed sized, arrays are variable - length-ed and homogeneous in their type from OCaml compiler’s POV but the interesting thing is that this homogeneity is not required by the memory representation though.
Floating-Point Numbers and Arrays #
Floats are just doubles – their tag is = Obj.double_tag
Each floating-point value is boxed in a separate memory-block — it’s inefficient to handle large arrays of floats compared to unboxed integers \(\implies\) that’s why OCaml special-cases records / arrays that contain ONLY float types \(\implies\) they’re tagged as Double_array_tag and are stored in a block that contains the floats packed directly in the data-section.
| |
Variants and Lists #
Variants:
Basic variants just stored as 0-indexed numbers.
Variants with params are complex — they’re stored as blocks, the value tags ascend from 0 (from the leftmost variants with params) and params are stored as words in the block.
interesting gotcha on variant limits: because of the tagged encoding, there is a limit around 240 variants with parameters that applies to each type definition, but the only limit on the number of variants without parameters is the size of the native integer (either 31 or 63 bits). This limit arises because of the size of the tag byte, and that some of the high-numbered tags are reserved.
Lists:
- representation is exactly same as if the list was written as a variant type with
NilandCons []is an integer0- subsequent blocks have tag
0and 2 params: block with the current value, pointer to the rest of the list
We can force a type case between any two OCaml types using Obj.magic and we just need to provide a type hint:
| |
Obj.magicObj Module Considered Harmful #
Obj is an undocumented module that exposes the internals of the OCaml compiler and runtime. It is very useful for examining and understanding how your code will behave at runtime but should never be used for production code unless you understand the implications. The module bypasses the OCaml type system, making memory corruption and segmentation faults possible.
Great for debugging.
Polymorphic Variants #
Not much compile-time info available to optimise memory layout — so the flexibility of polymorphic variants is at the expense of a slight runtime inefficiency.
Plain polymorphic variant (no parameters) — unboxed integer so 1 word wide, like a normal variant. Integer value =
hash (name of variant). hash function would be the same for both 32 vs 64 bit architectures – so stable memory representation across CPU and host typesthis is how one might inspect the hash:
#require "ocaml-compiler-libs.common";; Btype.hash_variant "Foo";; - : int = 3505894 (Obj.magic (Obj.repr `Foo) : int);; - : int = 3505894Code Snippet 5: Usingocaml-compiler-libsto calculate hashesInefficiencies:
- more mem space than normal variants IF they have params included
the normal variants just use the tag byte for encoding the variant value and save the fields for the contents
the polymorphic variant needs to allocate a new block (with tag 0) to store the hashed value for the polymorphic variant. So, polymorphic variants with constructors use one word of memory MORE than the normal variant constructors.
- when polymorphic variant constructor has more than one param
normal variant holds params as a single flat block with multiple fields for each entry
polymorphic variants need flexible uniform memory representation because they may be used in different context across compilation units (that’s why theyr’e polymorphic) \(\implies\) allocate a tuple block for params which the argument field of the variant points to
so for the polymorphic variant, there are 3 additional words: along with an extra memory indirection due to the tuple.
- more mem space than normal variants IF they have params included
the CON of the inefficiencies not as bad as the PRO of the flexibility of using polymorphic variants.
if we’re writing code that demands high performance or must run within tight memory bounds, the runtime layout is at least very predictable.
String (and Byte) Values #
Basically just header then contents then padding. Padding is dynamically determined to make things word-aligned.
Header contains the size of string, defined in number of machine words; String_tag (252) is higher than No_scan_tag which helps indicate that the contents of the block are opaque to the collector.
The string representation is a clever way to ensure that the contents are always zero-terminated by the padding word and to still compute its length efficiently without scanning the whole string.
length_of_string = num_of_words_in_block * sizeof(word) - last_byte_of_block - 1NULL termination comes in handy when passing a string to C, but is not relied upon to compute the length from OCaml code.
Careful when interoperating with C libs:
Take care that any C library functions that receive these buffers can also cope with arbitrary bytes within the buffer contents and are not expecting C strings
e.g. C
memcopyormemmovestandard library functions can operate on arbitrary data, butstrlenorstrcpyboth require aNULL-terminated buffer, and neither has a mechanism for encoding aNULLvalue within its contents.
Custom Heap Blocks #
We can define our own operations over OCaml values using Custom_tag – custom blcok lives in the OCaml heap like an ordinary block and can be of user-defined size.
Custom_tag is higher than No_scan_tag so it doesn’t get scanned by the GC.
first word of the data within the custom block points out to C struct of custom operations. Very important point that this is one-way – so the custom block can’t have pointers to OCaml blocks and is always going to be opaque to the GC.
| |
Gc.finalize) are not the same.Managing external memory with Bigarray #
Custom blocks are useful for interfacing with external system memory directly from within OCaml — for FFI interoperability.
E.g. BigArray interface created for data-exchange with Fortran code — mapped a block of system memory as a multi-dimensional array that can be accessed from OCaml
Bigarray operations work directly on the external memory without requiring it to be copied into the OCaml heap (which is a potentially expensive operation for large arrays). Can avoid copying it over to OCaml heap.