Chapter 23: Memory Representation of Values

Table of Contents

There’s a need to know the underlying OCaml memory model to be able to use other features like FFIs properly — because of the ways values are exchanged across C libs and the OCaml runtime. Also useful for interfacing with external system tools that operate on OCaml binaries (e.g. profiling tools, debuggers [who don’t have any knowledge of the static OCaml types]). So the learning of memory representation is done with translation between C world and OCaml world in mind.

The OCaml compiler toolchain makes things predictable — we will be able to know precisely where a block of critical OCaml code spends it time in memory.

OCaml Types Disappear @ Runtime to keep runtime simple

Since type-checking is done @ compilation, the runtime doesn’t need to keep track of that much info — they can be discarded and the type declarations can be reduced to a simplified subset (e.g. to distinguish polymorphic values @ runtime).

Contrast this with JIT dynamic patching (an amortisation strategy) in languages like Java/.NET where @ runtime, there’s a need to lookup the concrete instance of the object and dispatch the method call.

TODO/KIV: compilation pipeline discussed in chapters 25 and 26, for now we just focus on the mapping from OCaml types to runtime values.

Ocaml Blocks and Values #

Uniform memory representation: every OCaml variable stored as a value.

it’s a single memory word that is a) an immediate integer OR b) a pointer to some other memory
that’s why it need to be able to differentiate between integer and pointer values — pointers need to be followed for scanning since they indirect
runtime tracks all values to free them when unneeded.

A running program will use blocks of memory (contiguous words in RAM) to represent tuples, records, closures, arrays — allocation implicitly happens at creation of the value

1
2
3
4
(*--- type declarations don't take up any memory*)
type t = { foo: int; bar: int };;
(*--- let binding -- variable init -- 2 words of space: 1 for foo, 1 for bar*)
let x = { foo = 13; bar = 14 };;

Code Snippet 1: let-bindings get compiled into allocations from the runtime system; type declarations don't take up any memory

Distinguishing Integers and Pointers at Runtime #

Boxing is when we wrap primitives in structs that hold metadata about it. In this case, boxing is helpful to give information to the GC, so the layer of indirection is tolerable.

Not all of the values need to be boxed @ runtime — there’s a tag bit per word to distinguish integers and pointers at runtime.

integer \(\Leftarrow\) if lowest bit of the block word is non-zero
- used to represent: bool, int, empty list, unit, sometimes variants may use this also (for constant constructors [i.e. no args like None ])
- so, integers are unboxed runtime values in OCaml \(\implies\) no need wrappers, can be directly stored \(\implies\) can be loaded to registers easily that’s why they’re the cheapest and fastest values to use in OCaml
pointer \(\Leftarrow\) if the lowest bit of the block word (i.e. lowest bit) is Zero
- pointers are always word-aligned

Pointers are seen from the lens of something that does indirection, but not all pointers should be followed.

why?

The GC follows pointers to update and maintain the OCaml object graph. C pointers aren’t part of that graph — they point into foreign memory with no OCaml structure, so following them would be reading and potentially corrupting memory the GC has no business touching.

followed by whom, and for what purpose?
- “Following” is what the GC does during the mark / trace phase – a traversal of the object graph to find all live values so it knows what to keep and what to collect
- So GC is supposed to trace the reachability graph of OCaml values — that’s why it should follow pointers to other OCaml heap blocks so that it does this job properly (so that live data is not corrupted) \(\implies\) that’s why should follow the OCaml pointers
- C pointer points to memory that OCaml runtime didn’t allocate on its own (e.g. malloc-ed buffer, a fd-wrapper, a C lib struct…). If GC follows this then it would end up reading raw C memory like as if it were OCaml heap structure – but it would lack block header, tag, size fields… so GC wouldn’t be able to read and corruption might happen.
  GC cannot free or move C memory either — because that’s owned by C allocator.
C iterop in OCaml uses Abstract or Custom block types for this reason – which are heap-allocated wrappers that contain the C pointer as an opaque payload
Also, an optional finalize function registered with the custom block will be called when the wrapper is collected – that’s how C resources get freed.

A: Pointers to OCaml values \(\implies\) should be followed If pointer is within a heap-chunk that is marked as managed by OCaml runtime \(\implies\) it’s a OCaml value
B: Pointers to C values \(\implies\) shouldn’t be followed If pointer points to outside the OCaml runtime \(\implies\) it’s an opaque C pointer to some other system resource.

Blocks and Values #

Recall that values can be categorised as: boxed values or immediate values.

Inspection helpers using Obj

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(* ------ [1] Determining whether it's a block or immediate value  *)
(* repr function retrieves the runtime representation:*)
(*  is_block does a bit-check for the bottom bit *)
Obj.is_block (Obj.repr (1,2,3));;
(* - : bool = true *)
Obj.is_block (Obj.repr 1);;
(* - : bool = false *)



(* ------ [2] Finding out tag value  *)
Obj.tag (Obj.repr 1.0);;
(* note: for floating points, the =Double_tag= is set, so the collector knows that the fp value is not to be scanned *)
Obj.tag (Obj.repr 1.0) = Obj.double_tag;;

Code Snippet 2: Inspection helpers accessible via Obj module

Obj is an undocumented module that exposes the internals of the OCaml compiler and runtime. It is very useful for examining and understanding how your code will behave at runtime but should never be used for production code unless you understand the implications. The module bypasses the OCaml type system, making memory corruption and segmentation faults possible.

Great for debugging.

Immediate values: have no block at all.
The value is the word, with the tag-bit set. So there’s no header, no data-segment and nothing on the heap — e.g. (int, char, unit, bool, constant variants)
Boxed values: block structure has meta fields and a data-segment
A block is structured like so:

•                                              •                          •
•••••••••••••••• one word header •••••••••••••••• variable data-segment •••
•                                              •                          •

███████████████████████████████████████████████████████████████████████████
█               █                 █            █                          █
█   size field  █  colour (by GC) █  tag byte  █  variable data-segment   █
█               █                 █            █                          █
███████████████████████████████████████████████████████████████████████████

Code Snippet 3: ASCII diagram for the structure

Breakdown of the structure

Header (one word) 64 bits
1. size field: the length of the block in memory words 22 bits
  - this is why on 32-bit platforms, OCaml strings limited to 16MB, any bigger, we need to use Bigarray or use 64-bit architecture
2. color field 2 bits
  - for the GC to use, to keep track of state during mark-and-sweep collection. The tag isn’t exposed to the OCaml source code, ever
3. tag byte 8 bits
  - multiple purposes:
    - indicates whether data array is opaque bytes or fields
    - indicates info based on the tag value
      - No_scan_tag — value 251 \(\implies\) block’s data are all opaque bytes and the GC shouldn’t scan it — most common e.g. of opaque bytes is string type
Variable data-segment This is what the rest of the block is about — for the representation of the value

Table 1: Overview of Types and Runtime Representations

OCaml Type	Boxed?	Runtime Representation
`int`, `char`	No	Value shifted left 1 bit, LSB set to 1
`unit`, `[]`, `false`	No	OCaml int 0
`true`	No	OCaml int 1
Constant variants (`Foo Bar` construtors without params)	No	Ascending OCaml ints from 0
Variants with params	Yes	Block (tag 0); constant constructors remain unboxed
Polymorphic variants (no params)	No	Unboxed int
Polymorphic variants (with params)	Yes	Block with extra header word to store hash
`float`	Yes	Block with single field containing double-precision float
`string`	Yes	Word-aligned byte array with explicit length; tag >= No_scan_tag
List `[1;2;3]`	Mixed	`[]` is int 0; `h::t` is block (tag 0) with two fields, every `::` cell is a heap block. So, a list is a chain of blocks terminated by an unboxed zero.
Tuple, record, array	Yes	C array of values; tuples/records fixed-size, arrays variable
All-float record or array	Yes	Special unboxed float array tag

Integers, Characters, Other Basic Types (Atomic Types) #

These atomic types are just stored as unboxed integers at runtime.

OCaml ints are effectively 63-bit immediates – loss of 1 bit of precision because of the block tag
for int s, there’s a single bit lost to the tag bit and there’s a loss of a single bit of precision there. So the int representable range is \([ - 2^{62}, 2^{62} - 1 ]\)

Tuples, Records and Arrays #

all of these are boxed with the tag 0
tuples and records are fixed sized, arrays are variable - length-ed and homogeneous in their type from OCaml compiler’s POV but the interesting thing is that this homogeneity is not required by the memory representation though.

Floating-Point Numbers and Arrays #

Floats are just doubles – their tag is = Obj.double_tag

Each floating-point value is boxed in a separate memory-block — it’s inefficient to handle large arrays of floats compared to unboxed integers \(\implies\) that’s why OCaml special-cases records / arrays that contain ONLY float types \(\implies\) they’re tagged as Double_array_tag and are stored in a block that contains the floats packed directly in the data-section.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
(* [1]--- tag for float arrays differs from float and differs from float *)
Obj.tag (Obj.repr [| 1.0; 2.0; 3.0 |]);;
(* - : int = 254 *)
Obj.tag (Obj.repr 1.2);;
(* - : int = 253 *)

(* [2]--- tuple of floats are NOT optimised in the same way -- we end up having tuple tags: *)
Obj.tag (Obj.repr (1.0, 2.0, 3.0) );;
(* - : int = 0 *)


(* [3]--- possible to access the array using double_field *)
Obj.double_field (Obj.repr [| 1.1; 2.2; 3.3 |]) 1;;
(* - : float = 2.2 *)
Obj.double_field (Obj.repr 1.234) 0;;
(* - : float = 1.234 *)

Only records and arrays can have the float array optimization, and for records, every single field must be a float.

Variants and Lists #

Variants:

Basic variants just stored as 0-indexed numbers.
Variants with params are complex — they’re stored as blocks, the value tags ascend from 0 (from the leftmost variants with params) and params are stored as words in the block.
interesting gotcha on variant limits: because of the tagged encoding, there is a limit around 240 variants with parameters that applies to each type definition, but the only limit on the number of variants without parameters is the size of the native integer (either 31 or 63 bits). This limit arises because of the size of the tag byte, and that some of the high-numbered tags are reserved.

Lists:

representation is exactly same as if the list was written as a variant type with Nil and Cons
[] is an integer 0
subsequent blocks have tag 0 and 2 params: block with the current value, pointer to the rest of the list

We can force a type case between any two OCaml types using Obj.magic and we just need to provide a type hint:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(* ====== [1] basic variant *)
type t = Apple | Orange | Pear;;
(* type t = Apple | Orange | Pear *)
((Obj.magic (Obj.repr Apple)) : int);;
(* - : int = 0 *)
((Obj.magic (Obj.repr Pear)) : int);;
(* - : int = 2 *)

(* --- confirms that the value is not a complex, boxed value*)
Obj.is_block (Obj.repr Apple);;
(* - : bool = false *)

(* ====== [2] parametric variant *)
type t = Apple | Orange of int | Pear of string | Kiwi;;
(* - : type t = Apple | Orange of Base.int | Pear of Base.string | Kiwi *)
Obj.is_block (Obj. repr (Orange 1234));;
(* - : bool = true *)
Obj.tag (Obj. repr (Orange 1234));;
(* - : int = 0 *)
Obj.tag (Obj. repr (Pear "xyz"));;
(* - : int = 1 *)
(Obj.magic (Obj.field (Obj.repr (Orange 1234)) 0) : int);;
(* - : int = 1234 *)
(Obj.magic (Obj.field (Obj.repr (Pear "xyz")) 0) : string);;
(* - : string = "xyz" *)

Code Snippet 4: Investigating basic variant representation using Obj.magic

Obj Module Considered Harmful #

Great for debugging.

Polymorphic Variants #

Not much compile-time info available to optimise memory layout — so the flexibility of polymorphic variants is at the expense of a slight runtime inefficiency.

Plain polymorphic variant (no parameters) — unboxed integer so 1 word wide, like a normal variant. Integer value = hash (name of variant). hash function would be the same for both 32 vs 64 bit architectures – so stable memory representation across CPU and host types
this is how one might inspect the hash:
```
  #require "ocaml-compiler-libs.common";;
  Btype.hash_variant "Foo";;
  - : int = 3505894
  (Obj.magic (Obj.repr `Foo) : int);;
  - : int = 3505894
```
Code Snippet 5: Using ocaml-compiler-libs to calculate hashes
Inefficiencies:
1. more mem space than normal variants IF they have params included
  - the normal variants just use the tag byte for encoding the variant value and save the fields for the contents
  - the polymorphic variant needs to allocate a new block (with tag 0) to store the hashed value for the polymorphic variant. So, polymorphic variants with constructors use one word of memory MORE than the normal variant constructors.
2. when polymorphic variant constructor has more than one param
  - normal variant holds params as a single flat block with multiple fields for each entry
  - polymorphic variants need flexible uniform memory representation because they may be used in different context across compilation units (that’s why theyr’e polymorphic) \(\implies\) allocate a tuple block for params which the argument field of the variant points to
    so for the polymorphic variant, there are 3 additional words: along with an extra memory indirection due to the tuple.
the CON of the inefficiencies not as bad as the PRO of the flexibility of using polymorphic variants.
if we’re writing code that demands high performance or must run within tight memory bounds, the runtime layout is at least very predictable.

String (and Byte) Values #

Basically just header then contents then padding. Padding is dynamically determined to make things word-aligned.

Header contains the size of string, defined in number of machine words; String_tag (252) is higher than No_scan_tag which helps indicate that the contents of the block are opaque to the collector.

The string representation is a clever way to ensure that the contents are always zero-terminated by the padding word and to still compute its length efficiently without scanning the whole string.

length_of_string = num_of_words_in_block * sizeof(word) - last_byte_of_block - 1

NULL termination comes in handy when passing a string to C, but is not relied upon to compute the length from OCaml code.

Careful when interoperating with C libs:

Take care that any C library functions that receive these buffers can also cope with arbitrary bytes within the buffer contents and are not expecting C strings
e.g. C memcopy or memmove standard library functions can operate on arbitrary data, but strlen or strcpy both require a NULL-terminated buffer, and neither has a mechanism for encoding a NULL value within its contents.

Custom Heap Blocks #

We can define our own operations over OCaml values using Custom_tag – custom blcok lives in the OCaml heap like an ordinary block and can be of user-defined size.

Custom_tag is higher than No_scan_tag so it doesn’t get scanned by the GC.

first word of the data within the custom block points out to C struct of custom operations. Very important point that this is one-way – so the custom block can’t have pointers to OCaml blocks and is always going to be opaque to the GC.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
struct custom_operations {
  char *identifier;
  /* This is a fini routine typically used to invoke C cleanup functions like free, different from ordinary OCaml finalizers: */
  void (*finalize)(value v);
  int (*compare)(value v1, value v2); /*<-- shows how to do polymorphic comparison */
  intnat (*hash)(value v);

  /*---- Marshalling logic: ----*/
  void (*serialize)(value v,
                    /*out*/ uintnat * wsize_32 /*size in bytes*/,
                    /*out*/ uintnat * wsize_64 /*size in bytes*/);
  uintnat (*deserialize)(void * dst);
  int (*compare_ext)(value v1, value v2);
};

Code Snippet 6: example of custom operations that a custom heap block may point to

The finalizers in custom operations defined for custom heap blocks and OCaml finalizers (as created by Gc.finalize) are not the same.

Managing external memory with Bigarray #

Custom blocks are useful for interfacing with external system memory directly from within OCaml — for FFI interoperability.

E.g. BigArray interface created for data-exchange with Fortran code — mapped a block of system memory as a multi-dimensional array that can be accessed from OCaml

Bigarray operations work directly on the external memory without requiring it to be copied into the OCaml heap (which is a potentially expensive operation for large arrays). Can avoid copying it over to OCaml heap.