1Object allocation and lifetime in ICE
2=====================================
3
4This document discusses object lifetime and scoping issues, starting with
5bitcode parsing and ending with ELF file emission.
6
7Multithreaded translation model
8-------------------------------
9
10A single thread is responsible for parsing PNaCl bitcode (possibly concurrently
11with downloading the bitcode file) and constructing the initial high-level ICE.
12The result is a queue of Cfg pointers.  The parser thread incrementally adds a
13Cfg pointer to the queue after the Cfg is created, and then moves on to parse
14the next function.
15
16Multiple translation worker threads draw from the queue of Cfg pointers as they
17are added to the queue, such that several functions can be translated in parallel.
18The result is a queue of assembler buffers, each of which consists of machine code
19plus fixups.
20
21A single thread is responsible for writing the assembler buffers to an ELF file.
22It consumes the assembler buffers from the queue that the translation threads
23write to.
24
25This means that Cfgs are created by the parser thread and destroyed by the
26translation thread (including Cfg nodes, instructions, and most kinds of
27operands), and assembler buffers are created by the translation thread and
28destroyed by the writer thread.
29
30Deterministic execution
31^^^^^^^^^^^^^^^^^^^^^^^
32
33Although code randomization is a key aspect of security, deterministic and
34repeatable translation is sometimes needed, e.g. for regression testing.
35Multithreaded translation introduces potential for randomness that may need to
36be made deterministic.
37
38* Bitcode parsing is sequential, so it's easy to use a FIFO queue to keep the
39  translation queue in deterministic order.  But since translation is
40  multithreaded, FIFO order for the assembler buffer queue may not be
41  deterministic.  The writer thread would be responsible for reordering the
42  buffers, potentially waiting for slower translations to complete even if other
43  assembler buffers are available.
44
45* Different translation threads may add new constant pool entries at different
46  times.  Some constant pool entries are emitted as read-only data.  This
47  includes floating-point constants for x86, as well as integer immediate
48  randomization through constant pooling.  These constant pool entries are
49  emitted after all assembler buffers have been written.  The writer needs to be
50  able to sort them deterministically before emitting them.
51
52Object lifetimes
53----------------
54
55Objects of type Constant, or a subclass of Constant, are pooled globally.  The
56pooling is managed by the GlobalContext class.  Since Constants are added or
57looked up by translation threads and the parser thread, access to the constant
58pools, as well as GlobalContext in general, need to be arbitrated by locks.
59(It's possible that if there's too much contention, we can maintain a
60thread-local cache for Constant pool lookups.)  Constants live across all
61function translations, and are destroyed only at the end.
62
63Several object types are scoped within the lifetime of the Cfg.  These include
64CfgNode, Inst, Variable, and any target-specific subclasses of Inst and Operand.
65When the Cfg is destroyed, these scoped objects are destroyed as well.  To keep
66this cheap, the Cfg includes a slab allocator from which these objects are
67allocated, and the objects should not contain fields with non-trivial
68destructors.  Most of these fields are POD, but in a couple of cases these
69fields are STL containers.  We deal with this, and avoid leaking memory, by
70providing the container with an allocator that uses the Cfg-local slab
71allocator.  Since the container allocator generally needs to be stateless, we
72store a pointer to the slab allocator in thread-local storage (TLS).  This is
73straightforward since on any of the threads, only one Cfg is active at a time,
74and a given Cfg is only active in one thread at a time (either the parser
75thread, or at most one translation thread, or the writer thread).
76
77Even though there is a one-to-one correspondence between Cfgs and assembler
78buffers, they need to use different allocators.  This is because the translation
79thread wants to destroy the Cfg and reclaim all its memory after translation
80completes, but possibly before the assembly buffer is written to the ELF file.
81Ownership of the assembler buffer and its allocator are transferred to the
82writer thread after translation completes, similar to the way ownership of the
83Cfg and its allocator are transferred to the translation thread after parsing
84completes.
85
86Allocators and TLS
87------------------
88
89Part of the Cfg building, and transformations on the Cfg, include STL container
90operations which may need to allocate additional memory in a stateless fashion.
91This requires maintaining the proper slab allocator pointer in TLS.
92
93When the parser thread creates a new Cfg object, it puts a pointer to the Cfg's
94slab allocator into its own TLS.  This is used as the Cfg is built within the
95parser thread.  After the Cfg is built, the parser thread clears its allocator
96pointer, adds the new Cfg pointer to the translation queue, continues with the
97next function.
98
99When the translation thread grabs a new Cfg pointer, it installs the Cfg's slab
100allocator into its TLS and translates the function.  When generating the
101assembly buffer, it must take care not to use the Cfg's slab allocator.  If
102there is a slab allocator for the assembler buffer, a pointer to it can also be
103installed in TLS if needed.
104
105The translation thread destroys the Cfg when it is done translating, including
106the Cfg's slab allocator, and clears the allocator pointer from its TLS.
107Likewise, the writer thread destroys the assembler buffer when it is finished
108with it.
109
110Thread safety
111-------------
112
113The parse/translate/write stages of the translation pipeline are fairly
114independent, with little opportunity for threads to interfere.  The Subzero
115design calls for all shared accesses to go through the GlobalContext, which adds
116locking as appropriate.  This includes the coarse-grain work queues for Cfgs and
117assembler buffers.  It also includes finer-grain access to constant pool
118entries, as well as output streams for verbose debugging output.
119
120If locked access to constant pools becomes a bottleneck, we can investigate
121thread-local caches of constants (as mentioned earlier).  Also, it should be
122safe though slightly less efficient to allow duplicate copies of constants
123across threads (which could be de-dupped by the writer at the end).
124
125We will use ThreadSanitizer as a way to detect potential data races in the
126implementation.
127