1Subzero - Fast code generator for PNaCl bitcode
2===============================================
3
4Design
5------
6
7See the accompanying DESIGN.rst file for a more detailed technical overview of
8Subzero.
9
10Building
11--------
12
13Subzero is set up to be built within the Native Client tree.  Follow the
14`Developing PNaCl
15<https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/developing-pnacl>`_
16instructions, in particular the section on building PNaCl sources.  This will
17prepare the necessary external headers and libraries that Subzero needs.
18Checking out the Native Client project also gets the pre-built clang and LLVM
19tools in ``native_client/../third_party/llvm-build/Release+Asserts/bin`` which
20are used for building Subzero.
21
22The Subzero source is in ``native_client/toolchain_build/src/subzero``.  From
23within that directory, ``git checkout master && git pull`` to get the latest
24version of Subzero source code.
25
26The Makefile is designed to be used as part of the higher level LLVM build
27system.  To build manually, use the ``Makefile.standalone``.  There are several
28build configurations from the command line::
29
30    make -f Makefile.standalone
31    make -f Makefile.standalone DEBUG=1
32    make -f Makefile.standalone NOASSERT=1
33    make -f Makefile.standalone DEBUG=1 NOASSERT=1
34    make -f Makefile.standalone MINIMAL=1
35    make -f Makefile.standalone ASAN=1
36    make -f Makefile.standalone TSAN=1
37
38``DEBUG=1`` builds without optimizations and is good when running the translator
39inside a debugger.  ``NOASSERT=1`` disables assertions and is the preferred
40configuration for performance testing the translator.  ``MINIMAL=1`` attempts to
41minimize the size of the translator by compiling out everything unnecessary.
42``ASAN=1`` enables AddressSanitizer, and ``TSAN=1`` enables ThreadSanitizer.
43
44The result of the ``make`` command is the target ``pnacl-sz`` in the current
45directory.
46
47Building within LLVM trunk
48--------------------------
49
50Subzero can also be built from within a standard LLVM trunk checkout.  Here is
51an example of how it can be checked out and built::
52
53    mkdir llvm-git
54    cd llvm-git
55    git clone http://llvm.org/git/llvm.git
56    cd llvm/projects/
57    git clone https://chromium.googlesource.com/native_client/pnacl-subzero
58    cd ../..
59    mkdir build
60    cd build
61    cmake -G Ninja ../llvm/
62    ninja
63    ./bin/pnacl-sz -version
64
65This creates a default build of ``pnacl-sz``; currently any options such as
66``DEBUG=1`` or ``MINIMAL=1`` have to be added manually.
67
68``pnacl-sz``
69------------
70
71The ``pnacl-sz`` program parses a pexe or an LLVM bitcode file and translates it
72into ICE (Subzero's intermediate representation).  It then invokes the ICE
73translate method to lower it to target-specific machine code, optionally dumping
74the intermediate representation at various stages of the translation.
75
76The program can be run as follows::
77
78    ../pnacl-sz ./path/to/<file>.pexe
79    ../pnacl-sz ./tests_lit/pnacl-sz_tests/<file>.ll
80
81At this time, ``pnacl-sz`` accepts a number of arguments, including the
82following:
83
84    ``-help`` -- Show available arguments and possible values.  (Note: this
85    unfortunately also pulls in some LLVM-specific options that are reported but
86    that Subzero doesn't use.)
87
88    ``-notranslate`` -- Suppress the ICE translation phase, which is useful if
89    ICE is missing some support.
90
91    ``-target=<TARGET>`` -- Set the target architecture.  The default is x8632.
92    Future targets include x8664, arm32, and arm64.
93
94    ``-filetype=obj|asm|iasm`` -- Select the output file type.  ``obj`` is a
95    native ELF file, ``asm`` is a textual assembly file, and ``iasm`` is a
96    low-level textual assembly file demonstrating the integrated assembler.
97
98    ``-O<LEVEL>`` -- Set the optimization level.  Valid levels are ``2``, ``1``,
99    ``0``, ``-1``, and ``m1``.  Levels ``-1`` and ``m1`` are synonyms, and
100    represent the minimum optimization and worst code quality, but fastest code
101    generation.
102
103    ``-verbose=<list>`` -- Set verbosity flags.  This argument allows a
104    comma-separated list of values.  The default is ``none``, and the value
105    ``inst,pred`` will roughly match the .ll bitcode file.  Of particular use
106    are ``all``, ``most``, and ``none``.
107
108    ``-o <FILE>`` -- Set the assembly output file name.  Default is stdout.
109
110    ``-log <FILE>`` -- Set the file name for diagnostic output (whose level is
111    controlled by ``-verbose``).  Default is stdout.
112
113    ``-timing`` -- Dump some pass timing information after translating the input
114    file.
115
116Running the test suite
117----------------------
118
119Subzero uses the LLVM ``lit`` testing tool for part of its test suite, which
120lives in ``tests_lit``. To execute the test suite, first build Subzero, and then
121run::
122
123    make -f Makefile.standalone check-lit
124
125There is also a suite of cross tests in the ``crosstest`` directory.  A cross
126test takes a test bitcode file implementing some unit tests, and translates it
127twice, once with Subzero and once with LLVM's known-good ``llc`` translator.
128The Subzero-translated symbols are specially mangled to avoid multiple
129definition errors from the linker.  Both translated versions are linked together
130with a driver program that calls each version of each unit test with a variety
131of interesting inputs and compares the results for equality.  The cross tests
132are currently invoked by running::
133
134    make -f Makefile.standalone check-xtest
135
136Similar, there is a suite of unit tests::
137
138    make -f Makefile.standalone check-unit
139
140A convenient way to run the lit, cross, and unit tests is::
141
142    make -f Makefile.standalone check
143
144Assembling ``pnacl-sz`` output as needed
145----------------------------------------
146
147``pnacl-sz`` can now produce a native ELF binary using ``-filetype=obj``.
148
149``pnacl-sz`` can also produce textual assembly code in a structure suitable for
150input to ``llvm-mc``, using ``-filetype=asm`` or ``-filetype=iasm``.  An object
151file can then be produced using the command::
152
153    llvm-mc -triple=i686 -filetype=obj -o=MyObj.o
154
155Building a translated binary
156----------------------------
157
158There is a helper script, ``pydir/szbuild.py``, that translates a finalized pexe
159into a fully linked executable.  Run it with ``-help`` for extensive
160documentation.
161
162By default, ``szbuild.py`` builds an executable using only Subzero translation,
163but it can also be used to produce hybrid Subzero/``llc`` binaries (``llc`` is
164the name of the LLVM translator) for bisection-based debugging.  In bisection
165debugging mode, the pexe is translated using both Subzero and ``llc``, and the
166resulting object files are combined into a single executable using symbol
167weakening and other linker tricks to control which Subzero symbols and which
168``llc`` symbols take precedence.  This is controlled by the ``-include`` and
169``-exclude`` arguments.  These can be used to rapidly find a single function
170that Subzero translates incorrectly leading to incorrect output.
171
172There is another helper script, ``pydir/szbuild_spec2k.py``, that runs
173``szbuild.py`` on one or more components of the Spec2K suite.  This assumes that
174Spec2K is set up in the usual place in the Native Client tree, and the finalized
175pexe files have been built.  (Note: for working with Spec2K and other pexes,
176it's helpful to finalize the pexe using ``--no-strip-syms``, to preserve the
177original function and global variable names.)
178
179Status
180------
181
182Subzero currently fully supports the x86-32 architecture, for both native and
183Native Client sandboxing modes.  The x86-64 architecture is also supported in
184native mode only, and only for the x32 flavor due to the fact that pointers and
18532-bit integers are indistinguishable in PNaCl bitcode.  Sandboxing support for
186x86-64 is in progress.  ARM and MIPS support is in progress.  Two optimization
187levels, ``-Om1`` and ``-O2``, are implemented.
188
189The ``-Om1`` configuration is designed to be the simplest and fastest possible,
190with a minimal set of passes and transformations.
191
192* Simple Phi lowering before target lowering, by generating temporaries and
193  adding assignments to the end of predecessor blocks.
194
195* Simple register allocation limited to pre-colored or infinite-weight
196  Variables.
197
198The ``-O2`` configuration is designed to use all optimizations available and
199produce the best code.
200
201* Address mode inference to leverage the complex x86 addressing modes.
202
203* Compare/branch fusing based on liveness/last-use analysis.
204
205* Global, linear-scan register allocation.
206
207* Advanced phi lowering after target lowering and global register allocation,
208  via edge splitting, topological sorting of the parallel moves, and final local
209  register allocation.
210
211* Stack slot coalescing to reduce frame size.
212
213* Branch optimization to reduce the number of branches to the following block.
214