1.. _Readers: 2 3Developing lld Readers 4====================== 5 6Note: this document discuss Mach-O port of LLD. For ELF and COFF, 7see :doc:`index`. 8 9Introduction 10------------ 11 12The purpose of a "Reader" is to take an object file in a particular format 13and create an `lld::File`:cpp:class: (which is a graph of Atoms) 14representing the object file. A Reader inherits from 15`lld::Reader`:cpp:class: which lives in 16:file:`include/lld/Core/Reader.h` and 17:file:`lib/Core/Reader.cpp`. 18 19The Reader infrastructure for an object format ``Foo`` requires the 20following pieces in order to fit into lld: 21 22:file:`include/lld/ReaderWriter/ReaderFoo.h` 23 24 .. cpp:class:: ReaderOptionsFoo : public ReaderOptions 25 26 This Options class is the only way to configure how the Reader will 27 parse any file into an `lld::Reader`:cpp:class: object. This class 28 should be declared in the `lld`:cpp:class: namespace. 29 30 .. cpp:function:: Reader *createReaderFoo(ReaderOptionsFoo &reader) 31 32 This factory function configures and create the Reader. This function 33 should be declared in the `lld`:cpp:class: namespace. 34 35:file:`lib/ReaderWriter/Foo/ReaderFoo.cpp` 36 37 .. cpp:class:: ReaderFoo : public Reader 38 39 This is the concrete Reader class which can be called to parse 40 object files. It should be declared in an anonymous namespace or 41 if there is shared code with the `lld::WriterFoo`:cpp:class: you 42 can make a nested namespace (e.g. `lld::foo`:cpp:class:). 43 44You may have noticed that :cpp:class:`ReaderFoo` is not declared in the 45``.h`` file. An important design aspect of lld is that all Readers are 46created *only* through an object-format-specific 47:cpp:func:`createReaderFoo` factory function. The creation of the Reader is 48parametrized through a :cpp:class:`ReaderOptionsFoo` class. This options 49class is the one-and-only way to control how the Reader operates when 50parsing an input file into an Atom graph. For instance, you may want the 51Reader to only accept certain architectures. The options class can be 52instantiated from command line options or be programmatically configured. 53 54Where to start 55-------------- 56 57The lld project already has a skeleton of source code for Readers for 58``ELF``, ``PECOFF``, ``MachO``, and lld's native ``YAML`` graph format. 59If your file format is a variant of one of those, you should modify the 60existing Reader to support your variant. This is done by customizing the Options 61class for the Reader and making appropriate changes to the ``.cpp`` file to 62interpret those options and act accordingly. 63 64If your object file format is not a variant of any existing Reader, you'll need 65to create a new Reader subclass with the organization described above. 66 67Readers are factories 68--------------------- 69 70The linker will usually only instantiate your Reader once. That one Reader will 71have its loadFile() method called many times with different input files. 72To support multithreaded linking, the Reader may be parsing multiple input 73files in parallel. Therefore, there should be no parsing state in you Reader 74object. Any parsing state should be in ivars of your File subclass or in 75some temporary object. 76 77The key function to implement in a reader is:: 78 79 virtual error_code loadFile(LinkerInput &input, 80 std::vector<std::unique_ptr<File>> &result); 81 82It takes a memory buffer (which contains the contents of the object file 83being read) and returns an instantiated lld::File object which is 84a collection of Atoms. The result is a vector of File pointers (instead of 85simple a File pointer) because some file formats allow multiple object 86"files" to be encoded in one file system file. 87 88 89Memory Ownership 90---------------- 91 92Atoms are always owned by their File object. During core linking when Atoms 93are coalesced or stripped away, core linking does not delete them. 94Core linking just removes those unused Atoms from its internal list. 95The destructor of a File object is responsible for deleting all Atoms it 96owns, and if ownership of the MemoryBuffer was passed to it, the File 97destructor needs to delete that too. 98 99Making Atoms 100------------ 101 102The internal model of lld is purely Atom based. But most object files do not 103have an explicit concept of Atoms, instead most have "sections". The way 104to think of this is that a section is just a list of Atoms with common 105attributes. 106 107The first step in parsing section-based object files is to cleave each 108section into a list of Atoms. The technique may vary by section type. For 109code sections (e.g. .text), there are usually symbols at the start of each 110function. Those symbol addresses are the points at which the section is 111cleaved into discrete Atoms. Some file formats (like ELF) also include the 112length of each symbol in the symbol table. Otherwise, the length of each 113Atom is calculated to run to the start of the next symbol or the end of the 114section. 115 116Other sections types can be implicitly cleaved. For instance c-string literals 117or unwind info (e.g. .eh_frame) can be cleaved by having the Reader look at 118the content of the section. It is important to cleave sections into Atoms 119to remove false dependencies. For instance the .eh_frame section often 120has no symbols, but contains "pointers" to the functions for which it 121has unwind info. If the .eh_frame section was not cleaved (but left as one 122big Atom), there would always be a reference (from the eh_frame Atom) to 123each function. So the linker would be unable to coalesce or dead stripped 124away the function atoms. 125 126The lld Atom model also requires that a reference to an undefined symbol be 127modeled as a Reference to an UndefinedAtom. So the Reader also needs to 128create an UndefinedAtom for each undefined symbol in the object file. 129 130Once all Atoms have been created, the second step is to create References 131(recall that Atoms are "nodes" and References are "edges"). Most References 132are created by looking at the "relocation records" in the object file. If 133a function contains a call to "malloc", there is usually a relocation record 134specifying the address in the section and the symbol table index. Your 135Reader will need to convert the address to an Atom and offset and the symbol 136table index into a target Atom. If "malloc" is not defined in the object file, 137the target Atom of the Reference will be an UndefinedAtom. 138 139 140Performance 141----------- 142Once you have the above working to parse an object file into Atoms and 143References, you'll want to look at performance. Some techniques that can 144help performance are: 145 146* Use llvm::BumpPtrAllocator or pre-allocate one big vector<Reference> and then 147 just have each atom point to its subrange of References in that vector. 148 This can be faster that allocating each Reference as separate object. 149* Pre-scan the symbol table and determine how many atoms are in each section 150 then allocate space for all the Atom objects at once. 151* Don't copy symbol names or section content to each Atom, instead use 152 StringRef and ArrayRef in each Atom to point to its name and content in the 153 MemoryBuffer. 154 155 156Testing 157------- 158 159We are still working on infrastructure to test Readers. The issue is that 160you don't want to check in binary files to the test suite. And the tools 161for creating your object file from assembly source may not be available on 162every OS. 163 164We are investigating a way to use YAML to describe the section, symbols, 165and content of a file. Then have some code which will write out an object 166file from that YAML description. 167 168Once that is in place, you can write test cases that contain section/symbols 169YAML and is run through the linker to produce Atom/References based YAML which 170is then run through FileCheck to verify the Atoms and References are as 171expected. 172 173 174 175