1Symbolication
2=============
3
4.. contents::
5   :local:
6
7
8LLDB is separated into a shared library that contains the core of the debugger,
9and a driver that implements debugging and a command interpreter. LLDB can be
10used to symbolicate your crash logs and can often provide more information than
11other symbolication programs:
12
13- Inlined functions
14- Variables that are in scope for an address, along with their locations
15
16The simplest form of symbolication is to load an executable:
17
18.. code-block:: text
19
20   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out
21
22We use the ``--no-dependents`` flag with the ``target create`` command so that
23we don't load all of the dependent shared libraries from the current system.
24When we symbolicate, we are often symbolicating a binary that was running on
25another system, and even though the main executable might reference shared
26libraries in ``/usr/lib``, we often don't want to load the versions on the
27current computer.
28
29Using the ``image list`` command will show us a list of all shared libraries
30associated with the current target. As expected, we currently only have a
31single binary:
32
33.. code-block:: text
34
35   (lldb) image list
36   [  0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out
37         /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out
38
39Now we can look up an address:
40
41.. code-block:: text
42
43   (lldb) image lookup --address 0x100000aa3
44         Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131)
45         Summary: a.out`main + 67 at main.c:13
46
47Since we haven't specified a slide or any load addresses for individual
48sections in the binary, the address that we use here is a file address. A file
49address refers to a virtual address as defined by each object file.
50
51If we didn't use the ``--no-dependents`` option with ``target create``, we
52would have loaded all dependent shared libraries:
53
54.. code-block:: text
55
56   (lldb) image list
57   [  0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out
58         /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out
59   [  1] 8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B 0x0000000000000000 /usr/lib/system/libsystem_c.dylib
60   [  2] 62AA0B84-188A-348B-8F9E-3E2DB08DB93C 0x0000000000000000 /usr/lib/system/libsystem_dnssd.dylib
61   [  3] C0535565-35D1-31A7-A744-63D9F10F12A4 0x0000000000000000 /usr/lib/system/libsystem_kernel.dylib
62   ...
63
64Now if we do a lookup using a file address, this can result in multiple matches
65since most shared libraries have a virtual address space that starts at zero:
66
67.. code-block:: text
68
69   (lldb) image lookup -a 0x1000
70         Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096)
71
72         Address: libsystem_c.dylib[0x0000000000001000] (libsystem_c.dylib.__TEXT.__text + 928)
73         Summary: libsystem_c.dylib`mcount + 9
74
75         Address: libsystem_dnssd.dylib[0x0000000000001000] (libsystem_dnssd.dylib.__TEXT.__text + 456)
76         Summary: libsystem_dnssd.dylib`ConvertHeaderBytes + 38
77
78         Address: libsystem_kernel.dylib[0x0000000000001000] (libsystem_kernel.dylib.__TEXT.__text + 1116)
79         Summary: libsystem_kernel.dylib`clock_get_time + 102
80   ...
81
82To avoid getting multiple file address matches, you can specify the name of the
83shared library to limit the search:
84
85.. code-block:: text
86
87   (lldb) image lookup -a 0x1000 a.out
88         Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096)
89
90Defining Load Addresses for Sections
91------------------------------------
92
93When symbolicating your crash logs, it can be tedious if you always have to
94adjust your crashlog-addresses into file addresses. To avoid having to do any
95conversion, you can set the load address for the sections of the modules in
96your target. Once you set any section load address, lookups will switch to
97using load addresses. You can slide all sections in the executable by the same
98amount, or set the load address for individual sections. The ``target modules
99load --slide`` command allows us to set the load address for all sections.
100
101Below is an example of sliding all sections in a.out by adding 0x123000 to each
102section's file address:
103
104.. code-block:: text
105
106   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out
107   (lldb) target modules load --file a.out --slide 0x123000
108
109
110It is often much easier to specify the actual load location of each section by
111name. Crash logs on macOS have a Binary Images section that specifies that
112address of the __TEXT segment for each binary. Specifying a slide requires
113requires that you first find the original (file) address for the __TEXT
114segment, and subtract the two values. If you specify the address of the __TEXT
115segment with ``target modules load section address``, you don't need to do any
116calculations. To specify the load addresses of sections we can specify one or
117more section name + address pairs in the ``target modules load`` command:
118
119.. code-block:: text
120
121   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out
122   (lldb) target modules load --file a.out __TEXT 0x100123000
123
124We specified that the __TEXT section is loaded at 0x100123000. Now that we have
125defined where sections have been loaded in our target, any lookups we do will
126now use load addresses so we don't have to do any math on the addresses in the
127crashlog backtraces, we can just use the raw addresses:
128
129.. code-block:: text
130
131   (lldb) image lookup --address 0x100123aa3
132         Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131)
133         Summary: a.out`main + 67 at main.c:13
134
135Loading Multiple Executables
136----------------------------
137
138You often have more than one executable involved when you need to symbolicate a
139crash log. When this happens, you create a target for the main executable or
140one of the shared libraries, then add more modules to the target using the
141``target modules add`` command.
142
143Lets say we have a Darwin crash log that contains the following images:
144
145.. code-block:: text
146
147   Binary Images:
148      0x100000000 -    0x100000ff7 <A866975B-CA1E-3649-98D0-6C5FAA444ECF> /tmp/a.out
149   0x7fff83f32000 - 0x7fff83ffefe7 <8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B> /usr/lib/system/libsystem_c.dylib
150   0x7fff883db000 - 0x7fff883e3ff7 <62AA0B84-188A-348B-8F9E-3E2DB08DB93C> /usr/lib/system/libsystem_dnssd.dylib
151   0x7fff8c0dc000 - 0x7fff8c0f7ff7 <C0535565-35D1-31A7-A744-63D9F10F12A4> /usr/lib/system/libsystem_kernel.dylib
152
153First we create the target using the main executable and then add any extra
154shared libraries we want:
155
156.. code-block:: text
157
158   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out
159   (lldb) target modules add /usr/lib/system/libsystem_c.dylib
160   (lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib
161   (lldb) target modules add /usr/lib/system/libsystem_kernel.dylib
162
163
164If you have debug symbols in standalone files, such as dSYM files on macOS,
165you can specify their paths using the --symfile option for the ``target create``
166(recent LLDB releases only) and ``target modules add`` commands:
167
168.. code-block:: text
169
170   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out --symfile /tmp/a.out.dSYM
171   (lldb) target modules add /usr/lib/system/libsystem_c.dylib --symfile /build/server/a/libsystem_c.dylib.dSYM
172   (lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib --symfile /build/server/b/libsystem_dnssd.dylib.dSYM
173   (lldb) target modules add /usr/lib/system/libsystem_kernel.dylib --symfile /build/server/c/libsystem_kernel.dylib.dSYM
174
175Then we set the load addresses for each __TEXT section (note the colors of the
176load addresses above and below) using the first address from the Binary Images
177section for each image:
178
179.. code-block:: text
180
181   (lldb) target modules load --file a.out 0x100000000
182   (lldb) target modules load --file libsystem_c.dylib 0x7fff83f32000
183   (lldb) target modules load --file libsystem_dnssd.dylib 0x7fff883db000
184   (lldb) target modules load --file libsystem_kernel.dylib 0x7fff8c0dc000
185
186
187Now any stack backtraces that haven't been symbolicated can be symbolicated
188using ``image lookup`` with the raw backtrace addresses.
189
190Given the following raw backtrace:
191
192.. code-block:: text
193
194   Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
195   0   libsystem_kernel.dylib        	0x00007fff8a1e6d46 __kill + 10
196   1   libsystem_c.dylib             	0x00007fff84597df0 abort + 177
197   2   libsystem_c.dylib             	0x00007fff84598e2a __assert_rtn + 146
198   3   a.out                         	0x0000000100000f46 main + 70
199   4   libdyld.dylib                 	0x00007fff8c4197e1 start + 1
200
201We can now symbolicate the load addresses:
202
203.. code-block:: text
204
205   (lldb) image lookup -a 0x00007fff8a1e6d46
206   (lldb) image lookup -a 0x00007fff84597df0
207   (lldb) image lookup -a 0x00007fff84598e2a
208   (lldb) image lookup -a 0x0000000100000f46
209
210
211Getting Variable Information
212----------------------------
213
214If you add the --verbose flag to the ``image lookup --address`` command, you
215can get verbose information which can often include the locations of some of
216your local variables:
217
218.. code-block:: text
219
220   (lldb) image lookup --address 0x100123aa3 --verbose
221         Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 110)
222         Summary: a.out`main + 50 at main.c:13
223         Module: file = "/tmp/a.out", arch = "x86_64"
224   CompileUnit: id = {0x00000000}, file = "/tmp/main.c", language = "ISO C:1999"
225      Function: id = {0x0000004f}, name = "main", range = [0x0000000100000bc0-0x0000000100000dc9)
226      FuncType: id = {0x0000004f}, decl = main.c:9, compiler_type = "int (int, const char **, const char **, const char **)"
227        Blocks: id = {0x0000004f}, range = [0x100000bc0-0x100000dc9)
228                id = {0x000000ae}, range = [0x100000bf2-0x100000dc4)
229      LineEntry: [0x0000000100000bf2-0x0000000100000bfa): /tmp/main.c:13:23
230        Symbol: id = {0x00000004}, range = [0x0000000100000bc0-0x0000000100000dc9), name="main"
231      Variable: id = {0x000000bf}, name = "path", type= "char [1024]", location = DW_OP_fbreg(-1072), decl = main.c:28
232      Variable: id = {0x00000072}, name = "argc", type= "int", location = r13, decl = main.c:8
233      Variable: id = {0x00000081}, name = "argv", type= "const char **", location = r12, decl = main.c:8
234      Variable: id = {0x00000090}, name = "envp", type= "const char **", location = r15, decl = main.c:8
235      Variable: id = {0x0000009f}, name = "aapl", type= "const char **", location = rbx, decl = main.c:8
236
237
238The interesting part is the variables that are listed. The variables are the
239parameters and local variables that are in scope for the address that was
240specified. These variable entries have locations which are shown in bold above.
241Crash logs often have register information for the first frame in each stack,
242and being able to reconstruct one or more local variables can often help you
243decipher more information from a crash log than you normally would be able to.
244Note that this is really only useful for the first frame, and only if your
245crash logs have register information for your threads.
246
247Using Python API to Symbolicate
248-------------------------------
249
250All of the commands above can be done through the python script bridge. The
251code below will recreate the target and add the three shared libraries that we
252added in the darwin crash log example above:
253
254.. code-block:: python
255
256   triple = "x86_64-apple-macosx"
257   platform_name = None
258   add_dependents = False
259   target = lldb.debugger.CreateTarget("/tmp/a.out", triple, platform_name, add_dependents, lldb.SBError())
260   if target:
261         # Get the executable module
262         module = target.GetModuleAtIndex(0)
263         target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x100000000)
264         module = target.AddModule ("/usr/lib/system/libsystem_c.dylib", triple, None, "/build/server/a/libsystem_c.dylib.dSYM")
265         target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff83f32000)
266         module = target.AddModule ("/usr/lib/system/libsystem_dnssd.dylib", triple, None, "/build/server/b/libsystem_dnssd.dylib.dSYM")
267         target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff883db000)
268         module = target.AddModule ("/usr/lib/system/libsystem_kernel.dylib", triple, None, "/build/server/c/libsystem_kernel.dylib.dSYM")
269         target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff8c0dc000)
270
271         load_addr = 0x00007fff8a1e6d46
272         # so_addr is a section offset address, or a lldb.SBAddress object
273         so_addr = target.ResolveLoadAddress (load_addr)
274         # Get a symbol context for the section offset address which includes
275         # a module, compile unit, function, block, line entry, and symbol
276         sym_ctx = so_addr.GetSymbolContext (lldb.eSymbolContextEverything)
277         print sym_ctx
278
279
280Use Builtin Python Module to Symbolicate
281----------------------------------------
282
283LLDB includes a module in the lldb package named lldb.utils.symbolication. This module contains a lot of symbolication functions that simplify the symbolication process by allowing you to create objects that represent symbolication class objects such as:
284
285- lldb.utils.symbolication.Address
286- lldb.utils.symbolication.Section
287- lldb.utils.symbolication.Image
288- lldb.utils.symbolication.Symbolicator
289
290
291**lldb.utils.symbolication.Address**
292
293This class represents an address that will be symbolicated. It will cache any
294information that has been looked up: module, compile unit, function, block,
295line entry, symbol. It does this by having a lldb.SBSymbolContext as a member
296variable.
297
298**lldb.utils.symbolication.Section**
299
300This class represents a section that might get loaded in a
301lldb.utils.symbolication.Image. It has helper functions that allow you to set
302it from text that might have been extracted from a crash log file.
303
304**lldb.utils.symbolication.Image**
305
306This class represents a module that might get loaded into the target we use for
307symbolication. This class contains the executable path, optional symbol file
308path, the triple, and the list of sections that will need to be loaded if we
309choose the ask the target to load this image. Many of these objects will never
310be loaded into the target unless they are needed by symbolication. You often
311have a crash log that has 100 to 200 different shared libraries loaded, but
312your crash log stack backtraces only use a few of these shared libraries. Only
313the images that contain stack backtrace addresses need to be loaded in the
314target in order to symbolicate.
315
316Subclasses of this class will want to override the
317locate_module_and_debug_symbols method:
318
319.. code-block:: text
320
321   class CustomImage(lldb.utils.symbolication.Image):
322      def locate_module_and_debug_symbols (self):
323         # Locate the module and symbol given the info found in the crash log
324
325Overriding this function allows clients to find the correct executable module
326and symbol files as they might reside on a build server.
327
328**lldb.utils.symbolication.Symbolicator**
329
330This class coordinates the symbolication process by loading only the
331lldb.utils.symbolication.Image instances that need to be loaded in order to
332symbolicate an supplied address.
333
334**lldb.macosx.crashlog**
335
336lldb.macosx.crashlog is a package that is distributed on macOS builds that
337subclasses the above classes. This module parses the information in the Darwin
338crash logs and creates symbolication objects that represent the images, the
339sections and the thread frames for the backtraces. It then uses the functions
340in the lldb.utils.symbolication to symbolicate the crash logs.
341
342This module installs a new ``crashlog`` command into the lldb command
343interpreter so that you can use it to parse and symbolicate macOS crash
344logs:
345
346.. code-block:: text
347
348   (lldb) command script import lldb.macosx.crashlog
349   "crashlog" and "save_crashlog" command installed, use the "--help" option for detailed help
350   (lldb) crashlog /tmp/crash.log
351   ...
352
353The command that is installed has built in help that shows the options that can
354be used when symbolicating:
355
356.. code-block:: text
357
358   (lldb) crashlog --help
359   Usage: crashlog [options]  [FILE ...]
360
361Symbolicate one or more darwin crash log files to provide source file and line
362information, inlined stack frames back to the concrete functions, and
363disassemble the location of the crash for the first frame of the crashed
364thread. If this script is imported into the LLDB command interpreter, a
365``crashlog`` command will be added to the interpreter for use at the LLDB
366command line. After a crash log has been parsed and symbolicated, a target will
367have been created that has all of the shared libraries loaded at the load
368addresses found in the crash log file. This allows you to explore the program
369as if it were stopped at the locations described in the crash log and functions
370can  be disassembled and lookups can be performed using the addresses found in
371the crash log.
372
373.. code-block:: text
374
375   Options:
376   -h, --help            show this help message and exit
377   -v, --verbose         display verbose debug info
378   -g, --debug           display verbose debug logging
379   -a, --load-all        load all executable images, not just the images found
380                           in the crashed stack frames
381   --images              show image list
382   --debug-delay=NSEC    pause for NSEC seconds for debugger
383   -c, --crashed-only    only symbolicate the crashed thread
384   -d DISASSEMBLE_DEPTH, --disasm-depth=DISASSEMBLE_DEPTH
385                           set the depth in stack frames that should be
386                           disassembled (default is 1)
387   -D, --disasm-all      enabled disassembly of frames on all threads (not just
388                           the crashed thread)
389   -B DISASSEMBLE_BEFORE, --disasm-before=DISASSEMBLE_BEFORE
390                           the number of instructions to disassemble before the
391                           frame PC
392   -A DISASSEMBLE_AFTER, --disasm-after=DISASSEMBLE_AFTER
393                           the number of instructions to disassemble after the
394                           frame PC
395   -C NLINES, --source-context=NLINES
396                           show NLINES source lines of source context (default =
397                           4)
398   --source-frames=NFRAMES
399                           show source for NFRAMES (default = 4)
400   --source-all          show source for all threads, not just the crashed
401                           thread
402   -i, --interactive     parse all crash logs and enter interactive mode
403
404
405The source for the "symbolication" and "crashlog" modules are available in SVN.
406
407