1=====================================
2The PDB DBI (Debug Info) Stream
3=====================================
4
5.. contents::
6   :local:
7
8.. _dbi_intro:
9
10Introduction
11============
12
13The PDB DBI Stream (Index 3) is one of the largest and most important streams
14in a PDB file.  It contains information about how the program was compiled,
15(e.g. compilation flags, etc), the compilands (e.g. object files) that
16were used to link together the program, the source files which were used
17to build the program, as well as references to other streams that contain more
18detailed information about each compiland, such as the CodeView symbol records
19contained within each compiland and the source and line information for
20functions and other symbols within each compiland.
21
22
23.. _dbi_header:
24
25Stream Header
26=============
27At offset 0 of the DBI Stream is a header with the following layout:
28
29
30.. code-block:: c++
31
32  struct DbiStreamHeader {
33    int32_t VersionSignature;
34    uint32_t VersionHeader;
35    uint32_t Age;
36    uint16_t GlobalStreamIndex;
37    uint16_t BuildNumber;
38    uint16_t PublicStreamIndex;
39    uint16_t PdbDllVersion;
40    uint16_t SymRecordStream;
41    uint16_t PdbDllRbld;
42    int32_t ModInfoSize;
43    int32_t SectionContributionSize;
44    int32_t SectionMapSize;
45    int32_t SourceInfoSize;
46    int32_t TypeServerSize;
47    uint32_t MFCTypeServerIndex;
48    int32_t OptionalDbgHeaderSize;
49    int32_t ECSubstreamSize;
50    uint16_t Flags;
51    uint16_t Machine;
52    uint32_t Padding;
53  };
54
55- **VersionSignature** - Unknown meaning.  Appears to always be ``-1``.
56
57- **VersionHeader** - A value from the following enum.
58
59.. code-block:: c++
60
61  enum class DbiStreamVersion : uint32_t {
62    VC41 = 930803,
63    V50 = 19960307,
64    V60 = 19970606,
65    V70 = 19990903,
66    V110 = 20091201
67  };
68
69Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
70``V70``, and it is not clear what the other values are for.
71
72- **Age** - The number of times the PDB has been written.  Equal to the same
73  field from the :ref:`PDB Stream header <pdb_stream_header>`.
74
75- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
76  which contains CodeView symbol records for all global symbols.  Actual records
77  are stored in the symbol record stream, and are referenced from this stream.
78
79- **BuildNumber** - A bitfield containing values representing the major and minor
80  version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
81  program, with the following layout:
82
83.. code-block:: c++
84
85  uint16_t MinorVersion : 8;
86  uint16_t MajorVersion : 7;
87  uint16_t NewVersionFormat : 1;
88
89For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
90If it is ``false``, the layout above does not apply and the reader should consult
91the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
92further guidance.
93
94- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
95  which contains CodeView symbol records for all public symbols.  Actual records
96  are stored in the symbol record stream, and are referenced from this stream.
97
98- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
99  PDB.  Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
100
101- **SymRecordStream** - The stream containing all CodeView symbol records used
102  by the program.  This is used for deduplication, so that many different
103  compilands can refer to the same symbols without having to include the full record
104  content inside of each module stream.
105
106- **PdbDllRbld** - Unknown
107
108- **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream
109
110- **Flags** - A bitfield with the following layout, containing various
111  information about how the program was built:
112
113.. code-block:: c++
114
115  uint16_t WasIncrementallyLinked : 1;
116  uint16_t ArePrivateSymbolsStripped : 1;
117  uint16_t HasConflictingTypes : 1;
118  uint16_t Reserved : 13;
119
120The only one of these that is not self-explanatory is ``HasConflictingTypes``.
121Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
122If it is passed to ``link.exe``, this field will be set.  Otherwise it will
123not be set.  It is unclear what this flag does, although it seems to have
124subtle implications on the algorithm used to look up type records.
125
126- **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__
127  enumeration.  Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
128
129Immediately after the fixed-size DBI Stream header are ``7`` variable-length
130`substreams`.  The following ``7`` fields of the DBI Stream header specify the
131number of bytes of the corresponding substream.  Each substream's contents will
132be described in detail :ref:`below <dbi_substreams>`.  The length of the entire
133DBI Stream should equal ``64`` (the length of the header above) plus the value
134of each of the following ``7`` fields.
135
136- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
137
138- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
139
140- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
141
142- **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
143
144- **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`.
145
146- **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
147
148- **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
149
150.. _dbi_substreams:
151
152Substreams
153==========
154
155.. _dbi_mod_info_substream:
156
157Module Info Substream
158^^^^^^^^^^^^^^^^^^^^^
159
160Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`.  The
161module info substream is an array of variable-length records, each one
162describing a single module (e.g. object file) linked into the program.  Each
163record in the array has the format:
164
165.. code-block:: c++
166
167  struct SectionContribEntry {
168    uint16_t Section;
169    char Padding1[2];
170    int32_t Offset;
171    int32_t Size;
172    uint32_t Characteristics;
173    uint16_t ModuleIndex;
174    char Padding2[2];
175    uint32_t DataCrc;
176    uint32_t RelocCrc;
177  };
178
179While most of these are self-explanatory, the ``Characteristics`` field
180warrants some elaboration.  It corresponds to the ``Characteristics``
181field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
182structure.
183
184.. code-block:: c++
185
186  struct ModInfo {
187    uint32_t Unused1;
188    SectionContribEntry SectionContr;
189    uint16_t Flags;
190    uint16_t ModuleSymStream;
191    uint32_t SymByteSize;
192    uint32_t C11ByteSize;
193    uint32_t C13ByteSize;
194    uint16_t SourceFileCount;
195    char Padding[2];
196    uint32_t Unused2;
197    uint32_t SourceFileNameIndex;
198    uint32_t PdbFilePathNameIndex;
199    char ModuleName[];
200    char ObjFileName[];
201  };
202
203- **SectionContr** - Describes the properties of the section in the final binary
204  which contain the code and data from this module.
205
206- **Flags** - A bitfield with the following format:
207
208.. code-block:: c++
209
210  uint16_t Dirty : 1;  // ``true`` if this ModInfo has been written since reading the PDB.
211  uint16_t EC : 1;     // ``true`` if EC information is present for this module. It is unknown what EC actually is.
212  uint16_t Unused : 6;
213  uint16_t TSM : 8;    // Type Server Index for this module.  It is unknown what this is used for, but it is not used by LLVM.
214
215
216- **ModuleSymStream** - The index of the stream that contains symbol information
217  for this module.  This includes CodeView symbol information as well as source
218  and line information.
219
220- **SymByteSize** - The number of bytes of data from the stream identified by
221  ``ModuleSymStream`` that represent CodeView symbol records.
222
223- **C11ByteSize** - The number of bytes of data from the stream identified by
224  ``ModuleSymStream`` that represent C11-style CodeView line information.
225
226- **C13ByteSize** - The number of bytes of data from the stream identified by
227  ``ModuleSymStream`` that represent C13-style CodeView line information.  At
228  most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero.
229
230- **SourceFileCount** - The number of source files that contributed to this
231  module during compilation.
232
233- **SourceFileNameIndex** - The offset in the names buffer of the primary
234  translation unit used to build this module.  All PDB files observed to date
235  always have this value equal to 0.
236
237- **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
238  containing this module's symbol information.  This has only been observed
239  to be non-zero for the special ``* Linker *`` module.
240
241- **ModuleName** - The module name.  This is usually either a full path to an
242  object file (either directly passed to ``link.exe`` or from an archive) or
243  a string of the form ``Import:<dll name>``.
244
245- **ObjFileName** - The object file name.  In the case of an module that is
246  linked directly passed to ``link.exe``, this is the same as **ModuleName**.
247  In the case of a module that comes from an archive, this is usually the full
248  path to the archive.
249
250.. _dbi_sec_contr_substream:
251
252Section Contribution Substream
253^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
254Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
255and consumes ``Header->SectionContributionSize`` bytes.  This substream begins
256with a single ``uint32_t`` which will be one of the following values:
257
258.. code-block:: c++
259
260  enum class SectionContrSubstreamVersion : uint32_t {
261    Ver60 = 0xeffe0000 + 19970605,
262    V2 = 0xeffe0000 + 20140516
263  };
264
265``Ver60`` is the only value which has been observed in a PDB so far.  Following
266this ``4`` byte field is an array of fixed-length structures.  If the version
267is ``Ver60``, it is an array of ``SectionContribEntry`` structures.  If the
268version is ``V2``, it is an array of ``SectionContribEntry2`` structures,
269defined as follows:
270
271.. code-block:: c++
272
273  struct SectionContribEntry2 {
274    SectionContribEntry SC;
275    uint32_t ISectCoff;
276  };
277
278The purpose of the second field is not well understood.
279
280
281.. _dbi_section_map_substream:
282
283Section Map Substream
284^^^^^^^^^^^^^^^^^^^^^
285Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
286and consumes ``Header->SectionMapSize`` bytes.  This substream begins with an ``8``
287byte header followed by an array of fixed-length records.  The header and records
288have the following layout:
289
290.. code-block:: c++
291
292  struct SectionMapHeader {
293    uint16_t Count;    // Number of segment descriptors
294    uint16_t LogCount; // Number of logical segment descriptors
295  };
296
297  struct SectionMapEntry {
298    uint16_t Flags;         // See the SectionMapEntryFlags enum below.
299    uint16_t Ovl;           // Logical overlay number
300    uint16_t Group;         // Group index into descriptor array.
301    uint16_t Frame;
302    uint16_t SectionName;   // Byte index of segment / group name in string table, or 0xFFFF.
303    uint16_t ClassName;     // Byte index of class in string table, or 0xFFFF.
304    uint32_t Offset;        // Byte offset of the logical segment within physical segment.  If group is set in flags, this is the offset of the group.
305    uint32_t SectionLength; // Byte count of the segment or group.
306  };
307
308  enum class SectionMapEntryFlags : uint16_t {
309    Read = 1 << 0,              // Segment is readable.
310    Write = 1 << 1,             // Segment is writable.
311    Execute = 1 << 2,           // Segment is executable.
312    AddressIs32Bit = 1 << 3,    // Descriptor describes a 32-bit linear address.
313    IsSelector = 1 << 8,        // Frame represents a selector.
314    IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
315    IsGroup = 1 << 10           // If set, descriptor represents a group.
316  };
317
318Many of these fields are not well understood, so will not be discussed further.
319
320.. _dbi_file_info_substream:
321
322File Info Substream
323^^^^^^^^^^^^^^^^^^^
324Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
325and consumes ``Header->SourceInfoSize`` bytes.  This substream defines the mapping
326from module to the source files that contribute to that module.  Since multiple
327modules can use the same source file (for example, a header file), this substream
328uses a string table to store each unique file name only once, and then have each
329module use offsets into the string table rather than embedding the string's value
330directly.  The format of this substream is as follows:
331
332.. code-block:: c++
333
334  struct FileInfoSubstream {
335    uint16_t NumModules;
336    uint16_t NumSourceFiles;
337
338    uint16_t ModIndices[NumModules];
339    uint16_t ModFileCounts[NumModules];
340    uint32_t FileNameOffsets[NumSourceFiles];
341    char NamesBuffer[][NumSourceFiles];
342  };
343
344**NumModules** - The number of modules for which source file information is
345contained within this substream.  Should match the corresponding value from the
346ref:`dbi_header`.
347
348**NumSourceFiles**: In theory this is supposed to contain the number of source
349files for which this substream contains information.  But that would present a
350problem in that the width of this field being ``16``-bits would prevent one from
351having more than 64K source files in a program.  In early versions of the file
352format, this seems to have been the case.  In order to support more than this, this
353field of the is simply ignored, and computed dynamically by summing up the values of
354the ``ModFileCounts`` array (discussed below).  In short, this value should be
355ignored.
356
357**ModIndices** - This array is present, but does not appear to be useful.
358
359**ModFileCountArray** - An array of ``NumModules`` integers, each one containing
360the number of source files which contribute to the module at the specified index.
361While each individual module is limited to 64K contributing source files, the
362union of all modules' source files may be greater than 64K.  The real number of
363source files is thus computed by summing this array.  Note that summing this array
364does not give the number of `unique` source files, only the total number of source
365file contributions to modules.
366
367**FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
368here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
369each integer is an offset into **NamesBuffer** pointing to a null terminated string.
370
371**NamesBuffer** - An array of null terminated strings containing the actual source
372file names.
373
374.. _dbi_type_server_substream:
375
376Type Server Substream
377^^^^^^^^^^^^^^^^^^^^^
378Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends,
379and consumes ``Header->TypeServerSize`` bytes.  Neither the purpose nor the layout
380of this substream is understood, although it is assumed to related somehow to the
381usage of ``/Zi`` and ``mspdbsrv.exe``.  This substream will not be discussed further.
382
383.. _dbi_ec_substream:
384
385EC Substream
386^^^^^^^^^^^^
387Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends,
388and consumes ``Header->ECSubstreamSize`` bytes.  Neither the purpose nor the layout
389of this substream is understood, and it will not be discussed further.
390
391.. _dbi_optional_dbg_stream:
392
393Optional Debug Header Stream
394^^^^^^^^^^^^^^^^^^^^^^^^^^^^
395Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
396consumes ``Header->OptionalDbgHeaderSize`` bytes.  This field is an array of
397stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
398index in the larger MSF file which contains some additional debug information.
399Each position of this array has a special meaning, allowing one to determine
400what kind of debug information is at the referenced stream.  ``11`` indices
401are currently understood, although it's possible there may be more.  The
402layout of each stream generally corresponds exactly to a particular type
403of debug data directory from the PE/COFF file.  The format of these fields
404can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__.
405
406**FPO Data** - ``DbgStreamArray[0]``.  The data in the referenced stream is a
407debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``
408
409**Exception Data** - ``DbgStreamArray[1]``.  The data in the referenced stream
410is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
411
412**Fixup Data** - ``DbgStreamArray[2]``.  The data in the referenced stream is a
413debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
414
415**Omap To Src Data** - ``DbgStreamArray[3]``.  The data in the referenced stream
416is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``.  This
417is used for mapping addresses between instrumented and uninstrumented code.
418
419**Omap From Src Data** - ``DbgStreamArray[4]``.  The data in the referenced stream
420is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``.  This
421is used for mapping addresses between instrumented and uninstrumented code.
422
423**Section Header Data** - ``DbgStreamArray[5]``.  A dump of all section headers from
424the original executable.
425
426**Token / RID Map** - ``DbgStreamArray[6]``.  The layout of this stream is not
427understood, but it is assumed to be a mapping from ``CLR Token`` to
428``CLR Record ID``.  Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
429for more information.
430
431**Xdata** - ``DbgStreamArray[7]``.  A copy of the ``.xdata`` section from the
432executable.
433
434**Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
435section from the executable, but that would make it identical to
436``DbgStreamArray[1]``.  The difference between these two indices is not well
437understood.
438
439**New FPO Data** - ``DbgStreamArray[9]``.  The data in the referenced stream is a
440debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``.  It is not clear how this
441differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have
442used the "new" format rather than the "old" format.
443
444**Original Section Header Data** - ``DbgStreamArray[10]``.  Assumed to be similar
445to ``DbgStreamArray[5]``, but has not been observed in practice.
446