1===================================== 2The PDB DBI (Debug Info) Stream 3===================================== 4 5.. contents:: 6 :local: 7 8.. _dbi_intro: 9 10Introduction 11============ 12 13The PDB DBI Stream (Index 3) is one of the largest and most important streams 14in a PDB file. It contains information about how the program was compiled, 15(e.g. compilation flags, etc), the compilands (e.g. object files) that 16were used to link together the program, the source files which were used 17to build the program, as well as references to other streams that contain more 18detailed information about each compiland, such as the CodeView symbol records 19contained within each compiland and the source and line information for 20functions and other symbols within each compiland. 21 22 23.. _dbi_header: 24 25Stream Header 26============= 27At offset 0 of the DBI Stream is a header with the following layout: 28 29 30.. code-block:: c++ 31 32 struct DbiStreamHeader { 33 int32_t VersionSignature; 34 uint32_t VersionHeader; 35 uint32_t Age; 36 uint16_t GlobalStreamIndex; 37 uint16_t BuildNumber; 38 uint16_t PublicStreamIndex; 39 uint16_t PdbDllVersion; 40 uint16_t SymRecordStream; 41 uint16_t PdbDllRbld; 42 int32_t ModInfoSize; 43 int32_t SectionContributionSize; 44 int32_t SectionMapSize; 45 int32_t SourceInfoSize; 46 int32_t TypeServerSize; 47 uint32_t MFCTypeServerIndex; 48 int32_t OptionalDbgHeaderSize; 49 int32_t ECSubstreamSize; 50 uint16_t Flags; 51 uint16_t Machine; 52 uint32_t Padding; 53 }; 54 55- **VersionSignature** - Unknown meaning. Appears to always be ``-1``. 56 57- **VersionHeader** - A value from the following enum. 58 59.. code-block:: c++ 60 61 enum class DbiStreamVersion : uint32_t { 62 VC41 = 930803, 63 V50 = 19960307, 64 V60 = 19970606, 65 V70 = 19990903, 66 V110 = 20091201 67 }; 68 69Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be 70``V70``, and it is not clear what the other values are for. 71 72- **Age** - The number of times the PDB has been written. Equal to the same 73 field from the :ref:`PDB Stream header <pdb_stream_header>`. 74 75- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`, 76 which contains CodeView symbol records for all global symbols. Actual records 77 are stored in the symbol record stream, and are referenced from this stream. 78 79- **BuildNumber** - A bitfield containing values representing the major and minor 80 version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the 81 program, with the following layout: 82 83.. code-block:: c++ 84 85 uint16_t MinorVersion : 8; 86 uint16_t MajorVersion : 7; 87 uint16_t NewVersionFormat : 1; 88 89For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``. 90If it is ``false``, the layout above does not apply and the reader should consult 91the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for 92further guidance. 93 94- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`, 95 which contains CodeView symbol records for all public symbols. Actual records 96 are stored in the symbol record stream, and are referenced from this stream. 97 98- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this 99 PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``. 100 101- **SymRecordStream** - The stream containing all CodeView symbol records used 102 by the program. This is used for deduplication, so that many different 103 compilands can refer to the same symbols without having to include the full record 104 content inside of each module stream. 105 106- **PdbDllRbld** - Unknown 107 108- **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream 109 110- **Flags** - A bitfield with the following layout, containing various 111 information about how the program was built: 112 113.. code-block:: c++ 114 115 uint16_t WasIncrementallyLinked : 1; 116 uint16_t ArePrivateSymbolsStripped : 1; 117 uint16_t HasConflictingTypes : 1; 118 uint16_t Reserved : 13; 119 120The only one of these that is not self-explanatory is ``HasConflictingTypes``. 121Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``. 122If it is passed to ``link.exe``, this field will be set. Otherwise it will 123not be set. It is unclear what this flag does, although it seems to have 124subtle implications on the algorithm used to look up type records. 125 126- **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__ 127 enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86). 128 129Immediately after the fixed-size DBI Stream header are ``7`` variable-length 130`substreams`. The following ``7`` fields of the DBI Stream header specify the 131number of bytes of the corresponding substream. Each substream's contents will 132be described in detail :ref:`below <dbi_substreams>`. The length of the entire 133DBI Stream should equal ``64`` (the length of the header above) plus the value 134of each of the following ``7`` fields. 135 136- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`. 137 138- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`. 139 140- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`. 141 142- **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`. 143 144- **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`. 145 146- **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`. 147 148- **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`. 149 150.. _dbi_substreams: 151 152Substreams 153========== 154 155.. _dbi_mod_info_substream: 156 157Module Info Substream 158^^^^^^^^^^^^^^^^^^^^^ 159 160Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`. The 161module info substream is an array of variable-length records, each one 162describing a single module (e.g. object file) linked into the program. Each 163record in the array has the format: 164 165.. code-block:: c++ 166 167 struct SectionContribEntry { 168 uint16_t Section; 169 char Padding1[2]; 170 int32_t Offset; 171 int32_t Size; 172 uint32_t Characteristics; 173 uint16_t ModuleIndex; 174 char Padding2[2]; 175 uint32_t DataCrc; 176 uint32_t RelocCrc; 177 }; 178 179While most of these are self-explanatory, the ``Characteristics`` field 180warrants some elaboration. It corresponds to the ``Characteristics`` 181field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__ 182structure. 183 184.. code-block:: c++ 185 186 struct ModInfo { 187 uint32_t Unused1; 188 SectionContribEntry SectionContr; 189 uint16_t Flags; 190 uint16_t ModuleSymStream; 191 uint32_t SymByteSize; 192 uint32_t C11ByteSize; 193 uint32_t C13ByteSize; 194 uint16_t SourceFileCount; 195 char Padding[2]; 196 uint32_t Unused2; 197 uint32_t SourceFileNameIndex; 198 uint32_t PdbFilePathNameIndex; 199 char ModuleName[]; 200 char ObjFileName[]; 201 }; 202 203- **SectionContr** - Describes the properties of the section in the final binary 204 which contain the code and data from this module. 205 206- **Flags** - A bitfield with the following format: 207 208.. code-block:: c++ 209 210 uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB. 211 uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is. 212 uint16_t Unused : 6; 213 uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM. 214 215 216- **ModuleSymStream** - The index of the stream that contains symbol information 217 for this module. This includes CodeView symbol information as well as source 218 and line information. 219 220- **SymByteSize** - The number of bytes of data from the stream identified by 221 ``ModuleSymStream`` that represent CodeView symbol records. 222 223- **C11ByteSize** - The number of bytes of data from the stream identified by 224 ``ModuleSymStream`` that represent C11-style CodeView line information. 225 226- **C13ByteSize** - The number of bytes of data from the stream identified by 227 ``ModuleSymStream`` that represent C13-style CodeView line information. At 228 most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero. 229 230- **SourceFileCount** - The number of source files that contributed to this 231 module during compilation. 232 233- **SourceFileNameIndex** - The offset in the names buffer of the primary 234 translation unit used to build this module. All PDB files observed to date 235 always have this value equal to 0. 236 237- **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file 238 containing this module's symbol information. This has only been observed 239 to be non-zero for the special ``* Linker *`` module. 240 241- **ModuleName** - The module name. This is usually either a full path to an 242 object file (either directly passed to ``link.exe`` or from an archive) or 243 a string of the form ``Import:<dll name>``. 244 245- **ObjFileName** - The object file name. In the case of an module that is 246 linked directly passed to ``link.exe``, this is the same as **ModuleName**. 247 In the case of a module that comes from an archive, this is usually the full 248 path to the archive. 249 250.. _dbi_sec_contr_substream: 251 252Section Contribution Substream 253^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 254Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends, 255and consumes ``Header->SectionContributionSize`` bytes. This substream begins 256with a single ``uint32_t`` which will be one of the following values: 257 258.. code-block:: c++ 259 260 enum class SectionContrSubstreamVersion : uint32_t { 261 Ver60 = 0xeffe0000 + 19970605, 262 V2 = 0xeffe0000 + 20140516 263 }; 264 265``Ver60`` is the only value which has been observed in a PDB so far. Following 266this ``4`` byte field is an array of fixed-length structures. If the version 267is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the 268version is ``V2``, it is an array of ``SectionContribEntry2`` structures, 269defined as follows: 270 271.. code-block:: c++ 272 273 struct SectionContribEntry2 { 274 SectionContribEntry SC; 275 uint32_t ISectCoff; 276 }; 277 278The purpose of the second field is not well understood. 279 280 281.. _dbi_section_map_substream: 282 283Section Map Substream 284^^^^^^^^^^^^^^^^^^^^^ 285Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends, 286and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8`` 287byte header followed by an array of fixed-length records. The header and records 288have the following layout: 289 290.. code-block:: c++ 291 292 struct SectionMapHeader { 293 uint16_t Count; // Number of segment descriptors 294 uint16_t LogCount; // Number of logical segment descriptors 295 }; 296 297 struct SectionMapEntry { 298 uint16_t Flags; // See the SectionMapEntryFlags enum below. 299 uint16_t Ovl; // Logical overlay number 300 uint16_t Group; // Group index into descriptor array. 301 uint16_t Frame; 302 uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF. 303 uint16_t ClassName; // Byte index of class in string table, or 0xFFFF. 304 uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group. 305 uint32_t SectionLength; // Byte count of the segment or group. 306 }; 307 308 enum class SectionMapEntryFlags : uint16_t { 309 Read = 1 << 0, // Segment is readable. 310 Write = 1 << 1, // Segment is writable. 311 Execute = 1 << 2, // Segment is executable. 312 AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address. 313 IsSelector = 1 << 8, // Frame represents a selector. 314 IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address. 315 IsGroup = 1 << 10 // If set, descriptor represents a group. 316 }; 317 318Many of these fields are not well understood, so will not be discussed further. 319 320.. _dbi_file_info_substream: 321 322File Info Substream 323^^^^^^^^^^^^^^^^^^^ 324Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends, 325and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping 326from module to the source files that contribute to that module. Since multiple 327modules can use the same source file (for example, a header file), this substream 328uses a string table to store each unique file name only once, and then have each 329module use offsets into the string table rather than embedding the string's value 330directly. The format of this substream is as follows: 331 332.. code-block:: c++ 333 334 struct FileInfoSubstream { 335 uint16_t NumModules; 336 uint16_t NumSourceFiles; 337 338 uint16_t ModIndices[NumModules]; 339 uint16_t ModFileCounts[NumModules]; 340 uint32_t FileNameOffsets[NumSourceFiles]; 341 char NamesBuffer[][NumSourceFiles]; 342 }; 343 344**NumModules** - The number of modules for which source file information is 345contained within this substream. Should match the corresponding value from the 346ref:`dbi_header`. 347 348**NumSourceFiles**: In theory this is supposed to contain the number of source 349files for which this substream contains information. But that would present a 350problem in that the width of this field being ``16``-bits would prevent one from 351having more than 64K source files in a program. In early versions of the file 352format, this seems to have been the case. In order to support more than this, this 353field of the is simply ignored, and computed dynamically by summing up the values of 354the ``ModFileCounts`` array (discussed below). In short, this value should be 355ignored. 356 357**ModIndices** - This array is present, but does not appear to be useful. 358 359**ModFileCountArray** - An array of ``NumModules`` integers, each one containing 360the number of source files which contribute to the module at the specified index. 361While each individual module is limited to 64K contributing source files, the 362union of all modules' source files may be greater than 64K. The real number of 363source files is thus computed by summing this array. Note that summing this array 364does not give the number of `unique` source files, only the total number of source 365file contributions to modules. 366 367**FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles** 368here refers to the 32-bit value obtained from summing **ModFileCountArray**), where 369each integer is an offset into **NamesBuffer** pointing to a null terminated string. 370 371**NamesBuffer** - An array of null terminated strings containing the actual source 372file names. 373 374.. _dbi_type_server_substream: 375 376Type Server Substream 377^^^^^^^^^^^^^^^^^^^^^ 378Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends, 379and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout 380of this substream is understood, although it is assumed to related somehow to the 381usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further. 382 383.. _dbi_ec_substream: 384 385EC Substream 386^^^^^^^^^^^^ 387Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends, 388and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout 389of this substream is understood, and it will not be discussed further. 390 391.. _dbi_optional_dbg_stream: 392 393Optional Debug Header Stream 394^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 395Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and 396consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of 397stream indices (e.g. ``uint16_t``'s), each of which identifies a stream 398index in the larger MSF file which contains some additional debug information. 399Each position of this array has a special meaning, allowing one to determine 400what kind of debug information is at the referenced stream. ``11`` indices 401are currently understood, although it's possible there may be more. The 402layout of each stream generally corresponds exactly to a particular type 403of debug data directory from the PE/COFF file. The format of these fields 404can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__. 405 406**FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a 407debug data directory of type ``IMAGE_DEBUG_TYPE_FPO`` 408 409**Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream 410is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``. 411 412**Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a 413debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``. 414 415**Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream 416is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This 417is used for mapping addresses between instrumented and uninstrumented code. 418 419**Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream 420is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This 421is used for mapping addresses between instrumented and uninstrumented code. 422 423**Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from 424the original executable. 425 426**Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not 427understood, but it is assumed to be a mapping from ``CLR Token`` to 428``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__ 429for more information. 430 431**Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the 432executable. 433 434**Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata`` 435section from the executable, but that would make it identical to 436``DbgStreamArray[1]``. The difference between these two indices is not well 437understood. 438 439**New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a 440debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this 441differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have 442used the "new" format rather than the "old" format. 443 444**Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar 445to ``DbgStreamArray[5]``, but has not been observed in practice. 446