1The PDB Serialized Hash Table Format 2==================================== 3 4.. contents:: 5 :local: 6 7.. _hash_intro: 8 9Introduction 10============ 11 12One of the design goals of the PDB format is to provide accelerated access to 13debug information, and for this reason there are several occasions where hash 14tables are serialized and embedded directly to the file, rather than requiring 15a consumer to read a list of values and reconstruct the hash table on the fly. 16 17The serialization format supports hash tables of arbitrarily large size and 18capacity, as well as value types and hash functions. The only supported key 19value type is a uint32. The only requirement is that the producer and consumer 20agree on the hash function. As such, the hash function can is not discussed 21further in this document, it is assumed that for a particular instance of a PDB 22file hash table, the appropriate hash function is being used. 23 24On-Disk Format 25============== 26 27.. code-block:: none 28 29 .--------------------.-- +0 30 | Size | 31 .--------------------.-- +4 32 | Capacity | 33 .--------------------.-- +8 34 | Present Bit Vector | 35 .--------------------.-- +N 36 | Deleted Bit Vector | 37 .--------------------.-- +M ─╮ 38 | Key | │ 39 .--------------------.-- +M+4 │ 40 | Value | │ 41 .--------------------.-- +M+4+sizeof(Value) │ 42 ... ├─ |Capacity| Bucket entries 43 .--------------------. │ 44 | Key | │ 45 .--------------------. │ 46 | Value | │ 47 .--------------------. ─╯ 48 49- **Size** - The number of values contained in the hash table. 50 51- **Capacity** - The number of buckets in the hash table. Producers should 52 maintain a load factor of no greater than ``2/3*Capacity+1``. 53 54- **Present Bit Vector** - A serialized bit vector which contains information 55 about which buckets have valid values. If the bucket has a value, the 56 corresponding bit will be set, and if the bucket doesn't have a value (either 57 because the bucket is empty or because the value is a tombstone value) the bit 58 will be unset. 59 60- **Deleted Bit Vector** - A serialized bit vector which contains information 61 about which buckets have tombstone values. If the entry in this bucket is 62 deleted, the bit will be set, otherwise it will be unset. 63 64- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first 65 entry is the key (always a uint32), and the second entry is the value. The 66 state of each bucket (valid, empty, deleted) can be determined by examining 67 the present and deleted bit vectors. 68 69 70.. _hash_bit_vectors: 71 72Present and Deleted Bit Vectors 73=============================== 74 75The bit vectors indicating the status of each bucket are serialized as follows: 76 77.. code-block:: none 78 79 .--------------------.-- +0 80 | Word Count | 81 .--------------------.-- +4 82 | Word_0 | ─╮ 83 .--------------------.-- +8 │ 84 | Word_1 | │ 85 .--------------------.-- +12 ├─ |Word Count| values 86 ... │ 87 .--------------------. │ 88 | Word_N | │ 89 .--------------------. ─╯ 90 91The words, when viewed as a contiguous block of bytes, represent a bit vector 92with the following layout: 93 94.. code-block:: none 95 96 .------------. .------------.------------. 97 | Word_N | ... | Word_1 | Word_0 | 98 .------------. .------------.------------. 99 | | | | | 100 +N*32 +(N-1)*32 +64 +32 +0 101 102where the k'th bit of this bit vector represents the status of the k'th bucket 103in the hash table. 104