• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..--

.gitignoreD23-Nov-202335 43

READMED22-Nov-20233.2 KiB6152

gen10.xmlD23-Nov-2023211.3 KiB3,7703,565

gen4.xmlD23-Nov-202369.8 KiB1,2811,228

gen45.xmlD23-Nov-202372 KiB1,3141,260

gen5.xmlD23-Nov-202377.6 KiB1,4101,339

gen6.xmlD23-Nov-2023122.4 KiB2,1912,079

gen7.xmlD23-Nov-2023153.3 KiB2,7792,618

gen75.xmlD23-Nov-2023181.9 KiB3,2903,092

gen8.xmlD23-Nov-2023196.2 KiB3,5223,315

gen9.xmlD23-Nov-2023214.6 KiB3,8503,633

genX_pack.hD22-Nov-20231.9 KiB5427

gen_bits_header.pyD23-Nov-20239.9 KiB355264

gen_macros.hD23-Nov-20233 KiB9639

gen_pack_header.pyD23-Nov-202322.2 KiB680550

gen_zipped_file.pyD23-Nov-20232.3 KiB7236

meson.buildD23-Nov-20231.8 KiB5951

README

1This provides some background the design of the generated headers.  We
2started out trying to generate bit fields but it evolved into the pack
3functions because of a few limitations:
4
5  1) Bit fields still generate terrible code today. Even with modern
6     optimizing compilers you get multiple load+mask+store operations
7     to the same dword in memory as you set individual bits. The
8     compiler also has to generate code to mask out overflowing values
9     (for example, if you assign 200 to a 2 bit field). Our driver
10     never writes overflowing values so that's not needed. On the
11     other hand, most compiler recognize that the template struct we
12     use is a temporary variable and copy propagate the individual
13     fields and do amazing constant folding.  You should take a look
14     at the code that gets generated when you compile in release mode
15     with optimizations.
16
17  2) For some types we need to have overlapping bit fields. For
18     example, some values are 64 byte aligned 32 bit offsets. The
19     lower 5 bits of the offset are always zero, so the hw packs in a
20     few misc bits in the lower 5 bits there. Other times a field can
21     be either a u32 or a float. I tried to do this with overlapping
22     anonymous unions and it became a big mess. Also, when using
23     initializers, you can only initialize one union member so this
24     just doesn't work with out approach.
25
26     The pack functions on the other hand allows us a great deal of
27     flexibility in how we combine things. In the case of overlapping
28     fields (the u32 and float case), if we only set one of them in
29     the pack function, the compiler will recognize that the other is
30     initialized to 0 and optimize out the code to or it it.
31
32  3) Bit fields (and certainly overlapping anonymous unions of bit
33     fields) aren't generally stable across compilers in how they're
34     laid out and aligned. Our pack functions let us control exactly
35     how things get packed, using only simple and unambiguous bitwise
36     shifting and or'ing that works on any compiler.
37
38Once we have the pack function it allows us to hook in various
39transformations and validation as we go from template struct to dwords
40in memory:
41
42  1) Validation: As I said above, our driver isn't supposed to write
43     overflowing values to the fields, but we've of course had lots of
44     cases where we make mistakes and write overflowing values. With
45     the pack function, we can actually assert on that and catch it at
46     runtime.  bitfields would just silently truncate.
47
48  2) Type conversions: some times it's just a matter of writing a
49     float to a u32, but we also convert from bool to bits, from
50     floats to fixed point integers.
51
52  3) Relocations: whenever we have a pointer from one buffer to
53     another (for example a pointer from the meta data for a texture
54     to the raw texture data), we have to tell the kernel about it so
55     it can adjust the pointer to point to the final location. That
56     means extra work we have to do extra work to record and annotate
57     the dword location that holds the pointer. With bit fields, we'd
58     have to call a function to do this, but with the pack function we
59     generate code in the pack function to do this for us. That's a
60     lot less error prone and less work.
61