1======================================= 2LLVM's Optional Rich Disassembly Output 3======================================= 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11LLVM's default disassembly output is raw text. To allow consumers more ability 12to introspect the instructions' textual representation or to reformat for a more 13user friendly display there is an optional rich disassembly output. 14 15This optional output is sufficient to reference into individual portions of the 16instruction text. This is intended for clients like disassemblers, list file 17generators, and pretty-printers, which need more than the raw instructions and 18the ability to print them. 19 20To provide this functionality the assembly text is marked up with annotations. 21The markup is simple enough in syntax to be robust even in the case of version 22mismatches between consumers and producers. That is, the syntax generally does 23not carry semantics beyond "this text has an annotation," so consumers can 24simply ignore annotations they do not understand or do not care about. 25 26After calling ``LLVMCreateDisasm()`` to create a disassembler context the 27optional output is enable with this call: 28 29.. code-block:: c 30 31 LLVMSetDisasmOptions(DC, LLVMDisassembler_Option_UseMarkup); 32 33Then subsequent calls to ``LLVMDisasmInstruction()`` will return output strings 34with the marked up annotations. 35 36Instruction Annotations 37======================= 38 39.. _contextual markups: 40 41Contextual markups 42------------------ 43 44Annoated assembly display will supply contextual markup to help clients more 45efficiently implement things like pretty printers. Most markup will be target 46independent, so clients can effectively provide good display without any target 47specific knowledge. 48 49Annotated assembly goes through the normal instruction printer, but optionally 50includes contextual tags on portions of the instruction string. An annotation 51is any '<' '>' delimited section of text(1). 52 53.. code-block:: bat 54 55 annotation: '<' tag-name tag-modifier-list ':' annotated-text '>' 56 tag-name: identifier 57 tag-modifier-list: comma delimited identifier list 58 59The tag-name is an identifier which gives the type of the annotation. For the 60first pass, this will be very simple, with memory references, registers, and 61immediates having the tag names "mem", "reg", and "imm", respectively. 62 63The tag-modifier-list is typically additional target-specific context, such as 64register class. 65 66Clients should accept and ignore any tag-names or tag-modifiers they do not 67understand, allowing the annotations to grow in richness without breaking older 68clients. 69 70For example, a possible annotation of an ARM load of a stack-relative location 71might be annotated as: 72 73.. code-block:: nasm 74 75 ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]> 76 77 781: For assembly dialects in which '<' and/or '>' are legal tokens, a literal token is escaped by following immediately with a repeat of the character. For example, a literal '<' character is output as '<<' in an annotated assembly string. 79 80C API Details 81------------- 82 83The intended consumers of this information use the C API, therefore the new C 84API function for the disassembler will be added to provide an option to produce 85disassembled instructions with annotations, ``LLVMSetDisasmOptions()`` and the 86``LLVMDisassembler_Option_UseMarkup`` option (see above). 87