1All about co_lnotab, the line number table. 2 3Code objects store a field named co_lnotab. This is an array of unsigned bytes 4disguised as a Python bytes object. It is used to map bytecode offsets to 5source code line #s for tracebacks and to identify line number boundaries for 6line tracing. Because of internals of the peephole optimizer, it's possible 7for lnotab to contain bytecode offsets that are no longer valid (for example 8if the optimizer removed the last line in a function). 9 10The array is conceptually a compressed list of 11 (bytecode offset increment, line number increment) 12pairs. The details are important and delicate, best illustrated by example: 13 14 byte code offset source code line number 15 0 1 16 6 2 17 50 7 18 350 207 19 361 208 20 21Instead of storing these numbers literally, we compress the list by storing only 22the difference from one row to the next. Conceptually, the stored list might 23look like: 24 25 0, 1, 6, 1, 44, 5, 300, 200, 11, 1 26 27The above doesn't really work, but it's a start. An unsigned byte (byte code 28offset) can't hold negative values, or values larger than 255, a signed byte 29(line number) can't hold values larger than 127 or less than -128, and the 30above example contains two such values. (Note that before 3.6, line number 31was also encoded by an unsigned byte.) So we make two tweaks: 32 33 (a) there's a deep assumption that byte code offsets increase monotonically, 34 and 35 (b) if byte code offset jumps by more than 255 from one row to the next, or if 36 source code line number jumps by more than 127 or less than -128 from one row 37 to the next, more than one pair is written to the table. In case #b, 38 there's no way to know from looking at the table later how many were written. 39 That's the delicate part. A user of co_lnotab desiring to find the source 40 line number corresponding to a bytecode address A should do something like 41 this: 42 43 lineno = addr = 0 44 for addr_incr, line_incr in co_lnotab: 45 addr += addr_incr 46 if addr > A: 47 return lineno 48 if line_incr >= 0x80: 49 line_incr -= 0x100 50 lineno += line_incr 51 52(In C, this is implemented by PyCode_Addr2Line().) In order for this to work, 53when the addr field increments by more than 255, the line # increment in each 54pair generated must be 0 until the remaining addr increment is < 256. So, in 55the example above, assemble_lnotab in compile.c should not (as was actually done 56until 2.2) expand 300, 200 to 57 255, 255, 45, 45, 58but to 59 255, 0, 45, 127, 0, 73. 60 61The above is sufficient to reconstruct line numbers for tracebacks, but not for 62line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c 63and maybe_call_line_trace() in ceval.c. 64 65*** Tracing *** 66 67To a first approximation, we want to call the tracing function when the line 68number of the current instruction changes. Re-computing the current line for 69every instruction is a little slow, though, so each time we compute the line 70number we save the bytecode indices where it's valid: 71 72 *instr_lb <= frame->f_lasti < *instr_ub 73 74is true so long as execution does not change lines. That is, *instr_lb holds 75the first bytecode index of the current line, and *instr_ub holds the first 76bytecode index of the next line. As long as the above expression is true, 77maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note 78that the same line may appear multiple times in the lnotab, either because the 79bytecode jumped more than 255 indices between line number changes or because 80the compiler inserted the same line twice. Even in that case, *instr_ub holds 81the first index of the next line. 82 83However, we don't *always* want to call the line trace function when the above 84test fails. 85 86Consider this code: 87 881: def f(a): 892: while a: 903: print(1) 914: break 925: else: 936: print(2) 94 95which compiles to this: 96 97 2 0 SETUP_LOOP 26 (to 28) 98 >> 2 LOAD_FAST 0 (a) 99 4 POP_JUMP_IF_FALSE 18 100 101 3 6 LOAD_GLOBAL 0 (print) 102 8 LOAD_CONST 1 (1) 103 10 CALL_FUNCTION 1 104 12 POP_TOP 105 106 4 14 BREAK_LOOP 107 16 JUMP_ABSOLUTE 2 108 >> 18 POP_BLOCK 109 110 6 20 LOAD_GLOBAL 0 (print) 111 22 LOAD_CONST 2 (2) 112 24 CALL_FUNCTION 1 113 26 POP_TOP 114 >> 28 LOAD_CONST 0 (None) 115 30 RETURN_VALUE 116 117If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18 118and the co_lnotab will claim that execution has moved to line 4, which is wrong. 119In this case, we could instead associate the POP_BLOCK with line 5, but that 120would break jumps around loops without else clauses. 121 122We fix this by only calling the line trace function for a forward jump if the 123co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current 124instruction offset matches the offset given for the start of a line by the 125co_lnotab. For backward jumps, however, we always call the line trace function, 126which lets a debugger stop on every evaluation of a loop guard (which usually 127won't be the first opcode in a line). 128 129Why do we set f_lineno when tracing, and only just before calling the trace 130function? Well, consider the code above when 'a' is true. If stepping through 131this with 'n' in pdb, you would stop at line 1 with a "call" type event, then 132line events on lines 2, 3, and 4, then a "return" type event -- but because the 133code for the return actually falls in the range of the "line 6" opcodes, you 134would be shown line 6 during this event. This is a change from the behaviour in 1352.2 and before, and I've found it confusing in practice. By setting and using 136f_lineno when tracing, one can report a line number different from that 137suggested by f_lasti on this one occasion where it's desirable. 138