1<?xml version="1.0"?> <!-- -*- sgml -*- --> 2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" 4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> 5 6<chapter id="cl-format" xreflabel="Callgrind Format Specification"> 7<title>Callgrind Format Specification</title> 8 9<para>This chapter describes the Callgrind Profile Format, Version 1.</para> 10 11<para>A synonymous name is "Calltree Profile Format". These names actually mean 12the same since Callgrind was previously named Calltree.</para> 13 14<para>The format description is meant for the user to be able to understand the 15file contents; but more important, it is given for authors of measurement or 16visualization tools to be able to write and read this format.</para> 17 18<sect1 id="cl-format.overview" xreflabel="Overview"> 19<title>Overview</title> 20 21<para>The profile data format is ASCII based. 22It is written by Callgrind, and it is upwards compatible 23to the format used by Cachegrind (ie. Cachegrind uses a subset). It can 24be read by callgrind_annotate and KCachegrind.</para> 25 26<para>This chapter gives on overview of format features and examples. 27For detailed syntax, look at the format reference.</para> 28 29<sect2 id="cl-format.overview.basics" xreflabel="Basic Structure"> 30<title>Basic Structure</title> 31 32<para>Each file has a header part of an arbitrary number of lines of the 33format "key: value". After the header, lines specifying profile costs 34follow. Everywhere, comments on own lines starting with '#' are allowed. 35The header lines with keys "positions" and "events" define 36the meaning of cost lines in the second part of the file: the value of 37"positions" is a list of subpositions, and the value of "events" is a list 38of event type names. Cost lines consist of subpositions followed by 64-bit 39counters for the events, in the order specified by the "positions" and "events" 40header line.</para> 41 42<para>The "events" header line is always required in contrast to the optional 43line for "positions", which defaults to "line", i.e. a line number of some 44source file. In addition, the second part of the file contains position 45specifications of the form "spec=name". "spec" can be e.g. "fn" for a 46function name or "fl" for a file name. Cost lines are always related to 47the function/file specifications given directly before.</para> 48 49</sect2> 50 51<sect2 id="cl-format.overview.example1" xreflabel="Simple Example"> 52<title>Simple Example</title> 53 54<para>The event names in the following example are quite arbitrary, and are not 55related to event names used by Callgrind. Especially, cycle counts matching 56real processors probably will never be generated by any Valgrind tools, as these 57are bound to simulations of simple machine models for acceptable slowdown. 58However, any profiling tool could use the format described in this chapter.</para> 59 60<para> 61<screen>events: Cycles Instructions Flops 62fl=file.f 63fn=main 6415 90 14 2 6516 20 12</screen></para> 66 67<para>The above example gives profile information for event types "Cycles", 68"Instructions", and "Flops". Thus, cost lines give the number of CPU cycles 69passed by, number of executed instructions, and number of floating point 70operations executed while running code corresponding to some source 71position. As there is no line specifying the value of "positions", it defaults 72to "line", which means that the first number of a cost line is always a line 73number.</para> 74 75<para>Thus, the first cost line specifies that in line 15 of source file 76<filename>file.f</filename> there is code belonging to function 77<function>main</function>. While running, 90 CPU cycles passed by, and 2 of 78the 14 instructions executed were floating point operations. Similarly, the 79next line specifies that there were 12 instructions executed in the context 80of function <function>main</function> which can be related to line 16 in 81file <filename>file.f</filename>, taking 20 CPU cycles. If a cost line 82specifies less event counts than given in the "events" line, the rest is 83assumed to be zero. I.e. there was no floating point instruction executed 84relating to line 16.</para> 85 86<para>Note that regular cost lines always give self (also called exclusive) 87cost of code at a given position. If you specify multiple cost lines for the 88same position, these will be summed up. On the other hand, in the example above 89there is no specification of how many times function 90<function>main</function> actually was 91called: profile data only contains sums.</para> 92 93</sect2> 94 95 96<sect2 id="cl-format.overview.associations" xreflabel="Associations"> 97<title>Associations</title> 98 99<para>The most important extension to the original format of Cachegrind is the 100ability to specify call relationship among functions. More generally, you 101specify associations among positions. For this, the second part of the 102file also can contain association specifications. These look similar to 103position specifications, but consist of two lines. For calls, the format 104looks like 105<screen> 106 calls=(Call Count) (Destination position) 107 (Source position) (Inclusive cost of call) 108</screen></para> 109 110<para>The destination only specifies subpositions like line number. Therefore, 111to be able to specify a call to another function in another source file, you 112have to precede the above lines with a "cfn=" specification for the name of the 113called function, and optionally a "cfi=" specification if the function is in 114another source file ("cfl=" is an alternative specification for "cfi=" because 115of historical reasons, and both should be supported by format readers). 116The second line looks like a regular cost line with the difference 117that inclusive cost spent inside of the function call has to be specified.</para> 118 119<para>Other associations are for example (conditional) jumps. See the 120reference below for details.</para> 121 122</sect2> 123 124 125<sect2 id="cl-format.overview.example2" xreflabel="Extended Example"> 126<title>Extended Example</title> 127 128<para>The following example shows 3 functions, <function>main</function>, 129<function>func1</function>, and <function>func2</function>. Function 130<function>main</function> calls <function>func1</function> once and 131<function>func2</function> 3 times. <function>func1</function> calls 132<function>func2</function> 2 times. 133<screen>events: Instructions 134 135fl=file1.c 136fn=main 13716 20 138cfn=func1 139calls=1 50 14016 400 141cfi=file2.c 142cfn=func2 143calls=3 20 14416 400 145 146fn=func1 14751 100 148cfi=file2.c 149cfn=func2 150calls=2 20 15151 300 152 153fl=file2.c 154fn=func2 15520 700</screen></para> 156 157<para>One can see that in <function>main</function> only code from line 16 158is executed where also the other functions are called. Inclusive cost of 159<function>main</function> is 820, which is the sum of self cost 20 and costs 160spent in the calls: 400 for the single call to <function>func1</function> 161and 400 as sum for the three calls to <function>func2</function>.</para> 162 163<para>Function <function>func1</function> is located in 164<filename>file1.c</filename>, the same as <function>main</function>. 165Therefore, a "cfi=" specification for the call to <function>func1</function> 166is not needed. The function <function>func1</function> only consists of code 167at line 51 of <filename>file1.c</filename>, where <function>func2</function> 168is called.</para> 169 170</sect2> 171 172 173<sect2 id="cl-format.overview.compression1" xreflabel="Name Compression"> 174<title>Name Compression</title> 175 176<para>With the introduction of association specifications like calls it is 177needed to specify the same function or same file name multiple times. As 178absolute filenames or symbol names in C++ can be quite long, it is advantageous 179to be able to specify integer IDs for position specifications. 180Here, the term "position" corresponds to a file name (source or object file) 181or function name.</para> 182 183<para>To support name compression, a position specification can be not only of 184the format "spec=name", but also "spec=(ID) name" to specify a mapping of an 185integer ID to a name, and "spec=(ID)" to reference a previously defined ID 186mapping. There is a separate ID mapping for each position specification, 187i.e. you can use ID 1 for both a file name and a symbol name.</para> 188 189<para>With string compression, the example from 1.4 looks like this: 190<screen>events: Instructions 191 192fl=(1) file1.c 193fn=(1) main 19416 20 195cfn=(2) func1 196calls=1 50 19716 400 198cfi=(2) file2.c 199cfn=(3) func2 200calls=3 20 20116 400 202 203fn=(2) 20451 100 205cfi=(2) 206cfn=(3) 207calls=2 20 20851 300 209 210fl=(2) 211fn=(3) 21220 700</screen></para> 213 214<para>As position specifications carry no information themselves, but only change 215the meaning of subsequent cost lines or associations, they can appear 216everywhere in the file without any negative consequence. Especially, you can 217define name compression mappings directly after the header, and before any cost 218lines. Thus, the above example can also be written as 219<screen>events: Instructions 220 221# define file ID mapping 222fl=(1) file1.c 223fl=(2) file2.c 224# define function ID mapping 225fn=(1) main 226fn=(2) func1 227fn=(3) func2 228 229fl=(1) 230fn=(1) 23116 20 232...</screen></para> 233 234</sect2> 235 236 237<sect2 id="cl-format.overview.compression2" xreflabel="Subposition Compression"> 238<title>Subposition Compression</title> 239 240<para>If a Callgrind data file should hold costs for each assembler instruction 241of a program, you specify subposition "instr" in the "positions:" header line, 242and each cost line has to include the address of some instruction. Addresses 243are allowed to have a size of 64 bits to support 64-bit architectures. Thus, 244repeating similar, long addresses for almost every line in the data file can 245enlarge the file size quite significantly, and 246motivates for subposition compression: instead of every cost line starting with 247a 16 character long address, one is allowed to specify relative addresses. 248This relative specification is not only allowed for instruction addresses, but 249also for line numbers; both addresses and line numbers are called "subpositions".</para> 250 251<para>A relative subposition always is based on the corresponding subposition 252of the last cost line, and starts with a "+" to specify a positive difference, 253a "-" to specify a negative difference, or consists of "*" to specify the same 254subposition. Because absolute subpositions always are positive (ie. never 255prefixed by "-"), any relative specification is non-ambiguous; additionally, 256absolute and relative subposition specifications can be mixed freely. 257Assume the following example (subpositions can always be specified 258as hexadecimal numbers, beginning with "0x"): 259<screen>positions: instr line 260events: ticks 261 262fn=func 2630x80001234 90 1 2640x80001237 90 5 2650x80001238 91 6</screen></para> 266 267<para>With subposition compression, this looks like 268<screen>positions: instr line 269events: ticks 270 271fn=func 2720x80001234 90 1 273+3 * 5 274+1 +1 6</screen></para> 275 276<para>Remark: For assembler annotation to work, instruction addresses have to 277be corrected to correspond to addresses found in the original binary. I.e. for 278relocatable shared objects, often a load offset has to be subtracted.</para> 279 280</sect2> 281 282 283<sect2 id="cl-format.overview.misc" xreflabel="Miscellaneous"> 284<title>Miscellaneous</title> 285 286<sect3 id="cl-format.overview.misc.summary" xreflabel="Cost Summary Information"> 287<title>Cost Summary Information</title> 288 289<para>For the visualization to be able to show cost percentage, a sum of the 290cost of the full run has to be known. Usually, it is assumed that this is the 291sum of all cost lines in a file. But sometimes, this is not correct. Thus, you 292can specify a "summary:" line in the header giving the full cost for the 293profile run. An import filter may use this to show a progress bar 294while loading a large data file.</para> 295 296</sect3> 297 298<sect3 id="cl-format.overview.misc.events" xreflabel="Long Names for Event Types and inherited Types"> 299<title>Long Names for Event Types and inherited Types</title> 300 301<para>Event types for cost lines are specified in the "events:" line with an 302abbreviated name. For visualization, it makes sense to be able to specify some 303longer, more descriptive name. For an event type "Ir" which means "Instruction 304Fetches", this can be specified the header line 305<screen>event: Ir : Instruction Fetches 306events: Ir Dr</screen></para> 307 308<para>In this example, "Dr" itself has no long name associated. The order of 309"event:" lines and the "events:" line is of no importance. Additionally, 310inherited event types can be introduced for which no raw data is available, but 311which are calculated from given types. Suppose the last example, you could add 312<screen>event: Sum = Ir + Dr</screen> 313to specify an additional event type "Sum", which is calculated by adding costs 314for "Ir and "Dr".</para> 315 316</sect3> 317 318</sect2> 319 320</sect1> 321 322<sect1 id="cl-format.reference" xreflabel="Reference"> 323<title>Reference</title> 324 325<sect2 id="cl-format.reference.grammar" xreflabel="Grammar"> 326<title>Grammar</title> 327 328<para> 329<screen>ProfileDataFile := FormatVersion? Creator? PartData*</screen> 330<screen>FormatVersion := "version:" Space* Number "\n"</screen> 331<screen>Creator := "creator:" NoNewLineChar* "\n"</screen> 332<screen>PartData := (HeaderLine "\n")+ (BodyLine "\n")+</screen> 333<screen>HeaderLine := (empty line) 334 | ('#' NoNewLineChar*) 335 | PartDetail 336 | Description 337 | EventSpecification 338 | CostLineDef</screen> 339<screen>PartDetail := TargetCommand | TargetID</screen> 340<screen>TargetCommand := "cmd:" Space* NoNewLineChar*</screen> 341<screen>TargetID := ("pid"|"thread"|"part") ":" Space* Number</screen> 342<screen>Description := "desc:" Space* Name Space* ":" NoNewLineChar*</screen> 343<screen>EventSpecification := "event:" Space* Name InheritedDef? LongNameDef?</screen> 344<screen>InheritedDef := "=" InheritedExpr</screen> 345<screen>InheritedExpr := Name 346 | Number Space* ("*" Space*)? Name 347 | InheritedExpr Space* "+" Space* InheritedExpr</screen> 348<screen>LongNameDef := ":" NoNewLineChar*</screen> 349<screen>CostLineDef := "events:" Space* Name (Space+ Name)* 350 | "positions:" "instr"? (Space+ "line")?</screen> 351<screen>BodyLine := (empty line) 352 | ('#' NoNewLineChar*) 353 | CostLine 354 | PositionSpecification 355 | AssociationSpecification</screen> 356<screen>CostLine := SubPositionList Costs?</screen> 357<screen>SubPositionList := (SubPosition+ Space+)+</screen> 358<screen>SubPosition := Number | "+" Number | "-" Number | "*"</screen> 359<screen>Costs := (Number Space+)+</screen> 360<screen>PositionSpecification := Position "=" Space* PositionName</screen> 361<screen>Position := CostPosition | CalledPosition</screen> 362<screen>CostPosition := "ob" | "fl" | "fi" | "fe" | "fn"</screen> 363<screen>CalledPosition := " "cob" | "cfi" | "cfl" | "cfn"</screen> 364<screen>PositionName := ( "(" Number ")" )? (Space* NoNewLineChar* )?</screen> 365<screen>AssociationSpecification := CallSpecification 366 | JumpSpecification</screen> 367<screen>CallSpecification := CallLine "\n" CostLine</screen> 368<screen>CallLine := "calls=" Space* Number Space+ SubPositionList</screen> 369<screen>JumpSpecification := ...</screen> 370<screen>Space := " " | "\t"</screen> 371<screen>Number := HexNumber | (Digit)+</screen> 372<screen>Digit := "0" | ... | "9"</screen> 373<screen>HexNumber := "0x" (Digit | HexChar)+</screen> 374<screen>HexChar := "a" | ... | "f" | "A" | ... | "F"</screen> 375<screen>Name = Alpha (Digit | Alpha)*</screen> 376<screen>Alpha = "a" | ... | "z" | "A" | ... | "Z"</screen> 377<screen>NoNewLineChar := all characters without "\n"</screen> 378</para> 379 380</sect2> 381 382<sect2 id="cl-format.reference.header" xreflabel="Description of Header Lines"> 383<title>Description of Header Lines</title> 384 385<para>The header has an arbitrary number of lines of the format 386"key: value". Possible <emphasis>key</emphasis> values for the header are:</para> 387 388<itemizedlist> 389 390 <listitem> 391 <para><computeroutput>version: number</computeroutput> [Callgrind]</para> 392 <para>This is used to distinguish future profile data formats. A 393 major version of 0 or 1 is supposed to be upwards compatible with 394 Cachegrind's format. It is optional; if not appearing, version 1 395 is assumed. Otherwise, this has to be the first header line.</para> 396 </listitem> 397 398 <listitem> 399 <para><computeroutput>pid: process id</computeroutput> [Callgrind]</para> 400 <para>Optional. This specifies the process ID of the supervised application 401 for which this profile was generated.</para> 402 </listitem> 403 404 <listitem> 405 <para><computeroutput>cmd: program name + args</computeroutput> [Cachegrind]</para> 406 <para>Optional. This specifies the full command line of the supervised 407 application for which this profile was generated.</para> 408 </listitem> 409 410 <listitem> 411 <para><computeroutput>part: number</computeroutput> [Callgrind]</para> 412 <para>Optional. This specifies a sequentially incremented number for each dump 413 generated, starting at 1.</para> 414 </listitem> 415 416 <listitem> 417 <para><computeroutput>desc: type: value</computeroutput> [Cachegrind]</para> 418 <para>This specifies various information for this dump. For some 419 types, the semantic is defined, but any description type is allowed. 420 Unknown types should be ignored.</para> 421 <para>There are the types "I1 cache", "D1 cache", "LL cache", which 422 specify parameters used for the cache simulator. These are the only 423 types originally used by Cachegrind. Additionally, Callgrind uses 424 the following types: "Timerange" gives a rough range of the basic 425 block counter, for which the cost of this dump was collected. 426 Type "Trigger" states the reason of why this trace was generated. 427 E.g. program termination or forced interactive dump.</para> 428 </listitem> 429 430 <listitem> 431 <para><computeroutput>positions: [instr] [line]</computeroutput> [Callgrind]</para> 432 <para>For cost lines, this defines the semantic of the first numbers. 433 Any combination of "instr", "bb" and "line" is allowed, but has to be 434 in this order which corresponds to position numbers at the start of 435 the cost lines later in the file.</para> 436 <para>If "instr" is specified, the position is the address of an 437 instruction whose execution raised the events given later on the 438 line. This address is relative to the offset of the binary/shared 439 library file to not have to specify relocation info. For "line", 440 the position is the line number of a source file, which is 441 responsible for the events raised. Note that the mapping of "instr" 442 and "line" positions are given by the debugging line information 443 produced by the compiler.</para> 444 <para>This field is optional. If not specified, "line" is supposed 445 only.</para> 446 </listitem> 447 448 <listitem> 449 <para><computeroutput>events: event type abbreviations</computeroutput> [Cachegrind]</para> 450 <para>A list of short names of the event types logged in this file. 451 The order is the same as in cost lines. The first event type is the 452 second or third number in a cost line, depending on the value of 453 "positions". Callgrind does not add additional cost types. Specify 454 exactly once.</para> 455 <para>Cost types from original Cachegrind are: 456 <itemizedlist> 457 <listitem> 458 <para><command>Ir</command>: Instruction read access</para> 459 </listitem> 460 <listitem> 461 <para><command>I1mr</command>: Instruction Level 1 read cache miss</para> 462 </listitem> 463 <listitem> 464 <para><command>ILmr</command>: Instruction last-level read cache miss</para> 465 </listitem> 466 <listitem> 467 <para>...</para> 468 </listitem> 469 </itemizedlist> 470 </para> 471 </listitem> 472 473 <listitem> 474 <para><computeroutput>summary: costs</computeroutput> [Callgrind]</para> 475 <para>Optional. This header line specifies a summary cost, which should be 476 equal or larger than a total over all self costs. It may be larger as 477 the cost lines may not represent all cost of the program run.</para> 478 </listitem> 479 480 <listitem> 481 <para><computeroutput>totals: costs</computeroutput> [Cachegrind]</para> 482 <para>Optional. Should appear at the end of the file (although 483 looking like a header line). Must give the total of all cost lines, 484 to allow for a consistency check.</para> 485 </listitem> 486 487</itemizedlist> 488 489</sect2> 490 491<sect2 id="cl-format.reference.body" xreflabel="Description of Body Lines"> 492<title>Description of Body Lines</title> 493 494<para>There exist lines 495<computeroutput>spec=position</computeroutput>. The values for position 496specifications are arbitrary strings. When starting with "(" and a 497digit, it's a string in compressed format. Otherwise it's the real 498position string. This allows for file and symbol names as position 499strings, as these never start with "(" + <emphasis>digit</emphasis>. 500The compressed format is either "(" <emphasis>number</emphasis> ")" 501<emphasis>space</emphasis> <emphasis>position</emphasis> or only 502"(" <emphasis>number</emphasis> ")". The first relates 503<emphasis>position</emphasis> to <emphasis>number</emphasis> in the 504context of the given format specification from this line to the end of 505the file; it makes the (<emphasis>number</emphasis>) an alias for 506<emphasis>position</emphasis>. Compressed format is always 507optional.</para> 508 509<para>Position specifications allowed:</para> 510<itemizedlist> 511 512 <listitem> 513 <para><computeroutput>ob=</computeroutput> [Callgrind]</para> 514 <para>The ELF object where the cost of next cost lines happens.</para> 515 </listitem> 516 517 <listitem> 518 <para><computeroutput>fl=</computeroutput> [Cachegrind]</para> 519 </listitem> 520 521 <listitem> 522 <para><computeroutput>fi=</computeroutput> [Cachegrind]</para> 523 </listitem> 524 525 <listitem> 526 <para><computeroutput>fe=</computeroutput> [Cachegrind]</para> 527 <para>The source file including the code which is responsible for 528 the cost of next cost lines. "fi="/"fe=" is used when the source 529 file changes inside of a function, i.e. for inlined code.</para> 530 </listitem> 531 532 <listitem> 533 <para><computeroutput>fn=</computeroutput> [Cachegrind]</para> 534 <para>The name of the function where the cost of next cost lines 535 happens.</para> 536 </listitem> 537 538 <listitem> 539 <para><computeroutput>cob=</computeroutput> [Callgrind]</para> 540 <para>The ELF object of the target of the next call cost lines.</para> 541 </listitem> 542 543 <listitem> 544 <para><computeroutput>cfi=</computeroutput> [Callgrind]</para> 545 <para>The source file including the code of the target of the 546 next call cost lines.</para> 547 </listitem> 548 549 <listitem> 550 <para><computeroutput>cfl=</computeroutput> [Callgrind]</para> 551 <para>Alternative spelling for <computeroutput>cfi=</computeroutput> 552 specification (because of historical reasons).</para> 553 </listitem> 554 555 <listitem> 556 <para><computeroutput>cfn=</computeroutput> [Callgrind]</para> 557 <para>The name of the target function of the next call cost 558 lines.</para> 559 </listitem> 560 561 <listitem> 562 <para><computeroutput>calls=</computeroutput> [Callgrind]</para> 563 <para>The number of nonrecursive calls which are responsible for the 564 cost specified by the next call cost line. This is the cost spent 565 inside of the called function.</para> 566 <para>After "calls=" there MUST be a cost line. This is the cost 567 spent in the called function. The first number is the source line 568 from where the call happened.</para> 569 </listitem> 570 571 <listitem> 572 <para><computeroutput>jump=count target position</computeroutput> [Callgrind]</para> 573 <para>Unconditional jump, executed count times, to the given target 574 position.</para> 575 </listitem> 576 577 <listitem> 578 <para><computeroutput>jcnd=exe.count jumpcount target position</computeroutput> [Callgrind]</para> 579 <para>Conditional jump, executed exe.count times with jumpcount 580 jumps to the given target position.</para> 581 </listitem> 582 583</itemizedlist> 584 585</sect2> 586 587</sect1> 588 589</chapter> 590