1<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6<chapter id="cl-manual" xreflabel="Callgrind Manual">
7<title>Callgrind: a call-graph generating cache and branch prediction profiler</title>
8
9
10<para>To use this tool, you must specify
11<option>--tool=callgrind</option> on the
12Valgrind command line.</para>
13
14<sect1 id="cl-manual.use" xreflabel="Overview">
15<title>Overview</title>
16
17<para>Callgrind is a profiling tool that records the call history among
18functions in a program's run as a call-graph.
19By default, the collected data consists of
20the number of instructions executed, their relationship
21to source lines, the caller/callee relationship between functions,
22and the numbers of such calls.
23Optionally, cache simulation and/or branch prediction (similar to Cachegrind)
24can produce further information about the runtime behavior of an application.
25</para>
26
27<para>The profile data is written out to a file at program
28termination. For presentation of the data, and interactive control
29of the profiling, two command line tools are provided:</para>
30<variablelist>
31  <varlistentry>
32  <term><command>callgrind_annotate</command></term>
33  <listitem>
34    <para>This command reads in the profile data, and prints a
35    sorted lists of functions, optionally with source annotation.</para>
36
37    <para>For graphical visualization of the data, try
38    <ulink url="&cl-gui-url;">KCachegrind</ulink>, which is a KDE/Qt based
39    GUI that makes it easy to navigate the large amount of data that
40    Callgrind produces.</para>
41
42  </listitem>
43  </varlistentry>
44
45  <varlistentry>
46  <term><command>callgrind_control</command></term>
47  <listitem>
48    <para>This command enables you to interactively observe and control
49    the status of a program currently running under Callgrind's control,
50    without stopping the program.  You can get statistics information as
51    well as the current stack trace, and you can request zeroing of counters
52    or dumping of profile data.</para>
53  </listitem>
54  </varlistentry>
55</variablelist>
56
57  <sect2 id="cl-manual.functionality" xreflabel="Functionality">
58  <title>Functionality</title>
59
60<para>Cachegrind collects flat profile data: event counts (data reads,
61cache misses, etc.) are attributed directly to the function they
62occurred in.  This cost attribution mechanism is
63called <emphasis>self</emphasis> or <emphasis>exclusive</emphasis>
64attribution.</para>
65
66<para>Callgrind extends this functionality by propagating costs
67across function call boundaries.  If function <function>foo</function> calls
68<function>bar</function>, the costs from <function>bar</function> are added into
69<function>foo</function>'s costs.  When applied to the program as a whole,
70this builds up a picture of so called <emphasis>inclusive</emphasis>
71costs, that is, where the cost of each function includes the costs of
72all functions it called, directly or indirectly.</para>
73
74<para>As an example, the inclusive cost of
75<function>main</function> should be almost 100 percent
76of the total program cost.  Because of costs arising before
77<function>main</function> is run, such as
78initialization of the run time linker and construction of global C++
79objects, the inclusive cost of <function>main</function>
80is not exactly 100 percent of the total program cost.</para>
81
82<para>Together with the call graph, this allows you to find the
83specific call chains starting from
84<function>main</function> in which the majority of the
85program's costs occur.  Caller/callee cost attribution is also useful
86for profiling functions called from multiple call sites, and where
87optimization opportunities depend on changing code in the callers, in
88particular by reducing the call count.</para>
89
90<para>Callgrind's cache simulation is based on that of Cachegrind.
91Read the documentation for <xref linkend="&vg-cg-manual-id;"/> first.  The material
92below describes the features supported in addition to Cachegrind's
93features.</para>
94
95<para>Callgrind's ability to detect function calls and returns depends
96on the instruction set of the platform it is run on.  It works best on
97x86 and amd64, and unfortunately currently does not work so well on
98PowerPC, ARM, Thumb or MIPS code.  This is because there are no explicit
99call or return instructions in these instruction sets, so Callgrind
100has to rely on heuristics to detect calls and returns.</para>
101
102  </sect2>
103
104  <sect2 id="cl-manual.basics" xreflabel="Basic Usage">
105  <title>Basic Usage</title>
106
107  <para>As with Cachegrind, you probably want to compile with debugging info
108  (the <option>-g</option> option) and with optimization turned on.</para>
109
110  <para>To start a profile run for a program, execute:
111  <screen>valgrind --tool=callgrind [callgrind options] your-program [program options]</screen>
112  </para>
113
114  <para>While the simulation is running, you can observe execution with:
115  <screen>callgrind_control -b</screen>
116  This will print out the current backtrace. To annotate the backtrace with
117  event counts, run
118  <screen>callgrind_control -e -b</screen>
119  </para>
120
121  <para>After program termination, a profile data file named
122  <computeroutput>callgrind.out.&lt;pid&gt;</computeroutput>
123  is generated, where <emphasis>pid</emphasis> is the process ID
124  of the program being profiled.
125  The data file contains information about the calls made in the
126  program among the functions executed, together with
127  <command>Instruction Read</command> (Ir) event counts.</para>
128
129  <para>To generate a function-by-function summary from the profile
130  data file, use
131  <screen>callgrind_annotate [options] callgrind.out.&lt;pid&gt;</screen>
132  This summary is similar to the output you get from a Cachegrind
133  run with cg_annotate: the list
134  of functions is ordered by exclusive cost of functions, which also
135  are the ones that are shown.
136  Important for the additional features of Callgrind are
137  the following two options:</para>
138
139  <itemizedlist>
140    <listitem>
141      <para><option>--inclusive=yes</option>: Instead of using
142      exclusive cost of functions as sorting order, use and show
143      inclusive cost.</para>
144    </listitem>
145
146    <listitem>
147      <para><option>--tree=both</option>: Interleave into the
148      top level list of functions, information on the callers and the callees
149      of each function. In these lines, which represents executed
150      calls, the cost gives the number of events spent in the call.
151      Indented, above each function, there is the list of callers,
152      and below, the list of callees. The sum of events in calls to
153      a given function (caller lines), as well as the sum of events in
154      calls from the function (callee lines) together with the self
155      cost, gives the total inclusive cost of the function.</para>
156     </listitem>
157  </itemizedlist>
158
159  <para>Use <option>--auto=yes</option> to get annotated source code
160  for all relevant functions for which the source can be found. In
161  addition to source annotation as produced by
162  <computeroutput>cg_annotate</computeroutput>, you will see the
163  annotated call sites with call counts. For all other options,
164  consult the (Cachegrind) documentation for
165  <computeroutput>cg_annotate</computeroutput>.
166  </para>
167
168  <para>For better call graph browsing experience, it is highly recommended
169  to use <ulink url="&cl-gui-url;">KCachegrind</ulink>.
170  If your code
171  has a significant fraction of its cost in <emphasis>cycles</emphasis> (sets
172  of functions calling each other in a recursive manner), you have to
173  use KCachegrind, as <computeroutput>callgrind_annotate</computeroutput>
174  currently does not do any cycle detection, which is important to get correct
175  results in this case.</para>
176
177  <para>If you are additionally interested in measuring the
178  cache behavior of your program, use Callgrind with the option
179  <option><xref linkend="clopt.cache-sim"/>=yes</option>. For
180  branch prediction simulation, use <option><xref linkend="clopt.branch-sim"/>=yes</option>.
181  Expect a further slow down approximately by a factor of 2.</para>
182
183  <para>If the program section you want to profile is somewhere in the
184  middle of the run, it is beneficial to
185  <emphasis>fast forward</emphasis> to this section without any
186  profiling, and then enable profiling.  This is achieved by using
187  the command line option
188  <option><xref linkend="opt.instr-atstart"/>=no</option>
189  and running, in a shell:
190  <computeroutput>callgrind_control -i on</computeroutput> just before the
191  interesting code section is executed. To exactly specify
192  the code position where profiling should start, use the client request
193  <computeroutput><xref linkend="cr.start-instr"/></computeroutput>.</para>
194
195  <para>If you want to be able to see assembly code level annotation, specify
196  <option><xref linkend="opt.dump-instr"/>=yes</option>. This will produce
197  profile data at instruction granularity. Note that the resulting profile
198  data
199  can only be viewed with KCachegrind. For assembly annotation, it also is
200  interesting to see more details of the control flow inside of functions,
201  i.e. (conditional) jumps. This will be collected by further specifying
202  <option><xref linkend="opt.collect-jumps"/>=yes</option>.</para>
203
204  </sect2>
205
206</sect1>
207
208<sect1 id="cl-manual.usage" xreflabel="Advanced Usage">
209<title>Advanced Usage</title>
210
211  <sect2 id="cl-manual.dumps"
212         xreflabel="Multiple dumps from one program run">
213  <title>Multiple profiling dumps from one program run</title>
214
215  <para>Sometimes you are not interested in characteristics of a full
216  program run, but only of a small part of it, for example execution of one
217  algorithm.  If there are multiple algorithms, or one algorithm
218  running with different input data, it may even be useful to get different
219  profile information for different parts of a single program run.</para>
220
221  <para>Profile data files have names of the form
222<screen>
223callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis>
224</screen>
225  </para>
226  <para>where <emphasis>pid</emphasis> is the PID of the running
227  program, <emphasis>part</emphasis> is a number incremented on each
228  dump (".part" is skipped for the dump at program termination), and
229  <emphasis>threadID</emphasis> is a thread identification
230  ("-threadID" is only used if you request dumps of individual
231  threads with <option><xref linkend="opt.separate-threads"/>=yes</option>).</para>
232
233  <para>There are different ways to generate multiple profile dumps
234  while a program is running under Callgrind's supervision.  Nevertheless,
235  all methods trigger the same action, which is "dump all profile
236  information since the last dump or program start, and zero cost
237  counters afterwards".  To allow for zeroing cost counters without
238  dumping, there is a second action "zero all cost counters now".
239  The different methods are:</para>
240  <itemizedlist>
241
242    <listitem>
243      <para><command>Dump on program termination.</command>
244      This method is the standard way and doesn't need any special
245      action on your part.</para>
246    </listitem>
247
248    <listitem>
249      <para><command>Spontaneous, interactive dumping.</command> Use
250      <screen>callgrind_control -d [hint [PID/Name]]</screen> to
251      request the dumping of profile information of the supervised
252      application with PID or Name.  <emphasis>hint</emphasis> is an
253      arbitrary string you can optionally specify to later be able to
254      distinguish profile dumps.  The control program will not terminate
255      before the dump is completely written.  Note that the application
256      must be actively running for detection of the dump command. So,
257      for a GUI application, resize the window, or for a server, send a
258      request.</para>
259      <para>If you are using <ulink url="&cl-gui-url;">KCachegrind</ulink>
260      for browsing of profile information, you can use the toolbar
261      button <command>Force dump</command>. This will request a dump
262      and trigger a reload after the dump is written.</para>
263    </listitem>
264
265    <listitem>
266      <para><command>Periodic dumping after execution of a specified
267      number of basic blocks</command>. For this, use the command line
268      option <option><xref linkend="opt.dump-every-bb"/>=count</option>.
269      </para>
270    </listitem>
271
272    <listitem>
273      <para><command>Dumping at enter/leave of specified functions.</command>
274      Use the
275      option <option><xref linkend="opt.dump-before"/>=function</option>
276      and <option><xref linkend="opt.dump-after"/>=function</option>.
277      To zero cost counters before entering a function, use
278      <option><xref linkend="opt.zero-before"/>=function</option>.</para>
279      <para>You can specify these options multiple times for different
280      functions. Function specifications support wildcards: e.g. use
281      <option><xref linkend="opt.dump-before"/>='foo*'</option> to
282      generate dumps before entering any function starting with
283      <emphasis>foo</emphasis>.</para>
284    </listitem>
285
286    <listitem>
287      <para><command>Program controlled dumping.</command>
288      Insert
289      <computeroutput><xref linkend="cr.dump-stats"/>;</computeroutput>
290      at the position in your code where you want a profile dump to happen. Use
291      <computeroutput><xref linkend="cr.zero-stats"/>;</computeroutput> to only
292      zero profile counters.
293      See <xref linkend="cl-manual.clientrequests"/> for more information on
294      Callgrind specific client requests.</para>
295    </listitem>
296  </itemizedlist>
297
298  <para>If you are running a multi-threaded application and specify the
299  command line option <option><xref linkend="opt.separate-threads"/>=yes</option>,
300  every thread will be profiled on its own and will create its own
301  profile dump. Thus, the last two methods will only generate one dump
302  of the currently running thread. With the other methods, you will get
303  multiple dumps (one for each thread) on a dump request.</para>
304
305  </sect2>
306
307
308
309  <sect2 id="cl-manual.limits"
310         xreflabel="Limiting range of event collection">
311  <title>Limiting the range of collected events</title>
312
313  <para>By default, whenever events are happening (such as an
314    instruction execution or cache hit/miss), Callgrind is aggregating
315    them into event counters. However, you may be interested only in
316    what is happening within a given function or starting from a given
317    program phase. To this end, you can disable event aggregation for
318    uninteresting program parts. While attribution of events to
319    functions as well as producing seperate output per program phase
320    can be done by other means (see previous section), there are two
321    benefits by disabling aggregation. First, this is very
322    fine-granular (e.g. just for a loop within a function).  Second,
323    disabling event aggregation for complete program phases allows to
324    switch off time-consuming cache simulation and allows Callgrind to
325    progress at much higher speed with an slowdown of around factor 2
326    (identical to <computeroutput>valgrind
327    --tool=none</computeroutput>).
328  </para>
329
330  <para>There are two aspects which influence whether Callgrind is
331    aggregating events at some point in time of program execution.
332    First, there is the <emphasis>collection state</emphasis>. If this
333    is off, no aggregation will be done.  By changing the collection
334    state, you can control event aggregation at a very fine
335    granularity.  However, there is not much difference in regard to
336    execution speed of Callgrind.  By default, collection is switched
337    on, but can be disabled by different means (see below).  Second,
338    there is the <emphasis>instrumentation mode</emphasis> in which
339    Callgrind is running. This mode either can be on or off. If
340    instrumentation is off, no observation of actions in the program
341    will be done and thus, no actions will be forwarded to the
342    simulator which could trigger events. In the end, no events will
343    be aggregated.  The huge benefit is the much higher speed with
344    instrumentation switched off.  However, this only should be used
345    with care and in a coarse fashion: every mode change resets the
346    simulator state (ie. whether a memory block is cached or not) and
347    flushes Valgrinds internal cache of instrumented code blocks,
348    resulting in latency penalty at switching time. Also, cache
349    simulator results directly after switching on instrumentation will
350    be skewed due to identified cache misses which would not happen in
351    reality (if you care about this warm-up effect, you should make
352    sure to temporarly have collection state switched off directly
353    after turning instrumentation mode on). However, switching
354    instrumentation state is very useful to skip larger program phases
355    such as an initialization phase. By default, instrumentation is
356    switched on, but as with the collection state, can be changed by
357    various means.
358  </para>
359
360  <para>Callgrind can start with instrumentation mode switched off by
361    specifying
362    option <option><xref linkend="opt.instr-atstart"/>=no</option>.
363    Afterwards, instrumentation can be controlled in two ways: first,
364    interactively with: <screen>callgrind_control -i on</screen> (and
365    switching off again by specifying "off" instead of "on").  Second,
366    instrumentation state can be programatically changed with the
367    macros <computeroutput><xref linkend="cr.start-instr"/>;</computeroutput>
368    and <computeroutput><xref linkend="cr.stop-instr"/>;</computeroutput>.
369  </para>
370
371  <para>Similarly, the collection state at program start can be
372    switched off
373    by <option><xref linkend="opt.instr-atstart"/>=no</option>. During
374    execution, it can be controlled programatically with the
375    macro <computeroutput>CALLGRIND_TOGGLE_COLLECT;</computeroutput>.
376    Further, you can limit event collection to a specific function by
377    using <option><xref linkend="opt.toggle-collect"/>=function</option>.
378    This will toggle the collection state on entering and leaving the
379    specified function.  When this option is in effect, the default
380    collection state at program start is "off".  Only events happening
381    while running inside of the given function will be
382    collected. Recursive calls of the given function do not trigger
383    any action. This option can be given multiple times to specify
384    different functions of interest.</para>
385  </sect2>
386
387  <sect2 id="cl-manual.busevents" xreflabel="Counting global bus events">
388  <title>Counting global bus events</title>
389
390  <para>For access to shared data among threads in a multithreaded
391  code, synchronization is required to avoid raced conditions.
392  Synchronization primitives are usually implemented via atomic instructions.
393  However, excessive use of such instructions can lead to performance
394  issues.</para>
395
396  <para>To enable analysis of this problem, Callgrind optionally can count
397  the number of atomic instructions executed. More precisely, for x86/x86_64,
398  these are instructions using a lock prefix. For architectures supporting
399  LL/SC, these are the number of SC instructions executed. For both, the term
400  "global bus events" is used.</para>
401
402  <para>The short name of the event type used for global bus events is "Ge".
403  To count global bus events, use <option><xref linkend="clopt.collect-bus"/>=yes</option>.
404  </para>
405  </sect2>
406
407  <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles">
408  <title>Avoiding cycles</title>
409
410  <para>Informally speaking, a cycle is a group of functions which
411  call each other in a recursive way.</para>
412
413  <para>Formally speaking, a cycle is a nonempty set S of functions,
414  such that for every pair of functions F and G in S, it is possible
415  to call from F to G (possibly via intermediate functions) and also
416  from G to F.  Furthermore, S must be maximal -- that is, be the
417  largest set of functions satisfying this property.  For example, if
418  a third function H is called from inside S and calls back into S,
419  then H is also part of the cycle and should be included in S.</para>
420
421  <para>Recursion is quite usual in programs, and therefore, cycles
422  sometimes appear in the call graph output of Callgrind. However,
423  the title of this chapter should raise two questions: What is bad
424  about cycles which makes you want to avoid them? And: How can
425  cycles be avoided without changing program code?</para>
426
427  <para>Cycles are not bad in itself, but tend to make performance
428  analysis of your code harder. This is because inclusive costs
429  for calls inside of a cycle are meaningless. The definition of
430  inclusive cost, i.e. self cost of a function plus inclusive cost
431  of its callees, needs a topological order among functions. For
432  cycles, this does not hold true: callees of a function in a cycle include
433  the function itself. Therefore, KCachegrind does cycle detection
434  and skips visualization of any inclusive cost for calls inside
435  of cycles. Further, all functions in a cycle are collapsed into artifical
436  functions called like <computeroutput>Cycle 1</computeroutput>.</para>
437
438  <para>Now, when a program exposes really big cycles (as is
439  true for some GUI code, or in general code using event or callback based
440  programming style), you lose the nice property to let you pinpoint
441  the bottlenecks by following call chains from
442  <function>main</function>, guided via
443  inclusive cost. In addition, KCachegrind loses its ability to show
444  interesting parts of the call graph, as it uses inclusive costs to
445  cut off uninteresting areas.</para>
446
447  <para>Despite the meaningless of inclusive costs in cycles, the big
448  drawback for visualization motivates the possibility to temporarily
449  switch off cycle detection in KCachegrind, which can lead to
450  misguiding visualization. However, often cycles appear because of
451  unlucky superposition of independent call chains in a way that
452  the profile result will see a cycle. Neglecting uninteresting
453  calls with very small measured inclusive cost would break these
454  cycles. In such cases, incorrect handling of cycles by not detecting
455  them still gives meaningful profiling visualization.</para>
456
457  <para>It has to be noted that currently, <command>callgrind_annotate</command>
458  does not do any cycle detection at all. For program executions with function
459  recursion, it e.g. can print nonsense inclusive costs way above 100%.</para>
460
461  <para>After describing why cycles are bad for profiling, it is worth
462  talking about cycle avoidance. The key insight here is that symbols in
463  the profile data do not have to exactly match the symbols found in the
464  program. Instead, the symbol name could encode additional information
465  from the current execution context such as recursion level of the
466  current function, or even some part of the call chain leading to the
467  function. While encoding of additional information into symbols is
468  quite capable of avoiding cycles, it has to be used carefully to not cause
469  symbol explosion. The latter imposes large memory requirement for Callgrind
470  with possible out-of-memory conditions, and big profile data files.</para>
471
472  <para>A further possibility to avoid cycles in Callgrind's profile data
473  output is to simply leave out given functions in the call graph. Of course, this
474  also skips any call information from and to an ignored function, and thus can
475  break a cycle. Candidates for this typically are dispatcher functions in event
476  driven code. The option to ignore calls to a function is
477  <option><xref linkend="opt.fn-skip"/>=function</option>. Aside from
478  possibly breaking cycles, this is used in Callgrind to skip
479  trampoline functions in the PLT sections
480  for calls to functions in shared libraries. You can see the difference
481  if you profile with <option><xref linkend="opt.skip-plt"/>=no</option>.
482  If a call is ignored, its cost events will be propagated to the
483  enclosing function.</para>
484
485  <para>If you have a recursive function, you can distinguish the first
486  10 recursion levels by specifying
487  <option><xref linkend="opt.separate-recs-num"/>=function</option>.
488  Or for all functions with
489  <option><xref linkend="opt.separate-recs"/>=10</option>, but this will
490  give you much bigger profile data files.  In the profile data, you will see
491  the recursion levels of "func" as the different functions with names
492  "func", "func'2", "func'3" and so on.</para>
493
494  <para>If you have call chains "A &gt; B &gt; C" and "A &gt; C &gt; B"
495  in your program, you usually get a "false" cycle "B &lt;&gt; C". Use
496  <option><xref linkend="opt.separate-callers-num"/>=B</option>
497  <option><xref linkend="opt.separate-callers-num"/>=C</option>,
498  and functions "B" and "C" will be treated as different functions
499  depending on the direct caller. Using the apostrophe for appending
500  this "context" to the function name, you get "A &gt; B'A &gt; C'B"
501  and "A &gt; C'A &gt; B'C", and there will be no cycle. Use
502  <option><xref linkend="opt.separate-callers"/>=2</option> to get a 2-caller
503  dependency for all functions.  Note that doing this will increase
504  the size of profile data files.</para>
505
506  </sect2>
507
508  <sect2 id="cl-manual.forkingprograms" xreflabel="Forking Programs">
509  <title>Forking Programs</title>
510
511  <para>If your program forks, the child will inherit all the profiling
512  data that has been gathered for the parent. To start with empty profile
513  counter values in the child, the client request
514  <computeroutput><xref linkend="cr.zero-stats"/>;</computeroutput>
515  can be inserted into code to be executed by the child, directly after
516  <computeroutput>fork</computeroutput>.</para>
517
518  <para>However, you will have to make sure that the output file format string
519  (controlled by <option>--callgrind-out-file</option>) does contain
520  <option>%p</option> (which is true by default). Otherwise, the
521  outputs from the parent and child will overwrite each other or will be
522  intermingled, which almost certainly is not what you want.</para>
523
524  <para>You will be able to control the new child independently from
525  the parent via callgrind_control.</para>
526
527  </sect2>
528
529</sect1>
530
531
532<sect1 id="cl-manual.options" xreflabel="Callgrind Command-line Options">
533<title>Callgrind Command-line Options</title>
534
535<para>
536In the following, options are grouped into classes.
537</para>
538<para>
539Some options allow the specification of a function/symbol name, such as
540<option><xref linkend="opt.dump-before"/>=function</option>, or
541<option><xref linkend="opt.fn-skip"/>=function</option>. All these options
542can be specified multiple times for different functions.
543In addition, the function specifications actually are patterns by supporting
544the use of wildcards '*' (zero or more arbitrary characters) and '?'
545(exactly one arbitrary character), similar to file name globbing in the
546shell. This feature is important especially for C++, as without wildcard
547usage, the function would have to be specified in full extent, including
548parameter signature. </para>
549
550<sect2 id="cl-manual.options.creation"
551       xreflabel="Dump creation options">
552<title>Dump creation options</title>
553
554<para>
555These options influence the name and format of the profile data files.
556</para>
557
558<!-- start of xi:include in the manpage -->
559<variablelist id="cl.opts.list.creation">
560
561  <varlistentry id="opt.callgrind-out-file" xreflabel="--callgrind-out-file">
562    <term>
563      <option><![CDATA[--callgrind-out-file=<file> ]]></option>
564    </term>
565    <listitem>
566      <para>Write the profile data to
567            <computeroutput>file</computeroutput> rather than to the default
568            output file,
569            <computeroutput>callgrind.out.&lt;pid&gt;</computeroutput>.  The
570            <option>%p</option> and <option>%q</option> format specifiers
571            can be used to embed the process ID and/or the contents of an
572            environment variable in the name, as is the case for the core
573            option <option><xref linkend="opt.log-file"/></option>.
574            When multiple dumps are made, the file name
575            is modified further; see below.</para>
576    </listitem>
577  </varlistentry>
578
579  <varlistentry id="opt.dump-line" xreflabel="--dump-line">
580    <term>
581      <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option>
582    </term>
583    <listitem>
584      <para>This specifies that event counting should be performed at
585      source line granularity. This allows source annotation for sources
586      which are compiled with debug information
587      (<option>-g</option>).</para>
588  </listitem>
589  </varlistentry>
590
591  <varlistentry id="opt.dump-instr" xreflabel="--dump-instr">
592    <term>
593      <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option>
594    </term>
595    <listitem>
596      <para>This specifies that event counting should be performed at
597      per-instruction granularity.
598      This allows for assembly code
599      annotation.  Currently the results can only be
600      displayed by KCachegrind.</para>
601  </listitem>
602  </varlistentry>
603
604  <varlistentry id="opt.compress-strings" xreflabel="--compress-strings">
605    <term>
606      <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option>
607    </term>
608    <listitem>
609      <para>This option influences the output format of the profile data.
610      It specifies whether strings (file and function names) should be
611      identified by numbers. This shrinks the file,
612      but makes it more difficult
613      for humans to read (which is not recommended in any case).</para>
614    </listitem>
615  </varlistentry>
616
617  <varlistentry id="opt.compress-pos" xreflabel="--compress-pos">
618    <term>
619      <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option>
620    </term>
621    <listitem>
622      <para>This option influences the output format of the profile data.
623      It specifies whether numerical positions are always specified as absolute
624      values or are allowed to be relative to previous numbers.
625      This shrinks the file size.</para>
626    </listitem>
627  </varlistentry>
628
629  <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps">
630    <term>
631      <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option>
632    </term>
633    <listitem>
634      <para>When enabled, when multiple profile data parts are to be
635      generated these parts are appended to the same output file.
636      Not recommended.</para>
637  </listitem>
638  </varlistentry>
639
640</variablelist>
641</sect2>
642
643<sect2 id="cl-manual.options.activity"
644       xreflabel="Activity options">
645<title>Activity options</title>
646
647<para>
648These options specify when actions relating to event counts are to
649be executed. For interactive control use callgrind_control.
650</para>
651
652<!-- start of xi:include in the manpage -->
653<variablelist id="cl.opts.list.activity">
654
655  <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb">
656    <term>
657      <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option>
658    </term>
659    <listitem>
660      <para>Dump profile data every <option>count</option> basic blocks.
661      Whether a dump is needed is only checked when Valgrind's internal
662      scheduler is run. Therefore, the minimum setting useful is about 100000.
663      The count is a 64-bit value to make long dump periods possible.
664      </para>
665    </listitem>
666  </varlistentry>
667
668  <varlistentry id="opt.dump-before" xreflabel="--dump-before">
669    <term>
670      <option><![CDATA[--dump-before=<function> ]]></option>
671    </term>
672    <listitem>
673      <para>Dump when entering <option>function</option>.</para>
674    </listitem>
675  </varlistentry>
676
677  <varlistentry id="opt.zero-before" xreflabel="--zero-before">
678    <term>
679      <option><![CDATA[--zero-before=<function> ]]></option>
680    </term>
681    <listitem>
682      <para>Zero all costs when entering <option>function</option>.</para>
683    </listitem>
684  </varlistentry>
685
686  <varlistentry id="opt.dump-after" xreflabel="--dump-after">
687    <term>
688      <option><![CDATA[--dump-after=<function> ]]></option>
689    </term>
690    <listitem>
691      <para>Dump when leaving <option>function</option>.</para>
692    </listitem>
693  </varlistentry>
694
695</variablelist>
696<!-- end of xi:include in the manpage -->
697</sect2>
698
699<sect2 id="cl-manual.options.collection"
700       xreflabel="Data collection options">
701<title>Data collection options</title>
702
703<para>
704These options specify when events are to be aggregated into event counts.
705Also see <xref linkend="cl-manual.limits"/>.</para>
706
707<!-- start of xi:include in the manpage -->
708<variablelist id="cl.opts.list.collection">
709
710  <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart">
711    <term>
712      <option><![CDATA[--instr-atstart=<yes|no> [default: yes] ]]></option>
713    </term>
714    <listitem>
715      <para>Specify if you want Callgrind to start simulation and
716      profiling from the beginning of the program.
717      When set to <computeroutput>no</computeroutput>,
718      Callgrind will not be able
719      to collect any information, including calls, but it will have at
720      most a slowdown of around 4, which is the minimum Valgrind
721      overhead.  Instrumentation can be interactively enabled via
722      <computeroutput>callgrind_control -i on</computeroutput>.</para>
723      <para>Note that the resulting call graph will most probably not
724      contain <function>main</function>, but will contain all the
725      functions executed after instrumentation was enabled.
726      Instrumentation can also programatically enabled/disabled. See the
727      Callgrind include file
728      <computeroutput>callgrind.h</computeroutput> for the macro
729      you have to use in your source code.</para> <para>For cache
730      simulation, results will be less accurate when switching on
731      instrumentation later in the program run, as the simulator starts
732      with an empty cache at that moment.  Switch on event collection
733      later to cope with this error.</para>
734    </listitem>
735  </varlistentry>
736
737  <varlistentry id="opt.collect-atstart" xreflabel="--collect-atstart">
738    <term>
739      <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option>
740    </term>
741    <listitem>
742      <para>Specify whether event collection is enabled at beginning
743      of the profile run.</para>
744      <para>To only look at parts of your program, you have two
745      possibilities:</para>
746      <orderedlist>
747      <listitem>
748        <para>Zero event counters before entering the program part you
749        want to profile, and dump the event counters to a file after
750        leaving that program part.</para>
751        </listitem>
752        <listitem>
753          <para>Switch on/off collection state as needed to only see
754          event counters happening while inside of the program part you
755          want to profile.</para>
756        </listitem>
757      </orderedlist>
758      <para>The second option can be used if the program part you want to
759      profile is called many times. Option 1, i.e. creating a lot of
760      dumps is not practical here.</para>
761      <para>Collection state can be
762      toggled at entry and exit of a given function with the
763      option <option><xref linkend="opt.toggle-collect"/></option>.  If you
764      use this option, collection
765      state should be disabled at the beginning.  Note that the
766      specification of <option>--toggle-collect</option>
767      implicitly sets
768      <option>--collect-state=no</option>.</para>
769      <para>Collection state can be toggled also by inserting the client request
770      <computeroutput>
771      <!-- commented out because it causes broken links in the man page
772      <xref linkend="cr.toggle-collect"/>;
773      -->
774      CALLGRIND_TOGGLE_COLLECT
775      ;</computeroutput>
776      at the needed code positions.</para>
777    </listitem>
778  </varlistentry>
779
780  <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect">
781    <term>
782      <option><![CDATA[--toggle-collect=<function> ]]></option>
783    </term>
784    <listitem>
785      <para>Toggle collection on entry/exit of <option>function</option>.</para>
786    </listitem>
787  </varlistentry>
788
789  <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps">
790    <term>
791      <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option>
792    </term>
793    <listitem>
794      <para>This specifies whether information for (conditional) jumps
795      should be collected.  As above, callgrind_annotate currently is not
796      able to show you the data.  You have to use KCachegrind to get jump
797      arrows in the annotated code.</para>
798    </listitem>
799  </varlistentry>
800
801  <varlistentry id="opt.collect-systime" xreflabel="--collect-systime">
802    <term>
803      <option><![CDATA[--collect-systime=<no|yes> [default: no] ]]></option>
804    </term>
805    <listitem>
806      <para>This specifies whether information for system call times
807      should be collected.</para>
808    </listitem>
809  </varlistentry>
810
811  <varlistentry id="clopt.collect-bus" xreflabel="--collect-bus">
812    <term>
813      <option><![CDATA[--collect-bus=<no|yes> [default: no] ]]></option>
814    </term>
815    <listitem>
816      <para>This specifies whether the number of global bus events executed
817      should be collected. The event type "Ge" is used for these events.</para>
818    </listitem>
819  </varlistentry>
820
821</variablelist>
822<!-- end of xi:include in the manpage -->
823</sect2>
824
825<sect2 id="cl-manual.options.separation"
826       xreflabel="Cost entity separation options">
827<title>Cost entity separation options</title>
828
829<para>
830These options specify how event counts should be attributed to execution
831contexts.
832For example, they specify whether the recursion level or the
833call chain leading to a function should be taken into account,
834and whether the thread ID should be considered.
835Also see <xref linkend="cl-manual.cycles"/>.</para>
836
837<!-- start of xi:include in the manpage -->
838<variablelist id="cmd-options.separation">
839
840  <varlistentry id="opt.separate-threads" xreflabel="--separate-threads">
841    <term>
842      <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option>
843    </term>
844    <listitem>
845      <para>This option specifies whether profile data should be generated
846      separately for every thread. If yes, the file names get "-threadID"
847      appended.</para>
848    </listitem>
849  </varlistentry>
850
851  <varlistentry id="opt.separate-callers" xreflabel="--separate-callers">
852    <term>
853      <option><![CDATA[--separate-callers=<callers> [default: 0] ]]></option>
854    </term>
855    <listitem>
856      <para>Separate contexts by at most &lt;callers&gt; functions in the
857      call chain. See <xref linkend="cl-manual.cycles"/>.</para>
858    </listitem>
859  </varlistentry>
860
861  <varlistentry id="opt.separate-callers-num" xreflabel="--separate-callers2">
862    <term>
863      <option><![CDATA[--separate-callers<number>=<function> ]]></option>
864    </term>
865    <listitem>
866      <para>Separate <option>number</option> callers for <option>function</option>.
867      See <xref linkend="cl-manual.cycles"/>.</para>
868    </listitem>
869  </varlistentry>
870
871  <varlistentry id="opt.separate-recs" xreflabel="--separate-recs">
872    <term>
873      <option><![CDATA[--separate-recs=<level> [default: 2] ]]></option>
874    </term>
875    <listitem>
876      <para>Separate function recursions by at most <option>level</option> levels.
877      See <xref linkend="cl-manual.cycles"/>.</para>
878    </listitem>
879  </varlistentry>
880
881  <varlistentry id="opt.separate-recs-num" xreflabel="--separate-recs10">
882    <term>
883      <option><![CDATA[--separate-recs<number>=<function> ]]></option>
884    </term>
885    <listitem>
886      <para>Separate <option>number</option> recursions for <option>function</option>.
887      See <xref linkend="cl-manual.cycles"/>.</para>
888    </listitem>
889  </varlistentry>
890
891  <varlistentry id="opt.skip-plt" xreflabel="--skip-plt">
892    <term>
893      <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option>
894    </term>
895    <listitem>
896      <para>Ignore calls to/from PLT sections.</para>
897    </listitem>
898  </varlistentry>
899
900  <varlistentry id="opt.skip-direct-rec" xreflabel="--skip-direct-rec">
901    <term>
902      <option><![CDATA[--skip-direct-rec=<no|yes> [default: yes] ]]></option>
903    </term>
904    <listitem>
905      <para>Ignore direct recursions.</para>
906    </listitem>
907  </varlistentry>
908
909  <varlistentry id="opt.fn-skip" xreflabel="--fn-skip">
910    <term>
911      <option><![CDATA[--fn-skip=<function> ]]></option>
912    </term>
913    <listitem>
914      <para>Ignore calls to/from a given function.  E.g. if you have a
915      call chain A &gt; B &gt; C, and you specify function B to be
916      ignored, you will only see A &gt; C.</para>
917      <para>This is very convenient to skip functions handling callback
918      behaviour.  For example, with the signal/slot mechanism in the
919      Qt graphics library, you only want
920      to see the function emitting a signal to call the slots connected
921      to that signal. First, determine the real call chain to see the
922      functions needed to be skipped, then use this option.</para>
923    </listitem>
924  </varlistentry>
925
926<!--
927    commenting out as it is only enabled with CLG_EXPERIMENTAL.  (Nb: I had to
928    insert a space between the double dash to avoid XML comment problems.)
929
930  <varlistentry id="opt.fn-group">
931    <term>
932      <option><![CDATA[- -fn-group<number>=<function> ]]></option>
933    </term>
934    <listitem>
935      <para>Put a function into a separate group. This influences the
936      context name for cycle avoidance. All functions inside such a
937      group are treated as being the same for context name building, which
938      resembles the call chain leading to a context. By specifying function
939      groups with this option, you can shorten the context name, as functions
940      in the same group will not appear in sequence in the name. </para>
941    </listitem>
942  </varlistentry>
943-->
944
945</variablelist>
946<!-- end of xi:include in the manpage -->
947</sect2>
948
949
950<sect2 id="cl-manual.options.simulation"
951       xreflabel="Simulation options">
952<title>Simulation options</title>
953
954<!-- start of xi:include in the manpage -->
955<variablelist id="cl.opts.list.simulation">
956
957  <varlistentry id="clopt.cache-sim" xreflabel="--cache-sim">
958    <term>
959      <option><![CDATA[--cache-sim=<yes|no> [default: no] ]]></option>
960    </term>
961    <listitem>
962      <para>Specify if you want to do full cache simulation.  By default,
963      only instruction read accesses will be counted ("Ir").
964      With cache simulation, further event counters are enabled:
965      Cache misses on instruction reads ("I1mr"/"ILmr"),
966      data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"),
967      data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw").
968      For more information, see <xref linkend="&vg-cg-manual-id;"/>.
969      </para>
970    </listitem>
971  </varlistentry>
972
973  <varlistentry id="clopt.branch-sim" xreflabel="--branch-sim">
974    <term>
975      <option><![CDATA[--branch-sim=<yes|no> [default: no] ]]></option>
976    </term>
977    <listitem>
978      <para>Specify if you want to do branch prediction simulation.
979      Further event counters are enabled: Number of executed conditional
980      branches and related predictor misses ("Bc"/"Bcm"), executed indirect
981      jumps and related misses of the jump address predictor ("Bi"/"Bim").
982      </para>
983    </listitem>
984  </varlistentry>
985
986</variablelist>
987<!-- end of xi:include in the manpage -->
988</sect2>
989
990
991<sect2 id="cl-manual.options.cachesimulation"
992       xreflabel="Cache simulation options">
993<title>Cache simulation options</title>
994
995<!-- start of xi:include in the manpage -->
996<variablelist id="cl.opts.list.cachesimulation">
997
998  <varlistentry id="opt.simulate-wb" xreflabel="--simulate-wb">
999    <term>
1000      <option><![CDATA[--simulate-wb=<yes|no> [default: no] ]]></option>
1001    </term>
1002    <listitem>
1003      <para>Specify whether write-back behavior should be simulated, allowing
1004      to distinguish LL caches misses with and without write backs.
1005      The cache model of Cachegrind/Callgrind does not specify write-through
1006      vs. write-back behavior, and this also is not relevant for the number
1007      of generated miss counts. However, with explicit write-back simulation
1008      it can be decided whether a miss triggers not only the loading of a new
1009      cache line, but also if a write back of a dirty cache line had to take
1010      place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw,
1011      for misses because of instruction read, data read, and data write,
1012      respectively. As they produce two memory transactions, they should
1013      account for a doubled time estimation in relation to a normal miss.
1014      </para>
1015    </listitem>
1016  </varlistentry>
1017
1018  <varlistentry id="opt.simulate-hwpref" xreflabel="--simulate-hwpref">
1019    <term>
1020      <option><![CDATA[--simulate-hwpref=<yes|no> [default: no] ]]></option>
1021    </term>
1022    <listitem>
1023      <para>Specify whether simulation of a hardware prefetcher should be
1024      added which is able to detect stream access in the second level cache
1025      by comparing accesses to separate to each page.
1026      As the simulation can not decide about any timing issues of prefetching,
1027      it is assumed that any hardware prefetch triggered succeeds before a
1028      real access is done. Thus, this gives a best-case scenario by covering
1029      all possible stream accesses.</para>
1030    </listitem>
1031  </varlistentry>
1032
1033  <varlistentry id="opt.cacheuse" xreflabel="--cacheuse">
1034    <term>
1035      <option><![CDATA[--cacheuse=<yes|no> [default: no] ]]></option>
1036    </term>
1037    <listitem>
1038      <para>Specify whether cache line use should be collected. For every
1039      cache line, from loading to it being evicted, the number of accesses
1040      as well as the number of actually used bytes is determined. This
1041      behavior is related to the code which triggered loading of the cache
1042      line. In contrast to miss counters, which shows the position where
1043      the symptoms of bad cache behavior (i.e. latencies) happens, the
1044      use counters try to pinpoint at the reason (i.e. the code with the
1045      bad access behavior). The new counters are defined in a way such
1046      that worse behavior results in higher cost.
1047      AcCost1 and AcCost2 are counters showing bad temporal locality
1048      for L1 and LL caches, respectively. This is done by summing up
1049      reciprocal values of the numbers of accesses of each cache line,
1050      multiplied by 1000 (as only integer costs are allowed). E.g. for
1051      a given source line with 5 read accesses, a value of 5000 AcCost
1052      means that for every access, a new cache line was loaded and directly
1053      evicted afterwards without further accesses. Similarly, SpLoss1/2
1054      shows bad spatial locality for L1 and LL caches, respectively. It
1055      gives the <emphasis>spatial loss</emphasis> count of bytes which
1056      were loaded into cache but never accessed. It pinpoints at code
1057      accessing data in a way such that cache space is wasted. This hints
1058      at bad layout of data structures in memory. Assuming a cache line
1059      size of 64 bytes and 100 L1 misses for a given source line, the
1060      loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a
1061      value of 3200 for this line, this means that half of the loaded data was
1062      never used, or using a better data layout, only half of the cache
1063      space would have been needed.
1064      Please note that for cache line use counters, it currently is
1065      not possible to provide meaningful inclusive costs. Therefore,
1066      inclusive cost of these counters should be ignored.
1067      </para>
1068    </listitem>
1069  </varlistentry>
1070
1071  <varlistentry id="opt.I1" xreflabel="--I1">
1072    <term>
1073      <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
1074    </term>
1075    <listitem>
1076      <para>Specify the size, associativity and line size of the level 1
1077      instruction cache.  </para>
1078    </listitem>
1079  </varlistentry>
1080
1081  <varlistentry id="opt.D1" xreflabel="--D1">
1082    <term>
1083      <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
1084    </term>
1085    <listitem>
1086      <para>Specify the size, associativity and line size of the level 1
1087      data cache.</para>
1088    </listitem>
1089  </varlistentry>
1090
1091  <varlistentry id="opt.LL" xreflabel="--LL">
1092    <term>
1093      <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
1094    </term>
1095    <listitem>
1096      <para>Specify the size, associativity and line size of the last-level
1097      cache.</para>
1098    </listitem>
1099  </varlistentry>
1100</variablelist>
1101<!-- end of xi:include in the manpage -->
1102
1103</sect2>
1104
1105</sect1>
1106
1107<sect1 id="cl-manual.monitor-commands" xreflabel="Callgrind Monitor Commands">
1108<title>Callgrind Monitor Commands</title>
1109<para>The Callgrind tool provides monitor commands handled by the Valgrind
1110gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>).
1111</para>
1112
1113<itemizedlist>
1114  <listitem>
1115    <para><varname>dump [&lt;dump_hint&gt;]</varname> requests to dump the
1116    profile data. </para>
1117  </listitem>
1118
1119  <listitem>
1120    <para><varname>zero</varname> requests to zero the profile data
1121    counters. </para>
1122  </listitem>
1123
1124  <listitem>
1125    <para><varname>instrumentation [on|off]</varname> requests to set
1126    (if parameter on/off is given) or get the current instrumentation state.
1127    </para>
1128  </listitem>
1129
1130  <listitem>
1131    <para><varname>status</varname> requests to print out some status
1132    information.</para>
1133  </listitem>
1134
1135</itemizedlist>
1136</sect1>
1137
1138<sect1 id="cl-manual.clientrequests" xreflabel="Client request reference">
1139<title>Callgrind specific client requests</title>
1140
1141<para>Callgrind provides the following specific client requests in
1142<filename>callgrind.h</filename>.  See that file for the exact details of
1143their arguments.</para>
1144
1145<variablelist id="cl.clientrequests.list">
1146
1147  <varlistentry id="cr.dump-stats" xreflabel="CALLGRIND_DUMP_STATS">
1148    <term>
1149      <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>
1150    </term>
1151    <listitem>
1152      <para>Force generation of a profile dump at specified position
1153      in code, for the current thread only. Written counters will be reset
1154      to zero.</para>
1155    </listitem>
1156  </varlistentry>
1157
1158  <varlistentry id="cr.dump-stats-at" xreflabel="CALLGRIND_DUMP_STATS_AT">
1159    <term>
1160      <computeroutput>CALLGRIND_DUMP_STATS_AT(string)</computeroutput>
1161    </term>
1162    <listitem>
1163      <para>Same as <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>,
1164      but allows to specify a string to be able to distinguish profile
1165      dumps.</para>
1166    </listitem>
1167  </varlistentry>
1168
1169  <varlistentry id="cr.zero-stats" xreflabel="CALLGRIND_ZERO_STATS">
1170    <term>
1171      <computeroutput>CALLGRIND_ZERO_STATS</computeroutput>
1172    </term>
1173    <listitem>
1174      <para>Reset the profile counters for the current thread to zero.</para>
1175    </listitem>
1176  </varlistentry>
1177
1178  <varlistentry id="cr.toggle-collect" xreflabel="CALLGRIND_TOGGLE_COLLECT">
1179    <term>
1180      <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput>
1181    </term>
1182    <listitem>
1183      <para>Toggle the collection state. This allows to ignore events
1184      with regard to profile counters. See also options
1185      <option><xref linkend="opt.collect-atstart"/></option> and
1186      <option><xref linkend="opt.toggle-collect"/></option>.</para>
1187    </listitem>
1188  </varlistentry>
1189
1190  <varlistentry id="cr.start-instr" xreflabel="CALLGRIND_START_INSTRUMENTATION">
1191    <term>
1192      <computeroutput>CALLGRIND_START_INSTRUMENTATION</computeroutput>
1193    </term>
1194    <listitem>
1195      <para>Start full Callgrind instrumentation if not already enabled.
1196      When cache simulation is done, this will flush the simulated cache
1197      and lead to an artifical cache warmup phase afterwards with
1198      cache misses which would not have happened in reality.  See also
1199      option <option><xref linkend="opt.instr-atstart"/></option>.</para>
1200    </listitem>
1201  </varlistentry>
1202
1203  <varlistentry id="cr.stop-instr" xreflabel="CALLGRIND_STOP_INSTRUMENTATION">
1204    <term>
1205      <computeroutput>CALLGRIND_STOP_INSTRUMENTATION</computeroutput>
1206    </term>
1207    <listitem>
1208      <para>Stop full Callgrind instrumentation if not already disabled.
1209      This flushes Valgrinds translation cache, and does no additional
1210      instrumentation afterwards: it effectivly will run at the same
1211      speed as Nulgrind, i.e. at minimal slowdown. Use this to
1212      speed up the Callgrind run for uninteresting code parts. Use
1213      <computeroutput><xref linkend="cr.start-instr"/></computeroutput> to
1214      enable instrumentation again.  See also option
1215      <option><xref linkend="opt.instr-atstart"/></option>.</para>
1216    </listitem>
1217  </varlistentry>
1218
1219</variablelist>
1220
1221</sect1>
1222
1223
1224
1225<sect1 id="cl-manual.callgrind_annotate-options" xreflabel="callgrind_annotate Command-line Options">
1226<title>callgrind_annotate Command-line Options</title>
1227
1228<!-- start of xi:include in the manpage -->
1229<variablelist id="callgrind_annotate.opts.list">
1230
1231  <varlistentry>
1232    <term><option>-h --help</option></term>
1233    <listitem>
1234      <para>Show summary of options.</para>
1235    </listitem>
1236  </varlistentry>
1237
1238  <varlistentry>
1239    <term><option>--version</option></term>
1240    <listitem>
1241      <para>Show version of callgrind_annotate.</para>
1242    </listitem>
1243  </varlistentry>
1244
1245  <varlistentry>
1246    <term>
1247      <option>--show=A,B,C [default: all]</option>
1248    </term>
1249    <listitem>
1250      <para>Only show figures for events A,B,C.</para>
1251    </listitem>
1252  </varlistentry>
1253
1254  <varlistentry>
1255    <term>
1256      <option>--sort=A,B,C</option>
1257    </term>
1258    <listitem>
1259      <para>Sort columns by events A,B,C [event column order].</para>
1260    </listitem>
1261  </varlistentry>
1262
1263  <varlistentry>
1264    <term>
1265      <option><![CDATA[--threshold=<0--100> [default: 99%] ]]></option>
1266    </term>
1267    <listitem>
1268      <para>Percentage of counts (of primary sort event) we are
1269      interested in.</para>
1270    </listitem>
1271  </varlistentry>
1272
1273  <varlistentry>
1274    <term>
1275      <option><![CDATA[--auto=<yes|no> [default: no] ]]></option>
1276    </term>
1277    <listitem>
1278      <para>Annotate all source files containing functions that helped
1279      reach the event count threshold.</para>
1280    </listitem>
1281  </varlistentry>
1282
1283  <varlistentry>
1284    <term>
1285      <option>--context=N [default: 8] </option>
1286    </term>
1287    <listitem>
1288      <para>Print N lines of context before and after annotated
1289      lines.</para>
1290    </listitem>
1291  </varlistentry>
1292
1293  <varlistentry>
1294    <term>
1295      <option><![CDATA[--inclusive=<yes|no> [default: no] ]]></option>
1296    </term>
1297    <listitem>
1298      <para>Add subroutine costs to functions calls.</para>
1299    </listitem>
1300  </varlistentry>
1301
1302  <varlistentry>
1303    <term>
1304      <option><![CDATA[--tree=<none|caller|calling|both> [default: none] ]]></option>
1305    </term>
1306    <listitem>
1307      <para>Print for each function their callers, the called functions
1308      or both.</para>
1309    </listitem>
1310  </varlistentry>
1311
1312  <varlistentry>
1313    <term>
1314      <option><![CDATA[-I, --include=<dir> ]]></option>
1315    </term>
1316    <listitem>
1317      <para>Add <option>dir</option> to the list of directories to search
1318      for source files.</para>
1319  </listitem>
1320  </varlistentry>
1321
1322</variablelist>
1323<!-- end of xi:include in the manpage -->
1324
1325
1326</sect1>
1327
1328
1329
1330
1331<sect1 id="cl-manual.callgrind_control-options" xreflabel="callgrind_control Command-line Options">
1332<title>callgrind_control Command-line Options</title>
1333
1334<para>By default, callgrind_control acts on all programs run by the
1335  current user under Callgrind.  It is possible to limit the actions to
1336  specified Callgrind runs by providing a list of pids or program names as
1337  argument.  The default action is to give some brief information about the
1338  applications being run under Callgrind.</para>
1339
1340<!-- start of xi:include in the manpage -->
1341<variablelist id="callgrind_control.opts.list">
1342
1343  <varlistentry>
1344    <term><option>-h --help</option></term>
1345    <listitem>
1346      <para>Show a short description, usage, and summary of options.</para>
1347    </listitem>
1348  </varlistentry>
1349
1350  <varlistentry>
1351    <term><option>--version</option></term>
1352    <listitem>
1353      <para>Show version of callgrind_control.</para>
1354    </listitem>
1355  </varlistentry>
1356
1357  <varlistentry>
1358    <term><option>-l --long</option></term>
1359    <listitem>
1360      <para>Show also the working directory, in addition to the brief
1361      information given by default.
1362      </para>
1363    </listitem>
1364  </varlistentry>
1365
1366  <varlistentry>
1367    <term><option>-s --stat</option></term>
1368    <listitem>
1369      <para>Show statistics information about active Callgrind runs.</para>
1370    </listitem>
1371  </varlistentry>
1372
1373  <varlistentry>
1374    <term><option>-b --back</option></term>
1375    <listitem>
1376      <para>Show stack/back traces of each thread in active Callgrind runs. For
1377      each active function in the stack trace, also the number of invocations
1378      since program start (or last dump) is shown. This option can be
1379      combined with -e to show inclusive cost of active functions.</para>
1380    </listitem>
1381  </varlistentry>
1382
1383  <varlistentry>
1384    <term><option><![CDATA[-e [A,B,...] ]]></option> (default: all)</term>
1385    <listitem>
1386      <para>Show the current per-thread, exclusive cost values of event
1387      counters. If no explicit event names are given, figures for all event
1388      types which are collected in the given Callgrind run are
1389      shown. Otherwise, only figures for event types A, B, ... are shown. If
1390      this option is combined with -b, inclusive cost for the functions of
1391      each active stack frame is provided, too.
1392      </para>
1393    </listitem>
1394  </varlistentry>
1395
1396  <varlistentry>
1397    <term><option><![CDATA[--dump[=<desc>] ]]></option> (default: no description)</term>
1398    <listitem>
1399      <para>Request the dumping of profile information. Optionally, a
1400      description can be specified which is written into the dump as part of
1401      the information giving the reason which triggered the dump action. This
1402      can be used to distinguish multiple dumps.</para>
1403    </listitem>
1404  </varlistentry>
1405
1406  <varlistentry>
1407    <term><option>-z --zero</option></term>
1408    <listitem>
1409      <para>Zero all event counters.</para>
1410    </listitem>
1411  </varlistentry>
1412
1413  <varlistentry>
1414    <term><option>-k --kill</option></term>
1415    <listitem>
1416      <para>Force a Callgrind run to be terminated.</para>
1417    </listitem>
1418  </varlistentry>
1419
1420  <varlistentry>
1421    <term><option><![CDATA[--instr=<on|off>]]></option></term>
1422    <listitem>
1423      <para>Switch instrumentation mode on or off. If a Callgrind run has
1424      instrumentation disabled, no simulation is done and no events are
1425      counted. This is useful to skip uninteresting program parts, as there
1426      is much less slowdown (same as with the Valgrind tool "none"). See also
1427      the Callgrind option <option>--instr-atstart</option>.</para>
1428    </listitem>
1429  </varlistentry>
1430
1431  <varlistentry>
1432    <term><option><![CDATA[--vgdb-prefix=<prefix>]]></option></term>
1433    <listitem>
1434      <para>Specify the vgdb prefix to use by callgrind_control.
1435      callgrind_control internally uses vgdb to find and control the active
1436      Callgrind runs. If the <option>--vgdb-prefix</option> option was used
1437      for launching valgrind, then the same option must be given to
1438      callgrind_control.</para>
1439    </listitem>
1440  </varlistentry>
1441</variablelist>
1442<!-- end of xi:include in the manpage -->
1443
1444</sect1>
1445
1446</chapter>
1447