1<?xml version="1.0" encoding="UTF-8" ?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
4<head>
5  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6  <link rel="stylesheet" href="resources/doc.css" charset="UTF-8" type="text/css" />
7  <link rel="stylesheet" href="../coverage/jacoco-resources/prettify.css" charset="UTF-8" type="text/css" />
8  <link rel="shortcut icon" href="resources/report.gif" type="image/gif" />
9  <script type="text/javascript" src="../coverage/jacoco-resources/prettify.js"></script>
10  <title>JaCoCo - Control Flow Analysis</title>
11</head>
12<body onload="prettyPrint()">
13
14<div class="breadcrumb">
15  <a href="../index.html" class="el_report">JaCoCo</a> &gt;
16  <a href="index.html" class="el_group">Documentation</a> &gt;
17  <span class="el_source">Control Flow Analysis</span>
18</div>
19<div id="content">
20
21<h1>Control Flow Analysis for Java Methods</h1>
22
23<p class="hint">
24  Implementing a coverage tool that supports statement (C0) as well as branch
25  coverage coverage (C1) requires detailed analysis of the internal control flow
26  of Java methods. Due to the architecture of JaCoCo this analysis happens on
27  the bytecode of compiled class files. This document describes JaCoCo's
28  strategies for inserting probes into the control flow at runtime and analyzing
29  the actual code coverage. Marc R. Hoffmann, November 2011
30</p>
31
32<h2>Control Flow Graphs for Java Bytecode</h2>
33
34<p>
35  As an starting point we take the following example method that contains a
36  single branching point:
37</p>
38
39<pre class="source lang-java linenums">
40public static void example() {
41    a();
42    if (cond()) {
43        b();
44    } else {
45        c();
46    }
47    d();
48}
49</pre>
50
51<p>
52  A Java compiler will create the following bytecode from this example method.
53  Java bytecode is a linear sequence of instructions. Control flow is
54  implemented with <i>jump</i> instructions like the conditional
55  <code>IFEQ</code> or the unconditional <code>GOTO</code> opcode. The jump
56  targets are technically relative offsets to the target instruction. For better
57  readability we use symbolic labels (<code>L1</code>, <code>L2</code>) instead
58  (also the ASM API uses such symbolic labels):
59</p>
60
61<pre class="source linenums">
62public static example()V
63      INVOKESTATIC a()V
64      INVOKESTATIC cond()Z
65      IFEQ L1
66      INVOKESTATIC b()V
67      GOTO L2
68  L1: INVOKESTATIC c()V
69  L2: INVOKESTATIC d()V
70      RETURN
71</pre>
72
73<p>
74  The possible control flow in the bytecode above can be represented by a graph.
75  The nodes are byte code instruction, the edged of the graph represent the
76  possible control flow between the instructions. The control flow of the
77  example is shown in the left box of this diagram:
78</p>
79
80<img src="resources/flow-example.png" alt="Bytecode Control Flow"/>
81
82
83<h3>Flow Edges</h3>
84
85<p>
86  The control flow graph of a Java method defined by Java byte code may have
87  the following Edges. Each edge connects a source instruction with a target
88  instruction. In some cases the source instruction or the target instruction
89  does not exist (virtual edges for method entry and exit) or cannot be
90  exactly specified (exception handlers).
91</p>
92
93<table class="coverage">
94  <thead>
95    <tr>
96      <td>Type</td>
97      <td>Source</td>
98      <td>Target</td>
99      <td>Remarks</td>
100    </tr>
101  </thead>
102  <tbody>
103    <tr>
104      <td>ENTRY</td>
105      <td>-</td>
106      <td>First instruction in method</td>
107      <td></td>
108    </tr>
109    <tr>
110      <td>SEQUENCE</td>
111      <td>Instruction, except <code>GOTO</code>, <code>xRETURN</code>,
112        <code>THROW</code>, <code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code></td>
113      <td>Subsequent instruction</td>
114      <td></td>
115    </tr>
116    <tr>
117      <td>JUMP</td>
118      <td><code>GOTO</code>, <code>IFx</code>, <code>TABLESWITCH</code> or
119        <code>LOOKUPSWITCH</code>  instruction</td>
120      <td>Target instruction</td>
121      <td><code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code> will define
122        multiple edges.</td>
123    </tr>
124    <tr>
125      <td>EXHANDLER</td>
126      <td>Any instruction in handler scope</td>
127      <td>Target instruction</td>
128      <td></td>
129    </tr>
130    <tr>
131      <td>EXIT</td>
132      <td><code>xRETURN</code> or <code>THROW</code> instruction</td>
133      <td>-</td>
134      <td></td>
135    </tr>
136    <tr>
137      <td>EXEXIT</td>
138      <td>Any instruction</td>
139      <td>-</td>
140      <td>Unhandled exception.</td>
141    </tr>
142  </tbody>
143</table>
144
145<p>
146  The current JaCoCo implementation ignores edges caused by implicit exceptions
147  and the the method entry. This means we consider SEQUENCE, JUMP, EXIT.
148</p>
149
150
151<h2>Probe Insertion Strategy</h2>
152
153<p>
154  Probes are additional instructions that can be inserted between existing
155  instructions. They do not change the behavior of the method but record the
156  fact that they have been executed. One can think probes are placed on edges of
157  the control flow graph. Theoretically we could insert a probe at every edge of
158  the control flow graph. As a probe implementation itself requires multiple
159  bytecode instructions this would increase the size of the class files several
160  times and significantly slow down execution speed of the instrumented classes.
161  Fortunately this is not required, in fact we only need a few probes per method
162  depending on the control flow of the method. For example a method without any
163  branches requires a single probe only. The reason for this is that starting
164  from a certain probe we can back-trace the execution path and typically get
165  coverage information for multiple instructions.
166</p>
167
168<p>
169  If a probe has been executed we know that the corresponding edge has been
170  visited. From this edge we can conclude to other preceding nodes and edges:
171</p>
172
173<ul>
174  <li>If a edge has been visited, we know that the source node of the this edge
175      has been executed.</li>
176  <li>If a node has been executed and the node is the target of only one edge
177      we know that this edge has been visited.</li>
178</ul>
179
180<p>
181  Recursively applying these rules allows to determine the execution status of
182  all instructions of a method &ndash; given that we have probes at the right
183  positions. Therefore JaCoCo inserts probes
184</p>
185
186<ul>
187  <li>at every method exit (return or throws) and</li>
188  <li>at every edge where the target instruction is the target of more than one
189      edge.</li>
190</ul>
191
192<p>
193  We recall that a probe is simply a small sequence of additional instructions
194  that needs to be inserted at a control flow edge. The following table
195  illustrates how this extra instructions are added in case of different edge
196  types.
197</p>
198
199<table class="coverage">
200  <thead>
201    <tr>
202      <td>Type</td>
203      <td>Before</td>
204      <td>After</td>
205      <td>Remarks</td>
206    </tr>
207  </thead>
208  <tbody>
209    <tr>
210      <td>SEQUENCE</td>
211      <td><img src="resources/flow-sequence.png" alt="Sequence"/></td>
212      <td><img src="resources/flow-sequence-probe.png" alt="Sequence with Probe"/></td>
213      <td>
214        In case of a simple sequence the probe is simply inserted between the
215        two instructions.
216      </td>
217    </tr>
218    <tr>
219      <td>JUMP (unconditional)</td>
220      <td><img src="resources/flow-goto.png" alt="Unconditional Jump"/></td>
221      <td><img src="resources/flow-goto-probe.png" alt="Unconditional Jump with Probe"/></td>
222      <td>
223        As an unconditional jump is executed in any case, we can also insert the
224        probe just before the GOTO instruction.
225      </td>
226    </tr>
227    <tr>
228      <td>JUMP (conditional)</td>
229      <td><img src="resources/flow-cond.png" alt="Conditional Jump"/></td>
230      <td><img src="resources/flow-cond-probe.png" alt="Conditional Jump with Probe"/></td>
231      <td>
232        Adding a probe to an conditional jump is little bit more tricky. We
233        invert the semantic of the opcode and add the probe right after the
234        conditional jump. With a subsequent <code>GOTO</code> instruction we
235        jump to the original target. Note that this approach will not introduce
236        a backward jump, which would cause trouble with the Java verifier if we
237        have an uninitialized object on the stack.
238      </td>
239    </tr>
240    <tr>
241      <td>EXIT</td>
242      <td><img src="resources/flow-exit.png" alt="Exit"/></td>
243      <td><img src="resources/flow-exit-probe.png" alt="Exit with Probe"/></td>
244      <td>
245        As is is the nature of RETURN and THROW statements to actually leave the
246        method we add the probe right before these statements.
247      </td>
248    </tr>
249  </tbody>
250</table>
251
252<p>
253  Now let's see how this rules apply to the example snippet above. We see that
254  <code>INVOKE d()</code> instruction is the only node with more than one
255  incoming edge. So we need to place probes on those edges and another probe on
256  the only exit node. The result is shown the the right box of the diagram
257  above.
258</p>
259
260<h2>Additional Probes Between Lines</h2>
261
262<p>
263  The probe insertion strategy described so far does not consider implicit
264  exceptions thrown for example from invoked methods. If the control flow
265  between two probes is interrupted by a exception not explicitly created with
266  a <code>throw</code> statement all instruction in between are considered as
267  not covered. This leads to unexpected results especially when the the block of
268  instructions spans multiple lines of source code.
269</p>
270
271<p>
272  Therefore JaCoCo adds an additional probe between the instructions of two
273  lines whenever the subsequent line contains at least one method invocation.
274  This limits the effect of implicit exceptions from method invocations to
275  single lines of source. The approach only works for class files compiled with
276  debug information (line numbers) and does not consider implicit exceptions
277  from other instructions than method invocations (e.g.
278  <code>NullPointerException</code> or <code>ArrayIndexOutOfBoundsException</code>).
279</p>
280
281<h2>Probe Implementation</h2>
282
283<p>
284  Code coverage analysis is a runtime metric that provides execution details
285  of the software under test. This requires detailed recording about the
286  instructions (instruction coverage) that have been executed. For branch
287  coverage also the outcome of decisions has to be recorded. In any case
288  execution data is collected by so called probes:
289</p>
290
291<p class="hint">
292  A <b>probe</b> is a sequence of bytecode instructions that can be inserted
293  into a Java method. When the probe is executed, this fact is recorded and can
294  be reported by the coverage runtime. The probe must not change the behavior
295  of the original code.
296</p>
297
298<p>
299  The only purpose of the probe is to record that it has been executed at least
300  once. The probe does not record the number of times it has been called or
301  collect any timing information. The latter is out of scope for code coverage
302  analysis and more in the objective of a performance analysis tool. Typically
303  multiple probes needs to be inserted into each method, therefore probes needs
304  to be identified. Also the probe implementation and the storage mechanism it
305  depends on needs to be thread safe as multi-threaded execution is a common
306  scenario for java applications (albeit not for plain unit tests). Probes must
307  not have any side effects on the original code of the method. Also they should
308  add minimal overhead.
309</p>
310
311<p>
312  So to summarize the requirements for execution probes:
313</p>
314
315<ul>
316  <li>Record execution</li>
317  <li>Identification for different probes</li>
318  <li>Thread safe</li>
319  <li>No side effects on application code</li>
320  <li>Minimal runtime overhead</li>
321</ul>
322
323<p>
324  JaCoCo implements probes with a <code>boolean[]</code> array instance per
325  class. Each probe corresponds to a entry in this array. Whenever the probe is
326  executed the entry is set to <code>true</code> with the following four
327  bytecode instructions:
328</p>
329
330<pre class="source">
331ALOAD    probearray
332xPUSH    probeid
333ICONST_1
334BASTORE
335</pre>
336
337<p>
338  Note that this probe code is thread safe, does not modify the operand stack
339  or modify local variables and is also thread safe. It does also not leave the
340  method though an external call. The only prerequisite is that the probe array
341  is available as a local variable. For this at the beginning of each method
342  additional instrumentation code needs to be added to obtain the array instance
343  associated with the belonging class. To avoid code duplication the
344  initialization is delegated to a static private method
345  <code>$jacocoinit()</code> which is added to every non-interface class.
346</p>
347
348<p>
349  The size of the probe code above depends on the position of the probe array
350  variable and the value of the probe identifier as different opcodes can be
351  used. As calculated in the table below the overhead per probe ranges between 4
352  and 7 bytes of additional bytecode:
353</p>
354
355<table class="coverage">
356  <thead>
357    <tr>
358      <td>Possible Opcodes</td>
359      <td>Min. Size [bytes]</td>
360      <td>Max. Size [bytes]</td>
361    </tr>
362  </thead>
363  <tfoot>
364    <tr>
365      <td>Total:</td>
366      <td>4</td>
367      <td>7</td>
368    </tr>
369  </tfoot>
370  <tbody>
371    <tr>
372      <td><code>ALOAD_x</code>, <code>ALOAD</code> <sup>1</sup></td>
373      <td>1</td>
374      <td>2</td>
375    </tr>
376    <tr>
377      <td><code>ICONST_x</code>, <code>BIPUSH</code>, <code>SIPUSH</code>, <code>LDC</code>, <code>LDC_W</code> <sup>2</sup></td>
378      <td>1</td>
379      <td>3</td>
380    </tr>
381    <tr>
382      <td><code>ICONST_1</code></td>
383      <td>1</td>
384      <td>1</td>
385    </tr>
386    <tr>
387      <td><code>BASTORE</code></td>
388      <td>1</td>
389      <td>1</td>
390    </tr>
391  </tbody>
392</table>
393
394<p>
395  <sup>1</sup> The probe array is the first variable after the arguments.
396  If the method arguments do not consume more that 3 slots the 1-byte opcode can
397  be used.<br/>
398  <sup>2</sup> 1-byte opcodes for ids 0 to 5, 2-byte opcode for ids up to 127,
399  3-byte opcode for ids up to 32767. Ids values of 32768 or more require an
400  additional constant pool entry. For normal class files it is very unlikely to
401  require more than 32,000 probes.
402</p>
403
404<h2>Performance</h2>
405
406<p>
407  The control flow analysis and probe insertion strategy described in this
408  document allows to efficiently record instruction and branch coverage. In
409  total classes instrumented with JaCoCo increase their size by about 30%. Due
410  to the fact that probe execution does not require any method calls, only local
411  instructions, the observed execution time overhead for instrumented
412  applications typically is less than 10%.
413</p>
414
415<h2>References</h2>
416
417<ul>
418  <li><a href="http://asm.objectweb.org/">ASM byte code library</a> by Eric Bruneton at al.</li>
419  <li><a href="http://andrei.gmxhome.de/bytecode/index.html">Bytecode Outline Plug-In</a> by Andrei Loskutov</li>
420  <li><a href="http://en.wikipedia.org/wiki/Glossary_of_graph_theory">Wikipedia: Glossary of Graph Theory</a></li>
421</ul>
422
423</div>
424<div class="footer">
425  <div class="right"><a href="@jacoco.home.url@">JaCoCo</a> @qualified.bundle.version@</div>
426  <a href="license.html">Copyright</a> &copy; @copyright.years@ Mountainminds GmbH &amp; Co. KG and Contributors
427</div>
428
429</body>
430</html>
431