1<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6
7<chapter id="sg-manual"
8         xreflabel="SGCheck: an experimental stack and global array overrun detector">
9  <title>SGCheck: an experimental stack and global array overrun detector</title>
10
11<para>To use this tool, you must specify
12<option>--tool=exp-sgcheck</option> on the Valgrind
13command line.</para>
14
15
16
17
18<sect1 id="sg-manual.overview" xreflabel="Overview">
19<title>Overview</title>
20
21<para>SGCheck is a tool for finding overruns of stack and global
22arrays.  It works by using a heuristic approach derived from an
23observation about the likely forms of stack and global array accesses.
24</para>
25
26</sect1>
27
28
29
30
31<sect1 id="sg-manual.options" xreflabel="SGCheck Command-line Options">
32<title>SGCheck Command-line Options</title>
33
34<para id="sg.opts.list">There are no SGCheck-specific command-line options at present.</para>
35<!--
36<para>SGCheck-specific command-line options are:</para>
37
38
39<variablelist id="sg.opts.list">
40</variablelist>
41-->
42
43</sect1>
44
45
46
47<sect1 id="sg-manual.how-works.sg-checks"
48       xreflabel="How SGCheck Works">
49<title>How SGCheck Works</title>
50
51<para>When a source file is compiled
52with <option>-g</option>, the compiler attaches DWARF3
53debugging information which describes the location of all stack and
54global arrays in the file.</para>
55
56<para>Checking of accesses to such arrays would then be relatively
57simple, if the compiler could also tell us which array (if any) each
58memory referencing instruction was supposed to access.  Unfortunately
59the DWARF3 debugging format does not provide a way to represent such
60information, so we have to resort to a heuristic technique to
61approximate it.  The key observation is that
62   <emphasis>
63   if a memory referencing instruction accesses inside a stack or
64   global array once, then it is highly likely to always access that
65   same array</emphasis>.</para>
66
67<para>To see how this might be useful, consider the following buggy
68fragment:</para>
69<programlisting><![CDATA[
70   { int i, a[10];  // both are auto vars
71     for (i = 0; i <= 10; i++)
72        a[i] = 42;
73   }
74]]></programlisting>
75
76<para>At run time we will know the precise address
77of <computeroutput>a[]</computeroutput> on the stack, and so we can
78observe that the first store resulting from <computeroutput>a[i] =
7942</computeroutput> writes <computeroutput>a[]</computeroutput>, and
80we will (correctly) assume that that instruction is intended always to
81access <computeroutput>a[]</computeroutput>.  Then, on the 11th
82iteration, it accesses somewhere else, possibly a different local,
83possibly an un-accounted for area of the stack (eg, spill slot), so
84SGCheck reports an error.</para>
85
86<para>There is an important caveat.</para>
87
88<para>Imagine a function such as <function>memcpy</function>, which is used
89to read and write many different areas of memory over the lifetime of the
90program.  If we insist that the read and write instructions in its memory
91copying loop only ever access one particular stack or global variable, we
92will be flooded with errors resulting from calls to
93<function>memcpy</function>.</para>
94
95<para>To avoid this problem, SGCheck instantiates fresh likely-target
96records for each entry to a function, and discards them on exit.  This
97allows detection of cases where (e.g.) <function>memcpy</function>
98overflows its source or destination buffers for any specific call, but
99does not carry any restriction from one call to the next.  Indeed,
100multiple threads may make multiple simultaneous calls to
101(e.g.) <function>memcpy</function> without mutual interference.</para>
102
103</sect1>
104
105
106
107
108<sect1 id="sg-manual.cmp-w-memcheck"
109       xreflabel="Comparison with Memcheck">
110<title>Comparison with Memcheck</title>
111
112<para>SGCheck and Memcheck are complementary: their capabilities do
113not overlap.  Memcheck performs bounds checks and use-after-free
114checks for heap arrays.  It also finds uses of uninitialised values
115created by heap or stack allocations.  But it does not perform bounds
116checking for stack or global arrays.</para>
117
118<para>SGCheck, on the other hand, does do bounds checking for stack or
119global arrays, but it doesn't do anything else.</para>
120
121</sect1>
122
123
124
125
126
127<sect1 id="sg-manual.limitations"
128       xreflabel="Limitations">
129<title>Limitations</title>
130
131<para>This is an experimental tool, which relies rather too heavily on some
132not-as-robust-as-I-would-like assumptions on the behaviour of correct
133programs.  There are a number of limitations which you should be aware
134of.</para>
135
136<itemizedlist>
137
138  <listitem>
139   <para>False negatives (missed errors): it follows from the
140   description above (<xref linkend="sg-manual.how-works.sg-checks"/>)
141   that the first access by a memory referencing instruction to a
142   stack or global array creates an association between that
143   instruction and the array, which is checked on subsequent accesses
144   by that instruction, until the containing function exits.  Hence,
145   the first access by an instruction to an array (in any given
146   function instantiation) is not checked for overrun, since SGCheck
147   uses that as the "example" of how subsequent accesses should
148   behave.</para>
149  </listitem>
150
151  <listitem>
152   <para>False positives (false errors): similarly, and more serious,
153   it is clearly possible to write legitimate pieces of code which
154   break the basic assumption upon which the checking algorithm
155   depends.  For example:</para>
156
157<programlisting><![CDATA[
158  { int a[10], b[10], *p, i;
159    for (i = 0; i < 10; i++) {
160       p = /* arbitrary condition */  ? &a[i]  : &b[i];
161       *p = 42;
162    }
163  }
164]]></programlisting>
165
166   <para>In this case the store sometimes
167   accesses <computeroutput>a[]</computeroutput> and
168   sometimes <computeroutput>b[]</computeroutput>, but in no cases is
169   the addressed array overrun.  Nevertheless the change in target
170   will cause an error to be reported.</para>
171
172   <para>It is hard to see how to get around this problem.  The only
173   mitigating factor is that such constructions appear very rare, at
174   least judging from the results using the tool so far.  Such a
175   construction appears only once in the Valgrind sources (running
176   Valgrind on Valgrind) and perhaps two or three times for a start
177   and exit of Firefox.  The best that can be done is to suppress the
178   errors.</para>
179  </listitem>
180
181  <listitem>
182   <para>Performance: SGCheck has to read all of
183   the DWARF3 type and variable information on the executable and its
184   shared objects.  This is computationally expensive and makes
185   startup quite slow.  You can expect debuginfo reading time to be in
186   the region of a minute for an OpenOffice sized application, on a
187   2.4 GHz Core 2 machine.  Reading this information also requires a
188   lot of memory.  To make it viable, SGCheck goes to considerable
189   trouble to compress the in-memory representation of the DWARF3
190   data, which is why the process of reading it appears slow.</para>
191  </listitem>
192
193  <listitem>
194   <para>Performance: SGCheck runs slower than Memcheck.  This is
195   partly due to a lack of tuning, but partly due to algorithmic
196   difficulties.  The
197   stack and global checks can sometimes require a number of range
198   checks per memory access, and these are difficult to short-circuit,
199   despite considerable efforts having been made.  A
200   redesign and reimplementation could potentially make it much faster.
201   </para>
202  </listitem>
203
204  <listitem>
205   <para>Coverage: Stack and global checking is fragile.  If a shared
206   object does not have debug information attached, then SGCheck will
207   not be able to determine the bounds of any stack or global arrays
208   defined within that shared object, and so will not be able to check
209   accesses to them.  This is true even when those arrays are accessed
210   from some other shared object which was compiled with debug
211   info.</para>
212
213   <para>At the moment SGCheck accepts objects lacking debuginfo
214   without comment.  This is dangerous as it causes SGCheck to
215   silently skip stack and global checking for such objects.  It would
216   be better to print a warning in such circumstances.</para>
217  </listitem>
218
219  <listitem>
220   <para>Coverage: SGCheck does not check whether the areas read
221   or written by system calls do overrun stack or global arrays.  This
222   would be easy to add.</para>
223  </listitem>
224
225  <listitem>
226   <para>Platforms: the stack/global checks won't work properly on
227   PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets.
228   That's because the stack and global checking requires tracking
229   function calls and exits reliably, and there's no obvious way to do
230   it on ABIs that use a link register for function returns.
231   </para>
232  </listitem>
233
234  <listitem>
235   <para>Robustness: related to the previous point.  Function
236   call/exit tracking for X86 and AMD64 is believed to work properly
237   even in the presence of longjmps within the same stack (although
238   this has not been tested).  However, code which switches stacks is
239   likely to cause breakage/chaos.</para>
240  </listitem>
241</itemizedlist>
242
243</sect1>
244
245
246
247
248
249<sect1 id="sg-manual.todo-user-visible"
250       xreflabel="Still To Do: User-visible Functionality">
251<title>Still To Do: User-visible Functionality</title>
252
253<itemizedlist>
254
255  <listitem>
256   <para>Extend system call checking to work on stack and global arrays.</para>
257  </listitem>
258
259  <listitem>
260   <para>Print a warning if a shared object does not have debug info
261   attached, or if, for whatever reason, debug info could not be
262   found, or read.</para>
263  </listitem>
264
265  <listitem>
266   <para>Add some heuristic filtering that removes obvious false
267     positives.  This would be easy to do.  For example, an access
268     transition from a heap to a stack object almost certainly isn't a
269     bug and so should not be reported to the user.</para>
270  </listitem>
271
272</itemizedlist>
273
274</sect1>
275
276
277
278
279<sect1 id="sg-manual.todo-implementation"
280       xreflabel="Still To Do: Implementation Tidying">
281<title>Still To Do: Implementation Tidying</title>
282
283<para>Items marked CRITICAL are considered important for correctness:
284non-fixage of them is liable to lead to crashes or assertion failures
285in real use.</para>
286
287<itemizedlist>
288
289  <listitem>
290   <para> sg_main.c: Redesign and reimplement the basic checking
291   algorithm.  It could be done much faster than it is -- the current
292   implementation isn't very good.
293   </para>
294  </listitem>
295
296  <listitem>
297   <para> sg_main.c: Improve the performance of the stack / global
298   checks by doing some up-front filtering to ignore references in
299   areas which "obviously" can't be stack or globals.  This will
300   require using information that m_aspacemgr knows about the address
301   space layout.</para>
302  </listitem>
303
304  <listitem>
305   <para>sg_main.c: fix compute_II_hash to make it a bit more sensible
306   for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
307   targets, so this is a bit academic at the moment).</para>
308  </listitem>
309
310</itemizedlist>
311
312</sect1>
313
314
315
316</chapter>
317