1<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6
7<chapter id="sg-manual"
8         xreflabel="SGCheck: an experimental stack and global array overrun detector">
9  <title>SGCheck: an experimental stack and global array overrun detector</title>
10
11<para>To use this tool, you must specify
12<option>--tool=exp-sgcheck</option> on the Valgrind
13command line.</para>
14
15
16
17
18<sect1 id="sg-manual.overview" xreflabel="Overview">
19<title>Overview</title>
20
21<para>SGCheck is a tool for finding overruns of stack and global
22arrays.  It works by using a heuristic approach derived from an
23observation about the likely forms of stack and global array accesses.
24</para>
25
26</sect1>
27
28
29
30
31<sect1 id="sg-manual.options" xreflabel="SGCheck Command-line Options">
32<title>SGCheck Command-line Options</title>
33
34<para id="sg.opts.list">There are no SGCheck-specific command-line options at present.</para>
35<!--
36<para>SGCheck-specific command-line options are:</para>
37
38
39<variablelist id="sg.opts.list">
40</variablelist>
41-->
42
43</sect1>
44
45
46
47<sect1 id="sg-manual.how-works.sg-checks"
48       xreflabel="How SGCheck Works">
49<title>How SGCheck Works</title>
50
51<para>When a source file is compiled
52with <option>-g</option>, the compiler attaches DWARF3
53debugging information which describes the location of all stack and
54global arrays in the file.</para>
55
56<para>Checking of accesses to such arrays would then be relatively
57simple, if the compiler could also tell us which array (if any) each
58memory referencing instruction was supposed to access.  Unfortunately
59the DWARF3 debugging format does not provide a way to represent such
60information, so we have to resort to a heuristic technique to
61approximate it.  The key observation is that
62   <emphasis>
63   if a memory referencing instruction accesses inside a stack or
64   global array once, then it is highly likely to always access that
65   same array</emphasis>.</para>
66
67<para>To see how this might be useful, consider the following buggy
68fragment:</para>
69<programlisting><![CDATA[
70   { int i, a[10];  // both are auto vars
71     for (i = 0; i <= 10; i++)
72        a[i] = 42;
73   }
74]]></programlisting>
75
76<para>At run time we will know the precise address
77of <computeroutput>a[]</computeroutput> on the stack, and so we can
78observe that the first store resulting from <computeroutput>a[i] =
7942</computeroutput> writes <computeroutput>a[]</computeroutput>, and
80we will (correctly) assume that that instruction is intended always to
81access <computeroutput>a[]</computeroutput>.  Then, on the 11th
82iteration, it accesses somewhere else, possibly a different local,
83possibly an un-accounted for area of the stack (eg, spill slot), so
84SGCheck reports an error.</para>
85
86<para>There is an important caveat.</para>
87
88<para>Imagine a function such as <function>memcpy</function>, which is used
89to read and write many different areas of memory over the lifetime of the
90program.  If we insist that the read and write instructions in its memory
91copying loop only ever access one particular stack or global variable, we
92will be flooded with errors resulting from calls to
93<function>memcpy</function>.</para>
94
95<para>To avoid this problem, SGCheck instantiates fresh likely-target
96records for each entry to a function, and discards them on exit.  This
97allows detection of cases where (e.g.) <function>memcpy</function>
98overflows its source or destination buffers for any specific call, but
99does not carry any restriction from one call to the next.  Indeed,
100multiple threads may make multiple simultaneous calls to
101(e.g.) <function>memcpy</function> without mutual interference.</para>
102
103<para>It is important to note that the association is done between
104  a <emphasis>binary instruction</emphasis> and an array, the
105  <emphasis>first time</emphasis> this binary instruction accesses an
106  array during a function call.  When the same instruction is executed
107  again during the same function call, then SGCheck might report a
108  problem, if these further executions are not accessing the same
109  array. This technique causes several limitations in SGCheck, see
110  <xref linkend="sg-manual.limitations"/>.
111</para>
112</sect1>
113
114
115
116<sect1 id="sg-manual.cmp-w-memcheck"
117       xreflabel="Comparison with Memcheck">
118<title>Comparison with Memcheck</title>
119
120<para>SGCheck and Memcheck are complementary: their capabilities do
121not overlap.  Memcheck performs bounds checks and use-after-free
122checks for heap arrays.  It also finds uses of uninitialised values
123created by heap or stack allocations.  But it does not perform bounds
124checking for stack or global arrays.</para>
125
126<para>SGCheck, on the other hand, does do bounds checking for stack or
127global arrays, but it doesn't do anything else.</para>
128
129</sect1>
130
131
132
133
134
135<sect1 id="sg-manual.limitations"
136       xreflabel="Limitations">
137<title>Limitations</title>
138
139<para>This is an experimental tool, which relies rather too heavily on some
140not-as-robust-as-I-would-like assumptions on the behaviour of correct
141programs.  There are a number of limitations which you should be aware
142of.</para>
143
144<itemizedlist>
145
146  <listitem>
147   <para>False negatives (missed errors): it follows from the
148   description above (<xref linkend="sg-manual.how-works.sg-checks"/>)
149   that the first access by a memory referencing instruction to a
150   stack or global array creates an association between that
151   instruction and the array, which is checked on subsequent accesses
152   by that instruction, until the containing function exits.  Hence,
153   the first access by an instruction to an array (in any given
154   function instantiation) is not checked for overrun, since SGCheck
155   uses that as the "example" of how subsequent accesses should
156   behave.</para>
157   <para>It also means that errors will not be found in an instruction
158     executed only once (e.g. because this instruction is not in a loop,
159     or the loop is executed only once).</para>
160  </listitem>
161
162  <listitem>
163   <para>False positives (false errors): similarly, and more serious,
164   it is clearly possible to write legitimate pieces of code which
165   break the basic assumption upon which the checking algorithm
166   depends.  For example:</para>
167
168<programlisting><![CDATA[
169  { int a[10], b[10], *p, i;
170    for (i = 0; i < 10; i++) {
171       p = /* arbitrary condition */  ? &a[i]  : &b[i];
172       *p = 42;
173    }
174  }
175]]></programlisting>
176
177   <para>In this case the store sometimes
178   accesses <computeroutput>a[]</computeroutput> and
179   sometimes <computeroutput>b[]</computeroutput>, but in no cases is
180   the addressed array overrun.  Nevertheless the change in target
181   will cause an error to be reported.</para>
182
183   <para>It is hard to see how to get around this problem.  The only
184   mitigating factor is that such constructions appear very rare, at
185   least judging from the results using the tool so far.  Such a
186   construction appears only once in the Valgrind sources (running
187   Valgrind on Valgrind) and perhaps two or three times for a start
188   and exit of Firefox.  The best that can be done is to suppress the
189   errors.</para>
190  </listitem>
191
192  <listitem>
193   <para>Performance: SGCheck has to read all of
194   the DWARF3 type and variable information on the executable and its
195   shared objects.  This is computationally expensive and makes
196   startup quite slow.  You can expect debuginfo reading time to be in
197   the region of a minute for an OpenOffice sized application, on a
198   2.4 GHz Core 2 machine.  Reading this information also requires a
199   lot of memory.  To make it viable, SGCheck goes to considerable
200   trouble to compress the in-memory representation of the DWARF3
201   data, which is why the process of reading it appears slow.</para>
202  </listitem>
203
204  <listitem>
205   <para>Performance: SGCheck runs slower than Memcheck.  This is
206   partly due to a lack of tuning, but partly due to algorithmic
207   difficulties.  The
208   stack and global checks can sometimes require a number of range
209   checks per memory access, and these are difficult to short-circuit,
210   despite considerable efforts having been made.  A
211   redesign and reimplementation could potentially make it much faster.
212   </para>
213  </listitem>
214
215  <listitem>
216   <para>Coverage: Stack and global checking is fragile.  If a shared
217   object does not have debug information attached, then SGCheck will
218   not be able to determine the bounds of any stack or global arrays
219   defined within that shared object, and so will not be able to check
220   accesses to them.  This is true even when those arrays are accessed
221   from some other shared object which was compiled with debug
222   info.</para>
223
224   <para>At the moment SGCheck accepts objects lacking debuginfo
225   without comment.  This is dangerous as it causes SGCheck to
226   silently skip stack and global checking for such objects.  It would
227   be better to print a warning in such circumstances.</para>
228  </listitem>
229
230  <listitem>
231   <para>Coverage: SGCheck does not check whether the areas read
232   or written by system calls do overrun stack or global arrays.  This
233   would be easy to add.</para>
234  </listitem>
235
236  <listitem>
237   <para>Platforms: the stack/global checks won't work properly on
238   PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets.
239   That's because the stack and global checking requires tracking
240   function calls and exits reliably, and there's no obvious way to do
241   it on ABIs that use a link register for function returns.
242   </para>
243  </listitem>
244
245  <listitem>
246   <para>Robustness: related to the previous point.  Function
247   call/exit tracking for X86 and AMD64 is believed to work properly
248   even in the presence of longjmps within the same stack (although
249   this has not been tested).  However, code which switches stacks is
250   likely to cause breakage/chaos.</para>
251  </listitem>
252</itemizedlist>
253
254</sect1>
255
256
257
258
259
260<sect1 id="sg-manual.todo-user-visible"
261       xreflabel="Still To Do: User-visible Functionality">
262<title>Still To Do: User-visible Functionality</title>
263
264<itemizedlist>
265
266  <listitem>
267   <para>Extend system call checking to work on stack and global arrays.</para>
268  </listitem>
269
270  <listitem>
271   <para>Print a warning if a shared object does not have debug info
272   attached, or if, for whatever reason, debug info could not be
273   found, or read.</para>
274  </listitem>
275
276  <listitem>
277   <para>Add some heuristic filtering that removes obvious false
278     positives.  This would be easy to do.  For example, an access
279     transition from a heap to a stack object almost certainly isn't a
280     bug and so should not be reported to the user.</para>
281  </listitem>
282
283</itemizedlist>
284
285</sect1>
286
287
288
289
290<sect1 id="sg-manual.todo-implementation"
291       xreflabel="Still To Do: Implementation Tidying">
292<title>Still To Do: Implementation Tidying</title>
293
294<para>Items marked CRITICAL are considered important for correctness:
295non-fixage of them is liable to lead to crashes or assertion failures
296in real use.</para>
297
298<itemizedlist>
299
300  <listitem>
301   <para> sg_main.c: Redesign and reimplement the basic checking
302   algorithm.  It could be done much faster than it is -- the current
303   implementation isn't very good.
304   </para>
305  </listitem>
306
307  <listitem>
308   <para> sg_main.c: Improve the performance of the stack / global
309   checks by doing some up-front filtering to ignore references in
310   areas which "obviously" can't be stack or globals.  This will
311   require using information that m_aspacemgr knows about the address
312   space layout.</para>
313  </listitem>
314
315  <listitem>
316   <para>sg_main.c: fix compute_II_hash to make it a bit more sensible
317   for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
318   targets, so this is a bit academic at the moment).</para>
319  </listitem>
320
321</itemizedlist>
322
323</sect1>
324
325
326
327</chapter>
328