1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 4<title>11.�SGCheck: an experimental stack and global array overrun detector</title> 5<link rel="stylesheet" type="text/css" href="vg_basic.css"> 6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7<link rel="home" href="index.html" title="Valgrind Documentation"> 8<link rel="up" href="manual.html" title="Valgrind User Manual"> 9<link rel="prev" href="dh-manual.html" title="10.�DHAT: a dynamic heap analysis tool"> 10<link rel="next" href="bbv-manual.html" title="12.�BBV: an experimental basic block vector generation tool"> 11</head> 12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13<div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr> 14<td width="22px" align="center" valign="middle"><a accesskey="p" href="dh-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td> 15<td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td> 16<td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td> 17<th align="center" valign="middle">Valgrind User Manual</th> 18<td width="22px" align="center" valign="middle"><a accesskey="n" href="bbv-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td> 19</tr></table></div> 20<div class="chapter"> 21<div class="titlepage"><div><div><h1 class="title"> 22<a name="sg-manual"></a>11.�SGCheck: an experimental stack and global array overrun detector</h1></div></div></div> 23<div class="toc"> 24<p><b>Table of Contents</b></p> 25<dl class="toc"> 26<dt><span class="sect1"><a href="sg-manual.html#sg-manual.overview">11.1. Overview</a></span></dt> 27<dt><span class="sect1"><a href="sg-manual.html#sg-manual.options">11.2. SGCheck Command-line Options</a></span></dt> 28<dt><span class="sect1"><a href="sg-manual.html#sg-manual.how-works.sg-checks">11.3. How SGCheck Works</a></span></dt> 29<dt><span class="sect1"><a href="sg-manual.html#sg-manual.cmp-w-memcheck">11.4. Comparison with Memcheck</a></span></dt> 30<dt><span class="sect1"><a href="sg-manual.html#sg-manual.limitations">11.5. Limitations</a></span></dt> 31<dt><span class="sect1"><a href="sg-manual.html#sg-manual.todo-user-visible">11.6. Still To Do: User-visible Functionality</a></span></dt> 32<dt><span class="sect1"><a href="sg-manual.html#sg-manual.todo-implementation">11.7. Still To Do: Implementation Tidying</a></span></dt> 33</dl> 34</div> 35<p>To use this tool, you must specify 36<code class="option">--tool=exp-sgcheck</code> on the Valgrind 37command line.</p> 38<div class="sect1"> 39<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 40<a name="sg-manual.overview"></a>11.1.�Overview</h2></div></div></div> 41<p>SGCheck is a tool for finding overruns of stack and global 42arrays. It works by using a heuristic approach derived from an 43observation about the likely forms of stack and global array accesses. 44</p> 45</div> 46<div class="sect1"> 47<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 48<a name="sg-manual.options"></a>11.2.�SGCheck Command-line Options</h2></div></div></div> 49<p><a name="sg.opts.list"></a>There are no SGCheck-specific command-line options at present.</p> 50</div> 51<div class="sect1"> 52<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 53<a name="sg-manual.how-works.sg-checks"></a>11.3.�How SGCheck Works</h2></div></div></div> 54<p>When a source file is compiled 55with <code class="option">-g</code>, the compiler attaches DWARF3 56debugging information which describes the location of all stack and 57global arrays in the file.</p> 58<p>Checking of accesses to such arrays would then be relatively 59simple, if the compiler could also tell us which array (if any) each 60memory referencing instruction was supposed to access. Unfortunately 61the DWARF3 debugging format does not provide a way to represent such 62information, so we have to resort to a heuristic technique to 63approximate it. The key observation is that 64 <span class="emphasis"><em> 65 if a memory referencing instruction accesses inside a stack or 66 global array once, then it is highly likely to always access that 67 same array</em></span>.</p> 68<p>To see how this might be useful, consider the following buggy 69fragment:</p> 70<pre class="programlisting"> 71 { int i, a[10]; // both are auto vars 72 for (i = 0; i <= 10; i++) 73 a[i] = 42; 74 } 75</pre> 76<p>At run time we will know the precise address 77of <code class="computeroutput">a[]</code> on the stack, and so we can 78observe that the first store resulting from <code class="computeroutput">a[i] = 7942</code> writes <code class="computeroutput">a[]</code>, and 80we will (correctly) assume that that instruction is intended always to 81access <code class="computeroutput">a[]</code>. Then, on the 11th 82iteration, it accesses somewhere else, possibly a different local, 83possibly an un-accounted for area of the stack (eg, spill slot), so 84SGCheck reports an error.</p> 85<p>There is an important caveat.</p> 86<p>Imagine a function such as <code class="function">memcpy</code>, which is used 87to read and write many different areas of memory over the lifetime of the 88program. If we insist that the read and write instructions in its memory 89copying loop only ever access one particular stack or global variable, we 90will be flooded with errors resulting from calls to 91<code class="function">memcpy</code>.</p> 92<p>To avoid this problem, SGCheck instantiates fresh likely-target 93records for each entry to a function, and discards them on exit. This 94allows detection of cases where (e.g.) <code class="function">memcpy</code> 95overflows its source or destination buffers for any specific call, but 96does not carry any restriction from one call to the next. Indeed, 97multiple threads may make multiple simultaneous calls to 98(e.g.) <code class="function">memcpy</code> without mutual interference.</p> 99<p>It is important to note that the association is done between 100 a <span class="emphasis"><em>binary instruction</em></span> and an array, the 101 <span class="emphasis"><em>first time</em></span> this binary instruction accesses an 102 array during a function call. When the same instruction is executed 103 again during the same function call, then SGCheck might report a 104 problem, if these further executions are not accessing the same 105 array. This technique causes several limitations in SGCheck, see 106 <a class="xref" href="sg-manual.html#sg-manual.limitations" title="11.5.�Limitations">Limitations</a>. 107</p> 108</div> 109<div class="sect1"> 110<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 111<a name="sg-manual.cmp-w-memcheck"></a>11.4.�Comparison with Memcheck</h2></div></div></div> 112<p>SGCheck and Memcheck are complementary: their capabilities do 113not overlap. Memcheck performs bounds checks and use-after-free 114checks for heap arrays. It also finds uses of uninitialised values 115created by heap or stack allocations. But it does not perform bounds 116checking for stack or global arrays.</p> 117<p>SGCheck, on the other hand, does do bounds checking for stack or 118global arrays, but it doesn't do anything else.</p> 119</div> 120<div class="sect1"> 121<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 122<a name="sg-manual.limitations"></a>11.5.�Limitations</h2></div></div></div> 123<p>This is an experimental tool, which relies rather too heavily on some 124not-as-robust-as-I-would-like assumptions on the behaviour of correct 125programs. There are a number of limitations which you should be aware 126of.</p> 127<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 128<li class="listitem"> 129<p>False negatives (missed errors): it follows from the 130 description above (<a class="xref" href="sg-manual.html#sg-manual.how-works.sg-checks" title="11.3.�How SGCheck Works">How SGCheck Works</a>) 131 that the first access by a memory referencing instruction to a 132 stack or global array creates an association between that 133 instruction and the array, which is checked on subsequent accesses 134 by that instruction, until the containing function exits. Hence, 135 the first access by an instruction to an array (in any given 136 function instantiation) is not checked for overrun, since SGCheck 137 uses that as the "example" of how subsequent accesses should 138 behave.</p> 139<p>It also means that errors will not be found in an instruction 140 executed only once (e.g. because this instruction is not in a loop, 141 or the loop is executed only once).</p> 142</li> 143<li class="listitem"> 144<p>False positives (false errors): similarly, and more serious, 145 it is clearly possible to write legitimate pieces of code which 146 break the basic assumption upon which the checking algorithm 147 depends. For example:</p> 148<pre class="programlisting"> 149 { int a[10], b[10], *p, i; 150 for (i = 0; i < 10; i++) { 151 p = /* arbitrary condition */ ? &a[i] : &b[i]; 152 *p = 42; 153 } 154 } 155</pre> 156<p>In this case the store sometimes 157 accesses <code class="computeroutput">a[]</code> and 158 sometimes <code class="computeroutput">b[]</code>, but in no cases is 159 the addressed array overrun. Nevertheless the change in target 160 will cause an error to be reported.</p> 161<p>It is hard to see how to get around this problem. The only 162 mitigating factor is that such constructions appear very rare, at 163 least judging from the results using the tool so far. Such a 164 construction appears only once in the Valgrind sources (running 165 Valgrind on Valgrind) and perhaps two or three times for a start 166 and exit of Firefox. The best that can be done is to suppress the 167 errors.</p> 168</li> 169<li class="listitem"><p>Performance: SGCheck has to read all of 170 the DWARF3 type and variable information on the executable and its 171 shared objects. This is computationally expensive and makes 172 startup quite slow. You can expect debuginfo reading time to be in 173 the region of a minute for an OpenOffice sized application, on a 174 2.4 GHz Core 2 machine. Reading this information also requires a 175 lot of memory. To make it viable, SGCheck goes to considerable 176 trouble to compress the in-memory representation of the DWARF3 177 data, which is why the process of reading it appears slow.</p></li> 178<li class="listitem"><p>Performance: SGCheck runs slower than Memcheck. This is 179 partly due to a lack of tuning, but partly due to algorithmic 180 difficulties. The 181 stack and global checks can sometimes require a number of range 182 checks per memory access, and these are difficult to short-circuit, 183 despite considerable efforts having been made. A 184 redesign and reimplementation could potentially make it much faster. 185 </p></li> 186<li class="listitem"> 187<p>Coverage: Stack and global checking is fragile. If a shared 188 object does not have debug information attached, then SGCheck will 189 not be able to determine the bounds of any stack or global arrays 190 defined within that shared object, and so will not be able to check 191 accesses to them. This is true even when those arrays are accessed 192 from some other shared object which was compiled with debug 193 info.</p> 194<p>At the moment SGCheck accepts objects lacking debuginfo 195 without comment. This is dangerous as it causes SGCheck to 196 silently skip stack and global checking for such objects. It would 197 be better to print a warning in such circumstances.</p> 198</li> 199<li class="listitem"><p>Coverage: SGCheck does not check whether the areas read 200 or written by system calls do overrun stack or global arrays. This 201 would be easy to add.</p></li> 202<li class="listitem"><p>Platforms: the stack/global checks won't work properly on 203 PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets. 204 That's because the stack and global checking requires tracking 205 function calls and exits reliably, and there's no obvious way to do 206 it on ABIs that use a link register for function returns. 207 </p></li> 208<li class="listitem"><p>Robustness: related to the previous point. Function 209 call/exit tracking for X86 and AMD64 is believed to work properly 210 even in the presence of longjmps within the same stack (although 211 this has not been tested). However, code which switches stacks is 212 likely to cause breakage/chaos.</p></li> 213</ul></div> 214</div> 215<div class="sect1"> 216<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 217<a name="sg-manual.todo-user-visible"></a>11.6.�Still To Do: User-visible Functionality</h2></div></div></div> 218<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 219<li class="listitem"><p>Extend system call checking to work on stack and global arrays.</p></li> 220<li class="listitem"><p>Print a warning if a shared object does not have debug info 221 attached, or if, for whatever reason, debug info could not be 222 found, or read.</p></li> 223<li class="listitem"><p>Add some heuristic filtering that removes obvious false 224 positives. This would be easy to do. For example, an access 225 transition from a heap to a stack object almost certainly isn't a 226 bug and so should not be reported to the user.</p></li> 227</ul></div> 228</div> 229<div class="sect1"> 230<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 231<a name="sg-manual.todo-implementation"></a>11.7.�Still To Do: Implementation Tidying</h2></div></div></div> 232<p>Items marked CRITICAL are considered important for correctness: 233non-fixage of them is liable to lead to crashes or assertion failures 234in real use.</p> 235<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 236<li class="listitem"><p> sg_main.c: Redesign and reimplement the basic checking 237 algorithm. It could be done much faster than it is -- the current 238 implementation isn't very good. 239 </p></li> 240<li class="listitem"><p> sg_main.c: Improve the performance of the stack / global 241 checks by doing some up-front filtering to ignore references in 242 areas which "obviously" can't be stack or globals. This will 243 require using information that m_aspacemgr knows about the address 244 space layout.</p></li> 245<li class="listitem"><p>sg_main.c: fix compute_II_hash to make it a bit more sensible 246 for ppc32/64 targets (except that sg_ doesn't work on ppc32/64 247 targets, so this is a bit academic at the moment).</p></li> 248</ul></div> 249</div> 250</div> 251<div> 252<br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer"> 253<tr> 254<td rowspan="2" width="40%" align="left"> 255<a accesskey="p" href="dh-manual.html"><<�10.�DHAT: a dynamic heap analysis tool</a>�</td> 256<td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td> 257<td rowspan="2" width="40%" align="right">�<a accesskey="n" href="bbv-manual.html">12.�BBV: an experimental basic block vector generation tool�>></a> 258</td> 259</tr> 260<tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr> 261</table> 262</div> 263</body> 264</html> 265