1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2          "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5  <title>Open Projects</title>
6  <link type="text/css" rel="stylesheet" href="menu.css">
7  <link type="text/css" rel="stylesheet" href="content.css">
8  <script type="text/javascript" src="scripts/menu.js"></script>
9</head>
10<body>
11
12<div id="page">
13<!--#include virtual="menu.html.incl"-->
14<div id="content">
15
16<h1>Open Projects</h1>
17
18<p>This page lists several projects that would boost analyzer's usability and
19power. Most of the projects listed here are infrastructure-related so this list
20is an addition to the <a href="potential_checkers.html">potential checkers
21list</a>. If you are interested in tackling one of these, please send an email
22to the <a href=https://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev
23mailing list</a> to notify other members of the community.</p>
24
25<ul>
26  <li>Release checkers from "alpha"
27    <p>New checkers which were contributed to the analyzer,
28    but have not passed a rigorous evaluation process,
29    are committed as "alpha checkers" (from "alpha version"),
30    and are not enabled by default.</p>
31
32    <p>Ideally, only the checkers which are actively being worked on should be in
33    "alpha",
34    but over the years the development of many of those has stalled.
35    Such checkers should either be improved
36    up to a point where they can be enabled by default,
37    or removed from the analyzer entirely.
38
39    <ul>
40      <li><code>alpha.security.ArrayBound</code> and
41      <code>alpha.security.ArrayBoundV2</code>
42      <p>Array bounds checking is a desired feature,
43      but having an acceptable rate of false positives might not be possible
44      without a proper
45      <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening</a> support.
46      Additionally, it might be more promising to perform index checking based on
47      <a href="https://en.wikipedia.org/wiki/Taint_checking">tainted</a> index values.
48      <p><i>(Difficulty: Medium)</i></p></p>
49      </li>
50
51      <li><code>alpha.unix.StreamChecker</code>
52        <p>A SimpleStreamChecker has been presented in the Building a Checker in 24
53        Hours talk
54        (<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
55        <a href="https://youtu.be/kdxlsP5QVPw">video</a>).</p>
56
57        <p>This alpha checker is an attempt to write a production grade stream checker.
58        However, it was found to have an unacceptably high false positive rate.
59        One of the found problems was that eagerly splitting the state
60        based on whether the system call may fail leads to too many reports.
61        A <em>delayed</em> split where the implication is stored in the state
62        (similarly to nullability implications in <code>TrustNonnullChecker</code>)
63        may produce much better results.</p>
64        <p><i>(Difficulty: Medium)</i></p>
65      </li>
66    </ul>
67  </li>
68
69  <li>Improve C++ support
70  <ul>
71    <li>Handle construction as part of aggregate initialization.
72      <p><a href="https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates</a>
73      are objects that can be brace-initialized without calling a
74      constructor (that is, <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html">
75      CXXConstructExpr</a></code> does not occur in the AST),
76      but potentially calling
77      constructors for their fields and base classes
78      These
79      constructors of sub-objects need to know what object they are constructing.
80      Moreover, if the aggregate contains
81      references, lifetime extension needs to be properly modeled.
82
83      One can start untangling this problem by trying to replace the
84      current ad-hoc <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html">
85      ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430">
86      <code>CXXConstructExpr::CK_NonVirtualBase</code></a> branch of
87      <code>ExprEngine::VisitCXXConstructExpr()</code>
88      with proper support for the feature.
89      <p><i>(Difficulty: Medium) </i></p></p>
90    </li>
91
92    <li>Handle array constructors.
93      <p>When an array of objects is allocated (say, using the
94         <code>operator new[]</code> or defining a stack array),
95         constructors for all elements of the array are called.
96         We should model (potentially some of) such evaluations,
97         and the same applies for destructors called from
98         <code>operator delete[]</code>.
99         See tests cases in <a href="https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/handle_constructors_with_new_array.cpp">handle_constructors_with_new_array.cpp</a>.
100      </p>
101      <p>
102      Constructing an array requires invoking multiple (potentially unknown)
103      amount of constructors with the same construct-expression.
104      Apart from the technical difficulties of juggling program points around
105      correctly to avoid accidentally merging paths together, we'll have to
106      be a judge on when to exit the loop and how to widen it.
107      Given that the constructor is going to be a default constructor,
108      a nice 95% solution might be to execute exactly one constructor and
109      then default-bind the resulting LazyCompoundVal to the whole array;
110      it'll work whenever the default constructor doesn't touch global state
111      but only initializes the object to various default values.
112      But if, say, we're making an array of strings,
113      depending on the implementation you might have to allocate a new buffer
114      for each string, and in this case default-binding won't cut it.
115      We might want to come up with an auxiliary analysis in order to perform
116      widening of these simple loops more precisely.
117      </p>
118    </li>
119
120    <li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO)
121      <p>Local variables which are returned by values on all return statements
122         may be stored directly at the address for the return value,
123         eliding the copy or move constructor call.
124         Such variables can be identified using the AST call <code>VarDecl::isNRVOVariable</code>.
125      </p>
126    </li>
127
128    <li>Handle constructors of lambda captures
129      <p>Variables which are captured by value into a lambda require a call to
130         a copy constructor.
131         This call is not currently modeled.
132      </p>
133    </li>
134
135    <li>Handle constructors for default arguments
136      <p>Default arguments in C++ are recomputed at every call,
137         and are therefore local, and not static, variables.
138         See tests cases in <a href="https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/handle_constructors_for_default_arguments.cpp">handle_constructors_for_default_arguments.cpp</a>.
139      </p>
140      <p>
141      Default arguments are annoying because the initializer expression is
142      evaluated at the call site but doesn't syntactically belong to the
143      caller's AST; instead it belongs to the ParmVarDecl for the default
144      parameter. This can lead to situations when the same expression has to
145      carry different values simultaneously -
146      when multiple instances of the same function are evaluated as part of the
147      same full-expression without specifying the default arguments.
148      Even simply calling the function twice (not necessarily within the
149      same full-expression) may lead to program points agglutinating because
150      it's the same expression. There are some nasty test cases already
151      in temporaries.cpp (struct DefaultParam and so on). I recommend adding a
152      new LocationContext kind specifically to deal with this problem. It'll
153      also help you figure out the construction context when you evaluate the
154      construct-expression (though you might still need to do some additional
155      CFG work to get construction contexts right).
156      </p>
157    </li>
158
159    <li>Enhance the modeling of the standard library.
160      <p>The analyzer needs a better understanding of STL in order to be more
161      useful on C++ codebases.
162      While full library modeling is not an easy task,
163      large gains can be achieved by supporting only a few cases:
164      e.g. calling <code>.length()</code> on an empty
165      <code>std::string</code> always yields zero.
166    <p><i>(Difficulty: Medium)</i></p><p>
167    </li>
168
169    <li>Enhance CFG to model exception-handling.
170      <p>Currently exceptions are treated as "black holes", and exception-handling
171      control structures are poorly modeled in order to be conservative.
172      This could be improved for both C++ and Objective-C exceptions.
173      <p><i>(Difficulty: Hard)</i></p></p>
174    </li>
175  </ul>
176  </li>
177
178  <li>Core Analyzer Infrastructure
179  <ul>
180    <li>Handle unions.
181      <p>Currently in the analyzer the value of a union is always regarded as
182      an unknown.
183      This problem was
184      previously <a href="https://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed</a>
185      on the mailing list, but no solution was implemented.
186      <p><i> (Difficulty: Medium) </i></p></p>
187    </li>
188
189    <li>Floating-point support.
190      <p>Currently, the analyzer treats all floating-point values as unknown.
191      This project would involve adding a new <code>SVal</code> kind
192      for constant floats, generalizing the constraint manager to handle floats,
193      and auditing existing code to make sure it doesn't
194      make incorrect assumptions (most notably, that <code>X == X</code>
195      is always true, since it does not hold for <code>NaN</code>).
196      <p><i> (Difficulty: Medium)</i></p></p>
197    </li>
198
199    <li>Improved loop execution modeling.
200      <p>The analyzer simply unrolls each loop <tt>N</tt> times before
201      dropping the path, for a fixed constant <tt>N</tt>.
202      However, that results in lost coverage in cases where the loop always
203      executes more than <tt>N</tt> times.
204      A Google Summer Of Code
205      <a href="https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project</a>
206      was completed to make the loop bound parameterizable,
207      but the <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">widening</a>
208      problem still remains open.
209
210      <p><i> (Difficulty: Hard)</i></p></p>
211    </li>
212
213    <li>Basic function summarization support
214      <p>The analyzer performs inter-procedural analysis using
215      either inlining or "conservative evaluation" (invalidating all data
216      passed to the function).
217      Often, a very simple summary
218      (e.g. "this function is <a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>") would be
219      enough to be a large improvement over conservative evaluation.
220      Such summaries could be obtained either syntactically,
221      or using a dataflow framework.
222      <p><i>(Difficulty: Hard)</i></p><p>
223    </li>
224
225    <li>Implement a dataflow flamework.
226      <p>The analyzer core
227      implements a <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a>
228      engine, which performs checks
229      (use-after-free, uninitialized value read, etc.)
230      over a <em>single</em> program path.
231      However, many useful properties
232      (dead code, check-after-use, etc.) require
233      reasoning over <em>all</em> possible in a program.
234      Such reasoning requires a
235      <a href="https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis</a> framework.
236      Clang already implements
237      a few dataflow analyses (most notably, liveness),
238      but they implemented in an ad-hoc fashion.
239      A proper framework would enable us writing many more useful checkers.
240      <p><i> (Difficulty: Hard) </i></p></p>
241    </li>
242
243    <li>Track type information through casts more precisely.
244      <p>The <code>DynamicTypePropagation</code>
245      checker is in charge of inferring a region's
246      dynamic type based on what operations the code is performing.
247      Casts are a rich source of type information that the analyzer currently ignores.
248      <p><i>(Difficulty: Medium)</i></p></p>
249    </li>
250
251  </ul>
252  </li>
253
254  <li>Fixing miscellaneous bugs
255    <p>Apart from the open projects listed above,
256       contributors are welcome to fix any of the outstanding
257       <a href="https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs</a>
258       in the Bugzilla.
259       <p><i>(Difficulty: Anything)</i></p></p>
260  </li>
261
262</ul>
263
264</div>
265</div>
266</body>
267</html>
268