1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3<html> 4<head> 5 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 6 <title>Clang - Features and Goals</title> 7 <link type="text/css" rel="stylesheet" href="menu.css"> 8 <link type="text/css" rel="stylesheet" href="content.css"> 9 <style type="text/css"> 10</style> 11</head> 12<body> 13 14<!--#include virtual="menu.html.incl"--> 15 16<div id="content"> 17 18<!--*************************************************************************--> 19<h1>Clang - Features and Goals</h1> 20<!--*************************************************************************--> 21 22<p> 23This page describes the <a href="index.html#goals">features and goals</a> of 24Clang in more detail and gives a more broad explanation about what we mean. 25These features are: 26</p> 27 28<p>End-User Features:</p> 29 30<ul> 31<li><a href="#performance">Fast compiles and low memory use</a></li> 32<li><a href="#expressivediags">Expressive diagnostics</a></li> 33<li><a href="#gcccompat">GCC compatibility</a></li> 34</ul> 35 36<p>Utility and Applications:</p> 37 38<ul> 39<li><a href="#libraryarch">Library based architecture</a></li> 40<li><a href="#diverseclients">Support diverse clients</a></li> 41<li><a href="#ideintegration">Integration with IDEs</a></li> 42<li><a href="#license">Use the LLVM 'BSD' License</a></li> 43</ul> 44 45<p>Internal Design and Implementation:</p> 46 47<ul> 48<li><a href="#real">A real-world, production quality compiler</a></li> 49<li><a href="#simplecode">A simple and hackable code base</a></li> 50<li><a href="#unifiedparser">A single unified parser for C, Objective C, C++, 51 and Objective C++</a></li> 52<li><a href="#conformance">Conformance with C/C++/ObjC and their 53 variants</a></li> 54</ul> 55 56<!--*************************************************************************--> 57<h2><a name="enduser">End-User Features</a></h2> 58<!--*************************************************************************--> 59 60 61<!--=======================================================================--> 62<h3><a name="performance">Fast compiles and Low Memory Use</a></h3> 63<!--=======================================================================--> 64 65<p>A major focus of our work on clang is to make it fast, light and scalable. 66The library-based architecture of clang makes it straight-forward to time and 67profile the cost of each layer of the stack, and the driver has a number of 68options for performance analysis. Many detailed benchmarks can be found online.</p> 69 70<p>Compile time performance is important, but when using clang as an API, often 71memory use is even more so: the less memory the code takes the more code you can 72fit into memory at a time (useful for whole program analysis tools, for 73example).</p> 74 75<p>In addition to being efficient when pitted head-to-head against GCC in batch 76mode, clang is built with a <a href="#libraryarch">library based 77architecture</a> that makes it relatively easy to adapt it and build new tools 78with it. This means that it is often possible to apply out-of-the-box thinking 79and novel techniques to improve compilation in various ways.</p> 80 81 82<!--=======================================================================--> 83<h3><a name="expressivediags">Expressive Diagnostics</a></h3> 84<!--=======================================================================--> 85 86<p>In addition to being fast and functional, we aim to make Clang extremely user 87friendly. As far as a command-line compiler goes, this basically boils down to 88making the diagnostics (error and warning messages) generated by the compiler 89be as useful as possible. There are several ways that we do this, but the 90most important are pinpointing exactly what is wrong in the program, 91highlighting related information so that it is easy to understand at a glance, 92and making the wording as clear as possible.</p> 93 94<p>Here is one simple example that illustrates the quality of Clang diagnostic:</p> 95 96<pre> 97 $ <b>clang -fsyntax-only t.c</b> 98 t.c:7:39: error: invalid operands to binary expression ('int' and 'struct A') 99 <span style="color:darkgreen"> return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X);</span> 100 <span style="color:blue"> ~~~~~~~~~~~~~~ ^ ~~~~~</span> 101</pre> 102 103<p>Here you can see that you don't even need to see the original source code to 104understand what is wrong based on the Clang error: Because Clang prints a 105caret, you know exactly <em>which</em> plus it is complaining about. The range 106information highlights the left and right side of the plus which makes it 107immediately obvious what the compiler is talking about, which is very useful for 108cases involving precedence issues and many other situations.</p> 109 110<p>Clang diagnostics are very polished and have many features. For more 111information and examples, please see the <a href="diagnostics.html">Expressive 112Diagnostics</a> page.</p> 113 114<!--=======================================================================--> 115<h3><a name="gcccompat">GCC Compatibility</a></h3> 116<!--=======================================================================--> 117 118<p>GCC is currently the defacto-standard open source compiler today, and it 119routinely compiles a huge volume of code. GCC supports a huge number of 120extensions and features (many of which are undocumented) and a lot of 121code and header files depend on these features in order to build.</p> 122 123<p>While it would be nice to be able to ignore these extensions and focus on 124implementing the language standards to the letter, pragmatics force us to 125support the GCC extensions that see the most use. Many users just want their 126code to compile, they don't care to argue about whether it is pedantically C99 127or not.</p> 128 129<p>As mentioned above, all 130extensions are explicitly recognized as such and marked with extension 131diagnostics, which can be mapped to warnings, errors, or just ignored. 132</p> 133 134 135<!--*************************************************************************--> 136<h2><a name="applications">Utility and Applications</a></h2> 137<!--*************************************************************************--> 138 139<!--=======================================================================--> 140<h3><a name="libraryarch">Library Based Architecture</a></h3> 141<!--=======================================================================--> 142 143<p>A major design concept for clang is its use of a library-based 144architecture. In this design, various parts of the front-end can be cleanly 145divided into separate libraries which can then be mixed up for different needs 146and uses. In addition, the library-based approach encourages good interfaces 147and makes it easier for new developers to get involved (because they only need 148to understand small pieces of the big picture).</p> 149 150<blockquote><p> 151"The world needs better compiler tools, tools which are built as libraries. 152This design point allows reuse of the tools in new and novel ways. However, 153building the tools as libraries isn't enough: they must have clean APIs, be as 154decoupled from each other as possible, and be easy to modify/extend. This 155requires clean layering, decent design, and keeping the libraries independent of 156any specific client."</p></blockquote> 157 158<p> 159Currently, clang is divided into the following libraries and tool: 160</p> 161 162<ul> 163<li><b>libsupport</b> - Basic support library, from LLVM.</li> 164<li><b>libsystem</b> - System abstraction library, from LLVM.</li> 165<li><b>libbasic</b> - Diagnostics, SourceLocations, SourceBuffer abstraction, 166 file system caching for input source files.</li> 167<li><b>libast</b> - Provides classes to represent the C AST, the C type system, 168 builtin functions, and various helpers for analyzing and manipulating the 169 AST (visitors, pretty printers, etc).</li> 170<li><b>liblex</b> - Lexing and preprocessing, identifier hash table, pragma 171 handling, tokens, and macro expansion.</li> 172<li><b>libparse</b> - Parsing. This library invokes coarse-grained 'Actions' 173 provided by the client (e.g. libsema builds ASTs) but knows nothing about 174 ASTs or other client-specific data structures.</li> 175<li><b>libsema</b> - Semantic Analysis. This provides a set of parser actions 176 to build a standardized AST for programs.</li> 177<li><b>libcodegen</b> - Lower the AST to LLVM IR for optimization & code 178 generation.</li> 179<li><b>librewrite</b> - Editing of text buffers (important for code rewriting 180 transformation, like refactoring).</li> 181<li><b>libanalysis</b> - Static analysis support.</li> 182<li><b>clang</b> - A driver program, client of the libraries at various 183 levels.</li> 184</ul> 185 186<p>As an example of the power of this library based design.... If you wanted to 187build a preprocessor, you would take the Basic and Lexer libraries. If you want 188an indexer, you would take the previous two and add the Parser library and 189some actions for indexing. If you want a refactoring, static analysis, or 190source-to-source compiler tool, you would then add the AST building and 191semantic analyzer libraries.</p> 192 193<p>For more information about the low-level implementation details of the 194various clang libraries, please see the <a href="docs/InternalsManual.html"> 195clang Internals Manual</a>.</p> 196 197<!--=======================================================================--> 198<h3><a name="diverseclients">Support Diverse Clients</a></h3> 199<!--=======================================================================--> 200 201<p>Clang is designed and built with many grand plans for how we can use it. The 202driving force is the fact that we use C and C++ daily, and have to suffer due to 203a lack of good tools available for it. We believe that the C and C++ tools 204ecosystem has been significantly limited by how difficult it is to parse and 205represent the source code for these languages, and we aim to rectify this 206problem in clang.</p> 207 208<p>The problem with this goal is that different clients have very different 209requirements. Consider code generation, for example: a simple front-end that 210parses for code generation must analyze the code for validity and emit code 211in some intermediate form to pass off to a optimizer or backend. Because 212validity analysis and code generation can largely be done on the fly, there is 213not hard requirement that the front-end actually build up a full AST for all 214the expressions and statements in the code. TCC and GCC are examples of 215compilers that either build no real AST (in the former case) or build a stripped 216down and simplified AST (in the later case) because they focus primarily on 217codegen.</p> 218 219<p>On the opposite side of the spectrum, some clients (like refactoring) want 220highly detailed information about the original source code and want a complete 221AST to describe it with. Refactoring wants to have information about macro 222expansions, the location of every paren expression '(((x)))' vs 'x', full 223position information, and much more. Further, refactoring wants to look 224<em>across the whole program</em> to ensure that it is making transformations 225that are safe. Making this efficient and getting this right requires a 226significant amount of engineering and algorithmic work that simply are 227unnecessary for a simple static compiler.</p> 228 229<p>The beauty of the clang approach is that it does not restrict how you use it. 230In particular, it is possible to use the clang preprocessor and parser to build 231an extremely quick and light-weight on-the-fly code generator (similar to TCC) 232that does not build an AST at all. As an intermediate step, clang supports 233using the current AST generation and semantic analysis code and having a code 234generation client free the AST for each function after code generation. Finally, 235clang provides support for building and retaining fully-fledged ASTs, and even 236supports writing them out to disk.</p> 237 238<p>Designing the libraries with clean and simple APIs allows these high-level 239policy decisions to be determined in the client, instead of forcing "one true 240way" in the implementation of any of these libraries. Getting this right is 241hard, and we don't always get it right the first time, but we fix any problems 242when we realize we made a mistake.</p> 243 244<!--=======================================================================--> 245<h3 id="ideintegration">Integration with IDEs</h3> 246<!--=======================================================================--> 247 248<p> 249We believe that Integrated Development Environments (IDE's) are a great way 250to pull together various pieces of the development puzzle, and aim to make clang 251work well in such an environment. The chief advantage of an IDE is that they 252typically have visibility across your entire project and are long-lived 253processes, whereas stand-alone compiler tools are typically invoked on each 254individual file in the project, and thus have limited scope.</p> 255 256<p>There are many implications of this difference, but a significant one has to 257do with efficiency and caching: sharing an address space across different files 258in a project, means that you can use intelligent caching and other techniques to 259dramatically reduce analysis/compilation time.</p> 260 261<p>A further difference between IDEs and batch compiler is that they often 262impose very different requirements on the front-end: they depend on high 263performance in order to provide a "snappy" experience, and thus really want 264techniques like "incremental compilation", "fuzzy parsing", etc. Finally, IDEs 265often have very different requirements than code generation, often requiring 266information that a codegen-only frontend can throw away. Clang is 267specifically designed and built to capture this information. 268</p> 269 270 271<!--=======================================================================--> 272<h3><a name="license">Use the LLVM 'Apache 2' License</a></h3> 273<!--=======================================================================--> 274 275<p>We actively intend for clang (and LLVM as a whole) to be used for 276commercial projects, not only as a stand-alone compiler but also as a library 277embedded inside a proprietary application. We feel that the license encourages 278contributors to pick up the source and work with it, and believe that those 279individuals and organizations will contribute back their work if they do not 280want to have to maintain a fork forever (which is time consuming and expensive 281when merges are involved). Further, nobody makes money on compilers these days, 282but many people need them to get bigger goals accomplished: it makes sense for 283everyone to work together.</p> 284 285<p>For more information about the LLVM/clang license, please see the <a 286href="https://llvm.org/docs/DeveloperPolicy.html#copyright-license-and-patents">LLVM License 287Description</a> for more information.</p> 288 289 290 291<!--*************************************************************************--> 292<h2><a name="design">Internal Design and Implementation</a></h2> 293<!--*************************************************************************--> 294 295<!--=======================================================================--> 296<h3><a name="real">A real-world, production quality compiler</a></h3> 297<!--=======================================================================--> 298 299<p> 300Clang is designed and built by experienced compiler developers who are 301increasingly frustrated with the problems that existing open source 302compilers have. Clang is carefully and thoughtfully designed and 303built to provide the foundation of a whole new generation of 304C/C++/Objective C development tools, and we intend for it to be 305production quality.</p> 306 307<p>Being a production quality compiler means many things: it means being high 308performance, being solid and (relatively) bug free, and it means eventually 309being used and depended on by a broad range of people. While we are still in 310the early development stages, we strongly believe that this will become a 311reality.</p> 312 313<!--=======================================================================--> 314<h3><a name="simplecode">A simple and hackable code base</a></h3> 315<!--=======================================================================--> 316 317<p>Our goal is to make it possible for anyone with a basic understanding 318of compilers and working knowledge of the C/C++/ObjC languages to understand and 319extend the clang source base. A large part of this falls out of our decision to 320make the AST mirror the languages as closely as possible: you have your friendly 321if statement, for statement, parenthesis expression, structs, unions, etc, all 322represented in a simple and explicit way.</p> 323 324<p>In addition to a simple design, we work to make the source base approachable 325by commenting it well, including citations of the language standards where 326appropriate, and designing the code for simplicity. Beyond that, clang offers 327a set of AST dumpers, printers, and visualizers that make it easy to put code in 328and see how it is represented.</p> 329 330<!--=======================================================================--> 331<h3><a name="unifiedparser">A single unified parser for C, Objective C, C++, 332and Objective C++</a></h3> 333<!--=======================================================================--> 334 335<p>Clang is the "C Language Family Front-end", which means we intend to support 336the most popular members of the C family. We are convinced that the right 337parsing technology for this class of languages is a hand-built recursive-descent 338parser. Because it is plain C++ code, recursive descent makes it very easy for 339new developers to understand the code, it easily supports ad-hoc rules and other 340strange hacks required by C/C++, and makes it straight-forward to implement 341excellent diagnostics and error recovery.</p> 342 343<p>We believe that implementing C/C++/ObjC in a single unified parser makes the 344end result easier to maintain and evolve than maintaining a separate C and C++ 345parser which must be bugfixed and maintained independently of each other.</p> 346 347<!--=======================================================================--> 348<h3><a name="conformance">Conformance with C/C++/ObjC and their 349 variants</a></h3> 350<!--=======================================================================--> 351 352<p>When you start work on implementing a language, you find out that there is a 353huge gap between how the language works and how most people understand it to 354work. This gap is the difference between a normal programmer and a (scary? 355super-natural?) "language lawyer", who knows the ins and outs of the language 356and can grok standardese with ease.</p> 357 358<p>In practice, being conformant with the languages means that we aim to support 359the full language, including the dark and dusty corners (like trigraphs, 360preprocessor arcana, C99 VLAs, etc). Where we support extensions above and 361beyond what the standard officially allows, we make an effort to explicitly call 362this out in the code and emit warnings about it (which are disabled by default, 363but can optionally be mapped to either warnings or errors), allowing you to use 364clang in "strict" mode if you desire.</p> 365 366</div> 367</body> 368</html> 369