1================================ 2Frequently Asked Questions (FAQ) 3================================ 4 5.. contents:: 6 :local: 7 8 9License 10======= 11 12Does the University of Illinois Open Source License really qualify as an "open source" license? 13----------------------------------------------------------------------------------------------- 14Yes, the license is `certified 15<http://www.opensource.org/licenses/UoI-NCSA.php>`_ by the Open Source 16Initiative (OSI). 17 18 19Can I modify LLVM source code and redistribute the modified source? 20------------------------------------------------------------------- 21Yes. The modified source distribution must retain the copyright notice and 22follow the three bulletted conditions listed in the `LLVM license 23<http://llvm.org/svn/llvm-project/llvm/trunk/LICENSE.TXT>`_. 24 25 26Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source? 27-------------------------------------------------------------------------------------------------------------------------- 28Yes. This is why we distribute LLVM under a less restrictive license than GPL, 29as explained in the first question above. 30 31 32Source Code 33=========== 34 35In what language is LLVM written? 36--------------------------------- 37All of the LLVM tools and libraries are written in C++ with extensive use of 38the STL. 39 40 41How portable is the LLVM source code? 42------------------------------------- 43The LLVM source code should be portable to most modern Unix-like operating 44systems. Most of the code is written in standard C++ with operating system 45services abstracted to a support library. The tools required to build and 46test LLVM have been ported to a plethora of platforms. 47 48Some porting problems may exist in the following areas: 49 50* The autoconf/makefile build system relies heavily on UNIX shell tools, 51 like the Bourne Shell and sed. Porting to systems without these tools 52 (MacOS 9, Plan 9) will require more effort. 53 54What API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation? 55--------------------------------------------------------------------------------------------------- 56 57In short: you can't. It's actually kind of a silly question once you grok 58what's going on. Basically, in code like: 59 60.. code-block:: llvm 61 62 %result = add i32 %foo, %bar 63 64, ``%result`` is just a name given to the ``Value`` of the ``add`` 65instruction. In other words, ``%result`` *is* the add instruction. The 66"assignment" doesn't explicitly "store" anything to any "virtual register"; 67the "``=``" is more like the mathematical sense of equality. 68 69Longer explanation: In order to generate a textual representation of the 70IR, some kind of name has to be given to each instruction so that other 71instructions can textually reference it. However, the isomorphic in-memory 72representation that you manipulate from C++ has no such restriction since 73instructions can simply keep pointers to any other ``Value``'s that they 74reference. In fact, the names of dummy numbered temporaries like ``%1`` are 75not explicitly represented in the in-memory representation at all (see 76``Value::getName()``). 77 78 79Source Languages 80================ 81 82What source languages are supported? 83------------------------------------ 84 85LLVM currently has full support for C and C++ source languages through 86`Clang <http://clang.llvm.org/>`_. Many other language frontends have 87been written using LLVM, and an incomplete list is available at 88`projects with LLVM <http://llvm.org/ProjectsWithLLVM/>`_. 89 90 91I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators? 92---------------------------------------------------------------------------------------------------------------------------------------- 93Your compiler front-end will communicate with LLVM by creating a module in the 94LLVM intermediate representation (IR) format. Assuming you want to write your 95language's compiler in the language itself (rather than C++), there are 3 96major ways to tackle generating LLVM IR from a front-end: 97 981. **Call into the LLVM libraries code using your language's FFI (foreign 99 function interface).** 100 101 * *for:* best tracks changes to the LLVM IR, .ll syntax, and .bc format 102 103 * *for:* enables running LLVM optimization passes without a emit/parse 104 overhead 105 106 * *for:* adapts well to a JIT context 107 108 * *against:* lots of ugly glue code to write 109 1102. **Emit LLVM assembly from your compiler's native language.** 111 112 * *for:* very straightforward to get started 113 114 * *against:* the .ll parser is slower than the bitcode reader when 115 interfacing to the middle end 116 117 * *against:* it may be harder to track changes to the IR 118 1193. **Emit LLVM bitcode from your compiler's native language.** 120 121 * *for:* can use the more-efficient bitcode reader when interfacing to the 122 middle end 123 124 * *against:* you'll have to re-engineer the LLVM IR object model and bitcode 125 writer in your language 126 127 * *against:* it may be harder to track changes to the IR 128 129If you go with the first option, the C bindings in include/llvm-c should help 130a lot, since most languages have strong support for interfacing with C. The 131most common hurdle with calling C from managed code is interfacing with the 132garbage collector. The C interface was designed to require very little memory 133management, and so is straightforward in this regard. 134 135What support is there for a higher level source language constructs for building a compiler? 136-------------------------------------------------------------------------------------------- 137Currently, there isn't much. LLVM supports an intermediate representation 138which is useful for code representation but will not support the high level 139(abstract syntax tree) representation needed by most compilers. There are no 140facilities for lexical nor semantic analysis. 141 142 143I don't understand the ``GetElementPtr`` instruction. Help! 144----------------------------------------------------------- 145See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_. 146 147 148Using the C and C++ Front Ends 149============================== 150 151Can I compile C or C++ code to platform-independent LLVM bitcode? 152----------------------------------------------------------------- 153No. C and C++ are inherently platform-dependent languages. The most obvious 154example of this is the preprocessor. A very common way that C code is made 155portable is by using the preprocessor to include platform-specific code. In 156practice, information about other platforms is lost after preprocessing, so 157the result is inherently dependent on the platform that the preprocessing was 158targeting. 159 160Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary 161between platforms. In most C front-ends, ``sizeof`` is expanded to a 162constant immediately, thus hard-wiring a platform-specific detail. 163 164Also, since many platforms define their ABIs in terms of C, and since LLVM is 165lower-level than C, front-ends currently must emit platform-specific IR in 166order to have the result conform to the platform ABI. 167 168 169Questions about code generated by the demo page 170=============================================== 171 172What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``? 173------------------------------------------------------------------------------------------------------------- 174If you ``#include`` the ``<iostream>`` header into a C++ translation unit, 175the file will probably use the ``std::cin``/``std::cout``/... global objects. 176However, C++ does not guarantee an order of initialization between static 177objects in different translation units, so if a static ctor/dtor in your .cpp 178file used ``std::cout``, for example, the object would not necessarily be 179automatically initialized before your use. 180 181To make ``std::cout`` and friends work correctly in these scenarios, the STL 182that we use declares a static object that gets created in every translation 183unit that includes ``<iostream>``. This object has a static constructor 184and destructor that initializes and destroys the global iostream objects 185before they could possibly be used in the file. The code that you see in the 186``.ll`` file corresponds to the constructor and destructor registration code. 187 188If you would like to make it easier to *understand* the LLVM code generated 189by the compiler in the demo page, consider using ``printf()`` instead of 190``iostream``\s to print values. 191 192 193Where did all of my code go?? 194----------------------------- 195If you are using the LLVM demo page, you may often wonder what happened to 196all of the code that you typed in. Remember that the demo script is running 197the code through the LLVM optimizers, so if your code doesn't actually do 198anything useful, it might all be deleted. 199 200To prevent this, make sure that the code is actually needed. For example, if 201you are computing some expression, return the value from the function instead 202of leaving it in a local variable. If you really want to constrain the 203optimizer, you can read from and assign to ``volatile`` global variables. 204 205 206What is this "``undef``" thing that shows up in my code? 207-------------------------------------------------------- 208``undef`` is the LLVM way of representing a value that is not defined. You 209can get these if you do not initialize a variable before you use it. For 210example, the C function: 211 212.. code-block:: c 213 214 int X() { int i; return i; } 215 216Is compiled to "``ret i32 undef``" because "``i``" never has a value specified 217for it. 218 219 220Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it? 221---------------------------------------------------------------------------------------------------------------------------------------------------------- 222This is a common problem run into by authors of front-ends that are using 223custom calling conventions: you need to make sure to set the right calling 224convention on both the function and on each call to the function. For 225example, this code: 226 227.. code-block:: llvm 228 229 define fastcc void @foo() { 230 ret void 231 } 232 define void @bar() { 233 call void @foo() 234 ret void 235 } 236 237Is optimized to: 238 239.. code-block:: llvm 240 241 define fastcc void @foo() { 242 ret void 243 } 244 define void @bar() { 245 unreachable 246 } 247 248... with "``opt -instcombine -simplifycfg``". This often bites people because 249"all their code disappears". Setting the calling convention on the caller and 250callee is required for indirect calls to work, so people often ask why not 251make the verifier reject this sort of thing. 252 253The answer is that this code has undefined behavior, but it is not illegal. 254If we made it illegal, then every transformation that could potentially create 255this would have to ensure that it doesn't, and there is valid code that can 256create this sort of construct (in dead code). The sorts of things that can 257cause this to happen are fairly contrived, but we still need to accept them. 258Here's an example: 259 260.. code-block:: llvm 261 262 define fastcc void @foo() { 263 ret void 264 } 265 define internal void @bar(void()* %FP, i1 %cond) { 266 br i1 %cond, label %T, label %F 267 T: 268 call void %FP() 269 ret void 270 F: 271 call fastcc void %FP() 272 ret void 273 } 274 define void @test() { 275 %X = or i1 false, false 276 call void @bar(void()* @foo, i1 %X) 277 ret void 278 } 279 280In this example, "test" always passes ``@foo``/``false`` into ``bar``, which 281ensures that it is dynamically called with the right calling conv (thus, the 282code is perfectly well defined). If you run this through the inliner, you 283get this (the explicit "or" is there so that the inliner doesn't dead code 284eliminate a bunch of stuff): 285 286.. code-block:: llvm 287 288 define fastcc void @foo() { 289 ret void 290 } 291 define void @test() { 292 %X = or i1 false, false 293 br i1 %X, label %T.i, label %F.i 294 T.i: 295 call void @foo() 296 br label %bar.exit 297 F.i: 298 call fastcc void @foo() 299 br label %bar.exit 300 bar.exit: 301 ret void 302 } 303 304Here you can see that the inlining pass made an undefined call to ``@foo`` 305with the wrong calling convention. We really don't want to make the inliner 306have to know about this sort of thing, so it needs to be valid code. In this 307case, dead code elimination can trivially remove the undefined code. However, 308if ``%X`` was an input argument to ``@test``, the inliner would produce this: 309 310.. code-block:: llvm 311 312 define fastcc void @foo() { 313 ret void 314 } 315 316 define void @test(i1 %X) { 317 br i1 %X, label %T.i, label %F.i 318 T.i: 319 call void @foo() 320 br label %bar.exit 321 F.i: 322 call fastcc void @foo() 323 br label %bar.exit 324 bar.exit: 325 ret void 326 } 327 328The interesting thing about this is that ``%X`` *must* be false for the 329code to be well-defined, but no amount of dead code elimination will be able 330to delete the broken call as unreachable. However, since 331``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we 332end up with a branch on a condition that goes to unreachable: a branch to 333unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is 334able to produce: 335 336.. code-block:: llvm 337 338 define fastcc void @foo() { 339 ret void 340 } 341 define void @test(i1 %X) { 342 F.i: 343 call fastcc void @foo() 344 ret void 345 } 346