1===================================== 2Garbage Collection with LLVM 3===================================== 4 5.. contents:: 6 :local: 7 8Abstract 9======== 10 11This document covers how to integrate LLVM into a compiler for a language which 12supports garbage collection. **Note that LLVM itself does not provide a 13garbage collector.** You must provide your own. 14 15Quick Start 16============ 17 18First, you should pick a collector strategy. LLVM includes a number of built 19in ones, but you can also implement a loadable plugin with a custom definition. 20Note that the collector strategy is a description of how LLVM should generate 21code such that it interacts with your collector and runtime, not a description 22of the collector itself. 23 24Next, mark your generated functions as using your chosen collector strategy. 25From c++, you can call: 26 27.. code-block:: c++ 28 29 F.setGC(<collector description name>); 30 31 32This will produce IR like the following fragment: 33 34.. code-block:: llvm 35 36 define void @foo() gc "<collector description name>" { ... } 37 38 39When generating LLVM IR for your functions, you will need to: 40 41* Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` in place of standard load and 42 store instructions. These intrinsics are used to represent load and store 43 barriers. If you collector does not require such barriers, you can skip 44 this step. 45 46* Use the memory allocation routines provided by your garbage collector's 47 runtime library. 48 49* If your collector requires them, generate type maps according to your 50 runtime's binary interface. LLVM is not involved in the process. In 51 particular, the LLVM type system is not suitable for conveying such 52 information though the compiler. 53 54* Insert any coordination code required for interacting with your collector. 55 Many collectors require running application code to periodically check a 56 flag and conditionally call a runtime function. This is often referred to 57 as a safepoint poll. 58 59You will need to identify roots (i.e. references to heap objects your collector 60needs to know about) in your generated IR, so that LLVM can encode them into 61your final stack maps. Depending on the collector strategy chosen, this is 62accomplished by using either the ``@llvm.gcroot`` intrinsics or an 63``gc.statepoint`` relocation sequence. 64 65Don't forget to create a root for each intermediate value that is generated when 66evaluating an expression. In ``h(f(), g())``, the result of ``f()`` could 67easily be collected if evaluating ``g()`` triggers a collection. 68 69Finally, you need to link your runtime library with the generated program 70executable (for a static compiler) or ensure the appropriate symbols are 71available for the runtime linker (for a JIT compiler). 72 73 74Introduction 75============ 76 77What is Garbage Collection? 78--------------------------- 79 80Garbage collection is a widely used technique that frees the programmer from 81having to know the lifetimes of heap objects, making software easier to produce 82and maintain. Many programming languages rely on garbage collection for 83automatic memory management. There are two primary forms of garbage collection: 84conservative and accurate. 85 86Conservative garbage collection often does not require any special support from 87either the language or the compiler: it can handle non-type-safe programming 88languages (such as C/C++) and does not require any special information from the 89compiler. The `Boehm collector 90<http://www.hpl.hp.com/personal/Hans_Boehm/gc/>`__ is an example of a 91state-of-the-art conservative collector. 92 93Accurate garbage collection requires the ability to identify all pointers in the 94program at run-time (which requires that the source-language be type-safe in 95most cases). Identifying pointers at run-time requires compiler support to 96locate all places that hold live pointer variables at run-time, including the 97:ref:`processor stack and registers <gcroot>`. 98 99Conservative garbage collection is attractive because it does not require any 100special compiler support, but it does have problems. In particular, because the 101conservative garbage collector cannot *know* that a particular word in the 102machine is a pointer, it cannot move live objects in the heap (preventing the 103use of compacting and generational GC algorithms) and it can occasionally suffer 104from memory leaks due to integer values that happen to point to objects in the 105program. In addition, some aggressive compiler transformations can break 106conservative garbage collectors (though these seem rare in practice). 107 108Accurate garbage collectors do not suffer from any of these problems, but they 109can suffer from degraded scalar optimization of the program. In particular, 110because the runtime must be able to identify and update all pointers active in 111the program, some optimizations are less effective. In practice, however, the 112locality and performance benefits of using aggressive garbage collection 113techniques dominates any low-level losses. 114 115This document describes the mechanisms and interfaces provided by LLVM to 116support accurate garbage collection. 117 118Goals and non-goals 119------------------- 120 121LLVM's intermediate representation provides :ref:`garbage collection intrinsics 122<gc_intrinsics>` that offer support for a broad class of collector models. For 123instance, the intrinsics permit: 124 125* semi-space collectors 126 127* mark-sweep collectors 128 129* generational collectors 130 131* incremental collectors 132 133* concurrent collectors 134 135* cooperative collectors 136 137* reference counting 138 139We hope that the support built into the LLVM IR is sufficient to support a 140broad class of garbage collected languages including Scheme, ML, Java, C#, 141Perl, Python, Lua, Ruby, other scripting languages, and more. 142 143Note that LLVM **does not itself provide a garbage collector** --- this should 144be part of your language's runtime library. LLVM provides a framework for 145describing the garbage collectors requirements to the compiler. In particular, 146LLVM provides support for generating stack maps at call sites, polling for a 147safepoint, and emitting load and store barriers. You can also extend LLVM - 148possibly through a loadable :ref:`code generation plugins <plugin>` - to 149generate code and data structures which conforms to the *binary interface* 150specified by the *runtime library*. This is similar to the relationship between 151LLVM and DWARF debugging info, for example. The difference primarily lies in 152the lack of an established standard in the domain of garbage collection --- thus 153the need for a flexible extension mechanism. 154 155The aspects of the binary interface with which LLVM's GC support is 156concerned are: 157 158* Creation of GC safepoints within code where collection is allowed to execute 159 safely. 160 161* Computation of the stack map. For each safe point in the code, object 162 references within the stack frame must be identified so that the collector may 163 traverse and perhaps update them. 164 165* Write barriers when storing object references to the heap. These are commonly 166 used to optimize incremental scans in generational collectors. 167 168* Emission of read barriers when loading object references. These are useful 169 for interoperating with concurrent collectors. 170 171There are additional areas that LLVM does not directly address: 172 173* Registration of global roots with the runtime. 174 175* Registration of stack map entries with the runtime. 176 177* The functions used by the program to allocate memory, trigger a collection, 178 etc. 179 180* Computation or compilation of type maps, or registration of them with the 181 runtime. These are used to crawl the heap for object references. 182 183In general, LLVM's support for GC does not include features which can be 184adequately addressed with other features of the IR and does not specify a 185particular binary interface. On the plus side, this means that you should be 186able to integrate LLVM with an existing runtime. On the other hand, it can 187have the effect of leaving a lot of work for the developer of a novel 188language. We try to mitigate this by providing built in collector strategy 189descriptions that can work with many common collector designs and easy 190extension points. If you don't already have a specific binary interface 191you need to support, we recommend trying to use one of these built in collector 192strategies. 193 194.. _gc_intrinsics: 195 196LLVM IR Features 197================ 198 199This section describes the garbage collection facilities provided by the 200:doc:`LLVM intermediate representation <LangRef>`. The exact behavior of these 201IR features is specified by the selected :ref:`GC strategy description 202<plugin>`. 203 204Specifying GC code generation: ``gc "..."`` 205------------------------------------------- 206 207.. code-block:: llvm 208 209 define <returntype> @name(...) gc "name" { ... } 210 211The ``gc`` function attribute is used to specify the desired GC strategy to the 212compiler. Its programmatic equivalent is the ``setGC`` method of ``Function``. 213 214Setting ``gc "name"`` on a function triggers a search for a matching subclass 215of GCStrategy. Some collector strategies are built in. You can add others 216using either the loadable plugin mechanism, or by patching your copy of LLVM. 217It is the selected GC strategy which defines the exact nature of the code 218generated to support GC. If none is found, the compiler will raise an error. 219 220Specifying the GC style on a per-function basis allows LLVM to link together 221programs that use different garbage collection algorithms (or none at all). 222 223.. _gcroot: 224 225Identifying GC roots on the stack 226---------------------------------- 227 228LLVM currently supports two different mechanisms for describing references in 229compiled code at safepoints. ``llvm.gcroot`` is the older mechanism; 230``gc.statepoint`` has been added more recently. At the moment, you can choose 231either implementation (on a per :ref:`GC strategy <plugin>` basis). Longer 232term, we will probably either migrate away from ``llvm.gcroot`` entirely, or 233substantially merge their implementations. Note that most new development 234work is focused on ``gc.statepoint``. 235 236Using ``gc.statepoint`` 237^^^^^^^^^^^^^^^^^^^^^^^^ 238:doc:`This page <Statepoints>` contains detailed documentation for 239``gc.statepoint``. 240 241Using ``llvm.gcwrite`` 242^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 243 244.. code-block:: llvm 245 246 void @llvm.gcroot(i8** %ptrloc, i8* %metadata) 247 248The ``llvm.gcroot`` intrinsic is used to inform LLVM that a stack variable 249references an object on the heap and is to be tracked for garbage collection. 250The exact impact on generated code is specified by the Function's selected 251:ref:`GC strategy <plugin>`. All calls to ``llvm.gcroot`` **must** reside 252inside the first basic block. 253 254The first argument **must** be a value referring to an alloca instruction or a 255bitcast of an alloca. The second contains a pointer to metadata that should be 256associated with the pointer, and **must** be a constant or global value 257address. If your target collector uses tags, use a null pointer for metadata. 258 259A compiler which performs manual SSA construction **must** ensure that SSA 260values representing GC references are stored in to the alloca passed to the 261respective ``gcroot`` before every call site and reloaded after every call. 262A compiler which uses mem2reg to raise imperative code using ``alloca`` into 263SSA form need only add a call to ``@llvm.gcroot`` for those variables which 264are pointers into the GC heap. 265 266It is also important to mark intermediate values with ``llvm.gcroot``. For 267example, consider ``h(f(), g())``. Beware leaking the result of ``f()`` in the 268case that ``g()`` triggers a collection. Note, that stack variables must be 269initialized and marked with ``llvm.gcroot`` in function's prologue. 270 271The ``%metadata`` argument can be used to avoid requiring heap objects to have 272'isa' pointers or tag bits. [Appel89_, Goldberg91_, Tolmach94_] If specified, 273its value will be tracked along with the location of the pointer in the stack 274frame. 275 276Consider the following fragment of Java code: 277 278.. code-block:: java 279 280 { 281 Object X; // A null-initialized reference to an object 282 ... 283 } 284 285This block (which may be located in the middle of a function or in a loop nest), 286could be compiled to this LLVM code: 287 288.. code-block:: llvm 289 290 Entry: 291 ;; In the entry block for the function, allocate the 292 ;; stack space for X, which is an LLVM pointer. 293 %X = alloca %Object* 294 295 ;; Tell LLVM that the stack space is a stack root. 296 ;; Java has type-tags on objects, so we pass null as metadata. 297 %tmp = bitcast %Object** %X to i8** 298 call void @llvm.gcroot(i8** %tmp, i8* null) 299 ... 300 301 ;; "CodeBlock" is the block corresponding to the start 302 ;; of the scope above. 303 CodeBlock: 304 ;; Java null-initializes pointers. 305 store %Object* null, %Object** %X 306 307 ... 308 309 ;; As the pointer goes out of scope, store a null value into 310 ;; it, to indicate that the value is no longer live. 311 store %Object* null, %Object** %X 312 ... 313 314Reading and writing references in the heap 315------------------------------------------ 316 317Some collectors need to be informed when the mutator (the program that needs 318garbage collection) either reads a pointer from or writes a pointer to a field 319of a heap object. The code fragments inserted at these points are called *read 320barriers* and *write barriers*, respectively. The amount of code that needs to 321be executed is usually quite small and not on the critical path of any 322computation, so the overall performance impact of the barrier is tolerable. 323 324Barriers often require access to the *object pointer* rather than the *derived 325pointer* (which is a pointer to the field within the object). Accordingly, 326these intrinsics take both pointers as separate arguments for completeness. In 327this snippet, ``%object`` is the object pointer, and ``%derived`` is the derived 328pointer: 329 330.. code-block:: llvm 331 332 ;; An array type. 333 %class.Array = type { %class.Object, i32, [0 x %class.Object*] } 334 ... 335 336 ;; Load the object pointer from a gcroot. 337 %object = load %class.Array** %object_addr 338 339 ;; Compute the derived pointer. 340 %derived = getelementptr %object, i32 0, i32 2, i32 %n 341 342LLVM does not enforce this relationship between the object and derived pointer 343(although a particular :ref:`collector strategy <plugin>` might). However, it 344would be an unusual collector that violated it. 345 346The use of these intrinsics is naturally optional if the target GC does not 347require the corresponding barrier. The GC strategy used with such a collector 348should replace the intrinsic calls with the corresponding ``load`` or 349``store`` instruction if they are used. 350 351One known deficiency with the current design is that the barrier intrinsics do 352not include the size or alignment of the underlying operation performed. It is 353currently assumed that the operation is of pointer size and the alignment is 354assumed to be the target machine's default alignment. 355 356Write barrier: ``llvm.gcwrite`` 357^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 358 359.. code-block:: llvm 360 361 void @llvm.gcwrite(i8* %value, i8* %object, i8** %derived) 362 363For write barriers, LLVM provides the ``llvm.gcwrite`` intrinsic function. It 364has exactly the same semantics as a non-volatile ``store`` to the derived 365pointer (the third argument). The exact code generated is specified by the 366Function's selected :ref:`GC strategy <plugin>`. 367 368Many important algorithms require write barriers, including generational and 369concurrent collectors. Additionally, write barriers could be used to implement 370reference counting. 371 372Read barrier: ``llvm.gcread`` 373^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 374 375.. code-block:: llvm 376 377 i8* @llvm.gcread(i8* %object, i8** %derived) 378 379For read barriers, LLVM provides the ``llvm.gcread`` intrinsic function. It has 380exactly the same semantics as a non-volatile ``load`` from the derived pointer 381(the second argument). The exact code generated is specified by the Function's 382selected :ref:`GC strategy <plugin>`. 383 384Read barriers are needed by fewer algorithms than write barriers, and may have a 385greater performance impact since pointer reads are more frequent than writes. 386 387.. _plugin: 388 389.. _builtin-gc-strategies: 390 391Built In GC Strategies 392====================== 393 394LLVM includes built in support for several varieties of garbage collectors. 395 396The Shadow Stack GC 397---------------------- 398 399To use this collector strategy, mark your functions with: 400 401.. code-block:: c++ 402 403 F.setGC("shadow-stack"); 404 405Unlike many GC algorithms which rely on a cooperative code generator to compile 406stack maps, this algorithm carefully maintains a linked list of stack roots 407[:ref:`Henderson2002 <henderson02>`]. This so-called "shadow stack" mirrors the 408machine stack. Maintaining this data structure is slower than using a stack map 409compiled into the executable as constant data, but has a significant portability 410advantage because it requires no special support from the target code generator, 411and does not require tricky platform-specific code to crawl the machine stack. 412 413The tradeoff for this simplicity and portability is: 414 415* High overhead per function call. 416 417* Not thread-safe. 418 419Still, it's an easy way to get started. After your compiler and runtime are up 420and running, writing a :ref:`plugin <plugin>` will allow you to take advantage 421of :ref:`more advanced GC features <collector-algos>` of LLVM in order to 422improve performance. 423 424 425The shadow stack doesn't imply a memory allocation algorithm. A semispace 426collector or building atop ``malloc`` are great places to start, and can be 427implemented with very little code. 428 429When it comes time to collect, however, your runtime needs to traverse the stack 430roots, and for this it needs to integrate with the shadow stack. Luckily, doing 431so is very simple. (This code is heavily commented to help you understand the 432data structure, but there are only 20 lines of meaningful code.) 433 434.. code-block:: c++ 435 436 /// @brief The map for a single function's stack frame. One of these is 437 /// compiled as constant data into the executable for each function. 438 /// 439 /// Storage of metadata values is elided if the %metadata parameter to 440 /// @llvm.gcroot is null. 441 struct FrameMap { 442 int32_t NumRoots; //< Number of roots in stack frame. 443 int32_t NumMeta; //< Number of metadata entries. May be < NumRoots. 444 const void *Meta[0]; //< Metadata for each root. 445 }; 446 447 /// @brief A link in the dynamic shadow stack. One of these is embedded in 448 /// the stack frame of each function on the call stack. 449 struct StackEntry { 450 StackEntry *Next; //< Link to next stack entry (the caller's). 451 const FrameMap *Map; //< Pointer to constant FrameMap. 452 void *Roots[0]; //< Stack roots (in-place array). 453 }; 454 455 /// @brief The head of the singly-linked list of StackEntries. Functions push 456 /// and pop onto this in their prologue and epilogue. 457 /// 458 /// Since there is only a global list, this technique is not threadsafe. 459 StackEntry *llvm_gc_root_chain; 460 461 /// @brief Calls Visitor(root, meta) for each GC root on the stack. 462 /// root and meta are exactly the values passed to 463 /// @llvm.gcroot. 464 /// 465 /// Visitor could be a function to recursively mark live objects. Or it 466 /// might copy them to another heap or generation. 467 /// 468 /// @param Visitor A function to invoke for every GC root on the stack. 469 void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) { 470 for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) { 471 unsigned i = 0; 472 473 // For roots [0, NumMeta), the metadata pointer is in the FrameMap. 474 for (unsigned e = R->Map->NumMeta; i != e; ++i) 475 Visitor(&R->Roots[i], R->Map->Meta[i]); 476 477 // For roots [NumMeta, NumRoots), the metadata pointer is null. 478 for (unsigned e = R->Map->NumRoots; i != e; ++i) 479 Visitor(&R->Roots[i], NULL); 480 } 481 } 482 483 484The 'Erlang' and 'Ocaml' GCs 485----------------------------- 486 487LLVM ships with two example collectors which leverage the ``gcroot`` 488mechanisms. To our knowledge, these are not actually used by any language 489runtime, but they do provide a reasonable starting point for someone interested 490in writing an ``gcroot`` compatible GC plugin. In particular, these are the 491only in tree examples of how to produce a custom binary stack map format using 492a ``gcroot`` strategy. 493 494As there names imply, the binary format produced is intended to model that 495used by the Erlang and OCaml compilers respectively. 496 497 498The Statepoint Example GC 499------------------------- 500 501.. code-block:: c++ 502 503 F.setGC("statepoint-example"); 504 505This GC provides an example of how one might use the infrastructure provided 506by ``gc.statepoint``. This example GC is compatible with the 507:ref:`PlaceSafepoints` and :ref:`RewriteStatepointsForGC` utility passes 508which simplify ``gc.statepoint`` sequence insertion. If you need to build a 509custom GC strategy around the ``gc.statepoints`` mechanisms, it is recommended 510that you use this one as a starting point. 511 512This GC strategy does not support read or write barriers. As a result, these 513intrinsics are lowered to normal loads and stores. 514 515The stack map format generated by this GC strategy can be found in the 516:ref:`stackmap-section` using a format documented :ref:`here 517<statepoint-stackmap-format>`. This format is intended to be the standard 518format supported by LLVM going forward. 519 520 521Custom GC Strategies 522==================== 523 524If none of the built in GC strategy descriptions met your needs above, you will 525need to define a custom GCStrategy and possibly, a custom LLVM pass to perform 526lowering. Your best example of where to start defining a custom GCStrategy 527would be to look at one of the built in strategies. 528 529You may be able to structure this additional code as a loadable plugin library. 530Loadable plugins are sufficient if all you need is to enable a different 531combination of built in functionality, but if you need to provide a custom 532lowering pass, you will need to build a patched version of LLVM. If you think 533you need a patched build, please ask for advice on llvm-dev. There may be an 534easy way we can extend the support to make it work for your use case without 535requiring a custom build. 536 537Collector Requirements 538---------------------- 539 540You should be able to leverage any existing collector library that includes the following elements: 541 542#. A memory allocator which exposes an allocation function your compiled 543 code can call. 544 545#. A binary format for the stack map. A stack map describes the location 546 of references at a safepoint and is used by precise collectors to identify 547 references within a stack frame on the machine stack. Note that collectors 548 which conservatively scan the stack don't require such a structure. 549 550#. A stack crawler to discover functions on the call stack, and enumerate the 551 references listed in the stack map for each call site. 552 553#. A mechanism for identifying references in global locations (e.g. global 554 variables). 555 556#. If you collector requires them, an LLVM IR implementation of your collectors 557 load and store barriers. Note that since many collectors don't require 558 barriers at all, LLVM defaults to lowering such barriers to normal loads 559 and stores unless you arrange otherwise. 560 561 562Implementing a collector plugin 563------------------------------- 564 565User code specifies which GC code generation to use with the ``gc`` function 566attribute or, equivalently, with the ``setGC`` method of ``Function``. 567 568To implement a GC plugin, it is necessary to subclass ``llvm::GCStrategy``, 569which can be accomplished in a few lines of boilerplate code. LLVM's 570infrastructure provides access to several important algorithms. For an 571uncontroversial collector, all that remains may be to compile LLVM's computed 572stack map to assembly code (using the binary representation expected by the 573runtime library). This can be accomplished in about 100 lines of code. 574 575This is not the appropriate place to implement a garbage collected heap or a 576garbage collector itself. That code should exist in the language's runtime 577library. The compiler plugin is responsible for generating code which conforms 578to the binary interface defined by library, most essentially the :ref:`stack map 579<stack-map>`. 580 581To subclass ``llvm::GCStrategy`` and register it with the compiler: 582 583.. code-block:: c++ 584 585 // lib/MyGC/MyGC.cpp - Example LLVM GC plugin 586 587 #include "llvm/CodeGen/GCStrategy.h" 588 #include "llvm/CodeGen/GCMetadata.h" 589 #include "llvm/Support/Compiler.h" 590 591 using namespace llvm; 592 593 namespace { 594 class LLVM_LIBRARY_VISIBILITY MyGC : public GCStrategy { 595 public: 596 MyGC() {} 597 }; 598 599 GCRegistry::Add<MyGC> 600 X("mygc", "My bespoke garbage collector."); 601 } 602 603This boilerplate collector does nothing. More specifically: 604 605* ``llvm.gcread`` calls are replaced with the corresponding ``load`` 606 instruction. 607 608* ``llvm.gcwrite`` calls are replaced with the corresponding ``store`` 609 instruction. 610 611* No safe points are added to the code. 612 613* The stack map is not compiled into the executable. 614 615Using the LLVM makefiles, this code 616can be compiled as a plugin using a simple makefile: 617 618.. code-block:: make 619 620 # lib/MyGC/Makefile 621 622 LEVEL := ../.. 623 LIBRARYNAME = MyGC 624 LOADABLE_MODULE = 1 625 626 include $(LEVEL)/Makefile.common 627 628Once the plugin is compiled, code using it may be compiled using ``llc 629-load=MyGC.so`` (though MyGC.so may have some other platform-specific 630extension): 631 632:: 633 634 $ cat sample.ll 635 define void @f() gc "mygc" { 636 entry: 637 ret void 638 } 639 $ llvm-as < sample.ll | llc -load=MyGC.so 640 641It is also possible to statically link the collector plugin into tools, such as 642a language-specific compiler front-end. 643 644.. _collector-algos: 645 646Overview of available features 647------------------------------ 648 649``GCStrategy`` provides a range of features through which a plugin may do useful 650work. Some of these are callbacks, some are algorithms that can be enabled, 651disabled, or customized. This matrix summarizes the supported (and planned) 652features and correlates them with the collection techniques which typically 653require them. 654 655.. |v| unicode:: 0x2714 656 :trim: 657 658.. |x| unicode:: 0x2718 659 :trim: 660 661+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 662| Algorithm | Done | Shadow | refcount | mark- | copying | incremental | threaded | concurrent | 663| | | stack | | sweep | | | | | 664+============+======+========+==========+=======+=========+=============+==========+============+ 665| stack map | |v| | | | |x| | |x| | |x| | |x| | |x| | 666+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 667| initialize | |v| | |x| | |x| | |x| | |x| | |x| | |x| | |x| | 668| roots | | | | | | | | | 669+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 670| derived | NO | | | | | | **N**\* | **N**\* | 671| pointers | | | | | | | | | 672+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 673| **custom | |v| | | | | | | | | 674| lowering** | | | | | | | | | 675+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 676| *gcroot* | |v| | |x| | |x| | | | | | | 677+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 678| *gcwrite* | |v| | | |x| | | | |x| | | |x| | 679+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 680| *gcread* | |v| | | | | | | | |x| | 681+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 682| **safe | | | | | | | | | 683| points** | | | | | | | | | 684+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 685| *in | |v| | | | |x| | |x| | |x| | |x| | |x| | 686| calls* | | | | | | | | | 687+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 688| *before | |v| | | | | | | |x| | |x| | 689| calls* | | | | | | | | | 690+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 691| *for | NO | | | | | | **N** | **N** | 692| loops* | | | | | | | | | 693+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 694| *before | |v| | | | | | | |x| | |x| | 695| escape* | | | | | | | | | 696+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 697| emit code | NO | | | | | | **N** | **N** | 698| at safe | | | | | | | | | 699| points | | | | | | | | | 700+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 701| **output** | | | | | | | | | 702+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 703| *assembly* | |v| | | | |x| | |x| | |x| | |x| | |x| | 704+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 705| *JIT* | NO | | | **?** | **?** | **?** | **?** | **?** | 706+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 707| *obj* | NO | | | **?** | **?** | **?** | **?** | **?** | 708+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 709| live | NO | | | **?** | **?** | **?** | **?** | **?** | 710| analysis | | | | | | | | | 711+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 712| register | NO | | | **?** | **?** | **?** | **?** | **?** | 713| map | | | | | | | | | 714+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 715| \* Derived pointers only pose a hasard to copying collections. | 716+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 717| **?** denotes a feature which could be utilized if available. | 718+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 719 720To be clear, the collection techniques above are defined as: 721 722Shadow Stack 723 The mutator carefully maintains a linked list of stack roots. 724 725Reference Counting 726 The mutator maintains a reference count for each object and frees an object 727 when its count falls to zero. 728 729Mark-Sweep 730 When the heap is exhausted, the collector marks reachable objects starting 731 from the roots, then deallocates unreachable objects in a sweep phase. 732 733Copying 734 As reachability analysis proceeds, the collector copies objects from one heap 735 area to another, compacting them in the process. Copying collectors enable 736 highly efficient "bump pointer" allocation and can improve locality of 737 reference. 738 739Incremental 740 (Including generational collectors.) Incremental collectors generally have all 741 the properties of a copying collector (regardless of whether the mature heap 742 is compacting), but bring the added complexity of requiring write barriers. 743 744Threaded 745 Denotes a multithreaded mutator; the collector must still stop the mutator 746 ("stop the world") before beginning reachability analysis. Stopping a 747 multithreaded mutator is a complicated problem. It generally requires highly 748 platform-specific code in the runtime, and the production of carefully 749 designed machine code at safe points. 750 751Concurrent 752 In this technique, the mutator and the collector run concurrently, with the 753 goal of eliminating pause times. In a *cooperative* collector, the mutator 754 further aids with collection should a pause occur, allowing collection to take 755 advantage of multiprocessor hosts. The "stop the world" problem of threaded 756 collectors is generally still present to a limited extent. Sophisticated 757 marking algorithms are necessary. Read barriers may be necessary. 758 759As the matrix indicates, LLVM's garbage collection infrastructure is already 760suitable for a wide variety of collectors, but does not currently extend to 761multithreaded programs. This will be added in the future as there is 762interest. 763 764.. _stack-map: 765 766Computing stack maps 767-------------------- 768 769LLVM automatically computes a stack map. One of the most important features 770of a ``GCStrategy`` is to compile this information into the executable in 771the binary representation expected by the runtime library. 772 773The stack map consists of the location and identity of each GC root in the 774each function in the module. For each root: 775 776* ``RootNum``: The index of the root. 777 778* ``StackOffset``: The offset of the object relative to the frame pointer. 779 780* ``RootMetadata``: The value passed as the ``%metadata`` parameter to the 781 ``@llvm.gcroot`` intrinsic. 782 783Also, for the function as a whole: 784 785* ``getFrameSize()``: The overall size of the function's initial stack frame, 786 not accounting for any dynamic allocation. 787 788* ``roots_size()``: The count of roots in the function. 789 790To access the stack map, use ``GCFunctionMetadata::roots_begin()`` and 791-``end()`` from the :ref:`GCMetadataPrinter <assembly>`: 792 793.. code-block:: c++ 794 795 for (iterator I = begin(), E = end(); I != E; ++I) { 796 GCFunctionInfo *FI = *I; 797 unsigned FrameSize = FI->getFrameSize(); 798 size_t RootCount = FI->roots_size(); 799 800 for (GCFunctionInfo::roots_iterator RI = FI->roots_begin(), 801 RE = FI->roots_end(); 802 RI != RE; ++RI) { 803 int RootNum = RI->Num; 804 int RootStackOffset = RI->StackOffset; 805 Constant *RootMetadata = RI->Metadata; 806 } 807 } 808 809If the ``llvm.gcroot`` intrinsic is eliminated before code generation by a 810custom lowering pass, LLVM will compute an empty stack map. This may be useful 811for collector plugins which implement reference counting or a shadow stack. 812 813.. _init-roots: 814 815Initializing roots to null: ``InitRoots`` 816----------------------------------------- 817 818.. code-block:: c++ 819 820 MyGC::MyGC() { 821 InitRoots = true; 822 } 823 824When set, LLVM will automatically initialize each root to ``null`` upon entry to 825the function. This prevents the GC's sweep phase from visiting uninitialized 826pointers, which will almost certainly cause it to crash. This initialization 827occurs before custom lowering, so the two may be used together. 828 829Since LLVM does not yet compute liveness information, there is no means of 830distinguishing an uninitialized stack root from an initialized one. Therefore, 831this feature should be used by all GC plugins. It is enabled by default. 832 833Custom lowering of intrinsics: ``CustomRoots``, ``CustomReadBarriers``, and ``CustomWriteBarriers`` 834--------------------------------------------------------------------------------------------------- 835 836For GCs which use barriers or unusual treatment of stack roots, these 837flags allow the collector to perform arbitrary transformations of the 838LLVM IR: 839 840.. code-block:: c++ 841 842 class MyGC : public GCStrategy { 843 public: 844 MyGC() { 845 CustomRoots = true; 846 CustomReadBarriers = true; 847 CustomWriteBarriers = true; 848 } 849 }; 850 851If any of these flags are set, LLVM suppresses its default lowering for 852the corresponding intrinsics. Instead, you must provide a custom Pass 853which lowers the intrinsics as desired. If you have opted in to custom 854lowering of a particular intrinsic your pass **must** eliminate all 855instances of the corresponding intrinsic in functions which opt in to 856your GC. The best example of such a pass is the ShadowStackGC and it's 857ShadowStackGCLowering pass. 858 859There is currently no way to register such a custom lowering pass 860without building a custom copy of LLVM. 861 862.. _safe-points: 863 864Generating safe points: ``NeededSafePoints`` 865-------------------------------------------- 866 867LLVM can compute four kinds of safe points: 868 869.. code-block:: c++ 870 871 namespace GC { 872 /// PointKind - The type of a collector-safe point. 873 /// 874 enum PointKind { 875 Loop, //< Instr is a loop (backwards branch). 876 Return, //< Instr is a return instruction. 877 PreCall, //< Instr is a call instruction. 878 PostCall //< Instr is the return address of a call. 879 }; 880 } 881 882A collector can request any combination of the four by setting the 883``NeededSafePoints`` mask: 884 885.. code-block:: c++ 886 887 MyGC::MyGC() { 888 NeededSafePoints = 1 << GC::Loop 889 | 1 << GC::Return 890 | 1 << GC::PreCall 891 | 1 << GC::PostCall; 892 } 893 894It can then use the following routines to access safe points. 895 896.. code-block:: c++ 897 898 for (iterator I = begin(), E = end(); I != E; ++I) { 899 GCFunctionInfo *MD = *I; 900 size_t PointCount = MD->size(); 901 902 for (GCFunctionInfo::iterator PI = MD->begin(), 903 PE = MD->end(); PI != PE; ++PI) { 904 GC::PointKind PointKind = PI->Kind; 905 unsigned PointNum = PI->Num; 906 } 907 } 908 909Almost every collector requires ``PostCall`` safe points, since these correspond 910to the moments when the function is suspended during a call to a subroutine. 911 912Threaded programs generally require ``Loop`` safe points to guarantee that the 913application will reach a safe point within a bounded amount of time, even if it 914is executing a long-running loop which contains no function calls. 915 916Threaded collectors may also require ``Return`` and ``PreCall`` safe points to 917implement "stop the world" techniques using self-modifying code, where it is 918important that the program not exit the function without reaching a safe point 919(because only the topmost function has been patched). 920 921.. _assembly: 922 923Emitting assembly code: ``GCMetadataPrinter`` 924--------------------------------------------- 925 926LLVM allows a plugin to print arbitrary assembly code before and after the rest 927of a module's assembly code. At the end of the module, the GC can compile the 928LLVM stack map into assembly code. (At the beginning, this information is not 929yet computed.) 930 931Since AsmWriter and CodeGen are separate components of LLVM, a separate abstract 932base class and registry is provided for printing assembly code, the 933``GCMetadaPrinter`` and ``GCMetadataPrinterRegistry``. The AsmWriter will look 934for such a subclass if the ``GCStrategy`` sets ``UsesMetadata``: 935 936.. code-block:: c++ 937 938 MyGC::MyGC() { 939 UsesMetadata = true; 940 } 941 942This separation allows JIT-only clients to be smaller. 943 944Note that LLVM does not currently have analogous APIs to support code generation 945in the JIT, nor using the object writers. 946 947.. code-block:: c++ 948 949 // lib/MyGC/MyGCPrinter.cpp - Example LLVM GC printer 950 951 #include "llvm/CodeGen/GCMetadataPrinter.h" 952 #include "llvm/Support/Compiler.h" 953 954 using namespace llvm; 955 956 namespace { 957 class LLVM_LIBRARY_VISIBILITY MyGCPrinter : public GCMetadataPrinter { 958 public: 959 virtual void beginAssembly(AsmPrinter &AP); 960 961 virtual void finishAssembly(AsmPrinter &AP); 962 }; 963 964 GCMetadataPrinterRegistry::Add<MyGCPrinter> 965 X("mygc", "My bespoke garbage collector."); 966 } 967 968The collector should use ``AsmPrinter`` to print portable assembly code. The 969collector itself contains the stack map for the entire module, and may access 970the ``GCFunctionInfo`` using its own ``begin()`` and ``end()`` methods. Here's 971a realistic example: 972 973.. code-block:: c++ 974 975 #include "llvm/CodeGen/AsmPrinter.h" 976 #include "llvm/IR/Function.h" 977 #include "llvm/IR/DataLayout.h" 978 #include "llvm/Target/TargetAsmInfo.h" 979 #include "llvm/Target/TargetMachine.h" 980 981 void MyGCPrinter::beginAssembly(AsmPrinter &AP) { 982 // Nothing to do. 983 } 984 985 void MyGCPrinter::finishAssembly(AsmPrinter &AP) { 986 MCStreamer &OS = AP.OutStreamer; 987 unsigned IntPtrSize = AP.TM.getSubtargetImpl()->getDataLayout()->getPointerSize(); 988 989 // Put this in the data section. 990 OS.SwitchSection(AP.getObjFileLowering().getDataSection()); 991 992 // For each function... 993 for (iterator FI = begin(), FE = end(); FI != FE; ++FI) { 994 GCFunctionInfo &MD = **FI; 995 996 // A compact GC layout. Emit this data structure: 997 // 998 // struct { 999 // int32_t PointCount; 1000 // void *SafePointAddress[PointCount]; 1001 // int32_t StackFrameSize; // in words 1002 // int32_t StackArity; 1003 // int32_t LiveCount; 1004 // int32_t LiveOffsets[LiveCount]; 1005 // } __gcmap_<FUNCTIONNAME>; 1006 1007 // Align to address width. 1008 AP.EmitAlignment(IntPtrSize == 4 ? 2 : 3); 1009 1010 // Emit PointCount. 1011 OS.AddComment("safe point count"); 1012 AP.EmitInt32(MD.size()); 1013 1014 // And each safe point... 1015 for (GCFunctionInfo::iterator PI = MD.begin(), 1016 PE = MD.end(); PI != PE; ++PI) { 1017 // Emit the address of the safe point. 1018 OS.AddComment("safe point address"); 1019 MCSymbol *Label = PI->Label; 1020 AP.EmitLabelPlusOffset(Label/*Hi*/, 0/*Offset*/, 4/*Size*/); 1021 } 1022 1023 // Stack information never change in safe points! Only print info from the 1024 // first call-site. 1025 GCFunctionInfo::iterator PI = MD.begin(); 1026 1027 // Emit the stack frame size. 1028 OS.AddComment("stack frame size (in words)"); 1029 AP.EmitInt32(MD.getFrameSize() / IntPtrSize); 1030 1031 // Emit stack arity, i.e. the number of stacked arguments. 1032 unsigned RegisteredArgs = IntPtrSize == 4 ? 5 : 6; 1033 unsigned StackArity = MD.getFunction().arg_size() > RegisteredArgs ? 1034 MD.getFunction().arg_size() - RegisteredArgs : 0; 1035 OS.AddComment("stack arity"); 1036 AP.EmitInt32(StackArity); 1037 1038 // Emit the number of live roots in the function. 1039 OS.AddComment("live root count"); 1040 AP.EmitInt32(MD.live_size(PI)); 1041 1042 // And for each live root... 1043 for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI), 1044 LE = MD.live_end(PI); 1045 LI != LE; ++LI) { 1046 // Emit live root's offset within the stack frame. 1047 OS.AddComment("stack index (offset / wordsize)"); 1048 AP.EmitInt32(LI->StackOffset); 1049 } 1050 } 1051 } 1052 1053References 1054========== 1055 1056.. _appel89: 1057 1058[Appel89] Runtime Tags Aren't Necessary. Andrew W. Appel. Lisp and Symbolic 1059Computation 19(7):703-705, July 1989. 1060 1061.. _goldberg91: 1062 1063[Goldberg91] Tag-free garbage collection for strongly typed programming 1064languages. Benjamin Goldberg. ACM SIGPLAN PLDI'91. 1065 1066.. _tolmach94: 1067 1068[Tolmach94] Tag-free garbage collection using explicit type parameters. Andrew 1069Tolmach. Proceedings of the 1994 ACM conference on LISP and functional 1070programming. 1071 1072.. _henderson02: 1073 1074[Henderson2002] `Accurate Garbage Collection in an Uncooperative Environment 1075<http://citeseer.ist.psu.edu/henderson02accurate.html>`__ 1076