1========================================== 2Design and Usage of the InAlloca Attribute 3========================================== 4 5Introduction 6============ 7 8The :ref:`inalloca <attr_inalloca>` attribute is designed to allow 9taking the address of an aggregate argument that is being passed by 10value through memory. Primarily, this feature is required for 11compatibility with the Microsoft C++ ABI. Under that ABI, class 12instances that are passed by value are constructed directly into 13argument stack memory. Prior to the addition of inalloca, calls in LLVM 14were indivisible instructions. There was no way to perform intermediate 15work, such as object construction, between the first stack adjustment 16and the final control transfer. With inalloca, all arguments passed in 17memory are modelled as a single alloca, which can be stored to prior to 18the call. Unfortunately, this complicated feature comes with a large 19set of restrictions designed to bound the lifetime of the argument 20memory around the call. 21 22For now, it is recommended that frontends and optimizers avoid producing 23this construct, primarily because it forces the use of a base pointer. 24This feature may grow in the future to allow general mid-level 25optimization, but for now, it should be regarded as less efficient than 26passing by value with a copy. 27 28Intended Usage 29============== 30 31The example below is the intended LLVM IR lowering for some C++ code 32that passes two default-constructed ``Foo`` objects to ``g`` in the 3332-bit Microsoft C++ ABI. 34 35.. code-block:: c++ 36 37 // Foo is non-trivial. 38 struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); }; 39 void g(Foo a, Foo b); 40 void f() { 41 g(Foo(), Foo()); 42 } 43 44.. code-block:: llvm 45 46 %struct.Foo = type { i32, i32 } 47 declare void @Foo_ctor(%struct.Foo* %this) 48 declare void @Foo_dtor(%struct.Foo* %this) 49 declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) 50 51 define void @f() { 52 entry: 53 %base = call i8* @llvm.stacksave() 54 %memargs = alloca <{ %struct.Foo, %struct.Foo }> 55 %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1 56 call void @Foo_ctor(%struct.Foo* %b) 57 58 ; If a's ctor throws, we must destruct b. 59 %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0 60 invoke void @Foo_ctor(%struct.Foo* %a) 61 to label %invoke.cont unwind %invoke.unwind 62 63 invoke.cont: 64 call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) 65 call void @llvm.stackrestore(i8* %base) 66 ... 67 68 invoke.unwind: 69 call void @Foo_dtor(%struct.Foo* %b) 70 call void @llvm.stackrestore(i8* %base) 71 ... 72 } 73 74To avoid stack leaks, the frontend saves the current stack pointer with 75a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the 76argument stack space with alloca and calls the default constructor. The 77default constructor could throw an exception, so the frontend has to 78create a landing pad. The frontend has to destroy the already 79constructed argument ``b`` before restoring the stack pointer. If the 80constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI, 81``g`` will destroy its arguments, and then the stack is restored in 82``f``. 83 84Design Considerations 85===================== 86 87Lifetime 88-------- 89 90The biggest design consideration for this feature is object lifetime. 91We cannot model the arguments as static allocas in the entry block, 92because all calls need to use the memory at the top of the stack to pass 93arguments. We cannot vend pointers to that memory at function entry 94because after code generation they will alias. 95 96The rule against allocas between argument allocations and the call site 97avoids this problem, but it creates a cleanup problem. Cleanup and 98lifetime is handled explicitly with stack save and restore calls. In 99the future, we may want to introduce a new construct such as ``freea`` 100or ``afree`` to make it clear that this stack adjusting cleanup is less 101powerful than a full stack save and restore. 102 103Nested Calls and Copy Elision 104----------------------------- 105 106We also want to be able to support copy elision into these argument 107slots. This means we have to support multiple live argument 108allocations. 109 110Consider the evaluation of: 111 112.. code-block:: c++ 113 114 // Foo is non-trivial. 115 struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); }; 116 Foo bar(Foo b); 117 int main() { 118 bar(bar(Foo())); 119 } 120 121In this case, we want to be able to elide copies into ``bar``'s argument 122slots. That means we need to have more than one set of argument frames 123active at the same time. First, we need to allocate the frame for the 124outer call so we can pass it in as the hidden struct return pointer to 125the middle call. Then we do the same for the middle call, allocating a 126frame and passing its address to ``Foo``'s default constructor. By 127wrapping the evaluation of the inner ``bar`` with stack save and 128restore, we can have multiple overlapping active call frames. 129 130Callee-cleanup Calling Conventions 131---------------------------------- 132 133Another wrinkle is the existence of callee-cleanup conventions. On 134Windows, all methods and many other functions adjust the stack to clear 135the memory used to pass their arguments. In some sense, this means that 136the allocas are automatically cleared by the call. However, LLVM 137instead models this as a write of undef to all of the inalloca values 138passed to the call instead of a stack adjustment. Frontends should 139still restore the stack pointer to avoid a stack leak. 140 141Exceptions 142---------- 143 144There is also the possibility of an exception. If argument evaluation 145or copy construction throws an exception, the landing pad must do 146cleanup, which includes adjusting the stack pointer to avoid a stack 147leak. This means the cleanup of the stack memory cannot be tied to the 148call itself. There needs to be a separate IR-level instruction that can 149perform independent cleanup of arguments. 150 151Efficiency 152---------- 153 154Eventually, it should be possible to generate efficient code for this 155construct. In particular, using inalloca should not require a base 156pointer. If the backend can prove that all points in the CFG only have 157one possible stack level, then it can address the stack directly from 158the stack pointer. While this is not yet implemented, the plan is that 159the inalloca attribute should not change much, but the frontend IR 160generation recommendations may change. 161