1==========================================
2Design and Usage of the InAlloca Attribute
3==========================================
4
5Introduction
6============
7
8The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
9taking the address of an aggregate argument that is being passed by
10value through memory.  Primarily, this feature is required for
11compatibility with the Microsoft C++ ABI.  Under that ABI, class
12instances that are passed by value are constructed directly into
13argument stack memory.  Prior to the addition of inalloca, calls in LLVM
14were indivisible instructions.  There was no way to perform intermediate
15work, such as object construction, between the first stack adjustment
16and the final control transfer.  With inalloca, all arguments passed in
17memory are modelled as a single alloca, which can be stored to prior to
18the call.  Unfortunately, this complicated feature comes with a large
19set of restrictions designed to bound the lifetime of the argument
20memory around the call.
21
22For now, it is recommended that frontends and optimizers avoid producing
23this construct, primarily because it forces the use of a base pointer.
24This feature may grow in the future to allow general mid-level
25optimization, but for now, it should be regarded as less efficient than
26passing by value with a copy.
27
28Intended Usage
29==============
30
31The example below is the intended LLVM IR lowering for some C++ code
32that passes two default-constructed ``Foo`` objects to ``g`` in the
3332-bit Microsoft C++ ABI.
34
35.. code-block:: c++
36
37    // Foo is non-trivial.
38    struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
39    void g(Foo a, Foo b);
40    void f() {
41      g(Foo(), Foo());
42    }
43
44.. code-block:: llvm
45
46    %struct.Foo = type { i32, i32 }
47    declare void @Foo_ctor(%struct.Foo* %this)
48    declare void @Foo_dtor(%struct.Foo* %this)
49    declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
50
51    define void @f() {
52    entry:
53      %base = call i8* @llvm.stacksave()
54      %memargs = alloca <{ %struct.Foo, %struct.Foo }>
55      %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
56      call void @Foo_ctor(%struct.Foo* %b)
57
58      ; If a's ctor throws, we must destruct b.
59      %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
60      invoke void @Foo_ctor(%struct.Foo* %a)
61          to label %invoke.cont unwind %invoke.unwind
62
63    invoke.cont:
64      call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
65      call void @llvm.stackrestore(i8* %base)
66      ...
67
68    invoke.unwind:
69      call void @Foo_dtor(%struct.Foo* %b)
70      call void @llvm.stackrestore(i8* %base)
71      ...
72    }
73
74To avoid stack leaks, the frontend saves the current stack pointer with
75a call to :ref:`llvm.stacksave <int_stacksave>`.  Then, it allocates the
76argument stack space with alloca and calls the default constructor.  The
77default constructor could throw an exception, so the frontend has to
78create a landing pad.  The frontend has to destroy the already
79constructed argument ``b`` before restoring the stack pointer.  If the
80constructor does not unwind, ``g`` is called.  In the Microsoft C++ ABI,
81``g`` will destroy its arguments, and then the stack is restored in
82``f``.
83
84Design Considerations
85=====================
86
87Lifetime
88--------
89
90The biggest design consideration for this feature is object lifetime.
91We cannot model the arguments as static allocas in the entry block,
92because all calls need to use the memory at the top of the stack to pass
93arguments.  We cannot vend pointers to that memory at function entry
94because after code generation they will alias.
95
96The rule against allocas between argument allocations and the call site
97avoids this problem, but it creates a cleanup problem.  Cleanup and
98lifetime is handled explicitly with stack save and restore calls.  In
99the future, we may want to introduce a new construct such as ``freea``
100or ``afree`` to make it clear that this stack adjusting cleanup is less
101powerful than a full stack save and restore.
102
103Nested Calls and Copy Elision
104-----------------------------
105
106We also want to be able to support copy elision into these argument
107slots.  This means we have to support multiple live argument
108allocations.
109
110Consider the evaluation of:
111
112.. code-block:: c++
113
114    // Foo is non-trivial.
115    struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
116    Foo bar(Foo b);
117    int main() {
118      bar(bar(Foo()));
119    }
120
121In this case, we want to be able to elide copies into ``bar``'s argument
122slots.  That means we need to have more than one set of argument frames
123active at the same time.  First, we need to allocate the frame for the
124outer call so we can pass it in as the hidden struct return pointer to
125the middle call.  Then we do the same for the middle call, allocating a
126frame and passing its address to ``Foo``'s default constructor.  By
127wrapping the evaluation of the inner ``bar`` with stack save and
128restore, we can have multiple overlapping active call frames.
129
130Callee-cleanup Calling Conventions
131----------------------------------
132
133Another wrinkle is the existence of callee-cleanup conventions.  On
134Windows, all methods and many other functions adjust the stack to clear
135the memory used to pass their arguments.  In some sense, this means that
136the allocas are automatically cleared by the call.  However, LLVM
137instead models this as a write of undef to all of the inalloca values
138passed to the call instead of a stack adjustment.  Frontends should
139still restore the stack pointer to avoid a stack leak.
140
141Exceptions
142----------
143
144There is also the possibility of an exception.  If argument evaluation
145or copy construction throws an exception, the landing pad must do
146cleanup, which includes adjusting the stack pointer to avoid a stack
147leak.  This means the cleanup of the stack memory cannot be tied to the
148call itself.  There needs to be a separate IR-level instruction that can
149perform independent cleanup of arguments.
150
151Efficiency
152----------
153
154Eventually, it should be possible to generate efficient code for this
155construct.  In particular, using inalloca should not require a base
156pointer.  If the backend can prove that all points in the CFG only have
157one possible stack level, then it can address the stack directly from
158the stack pointer.  While this is not yet implemented, the plan is that
159the inalloca attribute should not change much, but the frontend IR
160generation recommendations may change.
161