1<!--===- docs/RuntimeDescriptor.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9# Runtime Descriptors
10
11```eval_rst
12.. contents::
13   :local:
14```
15
16## Concept
17The properties that characterize data values and objects in Fortran
18programs must sometimes be materialized when the program runs.
19
20Some properties are known during compilation and constant during
21execution, yet must be reified anyway for execution in order to
22drive the interfaces of a language support library or the mandated
23interfaces of interoperable (i.e., C) procedure calls.
24
25Note that many Fortran intrinsic subprograms have interfaces
26that are more flexible and generic than actual Fortran subprograms
27can be, so properties that must be known during compilation and
28are constant during execution may still need to be materialized
29for calls to the library, even if only by modifying names to
30distinguish types or their kind specializations.
31
32Other properties are deferred to execution, and need to be represented
33to serve the needs of compiled code and the run time support library.
34
35Previous implementations of Fortran have typically defined a small
36sheaf of _descriptor_ data structures for this purpose, and attached
37these descriptors as additional hidden arguments, type components,
38and local variables so as to convey dynamic characteristics between
39subprograms and between user code and the run-time support library.
40
41### References
42References are to the 12-2017 draft of the Fortran 2018 standard
43(N2146).
44
45Section 15.4.2.2 can be interpreted as a decent list of things that
46might need descriptors or other hidden state passed across a
47subprogram call, since such features (apart from assumed-length
48`CHARACTER` function results) trigger a requirement for the
49subprogram to have an explicit interface visible to their callers.
50
51Section 15.5.2 has good laundry lists of situations that can arise
52across subprogram call boundaries.
53
54## A survey of dynamic characteristics
55
56### Length of assumed-length `CHARACTER` function results (B.3.6)
57```
58CHARACTER*8 :: FOO
59PRINT *, FOO('abcdefghijklmnopqrstuvwxyz')
60...
61CHARACTER*(*) FUNCTION FOO(STR)
62  CHARACTER*26 STR
63  FOO=STR
64END
65```
66
67prints `abcdefgh` because the length parameter of the character type
68of the result of `FOO` is passed across the call -- even in the absence
69of an explicit interface!
70
71### Assumed length type parameters (7.2)
72Dummy arguments and associate names for `SELECT TYPE` can have assumed length
73type parameters, which are denoted by asterisks (not colons).
74Their values come from actual arguments or the associated expression (resp.).
75
76### Explicit-shape arrays (8.5.8.2)
77The expressions used for lower and upper bounds must be captured and remain
78invariant over the scope of an array, even if they contain references to
79variables that are later modified.
80
81Explicit-shape arrays can be dummy arguments, "adjustable" local variables,
82and components of derived type (using specification expressions in terms
83of constants and KIND type parameters).
84
85### Leading dimensions of assumed-size arrays (8.5.8.5)
86```
87SUBROUTINE BAR(A)
88  REAL A(2,3,*)
89END
90```
91The total size and final dimension's extent do not constitute dynamic
92properties.
93The called subprogram has no means to extract the extent of the
94last (major) dimension, and may not depend upon it implicitly by using
95the array in any context that demands a known shape.
96
97The values of the expressions used as the bounds of the dimensions
98that appear prior to
99the last dimension are, however, effectively captured on entry to the
100subprogram, and remain invariant even if the variables that appear in
101those expressions have their values modified later.
102This is similar to the requirements for an explicit-shape array.
103
104### Some function results
1051. Deferred-shape
1062. Deferred length type parameter values
1073. Stride information for `POINTER` results
108
109Note that while function result variables can have the `ALLOCATABLE`
110attribute, the function itself and the value returned to the caller
111do not possess the attribute.
112
113### Assumed-shape arrays
114The extents of the dimensions of assumed-shape dummy argument arrays
115are conveyed from those of the actual effective arguments.
116The bounds, however, are not.  The called subprogram can define the
117lower bound to be a value other than 1, but that is a local effect
118only.
119
120### Deferred-shape arrays
121The extents and bounds of `POINTER` and `ALLOCATABLE` arrays are
122established by pointer assignments and `ALLOCATE` statements.
123Note that dummy arguments and function results that are `POINTER`
124or `ALLOCATABLE` can be deferred-shape, not assumed-shape -- one cannot
125supply a lower bound expression as a local effect.
126
127### Strides
128Some arrays can have discontiguous (or negative) strides.
129These include assumed-shape dummy arguments and deferred-shape
130`POINTER` variables, components, and function results.
131
132Fortran disallows some conceivable cases that might otherwise
133require implied strides, such as passing an array of an extended
134derived type as an actual argument that corresponds to a
135nonpolymorphic dummy array of a base type, or the similar
136case of pointer assignment to a base of an extended derived type.
137
138Other arrays, including `ALLOCATABLE`, can be assured to
139be contiguous, and do not necessarily need to manage or
140convey dynamic stride information.
141`CONTIGUOUS` dummy arguments and `POINTER` arrays need not
142record stride information either.
143(The standard notes that a `CONTIGUOUS POINTER` occupies a
144number of storage units that is distinct from that required
145to hold a non-`CONTIGUOUS` pointer.)
146
147Note that Fortran distinguishes the `CONTIGUOUS` attribute from
148the concept of being known or required to be _simply contiguous_ (9.5.4),
149which includes `CONTIGUOUS` entities as well as many others, and
150the concept of actually _being_ contiguous (8.5.7) during execution.
151I believe that the property of being simply contiguous implies
152that an entity is known at compilation time to not require the
153use or maintenance of hidden stride values.
154
155### Derived type component initializers
156Fortran allows components of derived types to be declared with
157initial values that are to be assigned to the components when an
158instance of the derived type is created.
159These include `ALLOCATABLE` components, which are always initialized
160to a deallocated state.
161
162These can be implemented with constructor subroutines, inline
163stores or block copies from static initializer blocks, or a sequence
164of sparse offset/size/value component initializers to be emplaced
165by the run-time library.
166
167N.B. Fortran allows kind type parameters to appear in component
168initialization constant expressions, but not length type parameters,
169so the initialization values are constants.
170
171N.B. Initialization is not assignment, and cannot be implemented
172with assignments to uninitialized derived type instances from
173static constant initializers.
174
175### Polymorphic `CLASS()`, `CLASS(*)`, and `TYPE(*)`
176Type identification for `SELECT TYPE`.
177Default initializers (see above).
178Offset locations of `ALLOCATABLE` and polymorphic components.
179Presence of `FINAL` procedures.
180Mappings to overridable type-bound specific procedures.
181
182### Deferred length type parameters
183Derived types with length type parameters, and `CHARACTER`, may be used
184with the values of those parameters deferred to execution.
185Their actual values must be maintained as characteristics of the dynamic
186type that is associated with a value or object
187.
188A single copy of the deferred length type parameters suffices for
189all of the elements of an array of that parameterized derived type.
190
191### Components whose types and/or shape depends on length type parameters
192Non-pointer, non-allocatable components whose types or shapes are expressed
193in terms of length type parameters will probably have to be implemented as
194if they had deferred type and/or shape and were `ALLOCATABLE`.
195The derived type instance constructor must allocate them and possibly
196initialize them; the instance destructor must deallocate them.
197
198### Assumed rank arrays
199Rank is almost always known at compilation time and would be redundant
200in most circumstances if also managed dynamically.
201`DIMENSION(..)` dummy arguments (8.5.8.7), however, are a recent feature
202with which the rank of a whole array is dynamic outside the cases of
203a `SELECT RANK` construct.
204
205The lower bounds of the dimensions of assumed rank arrays
206are always 1.
207
208### Cached invariant subexpressions for addressing
209Implementations of Fortran have often maintained precalculated integer
210values to accelerate subscript computations.
211For example, given `REAL*8 :: A(2:4,3:5)`, the data reference `A(I,J)`
212resolves to something like `&A + 8*((I-2)+3*(J-3))`, and this can be
213effectively reassociated to `&A - 88 + 8*I + 24*J`
214or `&A - 88 + 8*(I + 3*J)`.
215When the offset term and coefficients are not compile-time constants,
216they are at least invariant and can be precomputed.
217
218In the cases of dummy argument arrays, `POINTER`, and `ALLOCATABLE`,
219these addressing invariants could be managed alongside other dynamic
220information like deferred extents and lower bounds to avoid their
221recalculation.
222It's not clear that it's worth the trouble to do so, since the
223expressions are invariant and cheap.
224
225### Coarray state (8.5.6)
226A _coarray_ is an `ALLOCATABLE` variable or component, or statically
227allocated variable (`SAVE` attribute explicit or implied), or dummy
228argument whose ultimate effective argument is one of such things.
229
230Each image in a team maintains its portion of each coarray and can
231access those portions of the coarray that are maintained by other images
232in the team.
233Allocations and deallocations are synchronization events at which
234the several images can exchange whatever information is needed by
235the underlying intercommunication interface to access the data
236of their peers.
237(Strictly speaking, an implementation could synchronize
238images at allocations and deallocations with simple barriers, and defer
239the communication of remote access information until it is needed for a
240given coarray on a given image, so long as it could be acquired in a
241"one-sided" fashion.)
242
243### Presence of `OPTIONAL` dummy arguments
244Typically indicated with null argument addresses.
245Note that `POINTER` and `ALLOCATABLE` objects can be passed to
246non-`POINTER` non-`ALLOCATABLE` dummy arguments, and their
247association or allocation status (resp.) determines the presence
248of the dummy argument.
249
250### Stronger contiguity enforcement or indication
251Some implementations of Fortran guarantee that dummy argument arrays
252are, or have been made to be, contiguous on one or more dimensions
253when the language does not require them to be so (8.5.7 p2).
254Others pass a flag to identify contiguous arrays (or could pass the
255number of contiguous leading dimensions, although I know of no such
256implementation) so that optimizing transformations that depend on
257contiguity can be made conditional with multiple-version code generation
258and selected during execution.
259
260In the absence of a contiguity guarantee or flag, the called side
261would have to determine contiguity dynamically, if it cares,
262by calculating addresses of elements in the array whose subscripts
263differ by exactly 1 on exactly 1 dimension of interest, and checking
264whether that difference exactly matches the byte size of the type times
265the product of the extents of any prior dimensions.
266
267### Host instances for dummy procedures and procedure pointers
268A static link or other means of accessing the imported state of the
269host procedure must be available when an internal procedure is
270used as an actual argument or as a pointer assignment target.
271
272### Alternate returns
273Subroutines (only) with alternate return arguments need a
274means, such as the otherwise unused function return value, by which
275to distinguish and identify the use of an alternate `RETURN` statement.
276The protocol can be a simple nonzero integer that drives a switch
277in the caller, or the caller can pass multiple return addresses as
278arguments for the callee to substitute on the stack for the original
279return address in the event of an alternate `RETURN`.
280
281## Implementation options
282
283### A note on array descriptions
284Some arrays require dynamic management of distinct combinations of
285values per dimension.
286
287One can extract the extent on a dimension from its bounds, or extract
288the upper bound from the extent and the lower bound.  Having distinct
289extent and upper bound would be redundant.
290
291Contiguous arrays can assume a stride of 1 on each dimension.
292
293Assumed-shape and assumed-size dummy argument arrays need not convey
294lower bounds.
295
296So there are examples of dimensions with
297 * extent only (== upper bound): `CONTIGUOUS` assumed-shape, explict shape and multidimensional assumed-size with constant lower bound
298 * lower bound and either extent or upper bound: `ALLOCATABLE`, `CONTIGUOUS` `POINTER`, general explicit-shape and multidimensional assumed-size
299 * extent (== upper bound) and stride: general (non-`CONTIGUOUS`) assumed-shape
300 * lower bound, stride, and either extent or upper bound: general (non-`CONTIGUOUS`) `POINTER`, assumed-rank
301
302and these cases could be accompanied by precomputed invariant
303addressing subexpressions to accelerate indexing calculations.
304
305### Interoperability requirements
306
307Fortran 2018 requires that a Fortran implementation supply a header file
308`ISO_Fortran_binding.h` for use in C and C++ programs that defines and
309implements an interface to Fortran objects from the _interoperable_
310subset of Fortran objects and their types suitable for use when those
311objects are passed to C functions.
312This interface mandates a fat descriptor that is passed by address,
313containing (at least)
314 * a data base address
315 * explicit rank and type
316 * flags to distinguish `POINTER` and `ALLOCATABLE`
317 * elemental byte size, and
318 * (per-dimension) lower bound, extent, and byte stride
319
320The requirements on the interoperability API do not mandate any
321support for features like derived type component initialization,
322automatic deallocation of `ALLOCATABLE` components, finalization,
323derived type parameters, data contiguity flags, &c.
324But neither does the Standard preclude inclusion of additional
325interfaces to describe and support such things.
326
327Given a desire to fully support the Fortran 2018 language, we need
328to either support the interoperability requirements as a distinct
329specialization of the procedure call protocol, or use the
330`ISO_Fortran_binding.h` header file requirements as a subset basis for a
331complete implementation that adds representations for all the
332missing capabilities, which would be isolated and named so as
333to prevent user C code from relying upon them.
334
335### Design space
336There is a range of possible options for representing the
337properties of values and objects during the execution of Fortran
338programs.
339
340At one extreme, the amount of dynamic information is minimized,
341and is packaged in custom data structures or additional arguments
342for each situation to convey only the values that are unknown at
343compilation time and actually needed at execution time.
344
345At the other extreme, data values and objects are described completely,
346including even the values of properties are known at compilation time.
347This is not as silly as it sounds -- e.g., Fortran array descriptors
348have historically materialized the number of dimensions they cover, even
349though rank will be (nearly) always be a known constant during compilation.
350
351When data are packaged, their containers can be self-describing to
352some degree.
353Description records can have tag values or strings.
354Their fields can have presence flags or identifying tags, and fields
355need not have fixed offsets or ordering.
356This flexibility can increase binary compatibility across revisions
357of the run-time support library, and is convenient for debugging
358that library.
359However, it is not free.
360
361Further, the requirements of the representation of dynamic
362properties of values and objects depend on the execution model:
363specifically, are the complicated semantics of intrinsic assignment,
364deallocation, and finalization of allocatables implemented entirely
365in the support library, in generated code for non-recursive cases,
366or by means of a combination of the two approaches?
367
368Consider how to implement the following:
369```
370TYPE :: LIST
371  REAL :: HEAD
372  TYPE(LIST), ALLOCATABLE :: REST
373END TYPE LIST
374TYPE(LIST), ALLOCATABLE :: A, B
375...
376A = B
377```
378
379Fortran requires that `A`'s arbitrary-length linked list be deleted and
380replaced with a "deep copy" of `B`'s.
381So either a complicated pair of loops must be generated by the compiler,
382or a sophisticated run time support library needs to be driven with
383an expressive representation of type information.
384
385## Proposal
386We need to write `ISO_Fortran_binding.h` in any event.
387It is a header that is published for use in user C code for interoperation
388with compiled Fortran and the Fortran run time support library.
389
390There is a sole descriptor structure defined in `ISO_Fortran_binding.h`.
391It is suitable for characterizing scalars and array sections of intrinsic
392types.
393It is essentially a "fat" data pointer that encapsulates a raw data pointer,
394a type code, rank, elemental byte size, and per-dimension bounds and stride.
395
396Please note that the mandated interoperable descriptor includes the data
397pointer.
398This design in the Standard precludes the use of static descriptors that
399could be associated with dynamic base addresses.
400
401The F18 runtime cannot use just the mandated interoperable
402`struct CFI_cdesc_t` argument descriptor structure as its
403all-purpose data descriptor.
404It has no information about derived type components, overridable
405type-bound procedure bindings, type parameters, &c.
406
407However, we could extend the standard interoperable argument descriptor.
408The `struct CFI_cdesc_t` structure is not of fixed size, but we
409can efficiently locate the first address after an instance of the
410standard descriptor and attach our own data record there to
411hold what we need.
412There's at least one unused padding byte in the standard argument
413descriptor that can be used to hold a flag indicating the presence
414of the addenda.
415
416The definitions of our additional run time data structures must
417appear in a header file that is distinct from `ISO_Fortran_binding.h`,
418and they should never be used by user applications.
419
420This expanded descriptor structure can serve, at least initially for
421simplicity, as the sole representation of `POINTER` variables and
422components, `ALLOCATABLE` variables and components, and derived type
423instances, including length parameter values.
424
425An immediate concern with this concept is the amount of space and
426initialization time that would be wasted when derived type components
427needing a descriptor would have to be accompanied by an instance
428of the general descriptor.
429(In the linked list example close above, what could be done with a
430single pointer for the `REST` component would become at least
431a four-word dynamic structure.)
432This concern is amplified when derived type instances
433are allocated as arrays, since the overhead is per-element.
434
435We can reduce this wastage in two ways.
436First, when the content of the component's descriptor is constant
437at compilation apart from its base address, a static descriptor
438can be placed in read-only storage and attached to the description
439of the derived type's components.
440Second, we could eventually optimize the storage requirements by
441omitting all static fields from the dynamic descriptor, and
442expand the compressed dynamic descriptor during execution when
443needed.
444