1<!--===- docs/RuntimeDescriptor.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4 See https://llvm.org/LICENSE.txt for license information. 5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6 7--> 8 9# Runtime Descriptors 10 11```eval_rst 12.. contents:: 13 :local: 14``` 15 16## Concept 17The properties that characterize data values and objects in Fortran 18programs must sometimes be materialized when the program runs. 19 20Some properties are known during compilation and constant during 21execution, yet must be reified anyway for execution in order to 22drive the interfaces of a language support library or the mandated 23interfaces of interoperable (i.e., C) procedure calls. 24 25Note that many Fortran intrinsic subprograms have interfaces 26that are more flexible and generic than actual Fortran subprograms 27can be, so properties that must be known during compilation and 28are constant during execution may still need to be materialized 29for calls to the library, even if only by modifying names to 30distinguish types or their kind specializations. 31 32Other properties are deferred to execution, and need to be represented 33to serve the needs of compiled code and the run time support library. 34 35Previous implementations of Fortran have typically defined a small 36sheaf of _descriptor_ data structures for this purpose, and attached 37these descriptors as additional hidden arguments, type components, 38and local variables so as to convey dynamic characteristics between 39subprograms and between user code and the run-time support library. 40 41### References 42References are to the 12-2017 draft of the Fortran 2018 standard 43(N2146). 44 45Section 15.4.2.2 can be interpreted as a decent list of things that 46might need descriptors or other hidden state passed across a 47subprogram call, since such features (apart from assumed-length 48`CHARACTER` function results) trigger a requirement for the 49subprogram to have an explicit interface visible to their callers. 50 51Section 15.5.2 has good laundry lists of situations that can arise 52across subprogram call boundaries. 53 54## A survey of dynamic characteristics 55 56### Length of assumed-length `CHARACTER` function results (B.3.6) 57``` 58CHARACTER*8 :: FOO 59PRINT *, FOO('abcdefghijklmnopqrstuvwxyz') 60... 61CHARACTER*(*) FUNCTION FOO(STR) 62 CHARACTER*26 STR 63 FOO=STR 64END 65``` 66 67prints `abcdefgh` because the length parameter of the character type 68of the result of `FOO` is passed across the call -- even in the absence 69of an explicit interface! 70 71### Assumed length type parameters (7.2) 72Dummy arguments and associate names for `SELECT TYPE` can have assumed length 73type parameters, which are denoted by asterisks (not colons). 74Their values come from actual arguments or the associated expression (resp.). 75 76### Explicit-shape arrays (8.5.8.2) 77The expressions used for lower and upper bounds must be captured and remain 78invariant over the scope of an array, even if they contain references to 79variables that are later modified. 80 81Explicit-shape arrays can be dummy arguments, "adjustable" local variables, 82and components of derived type (using specification expressions in terms 83of constants and KIND type parameters). 84 85### Leading dimensions of assumed-size arrays (8.5.8.5) 86``` 87SUBROUTINE BAR(A) 88 REAL A(2,3,*) 89END 90``` 91The total size and final dimension's extent do not constitute dynamic 92properties. 93The called subprogram has no means to extract the extent of the 94last (major) dimension, and may not depend upon it implicitly by using 95the array in any context that demands a known shape. 96 97The values of the expressions used as the bounds of the dimensions 98that appear prior to 99the last dimension are, however, effectively captured on entry to the 100subprogram, and remain invariant even if the variables that appear in 101those expressions have their values modified later. 102This is similar to the requirements for an explicit-shape array. 103 104### Some function results 1051. Deferred-shape 1062. Deferred length type parameter values 1073. Stride information for `POINTER` results 108 109Note that while function result variables can have the `ALLOCATABLE` 110attribute, the function itself and the value returned to the caller 111do not possess the attribute. 112 113### Assumed-shape arrays 114The extents of the dimensions of assumed-shape dummy argument arrays 115are conveyed from those of the actual effective arguments. 116The bounds, however, are not. The called subprogram can define the 117lower bound to be a value other than 1, but that is a local effect 118only. 119 120### Deferred-shape arrays 121The extents and bounds of `POINTER` and `ALLOCATABLE` arrays are 122established by pointer assignments and `ALLOCATE` statements. 123Note that dummy arguments and function results that are `POINTER` 124or `ALLOCATABLE` can be deferred-shape, not assumed-shape -- one cannot 125supply a lower bound expression as a local effect. 126 127### Strides 128Some arrays can have discontiguous (or negative) strides. 129These include assumed-shape dummy arguments and deferred-shape 130`POINTER` variables, components, and function results. 131 132Fortran disallows some conceivable cases that might otherwise 133require implied strides, such as passing an array of an extended 134derived type as an actual argument that corresponds to a 135nonpolymorphic dummy array of a base type, or the similar 136case of pointer assignment to a base of an extended derived type. 137 138Other arrays, including `ALLOCATABLE`, can be assured to 139be contiguous, and do not necessarily need to manage or 140convey dynamic stride information. 141`CONTIGUOUS` dummy arguments and `POINTER` arrays need not 142record stride information either. 143(The standard notes that a `CONTIGUOUS POINTER` occupies a 144number of storage units that is distinct from that required 145to hold a non-`CONTIGUOUS` pointer.) 146 147Note that Fortran distinguishes the `CONTIGUOUS` attribute from 148the concept of being known or required to be _simply contiguous_ (9.5.4), 149which includes `CONTIGUOUS` entities as well as many others, and 150the concept of actually _being_ contiguous (8.5.7) during execution. 151I believe that the property of being simply contiguous implies 152that an entity is known at compilation time to not require the 153use or maintenance of hidden stride values. 154 155### Derived type component initializers 156Fortran allows components of derived types to be declared with 157initial values that are to be assigned to the components when an 158instance of the derived type is created. 159These include `ALLOCATABLE` components, which are always initialized 160to a deallocated state. 161 162These can be implemented with constructor subroutines, inline 163stores or block copies from static initializer blocks, or a sequence 164of sparse offset/size/value component initializers to be emplaced 165by the run-time library. 166 167N.B. Fortran allows kind type parameters to appear in component 168initialization constant expressions, but not length type parameters, 169so the initialization values are constants. 170 171N.B. Initialization is not assignment, and cannot be implemented 172with assignments to uninitialized derived type instances from 173static constant initializers. 174 175### Polymorphic `CLASS()`, `CLASS(*)`, and `TYPE(*)` 176Type identification for `SELECT TYPE`. 177Default initializers (see above). 178Offset locations of `ALLOCATABLE` and polymorphic components. 179Presence of `FINAL` procedures. 180Mappings to overridable type-bound specific procedures. 181 182### Deferred length type parameters 183Derived types with length type parameters, and `CHARACTER`, may be used 184with the values of those parameters deferred to execution. 185Their actual values must be maintained as characteristics of the dynamic 186type that is associated with a value or object 187. 188A single copy of the deferred length type parameters suffices for 189all of the elements of an array of that parameterized derived type. 190 191### Components whose types and/or shape depends on length type parameters 192Non-pointer, non-allocatable components whose types or shapes are expressed 193in terms of length type parameters will probably have to be implemented as 194if they had deferred type and/or shape and were `ALLOCATABLE`. 195The derived type instance constructor must allocate them and possibly 196initialize them; the instance destructor must deallocate them. 197 198### Assumed rank arrays 199Rank is almost always known at compilation time and would be redundant 200in most circumstances if also managed dynamically. 201`DIMENSION(..)` dummy arguments (8.5.8.7), however, are a recent feature 202with which the rank of a whole array is dynamic outside the cases of 203a `SELECT RANK` construct. 204 205The lower bounds of the dimensions of assumed rank arrays 206are always 1. 207 208### Cached invariant subexpressions for addressing 209Implementations of Fortran have often maintained precalculated integer 210values to accelerate subscript computations. 211For example, given `REAL*8 :: A(2:4,3:5)`, the data reference `A(I,J)` 212resolves to something like `&A + 8*((I-2)+3*(J-3))`, and this can be 213effectively reassociated to `&A - 88 + 8*I + 24*J` 214or `&A - 88 + 8*(I + 3*J)`. 215When the offset term and coefficients are not compile-time constants, 216they are at least invariant and can be precomputed. 217 218In the cases of dummy argument arrays, `POINTER`, and `ALLOCATABLE`, 219these addressing invariants could be managed alongside other dynamic 220information like deferred extents and lower bounds to avoid their 221recalculation. 222It's not clear that it's worth the trouble to do so, since the 223expressions are invariant and cheap. 224 225### Coarray state (8.5.6) 226A _coarray_ is an `ALLOCATABLE` variable or component, or statically 227allocated variable (`SAVE` attribute explicit or implied), or dummy 228argument whose ultimate effective argument is one of such things. 229 230Each image in a team maintains its portion of each coarray and can 231access those portions of the coarray that are maintained by other images 232in the team. 233Allocations and deallocations are synchronization events at which 234the several images can exchange whatever information is needed by 235the underlying intercommunication interface to access the data 236of their peers. 237(Strictly speaking, an implementation could synchronize 238images at allocations and deallocations with simple barriers, and defer 239the communication of remote access information until it is needed for a 240given coarray on a given image, so long as it could be acquired in a 241"one-sided" fashion.) 242 243### Presence of `OPTIONAL` dummy arguments 244Typically indicated with null argument addresses. 245Note that `POINTER` and `ALLOCATABLE` objects can be passed to 246non-`POINTER` non-`ALLOCATABLE` dummy arguments, and their 247association or allocation status (resp.) determines the presence 248of the dummy argument. 249 250### Stronger contiguity enforcement or indication 251Some implementations of Fortran guarantee that dummy argument arrays 252are, or have been made to be, contiguous on one or more dimensions 253when the language does not require them to be so (8.5.7 p2). 254Others pass a flag to identify contiguous arrays (or could pass the 255number of contiguous leading dimensions, although I know of no such 256implementation) so that optimizing transformations that depend on 257contiguity can be made conditional with multiple-version code generation 258and selected during execution. 259 260In the absence of a contiguity guarantee or flag, the called side 261would have to determine contiguity dynamically, if it cares, 262by calculating addresses of elements in the array whose subscripts 263differ by exactly 1 on exactly 1 dimension of interest, and checking 264whether that difference exactly matches the byte size of the type times 265the product of the extents of any prior dimensions. 266 267### Host instances for dummy procedures and procedure pointers 268A static link or other means of accessing the imported state of the 269host procedure must be available when an internal procedure is 270used as an actual argument or as a pointer assignment target. 271 272### Alternate returns 273Subroutines (only) with alternate return arguments need a 274means, such as the otherwise unused function return value, by which 275to distinguish and identify the use of an alternate `RETURN` statement. 276The protocol can be a simple nonzero integer that drives a switch 277in the caller, or the caller can pass multiple return addresses as 278arguments for the callee to substitute on the stack for the original 279return address in the event of an alternate `RETURN`. 280 281## Implementation options 282 283### A note on array descriptions 284Some arrays require dynamic management of distinct combinations of 285values per dimension. 286 287One can extract the extent on a dimension from its bounds, or extract 288the upper bound from the extent and the lower bound. Having distinct 289extent and upper bound would be redundant. 290 291Contiguous arrays can assume a stride of 1 on each dimension. 292 293Assumed-shape and assumed-size dummy argument arrays need not convey 294lower bounds. 295 296So there are examples of dimensions with 297 * extent only (== upper bound): `CONTIGUOUS` assumed-shape, explict shape and multidimensional assumed-size with constant lower bound 298 * lower bound and either extent or upper bound: `ALLOCATABLE`, `CONTIGUOUS` `POINTER`, general explicit-shape and multidimensional assumed-size 299 * extent (== upper bound) and stride: general (non-`CONTIGUOUS`) assumed-shape 300 * lower bound, stride, and either extent or upper bound: general (non-`CONTIGUOUS`) `POINTER`, assumed-rank 301 302and these cases could be accompanied by precomputed invariant 303addressing subexpressions to accelerate indexing calculations. 304 305### Interoperability requirements 306 307Fortran 2018 requires that a Fortran implementation supply a header file 308`ISO_Fortran_binding.h` for use in C and C++ programs that defines and 309implements an interface to Fortran objects from the _interoperable_ 310subset of Fortran objects and their types suitable for use when those 311objects are passed to C functions. 312This interface mandates a fat descriptor that is passed by address, 313containing (at least) 314 * a data base address 315 * explicit rank and type 316 * flags to distinguish `POINTER` and `ALLOCATABLE` 317 * elemental byte size, and 318 * (per-dimension) lower bound, extent, and byte stride 319 320The requirements on the interoperability API do not mandate any 321support for features like derived type component initialization, 322automatic deallocation of `ALLOCATABLE` components, finalization, 323derived type parameters, data contiguity flags, &c. 324But neither does the Standard preclude inclusion of additional 325interfaces to describe and support such things. 326 327Given a desire to fully support the Fortran 2018 language, we need 328to either support the interoperability requirements as a distinct 329specialization of the procedure call protocol, or use the 330`ISO_Fortran_binding.h` header file requirements as a subset basis for a 331complete implementation that adds representations for all the 332missing capabilities, which would be isolated and named so as 333to prevent user C code from relying upon them. 334 335### Design space 336There is a range of possible options for representing the 337properties of values and objects during the execution of Fortran 338programs. 339 340At one extreme, the amount of dynamic information is minimized, 341and is packaged in custom data structures or additional arguments 342for each situation to convey only the values that are unknown at 343compilation time and actually needed at execution time. 344 345At the other extreme, data values and objects are described completely, 346including even the values of properties are known at compilation time. 347This is not as silly as it sounds -- e.g., Fortran array descriptors 348have historically materialized the number of dimensions they cover, even 349though rank will be (nearly) always be a known constant during compilation. 350 351When data are packaged, their containers can be self-describing to 352some degree. 353Description records can have tag values or strings. 354Their fields can have presence flags or identifying tags, and fields 355need not have fixed offsets or ordering. 356This flexibility can increase binary compatibility across revisions 357of the run-time support library, and is convenient for debugging 358that library. 359However, it is not free. 360 361Further, the requirements of the representation of dynamic 362properties of values and objects depend on the execution model: 363specifically, are the complicated semantics of intrinsic assignment, 364deallocation, and finalization of allocatables implemented entirely 365in the support library, in generated code for non-recursive cases, 366or by means of a combination of the two approaches? 367 368Consider how to implement the following: 369``` 370TYPE :: LIST 371 REAL :: HEAD 372 TYPE(LIST), ALLOCATABLE :: REST 373END TYPE LIST 374TYPE(LIST), ALLOCATABLE :: A, B 375... 376A = B 377``` 378 379Fortran requires that `A`'s arbitrary-length linked list be deleted and 380replaced with a "deep copy" of `B`'s. 381So either a complicated pair of loops must be generated by the compiler, 382or a sophisticated run time support library needs to be driven with 383an expressive representation of type information. 384 385## Proposal 386We need to write `ISO_Fortran_binding.h` in any event. 387It is a header that is published for use in user C code for interoperation 388with compiled Fortran and the Fortran run time support library. 389 390There is a sole descriptor structure defined in `ISO_Fortran_binding.h`. 391It is suitable for characterizing scalars and array sections of intrinsic 392types. 393It is essentially a "fat" data pointer that encapsulates a raw data pointer, 394a type code, rank, elemental byte size, and per-dimension bounds and stride. 395 396Please note that the mandated interoperable descriptor includes the data 397pointer. 398This design in the Standard precludes the use of static descriptors that 399could be associated with dynamic base addresses. 400 401The F18 runtime cannot use just the mandated interoperable 402`struct CFI_cdesc_t` argument descriptor structure as its 403all-purpose data descriptor. 404It has no information about derived type components, overridable 405type-bound procedure bindings, type parameters, &c. 406 407However, we could extend the standard interoperable argument descriptor. 408The `struct CFI_cdesc_t` structure is not of fixed size, but we 409can efficiently locate the first address after an instance of the 410standard descriptor and attach our own data record there to 411hold what we need. 412There's at least one unused padding byte in the standard argument 413descriptor that can be used to hold a flag indicating the presence 414of the addenda. 415 416The definitions of our additional run time data structures must 417appear in a header file that is distinct from `ISO_Fortran_binding.h`, 418and they should never be used by user applications. 419 420This expanded descriptor structure can serve, at least initially for 421simplicity, as the sole representation of `POINTER` variables and 422components, `ALLOCATABLE` variables and components, and derived type 423instances, including length parameter values. 424 425An immediate concern with this concept is the amount of space and 426initialization time that would be wasted when derived type components 427needing a descriptor would have to be accompanied by an instance 428of the general descriptor. 429(In the linked list example close above, what could be done with a 430single pointer for the `REST` component would become at least 431a four-word dynamic structure.) 432This concern is amplified when derived type instances 433are allocated as arrays, since the overhead is per-element. 434 435We can reduce this wastage in two ways. 436First, when the content of the component's descriptor is constant 437at compilation apart from its base address, a static descriptor 438can be placed in read-only storage and attached to the description 439of the derived type's components. 440Second, we could eventually optimize the storage requirements by 441omitting all static fields from the dynamic descriptor, and 442expand the compressed dynamic descriptor during execution when 443needed. 444