1<!--===- docs/ArrayComposition.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4 See https://llvm.org/LICENSE.txt for license information. 5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6 7--> 8 9# Array Composition 10 11```eval_rst 12.. contents:: 13 :local: 14``` 15 16This note attempts to describe the motivation for and design of an 17implementation of Fortran 90 (and later) array expression evaluation that 18minimizes the use of dynamically allocated temporary storage for 19the results of calls to transformational intrinsic functions, and 20making them more amenable to acceleration. 21 22The transformational intrinsic functions of Fortran of interest to 23us here include: 24 25* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, 26 `DOT_PRODUCT`, 27 `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) 28* Axial reductions (`SUM(X,DIM=)`, &c.) 29* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) 30* Axial location reductions (`MAXLOC(DIM=`, &c.) 31* `TRANSPOSE(M)` matrix transposition 32* `RESHAPE` without `ORDER=` 33* `RESHAPE` with `ORDER=` 34* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` 35* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` 36* `PACK` and `UNPACK` 37* `MATMUL` 38* `SPREAD` 39 40Other Fortran intrinsic functions are technically transformational (e.g., 41`COMMAND_ARGUMENT_COUNT`) but not of interest for this note. 42The generic `REDUCE` is also not considered here. 43 44## Arrays as functions 45 46A whole array can be viewed as a function that maps its indices to the values 47of its elements. 48Specifically, it is a map from a tuple of integers to its element type. 49The rank of the array is the number of elements in that tuple, 50and the shape of the array delimits the domain of the map. 51 52`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers 53`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. 54 55## Array expressions as functions 56 57The same perspective can be taken of an array expression comprising 58intrinsic operators and elemental functions. 59Fortran doesn't allow one to apply subscripts directly to an expression, 60but expressions have rank and shape, and one can view array expressions 61as functions over index tuples by applying those indices to the arrays 62and subexpressions in the expression. 63 64Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). 65The right-hand side of that assignment could be evaluated into a 66temporary array `T` and then subscripted as it is copied into `B`. 67``` 68REAL, ALLOCATABLE :: T(:,:) 69ALLOCATE(T(N,M)) 70DO CONCURRENT(J=1:N,K=1:M) 71 T(J,K)=A(J,K) + 1.0 72END DO 73DO CONCURRENT(J=1:N,K=1:M) 74 B(J,K)=T(J,K) 75END DO 76DEALLOCATE(T) 77``` 78But we can avoid the allocation, population, and deallocation of 79the temporary by treating the right-hand side expression as if it 80were a statement function `F(J,K)=A(J,K)+1.0` and evaluating 81``` 82DO CONCURRENT(J=1:N,K=1:M) 83 A(J,K)=F(J,K) 84END DO 85``` 86 87In general, when a Fortran array assignment to a non-allocatable array 88does not include the left-hand 89side variable as an operand of the right-hand side expression, and any 90function calls on the right-hand side are elemental or scalar-valued, 91we can avoid the use of a temporary. 92 93## Transformational intrinsic functions as function composition 94 95Many of the transformational intrinsic functions listed above 96can, when their array arguments are viewed as functions over their 97index tuples, be seen as compositions of those functions with 98functions of the "incoming" indices -- yielding a function for 99an entire right-hand side of an array assignment statement. 100 101For example, the application of `TRANSPOSE(A + 1.0)` to the index 102tuple `(J,K)` becomes `A(K,J) + 1.0`. 103 104Partial (axial) reductions can be similarly composed. 105The application of `SUM(A,DIM=2)` to the index `J` is the 106complete reduction `SUM(A(J,:))`. 107 108More completely: 109* Reductions to scalars (`SUM(X)` without `DIM=`) become 110 runtime calls; the result needs no dynamic allocation, 111 being a scalar. 112* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` 113 become scalar values like `SUM(X(J,K,:))` if `d=3`. 114* Location reductions to indices (`MAXLOC(X)` without `DIM=`) 115 do not require dynamic allocation, since their results are 116 either scalar or small vectors of length `RANK(X)`. 117* Axial location reductions (`MAXLOC(X,DIM=)`, &c.) 118 are handled like other axial reductions like `SUM(DIM=)`. 119* `TRANSPOSE(M)` exchanges the two components of the index tuple. 120* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape 121 vector `S`, and then use it to linearize indices into offsets 122 in the storage order of `A` (whose shape must also be captured). 123 These conversions can involve division and/or modulus, which 124 can be optimized into a fixed-point multiplication using the 125 usual technique. 126* `RESHAPE` with `ORDER=` is similar, but must permute the 127 components of the index tuple; it generalizes `TRANSPOSE`. 128* `CSHIFT` applies addition and modulus. 129* `EOSHIFT` applies addition and a conditional move (`MERGE`). 130* `PACK` and `UNPACK` are likely to require a runtime call. 131* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but 132 might benefit from calling a highly optimized runtime 133 routine. 134* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply 135 applies `A` to a reduced index tuple. 136 137## Determination of rank and shape 138 139An important part of evaluating array expressions without the use of 140temporary storage is determining the shape of the result prior to, 141or without, evaluating the elements of the result. 142 143The shapes of array objects, results of elemental intrinsic functions, 144and results of intrinsic operations are obvious. 145But it is possible to determine the shapes of the results of many 146transformational intrinsic function calls as well. 147 148* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: 149 `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. 150 (The `DIM=` argument is commonly a compile-time constant.) 151* `SHAPE(MAXLOC(X))` is `[RANK(X)]`. 152* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. 153* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. 154* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. 155* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. 156* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` 157* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. 158* `RANK(PACK(...))` is always 1. 159* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. 160* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. 161* `SHAPE(SHAPE(X))` is `[RANK(X)]`. 162* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at 163 dimension `d`. 164 165This is useful because expression evaluations that *do* require temporaries 166to hold their results (due to the context in which the evaluation occurs) 167can be implemented with a separation of the allocation 168of the temporary array and the population of the array. 169The code that evaluates the expression, or that implements a transformational 170intrinsic in the runtime library, can be designed with an API that includes 171a pointer to the destination array as an argument. 172 173Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable 174of evaluating their array expressions directly into the newly-allocated 175storage for the allocatable array. 176The implementation would generate code to calculate the shape, use it 177to allocate the memory and populate the descriptor, and then drive a 178loop nest around the expression to populate the array. 179In cases where the analyzed shape is known at compile time, we should 180be able to have the opportunity to avoid heap allocation in favor of 181stack storage, if the scope of the variable is local. 182 183## Automatic reallocation of allocatables 184 185Fortran 2003 introduced the ability to assign non-conforming array expressions 186to ALLOCATABLE arrays with the implied semantics of reallocation to the 187new shape. 188The implementation of this feature also becomes more straightforward if 189our implementation of array expressions has decoupled calculation of shapes 190from the evaluation of the elements of the result. 191 192## Rewriting rules 193 194Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into 195the result of an array expression or subexpression. 196 197* Array constructors always yield vectors; higher-rank arrays that appear as 198 constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. 199* Array constructors with multiple constituents are concatenations of 200 their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. 201* Array constructors with implied DO loops are difficult when nested 202 triangularly. 203* Whole array references can have lower bounds other than 1, so 204 `A => A(LBOUND(A,1):UBOUND(A,1),...)`. 205* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. 206* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. 207* Scalar operands ignore indices: `X{j,k} => X`. 208 Further, they are evaluated at most once. 209* Elemental operators and functions apply indices to their arguments: 210 `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. 211* `TRANSPOSE(X){j,k} => X{k,j}`. 212* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. 213 If X is sufficiently expensive to compute elementally, it might be evaluated 214 into a temporary. 215 216(more...) 217