1<!--===- docs/ArrayComposition.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9# Array Composition
10
11```eval_rst
12.. contents::
13   :local:
14```
15
16This note attempts to describe the motivation for and design of an
17implementation of Fortran 90 (and later) array expression evaluation that
18minimizes the use of dynamically allocated temporary storage for
19the results of calls to transformational intrinsic functions, and
20making them more amenable to acceleration.
21
22The transformational intrinsic functions of Fortran of interest to
23us here include:
24
25* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`,
26  `DOT_PRODUCT`,
27  `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`)
28* Axial reductions (`SUM(X,DIM=)`, &c.)
29* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`)
30* Axial location reductions (`MAXLOC(DIM=`, &c.)
31* `TRANSPOSE(M)` matrix transposition
32* `RESHAPE` without `ORDER=`
33* `RESHAPE` with `ORDER=`
34* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=`
35* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=`
36* `PACK` and `UNPACK`
37* `MATMUL`
38* `SPREAD`
39
40Other Fortran intrinsic functions are technically transformational (e.g.,
41`COMMAND_ARGUMENT_COUNT`) but not of interest for this note.
42The generic `REDUCE` is also not considered here.
43
44## Arrays as functions
45
46A whole array can be viewed as a function that maps its indices to the values
47of its elements.
48Specifically, it is a map from a tuple of integers to its element type.
49The rank of the array is the number of elements in that tuple,
50and the shape of the array delimits the domain of the map.
51
52`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers
53`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values.
54
55## Array expressions as functions
56
57The same perspective can be taken of an array expression comprising
58intrinsic operators and elemental functions.
59Fortran doesn't allow one to apply subscripts directly to an expression,
60but expressions have rank and shape, and one can view array expressions
61as functions over index tuples by applying those indices to the arrays
62and subexpressions in the expression.
63
64Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`).
65The right-hand side of that assignment could be evaluated into a
66temporary array `T` and then subscripted as it is copied into `B`.
67```
68REAL, ALLOCATABLE :: T(:,:)
69ALLOCATE(T(N,M))
70DO CONCURRENT(J=1:N,K=1:M)
71  T(J,K)=A(J,K) + 1.0
72END DO
73DO CONCURRENT(J=1:N,K=1:M)
74  B(J,K)=T(J,K)
75END DO
76DEALLOCATE(T)
77```
78But we can avoid the allocation, population, and deallocation of
79the temporary by treating the right-hand side expression as if it
80were a statement function `F(J,K)=A(J,K)+1.0` and evaluating
81```
82DO CONCURRENT(J=1:N,K=1:M)
83  A(J,K)=F(J,K)
84END DO
85```
86
87In general, when a Fortran array assignment to a non-allocatable array
88does not include the left-hand
89side variable as an operand of the right-hand side expression, and any
90function calls on the right-hand side are elemental or scalar-valued,
91we can avoid the use of a temporary.
92
93## Transformational intrinsic functions as function composition
94
95Many of the transformational intrinsic functions listed above
96can, when their array arguments are viewed as functions over their
97index tuples, be seen as compositions of those functions with
98functions of the "incoming" indices -- yielding a function for
99an entire right-hand side of an array assignment statement.
100
101For example, the application of `TRANSPOSE(A + 1.0)` to the index
102tuple `(J,K)` becomes `A(K,J) + 1.0`.
103
104Partial (axial) reductions can be similarly composed.
105The application of `SUM(A,DIM=2)` to the index `J` is the
106complete reduction `SUM(A(J,:))`.
107
108More completely:
109* Reductions to scalars (`SUM(X)` without `DIM=`) become
110  runtime calls; the result needs no dynamic allocation,
111  being a scalar.
112* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)`
113  become scalar values like `SUM(X(J,K,:))` if `d=3`.
114* Location reductions to indices (`MAXLOC(X)` without `DIM=`)
115  do not require dynamic allocation, since their results are
116  either scalar or small vectors of length `RANK(X)`.
117* Axial location reductions (`MAXLOC(X,DIM=)`, &c.)
118  are handled like other axial reductions like `SUM(DIM=)`.
119* `TRANSPOSE(M)` exchanges the two components of the index tuple.
120* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape
121  vector `S`, and then use it to linearize indices into offsets
122  in the storage order of `A` (whose shape must also be captured).
123  These conversions can involve division and/or modulus, which
124  can be optimized into a fixed-point multiplication using the
125  usual technique.
126* `RESHAPE` with `ORDER=` is similar, but must permute the
127  components of the index tuple; it generalizes `TRANSPOSE`.
128* `CSHIFT` applies addition and modulus.
129* `EOSHIFT` applies addition and a conditional move (`MERGE`).
130* `PACK` and `UNPACK` are likely to require a runtime call.
131* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but
132  might benefit from calling a highly optimized runtime
133  routine.
134* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply
135  applies `A` to a reduced index tuple.
136
137## Determination of rank and shape
138
139An important part of evaluating array expressions without the use of
140temporary storage is determining the shape of the result prior to,
141or without, evaluating the elements of the result.
142
143The shapes of array objects, results of elemental intrinsic functions,
144and results of intrinsic operations are obvious.
145But it is possible to determine the shapes of the results of many
146transformational intrinsic function calls as well.
147
148* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed:
149  `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general.
150  (The `DIM=` argument is commonly a compile-time constant.)
151* `SHAPE(MAXLOC(X))` is `[RANK(X)]`.
152* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed.
153* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`.
154* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`.
155* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`.
156* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)`
157* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`.
158* `RANK(PACK(...))` is always 1.
159* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`.
160* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`.
161* `SHAPE(SHAPE(X))` is `[RANK(X)]`.
162* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at
163  dimension `d`.
164
165This is useful because expression evaluations that *do* require temporaries
166to hold their results (due to the context in which the evaluation occurs)
167can be implemented with a separation of the allocation
168of the temporary array and the population of the array.
169The code that evaluates the expression, or that implements a transformational
170intrinsic in the runtime library, can be designed with an API that includes
171a pointer to the destination array as an argument.
172
173Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable
174of evaluating their array expressions directly into the newly-allocated
175storage for the allocatable array.
176The implementation would generate code to calculate the shape, use it
177to allocate the memory and populate the descriptor, and then drive a
178loop nest around the expression to populate the array.
179In cases where the analyzed shape is known at compile time, we should
180be able to have the opportunity to avoid heap allocation in favor of
181stack storage, if the scope of the variable is local.
182
183## Automatic reallocation of allocatables
184
185Fortran 2003 introduced the ability to assign non-conforming array expressions
186to ALLOCATABLE arrays with the implied semantics of reallocation to the
187new shape.
188The implementation of this feature also becomes more straightforward if
189our implementation of array expressions has decoupled calculation of shapes
190from the evaluation of the elements of the result.
191
192## Rewriting rules
193
194Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into
195the result of an array expression or subexpression.
196
197* Array constructors always yield vectors; higher-rank arrays that appear as
198  constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`.
199* Array constructors with multiple constituents are concatenations of
200  their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`.
201* Array constructors with implied DO loops are difficult when nested
202  triangularly.
203* Whole array references can have lower bounds other than 1, so
204  `A => A(LBOUND(A,1):UBOUND(A,1),...)`.
205* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`.
206* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`.
207* Scalar operands ignore indices: `X{j,k} => X`.
208  Further, they are evaluated at most once.
209* Elemental operators and functions apply indices to their arguments:
210  `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`.
211* `TRANSPOSE(X){j,k} => X{k,j}`.
212* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated.
213  If X is sufficiently expensive to compute elementally, it might be evaluated
214  into a temporary.
215
216(more...)
217