1
2
3(* Copyright (c) 2011-2020 Stefan Krah. All rights reserved. *)
4
5
6The Matrix Fourier Transform:
7=============================
8
9In libmpdec, the Matrix Fourier Transform [1] is called four-step transform
10after a variant that appears in [2]. The algorithm requires that the input
11array can be viewed as an R*C matrix.
12
13All operations are done modulo p. For readability, the proofs drop all
14instances of (mod p).
15
16
17Algorithm four-step (forward transform):
18----------------------------------------
19
20  a := input array
21  d := len(a) = R * C
22  p := prime
23  w := primitive root of unity of the prime field
24  r := w**((p-1)/d)
25  A := output array
26
27  1) Apply a length R FNT to each column.
28
29  2) Multiply each matrix element (addressed by j*C+m) by r**(j*m).
30
31  3) Apply a length C FNT to each row.
32
33  4) Transpose the matrix.
34
35
36Proof (forward transform):
37--------------------------
38
39  The algorithm can be derived starting from the regular definition of
40  the finite-field transform of length d:
41
42            d-1
43           ,----
44           \
45    A[k] =  |  a[l]  * r**(k * l)
46           /
47           `----
48           l = 0
49
50
51  The sum can be rearranged into the sum of the sums of columns:
52
53            C-1     R-1
54           ,----   ,----
55           \       \
56         =  |       |  a[i * C + j] * r**(k * (i * C + j))
57           /       /
58           `----   `----
59           j = 0   i = 0
60
61
62  Extracting a constant from the inner sum:
63
64            C-1           R-1
65           ,----         ,----
66           \             \
67         =  |  r**k*j  *  |  a[i * C + j] * r**(k * i * C)
68           /             /
69           `----         `----
70           j = 0         i = 0
71
72
73  Without any loss of generality, let k = n * R + m,
74  where n < C and m < R:
75
76                C-1                          R-1
77               ,----                        ,----
78               \                            \
79    A[n*R+m] =  |  r**(R*n*j) * r**(m*j)  *  |  a[i*C+j] * r**(R*C*n*i) * r**(C*m*i)
80               /                            /
81               `----                        `----
82               j = 0                        i = 0
83
84
85  Since r = w ** ((p-1) / (R*C)):
86
87     a) r**(R*C*n*i) = w**((p-1)*n*i) = 1
88
89     b) r**(C*m*i) = w**((p-1) / R) ** (m*i) = r_R ** (m*i)
90
91     c) r**(R*n*j) = w**((p-1) / C) ** (n*j) = r_C ** (n*j)
92
93     r_R := root of the subfield of length R.
94     r_C := root of the subfield of length C.
95
96
97                C-1                             R-1
98               ,----                           ,----
99               \                               \
100    A[n*R+m] =  |  r_C**(n*j) * [ r**(m*j)  *   |  a[i*C+j] * r_R**(m*i) ]
101               /                     ^         /
102               `----                 |         `----    1) transform the columns
103               j = 0                 |         i = 0
104                 ^                   |
105                 |                   `-- 2) multiply
106                 |
107                 `-- 3) transform the rows
108
109
110   Note that the entire RHS is a function of n and m and that the results
111   for each pair (n, m) are stored in Fortran order.
112
113   Let the term in square brackets be f(m, j). Step 1) and 2) precalculate
114   the term for all (m, j). After that, the original matrix is now a lookup
115   table with the mth element in the jth column at location m * C + j.
116
117   Let the complete RHS be g(m, n). Step 3) does an in-place transform of
118   length n on all rows. After that, the original matrix is now a lookup
119   table with the mth element in the nth column at location m * C + n.
120
121   But each (m, n) pair should be written to location n * R + m. Therefore,
122   step 4) transposes the result of step 3).
123
124
125
126Algorithm four-step (inverse transform):
127----------------------------------------
128
129  A  := input array
130  d  := len(A) = R * C
131  p  := prime
132  d' := d**(p-2)             # inverse of d
133  w  := primitive root of unity of the prime field
134  r  := w**((p-1)/d)         # root of the subfield
135  r' := w**((p-1) - (p-1)/d) # inverse of r
136  a  := output array
137
138  0) View the matrix as a C*R matrix.
139
140  1) Transpose the matrix, producing an R*C matrix.
141
142  2) Apply a length C FNT to each row.
143
144  3) Multiply each matrix element (addressed by i*C+n) by r**(i*n).
145
146  4) Apply a length R FNT to each column.
147
148
149Proof (inverse transform):
150--------------------------
151
152  The algorithm can be derived starting from the regular definition of
153  the finite-field inverse transform of length d:
154
155                  d-1
156                 ,----
157                 \
158    a[k] =  d' *  |  A[l]  * r' ** (k * l)
159                 /
160                 `----
161                 l = 0
162
163
164  The sum can be rearranged into the sum of the sums of columns. Note
165  that at this stage we still have a C*R matrix, so C denotes the number
166  of rows:
167
168                  R-1     C-1
169                 ,----   ,----
170                 \       \
171         =  d' *  |       |  a[j * R + i] * r' ** (k * (j * R + i))
172                 /       /
173                 `----   `----
174                 i = 0   j = 0
175
176
177  Extracting a constant from the inner sum:
178
179                  R-1                C-1
180                 ,----              ,----
181                 \                  \
182         =  d' *  |  r' ** (k*i)  *  |  a[j * R + i] * r' ** (k * j * R)
183                 /                  /
184                 `----              `----
185                 i = 0              j = 0
186
187
188  Without any loss of generality, let k = m * C + n,
189  where m < R and n < C:
190
191                     R-1                                  C-1
192                    ,----                                ,----
193                    \                                    \
194    A[m*C+n] = d' *  |  r' ** (C*m*i) *  r' ** (n*i)   *  |  a[j*R+i] * r' ** (R*C*m*j) * r' ** (R*n*j)
195                    /                                    /
196                    `----                                `----
197                    i = 0                                j = 0
198
199
200  Since r' = w**((p-1) - (p-1)/d) and d = R*C:
201
202     a) r' ** (R*C*m*j) = w**((p-1)*R*C*m*j - (p-1)*m*j) = 1
203
204     b) r' ** (C*m*i) = w**((p-1)*C - (p-1)/R) ** (m*i) = r_R' ** (m*i)
205
206     c) r' ** (R*n*j) = r_C' ** (n*j)
207
208     d) d' = d**(p-2) = (R*C) ** (p-2) = R**(p-2) * C**(p-2) = R' * C'
209
210     r_R' := inverse of the root of the subfield of length R.
211     r_C' := inverse of the root of the subfield of length C.
212     R'   := inverse of R
213     C'   := inverse of C
214
215
216                     R-1                                      C-1
217                    ,----                                    ,----  2) transform the rows of a^T
218                    \                                        \
219    A[m*C+n] = R' *  |  r_R' ** (m*i) * [ r' ** (n*i) * C' *  |  a[j*R+i] * r_C' ** (n*j) ]
220                    /                           ^            /       ^
221                    `----                       |            `----   |
222                    i = 0                       |            j = 0   |
223                      ^                         |                    `-- 1) Transpose input matrix
224                      |                         `-- 3) multiply             to address elements by
225                      |                                                     i * C + j
226                      `-- 3) transform the columns
227
228
229
230   Note that the entire RHS is a function of m and n and that the results
231   for each pair (m, n) are stored in C order.
232
233   Let the term in square brackets be f(n, i). Without step 1), the sum
234   would perform a length C transform on the columns of the input matrix.
235   This is a) inefficient and b) the results are needed in C order, so
236   step 1) exchanges rows and columns.
237
238   Step 2) and 3) precalculate f(n, i) for all (n, i). After that, the
239   original matrix is now a lookup table with the ith element in the nth
240   column at location i * C + n.
241
242   Let the complete RHS be g(m, n). Step 4) does an in-place transform of
243   length m on all columns. After that, the original matrix is now a lookup
244   table with the mth element in the nth column at location m * C + n,
245   which means that all A[k] = A[m * C + n] are in the correct order.
246
247
248--
249
250  [1] Joerg Arndt: "Matters Computational"
251      http://www.jjj.de/fxt/
252  [2] David H. Bailey: FFTs in External or Hierarchical Memory
253      http://crd.lbl.gov/~dhbailey/dhbpapers/
254
255
256
257