1# Chapter 3: High-level Language-Specific Analysis and Transformation
2
3[TOC]
4
5Creating a dialect that closely represents the semantics of an input language
6enables analyses, transformations and optimizations in MLIR that require
7high-level language information and are generally performed on the language AST.
8For example, `clang` has a fairly
9[heavy mechanism](https://clang.llvm.org/doxygen/classclang_1_1TreeTransform.html)
10for performing template instantiation in C++.
11
12We divide compiler transformations into two categories: local and global. In
13this chapter, we focus on how to leverage the Toy Dialect and its high-level
14semantics to perform local pattern-match transformations that would be difficult
15in LLVM. For this, we use MLIR's
16[Generic DAG Rewriter](../../PatternRewriter.md).
17
18There are two methods that can be used to implement pattern-match
19transformations: 1. Imperative, C++ pattern-match and rewrite 2. Declarative,
20rule-based pattern-match and rewrite using table-driven
21[Declarative Rewrite Rules](../../DeclarativeRewrites.md) (DRR). Note that the
22use of DRR requires that the operations be defined using ODS, as described in
23[Chapter 2](Ch-2.md).
24
25## Optimize Transpose using C++ style pattern-match and rewrite
26
27Let's start with a simple pattern and try to eliminate a sequence of two
28transposes that cancel out: `transpose(transpose(X)) -> X`. Here is the
29corresponding Toy example:
30
31```toy
32def transpose_transpose(x) {
33  return transpose(transpose(x));
34}
35```
36
37Which corresponds to the following IR:
38
39```mlir
40func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> {
41  %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64>
42  %1 = toy.transpose(%0 : tensor<*xf64>) to tensor<*xf64>
43  toy.return %1 : tensor<*xf64>
44}
45```
46
47This is a good example of a transformation that is trivial to match on the Toy
48IR but that would be quite hard for LLVM to figure. For example, today Clang
49can't optimize away the temporary array, and the computation with the naive
50transpose is expressed with these loops:
51
52```c++
53#define N 100
54#define M 100
55
56void sink(void *);
57void double_transpose(int A[N][M]) {
58  int B[M][N];
59  for(int i = 0; i < N; ++i) {
60    for(int j = 0; j < M; ++j) {
61       B[j][i] = A[i][j];
62    }
63  }
64  for(int i = 0; i < N; ++i) {
65    for(int j = 0; j < M; ++j) {
66       A[i][j] = B[j][i];
67    }
68  }
69  sink(A);
70}
71```
72
73For a simple C++ approach to rewrite, involving matching a tree-like pattern in
74the IR and replacing it with a different set of operations, we can plug into the
75MLIR `Canonicalizer` pass by implementing a `RewritePattern`:
76
77```c++
78/// Fold transpose(transpose(x)) -> x
79struct SimplifyRedundantTranspose : public mlir::OpRewritePattern<TransposeOp> {
80  /// We register this pattern to match every toy.transpose in the IR.
81  /// The "benefit" is used by the framework to order the patterns and process
82  /// them in order of profitability.
83  SimplifyRedundantTranspose(mlir::MLIRContext *context)
84      : OpRewritePattern<TransposeOp>(context, /*benefit=*/1) {}
85
86  /// This method is attempting to match a pattern and rewrite it. The rewriter
87  /// argument is the orchestrator of the sequence of rewrites. It is expected
88  /// to interact with it to perform any changes to the IR from here.
89  mlir::LogicalResult
90  matchAndRewrite(TransposeOp op,
91                  mlir::PatternRewriter &rewriter) const override {
92    // Look through the input of the current transpose.
93    mlir::Value transposeInput = op.getOperand();
94    TransposeOp transposeInputOp = transposeInput.getDefiningOp<TransposeOp>();
95
96    // Input defined by another transpose? If not, no match.
97    if (!transposeInputOp)
98      return failure();
99
100    // Otherwise, we have a redundant transpose. Use the rewriter.
101    rewriter.replaceOp(op, {transposeInputOp.getOperand()});
102    return success();
103  }
104};
105```
106
107The implementation of this rewriter is in `ToyCombine.cpp`. The
108[canonicalization pass](../../Canonicalization.md) applies transformations
109defined by operations in a greedy, iterative manner. To ensure that the
110canonicalization pass applies our new transform, we set
111[hasCanonicalizer = 1](../../OpDefinitions.md#hascanonicalizer) and register the
112pattern with the canonicalization framework.
113
114```c++
115// Register our patterns for rewrite by the Canonicalization framework.
116void TransposeOp::getCanonicalizationPatterns(
117    OwningRewritePatternList &results, MLIRContext *context) {
118  results.insert<SimplifyRedundantTranspose>(context);
119}
120```
121
122We also need to update our main file, `toyc.cpp`, to add an optimization
123pipeline. In MLIR, the optimizations are run through a `PassManager` in a
124similar way to LLVM:
125
126```c++
127  mlir::PassManager pm(module.getContext());
128  pm.addNestedPass<mlir::FuncOp>(mlir::createCanonicalizerPass());
129```
130
131Finally, we can run `toyc-ch3 test/Examples/Toy/Ch3/transpose_transpose.toy
132-emit=mlir -opt` and observe our pattern in action:
133
134```mlir
135func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> {
136  %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64>
137  toy.return %arg0 : tensor<*xf64>
138}
139```
140
141As expected, we now directly return the function argument, bypassing any
142transpose operation. However, one of the transposes still hasn't been
143eliminated. That is not ideal! What happened is that our pattern replaced the
144last transform with the function input and left behind the now dead transpose
145input. The Canonicalizer knows to clean up dead operations; however, MLIR
146conservatively assumes that operations may have side-effects. We can fix this by
147adding a new trait, `NoSideEffect`, to our `TransposeOp`:
148
149```tablegen
150def TransposeOp : Toy_Op<"transpose", [NoSideEffect]> {...}
151```
152
153Let's retry now `toyc-ch3 test/transpose_transpose.toy -emit=mlir -opt`:
154
155```mlir
156func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> {
157  toy.return %arg0 : tensor<*xf64>
158}
159```
160
161Perfect! No `transpose` operation is left - the code is optimal.
162
163In the next section, we use DRR for pattern match optimizations associated with
164the Reshape op.
165
166## Optimize Reshapes using DRR
167
168Declarative, rule-based pattern-match and rewrite (DRR) is an operation
169DAG-based declarative rewriter that provides a table-based syntax for
170pattern-match and rewrite rules:
171
172```tablegen
173class Pattern<
174    dag sourcePattern, list<dag> resultPatterns,
175    list<dag> additionalConstraints = [],
176    dag benefitsAdded = (addBenefit 0)>;
177```
178
179A redundant reshape optimization similar to SimplifyRedundantTranspose can be
180expressed more simply using DRR as follows:
181
182```tablegen
183// Reshape(Reshape(x)) = Reshape(x)
184def ReshapeReshapeOptPattern : Pat<(ReshapeOp(ReshapeOp $arg)),
185                                   (ReshapeOp $arg)>;
186```
187
188The automatically generated C++ code corresponding to each of the DRR patterns
189can be found under `path/to/BUILD/tools/mlir/examples/toy/Ch3/ToyCombine.inc`.
190
191DRR also provides a method for adding argument constraints when the
192transformation is conditional on some properties of the arguments and results.
193An example is a transformation that eliminates reshapes when they are redundant,
194i.e. when the input and output shapes are identical.
195
196```tablegen
197def TypesAreIdentical : Constraint<CPred<"$0.getType() == $1.getType()">>;
198def RedundantReshapeOptPattern : Pat<
199  (ReshapeOp:$res $arg), (replaceWithValue $arg),
200  [(TypesAreIdentical $res, $arg)]>;
201```
202
203Some optimizations may require additional transformations on instruction
204arguments. This is achieved using NativeCodeCall, which allows for more complex
205transformations either by calling into a C++ helper function or by using inline
206C++. An example of such an optimization is FoldConstantReshape, where we
207optimize Reshape of a constant value by reshaping the constant in place and
208eliminating the reshape operation.
209
210```tablegen
211def ReshapeConstant : NativeCodeCall<"$0.reshape(($1.getType()).cast<ShapedType>())">;
212def FoldConstantReshapeOptPattern : Pat<
213  (ReshapeOp:$res (ConstantOp $arg)),
214  (ConstantOp (ReshapeConstant $arg, $res))>;
215```
216
217We demonstrate these reshape optimizations using the following
218trivial_reshape.toy program:
219
220```c++
221def main() {
222  var a<2,1> = [1, 2];
223  var b<2,1> = a;
224  var c<2,1> = b;
225  print(c);
226}
227```
228
229```mlir
230module {
231  func @main() {
232    %0 = toy.constant dense<[1.000000e+00, 2.000000e+00]> : tensor<2xf64>
233    %1 = toy.reshape(%0 : tensor<2xf64>) to tensor<2x1xf64>
234    %2 = toy.reshape(%1 : tensor<2x1xf64>) to tensor<2x1xf64>
235    %3 = toy.reshape(%2 : tensor<2x1xf64>) to tensor<2x1xf64>
236    toy.print %3 : tensor<2x1xf64>
237    toy.return
238  }
239}
240```
241
242We can try to run `toyc-ch3 test/Examples/Toy/Ch3/trivial_reshape.toy -emit=mlir
243-opt` and observe our pattern in action:
244
245```mlir
246module {
247  func @main() {
248    %0 = toy.constant dense<[[1.000000e+00], [2.000000e+00]]> : tensor<2x1xf64>
249    toy.print %0 : tensor<2x1xf64>
250    toy.return
251  }
252}
253```
254
255As expected, no reshape operations remain after canonicalization.
256
257Further details on the declarative rewrite method can be found at
258[Table-driven Declarative Rewrite Rule (DRR)](../../DeclarativeRewrites.md).
259
260In this chapter, we saw how to use certain core transformations through always
261available hooks. In the [next chapter](Ch-4.md), we will see how to use generic
262solutions that scale better through Interfaces.
263