1# MLIR Python Bindings
2
3Current status: Under development and not enabled by default
4
5## Building
6
7### Pre-requisites
8
9* A relatively recent Python3 installation
10* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to
11  be located by CMake (auto-detected if installed via
12  `python -m pip install pybind11`). Note: minimum version required: :2.6.0.
13
14### CMake variables
15
16* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL`
17
18  Enables building the Python bindings. Defaults to `OFF`.
19
20* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL`
21
22  Links the native extension against the Python runtime library, which is
23  optional on some platforms. While setting this to `OFF` can yield some greater
24  deployment flexibility, linking in this way allows the linker to report
25  compile time errors for unresolved symbols on all platforms, which makes for a
26  smoother development workflow. Defaults to `ON`.
27
28* **`PYTHON_EXECUTABLE`**:`STRING`
29
30  Specifies the `python` executable used for the LLVM build, including for
31  determining header/link flags for the Python bindings. On systems with
32  multiple Python implementations, setting this explicitly to the preferred
33  `python3` executable is strongly recommended.
34
35## Design
36
37### Use cases
38
39There are likely two primary use cases for the MLIR python bindings:
40
411. Support users who expect that an installed version of LLVM/MLIR will yield
42   the ability to `import mlir` and use the API in a pure way out of the box.
43
441. Downstream integrations will likely want to include parts of the API in their
45   private namespace or specially built libraries, probably mixing it with other
46   python native bits.
47
48### Composable modules
49
50In order to support use case \#2, the Python bindings are organized into
51composable modules that downstream integrators can include and re-export into
52their own namespace if desired. This forces several design points:
53
54* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE`
55  global constructor.
56
57* Introduce headers for C++-only wrapper classes as other related C++ modules
58  will need to interop with it.
59
60* Separate any initialization routines that depend on optional components into
61  its own module/dependency (currently, things like `registerAllDialects` fall
62  into this category).
63
64There are a lot of co-related issues of shared library linkage, distribution
65concerns, etc that affect such things. Organizing the code into composable
66modules (versus a monolithic `cpp` file) allows the flexibility to address many
67of these as needed over time. Also, compilation time for all of the template
68meta-programming in pybind scales with the number of things you define in a
69translation unit. Breaking into multiple translation units can significantly aid
70compile times for APIs with a large surface area.
71
72### Submodules
73
74Generally, the C++ codebase namespaces most things into the `mlir` namespace.
75However, in order to modularize and make the Python bindings easier to
76understand, sub-packages are defined that map roughly to the directory structure
77of functional units in MLIR.
78
79Examples:
80
81* `mlir.ir`
82* `mlir.passes` (`pass` is a reserved word :( )
83* `mlir.dialect`
84* `mlir.execution_engine` (aside from namespacing, it is important that
85  "bulky"/optional parts like this are isolated)
86
87In addition, initialization functions that imply optional dependencies should
88be in underscored (notionally private) modules such as `_init` and linked
89separately. This allows downstream integrators to completely customize what is
90included "in the box" and covers things like dialect registration,
91pass registration, etc.
92
93### Loader
94
95LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with
96other non-trivial native extensions. As such, the native extension (i.e. the
97`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol
98(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py`
99and siblings which loads and re-exports it. This split provides a place to stage
100code that needs to prepare the environment *before* the shared library is loaded
101into the Python runtime, and also provides a place that one-time initialization
102code can be invoked apart from module constructors.
103
104To start with the `mlir/__init__.py` loader shim can be very simple and scale to
105future need:
106
107```python
108from _mlir import *
109```
110
111### Use the C-API
112
113The Python APIs should seek to layer on top of the C-API to the degree possible.
114Especially for the core, dialect-independent parts, such a binding enables
115packaging decisions that would be difficult or impossible if spanning a C++ ABI
116boundary. In addition, factoring in this way side-steps some very difficult
117issues that arise when combining RTTI-based modules (which pybind derived things
118are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM).
119
120### Ownership in the Core IR
121
122There are several top-level types in the core IR that are strongly owned by their python-side reference:
123
124* `PyContext` (`mlir.ir.Context`)
125* `PyModule` (`mlir.ir.Module`)
126* `PyOperation` (`mlir.ir.Operation`) - but with caveats
127
128All other objects are dependent. All objects maintain a back-reference
129(keep-alive) to their closest containing top-level object. Further, dependent
130objects fall into two categories: a) uniqued (which live for the life-time of
131the context) and b) mutable. Mutable objects need additional machinery for
132keeping track of when the C++ instance that backs their Python object is no
133longer valid (typically due to some specific mutation of the IR, deletion, or
134bulk operation).
135
136### Optionality and argument ordering in the Core IR
137
138The following types support being bound to the current thread as a context manager:
139
140* `PyLocation` (`loc: mlir.ir.Location = None`)
141* `PyInsertionPoint` (`ip: mlir.ir.InsertionPoint = None`)
142* `PyMlirContext` (`context: mlir.ir.Context = None`)
143
144In order to support composability of function arguments, when these types appear
145as arguments, they should always be the last and appear in the above order and
146with the given names (which is generally the order in which they are expected to
147need to be expressed explicitly in special cases) as necessary. Each should
148carry a default value of `py::none()` and use either a manual or automatic
149conversion for resolving either with the explicit value or a value from the
150thread context manager (i.e. `DefaultingPyMlirContext` or
151`DefaultingPyLocation`).
152
153The rationale for this is that in Python, trailing keyword arguments to the
154*right* are the most composable, enabling a variety of strategies such as kwarg
155passthrough, default values, etc. Keeping function signatures composable
156increases the chances that interesting DSLs and higher level APIs can be
157constructed without a lot of exotic boilerplate.
158
159Used consistently, this enables a style of IR construction that rarely needs to
160use explicit contexts, locations, or insertion points but is free to do so when
161extra control is needed.
162
163#### Operation hierarchy
164
165As mentioned above, `PyOperation` is special because it can exist in either a
166top-level or dependent state. The life-cycle is unidirectional: operations can
167be created detached (top-level) and once added to another operation, they are
168then dependent for the remainder of their lifetime. The situation is more
169complicated when considering construction scenarios where an operation is added
170to a transitive parent that is still detached, necessitating further accounting
171at such transition points (i.e. all such added children are initially added to
172the IR with a parent of their outer-most detached operation, but then once it is
173added to an attached operation, they need to be re-parented to the containing
174module).
175
176Due to the validity and parenting accounting needs, `PyOperation` is the owner
177for regions and blocks and needs to be a top-level type that we can count on not
178aliasing. This let's us do things like selectively invalidating instances when
179mutations occur without worrying that there is some alias to the same operation
180in the hierarchy. Operations are also the only entity that are allowed to be in
181a detached state, and they are interned at the context level so that there is
182never more than one Python `mlir.ir.Operation` object for a unique
183`MlirOperation`, regardless of how it is obtained.
184
185The C/C++ API allows for Region/Block to also be detached, but it simplifies the
186ownership model a lot to eliminate that possibility in this API, allowing the
187Region/Block to be completely dependent on its owning operation for accounting.
188The aliasing of Python `Region`/`Block` instances to underlying
189`MlirRegion`/`MlirBlock` is considered benign and these objects are not interned
190in the context (unlike operations).
191
192If we ever want to re-introduce detached regions/blocks, we could do so with new
193"DetachedRegion" class or similar and also avoid the complexity of accounting.
194With the way it is now, we can avoid having a global live list for regions and
195blocks. We may end up needing an op-local one at some point TBD, depending on
196how hard it is to guarantee how mutations interact with their Python peer
197objects. We can cross that bridge easily when we get there.
198
199Module, when used purely from the Python API, can't alias anyway, so we can use
200it as a top-level ref type without a live-list for interning. If the API ever
201changes such that this cannot be guaranteed (i.e. by letting you marshal a
202native-defined Module in), then there would need to be a live table for it too.
203
204## Style
205
206In general, for the core parts of MLIR, the Python bindings should be largely
207isomorphic with the underlying C++ structures. However, concessions are made
208either for practicality or to give the resulting library an appropriately
209"Pythonic" flavor.
210
211### Properties vs get\*() methods
212
213Generally favor converting trivial methods like `getContext()`, `getName()`,
214`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is
215primarily a matter of calling `def_property_readonly` vs `def` in binding code,
216and makes things feel much nicer to the Python side.
217
218For example, prefer:
219
220```c++
221m.def_property_readonly("context", ...)
222```
223
224Over:
225
226```c++
227m.def("getContext", ...)
228```
229
230### __repr__ methods
231
232Things that have nice printed representations are really great :)  If there is a
233reasonable printed form, it can be a significant productivity boost to wire that
234to the `__repr__` method (and verify it with a [doctest](#sample-doctest)).
235
236### CamelCase vs snake\_case
237
238Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As
239a mechanical concession to Python style, this can go a long way to making the
240API feel like it fits in with its peers in the Python landscape.
241
242If in doubt, choose names that will flow properly with other
243[PEP 8 style names](https://pep8.org/#descriptive-naming-styles).
244
245### Prefer pseudo-containers
246
247Many core IR constructs provide methods directly on the instance to query count
248and begin/end iterators. Prefer hoisting these to dedicated pseudo containers.
249
250For example, a direct mapping of blocks within regions could be done this way:
251
252```python
253region = ...
254
255for block in region:
256
257  pass
258```
259
260However, this way is preferred:
261
262```python
263region = ...
264
265for block in region.blocks:
266
267  pass
268
269print(len(region.blocks))
270print(region.blocks[0])
271print(region.blocks[-1])
272```
273
274Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate
275them to appropriate `__dunder__` methods and iterator wrappers in the bindings.
276
277Note that this can be taken too far, so use good judgment. For example, block
278arguments may appear container-like but have defined methods for lookup and
279mutation that would be hard to model properly without making semantics
280complicated. If running into these, just mirror the C/C++ API.
281
282### Provide one stop helpers for common things
283
284One stop helpers that aggregate over multiple low level entities can be
285incredibly helpful and are encouraged within reason. For example, making
286`Context` have a `parse_asm` or equivalent that avoids needing to explicitly
287construct a SourceMgr can be quite nice. One stop helpers do not have to be
288mutually exclusive with a more complete mapping of the backing constructs.
289
290## Testing
291
292Tests should be added in the `test/Bindings/Python` directory and should
293typically be `.py` files that have a lit run line.
294
295While lit can run any python module, prefer to lay tests out according to these
296rules:
297
298* For tests of the API surface area, prefer
299  [`doctest`](https://docs.python.org/3/library/doctest.html).
300* For generative tests (those that produce IR), define a Python module that
301  constructs/prints the IR and pipe it through `FileCheck`.
302* Parsing should be kept self-contained within the module under test by use of
303  raw constants and an appropriate `parse_asm` call.
304* Any file I/O code should be staged through a tempfile vs relying on file
305  artifacts/paths outside of the test module.
306
307### Sample Doctest
308
309```python
310# RUN: %PYTHON %s
311
312"""
313  >>> m = load_test_module()
314Test basics:
315  >>> m.operation.name
316  "module"
317  >>> m.operation.is_registered
318  True
319  >>> ... etc ...
320
321Verify that repr prints:
322  >>> m.operation
323  <operation 'module'>
324"""
325
326import mlir
327
328TEST_MLIR_ASM = r"""
329func @test_operation_correct_regions() {
330  // ...
331}
332"""
333
334# TODO: Move to a test utility class once any of this actually exists.
335def load_test_module():
336  ctx = mlir.ir.Context()
337  ctx.allow_unregistered_dialects = True
338  module = ctx.parse_asm(TEST_MLIR_ASM)
339  return module
340
341
342if __name__ == "__main__":
343  import doctest
344  doctest.testmod()
345```
346
347### Sample FileCheck test
348
349```python
350# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck
351
352# TODO: Move to a test utility class once any of this actually exists.
353def print_module(f):
354  m = f()
355  print("// -----")
356  print("// TEST_FUNCTION:", f.__name__)
357  print(m.to_asm())
358  return f
359
360# CHECK-LABEL: TEST_FUNCTION: create_my_op
361@print_module
362def create_my_op():
363  m = mlir.ir.Module()
364  builder = m.new_op_builder()
365  # CHECK: mydialect.my_operation ...
366  builder.my_op()
367  return m
368```
369