1# MLIR Python Bindings 2 3Current status: Under development and not enabled by default 4 5## Building 6 7### Pre-requisites 8 9* A relatively recent Python3 installation 10* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to 11 be located by CMake (auto-detected if installed via 12 `python -m pip install pybind11`). Note: minimum version required: :2.6.0. 13 14### CMake variables 15 16* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL` 17 18 Enables building the Python bindings. Defaults to `OFF`. 19 20* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL` 21 22 Links the native extension against the Python runtime library, which is 23 optional on some platforms. While setting this to `OFF` can yield some greater 24 deployment flexibility, linking in this way allows the linker to report 25 compile time errors for unresolved symbols on all platforms, which makes for a 26 smoother development workflow. Defaults to `ON`. 27 28* **`PYTHON_EXECUTABLE`**:`STRING` 29 30 Specifies the `python` executable used for the LLVM build, including for 31 determining header/link flags for the Python bindings. On systems with 32 multiple Python implementations, setting this explicitly to the preferred 33 `python3` executable is strongly recommended. 34 35## Design 36 37### Use cases 38 39There are likely two primary use cases for the MLIR python bindings: 40 411. Support users who expect that an installed version of LLVM/MLIR will yield 42 the ability to `import mlir` and use the API in a pure way out of the box. 43 441. Downstream integrations will likely want to include parts of the API in their 45 private namespace or specially built libraries, probably mixing it with other 46 python native bits. 47 48### Composable modules 49 50In order to support use case \#2, the Python bindings are organized into 51composable modules that downstream integrators can include and re-export into 52their own namespace if desired. This forces several design points: 53 54* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE` 55 global constructor. 56 57* Introduce headers for C++-only wrapper classes as other related C++ modules 58 will need to interop with it. 59 60* Separate any initialization routines that depend on optional components into 61 its own module/dependency (currently, things like `registerAllDialects` fall 62 into this category). 63 64There are a lot of co-related issues of shared library linkage, distribution 65concerns, etc that affect such things. Organizing the code into composable 66modules (versus a monolithic `cpp` file) allows the flexibility to address many 67of these as needed over time. Also, compilation time for all of the template 68meta-programming in pybind scales with the number of things you define in a 69translation unit. Breaking into multiple translation units can significantly aid 70compile times for APIs with a large surface area. 71 72### Submodules 73 74Generally, the C++ codebase namespaces most things into the `mlir` namespace. 75However, in order to modularize and make the Python bindings easier to 76understand, sub-packages are defined that map roughly to the directory structure 77of functional units in MLIR. 78 79Examples: 80 81* `mlir.ir` 82* `mlir.passes` (`pass` is a reserved word :( ) 83* `mlir.dialect` 84* `mlir.execution_engine` (aside from namespacing, it is important that 85 "bulky"/optional parts like this are isolated) 86 87In addition, initialization functions that imply optional dependencies should 88be in underscored (notionally private) modules such as `_init` and linked 89separately. This allows downstream integrators to completely customize what is 90included "in the box" and covers things like dialect registration, 91pass registration, etc. 92 93### Loader 94 95LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with 96other non-trivial native extensions. As such, the native extension (i.e. the 97`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol 98(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py` 99and siblings which loads and re-exports it. This split provides a place to stage 100code that needs to prepare the environment *before* the shared library is loaded 101into the Python runtime, and also provides a place that one-time initialization 102code can be invoked apart from module constructors. 103 104To start with the `mlir/__init__.py` loader shim can be very simple and scale to 105future need: 106 107```python 108from _mlir import * 109``` 110 111### Use the C-API 112 113The Python APIs should seek to layer on top of the C-API to the degree possible. 114Especially for the core, dialect-independent parts, such a binding enables 115packaging decisions that would be difficult or impossible if spanning a C++ ABI 116boundary. In addition, factoring in this way side-steps some very difficult 117issues that arise when combining RTTI-based modules (which pybind derived things 118are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM). 119 120### Ownership in the Core IR 121 122There are several top-level types in the core IR that are strongly owned by their python-side reference: 123 124* `PyContext` (`mlir.ir.Context`) 125* `PyModule` (`mlir.ir.Module`) 126* `PyOperation` (`mlir.ir.Operation`) - but with caveats 127 128All other objects are dependent. All objects maintain a back-reference 129(keep-alive) to their closest containing top-level object. Further, dependent 130objects fall into two categories: a) uniqued (which live for the life-time of 131the context) and b) mutable. Mutable objects need additional machinery for 132keeping track of when the C++ instance that backs their Python object is no 133longer valid (typically due to some specific mutation of the IR, deletion, or 134bulk operation). 135 136### Optionality and argument ordering in the Core IR 137 138The following types support being bound to the current thread as a context manager: 139 140* `PyLocation` (`loc: mlir.ir.Location = None`) 141* `PyInsertionPoint` (`ip: mlir.ir.InsertionPoint = None`) 142* `PyMlirContext` (`context: mlir.ir.Context = None`) 143 144In order to support composability of function arguments, when these types appear 145as arguments, they should always be the last and appear in the above order and 146with the given names (which is generally the order in which they are expected to 147need to be expressed explicitly in special cases) as necessary. Each should 148carry a default value of `py::none()` and use either a manual or automatic 149conversion for resolving either with the explicit value or a value from the 150thread context manager (i.e. `DefaultingPyMlirContext` or 151`DefaultingPyLocation`). 152 153The rationale for this is that in Python, trailing keyword arguments to the 154*right* are the most composable, enabling a variety of strategies such as kwarg 155passthrough, default values, etc. Keeping function signatures composable 156increases the chances that interesting DSLs and higher level APIs can be 157constructed without a lot of exotic boilerplate. 158 159Used consistently, this enables a style of IR construction that rarely needs to 160use explicit contexts, locations, or insertion points but is free to do so when 161extra control is needed. 162 163#### Operation hierarchy 164 165As mentioned above, `PyOperation` is special because it can exist in either a 166top-level or dependent state. The life-cycle is unidirectional: operations can 167be created detached (top-level) and once added to another operation, they are 168then dependent for the remainder of their lifetime. The situation is more 169complicated when considering construction scenarios where an operation is added 170to a transitive parent that is still detached, necessitating further accounting 171at such transition points (i.e. all such added children are initially added to 172the IR with a parent of their outer-most detached operation, but then once it is 173added to an attached operation, they need to be re-parented to the containing 174module). 175 176Due to the validity and parenting accounting needs, `PyOperation` is the owner 177for regions and blocks and needs to be a top-level type that we can count on not 178aliasing. This let's us do things like selectively invalidating instances when 179mutations occur without worrying that there is some alias to the same operation 180in the hierarchy. Operations are also the only entity that are allowed to be in 181a detached state, and they are interned at the context level so that there is 182never more than one Python `mlir.ir.Operation` object for a unique 183`MlirOperation`, regardless of how it is obtained. 184 185The C/C++ API allows for Region/Block to also be detached, but it simplifies the 186ownership model a lot to eliminate that possibility in this API, allowing the 187Region/Block to be completely dependent on its owning operation for accounting. 188The aliasing of Python `Region`/`Block` instances to underlying 189`MlirRegion`/`MlirBlock` is considered benign and these objects are not interned 190in the context (unlike operations). 191 192If we ever want to re-introduce detached regions/blocks, we could do so with new 193"DetachedRegion" class or similar and also avoid the complexity of accounting. 194With the way it is now, we can avoid having a global live list for regions and 195blocks. We may end up needing an op-local one at some point TBD, depending on 196how hard it is to guarantee how mutations interact with their Python peer 197objects. We can cross that bridge easily when we get there. 198 199Module, when used purely from the Python API, can't alias anyway, so we can use 200it as a top-level ref type without a live-list for interning. If the API ever 201changes such that this cannot be guaranteed (i.e. by letting you marshal a 202native-defined Module in), then there would need to be a live table for it too. 203 204## Style 205 206In general, for the core parts of MLIR, the Python bindings should be largely 207isomorphic with the underlying C++ structures. However, concessions are made 208either for practicality or to give the resulting library an appropriately 209"Pythonic" flavor. 210 211### Properties vs get\*() methods 212 213Generally favor converting trivial methods like `getContext()`, `getName()`, 214`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is 215primarily a matter of calling `def_property_readonly` vs `def` in binding code, 216and makes things feel much nicer to the Python side. 217 218For example, prefer: 219 220```c++ 221m.def_property_readonly("context", ...) 222``` 223 224Over: 225 226```c++ 227m.def("getContext", ...) 228``` 229 230### __repr__ methods 231 232Things that have nice printed representations are really great :) If there is a 233reasonable printed form, it can be a significant productivity boost to wire that 234to the `__repr__` method (and verify it with a [doctest](#sample-doctest)). 235 236### CamelCase vs snake\_case 237 238Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As 239a mechanical concession to Python style, this can go a long way to making the 240API feel like it fits in with its peers in the Python landscape. 241 242If in doubt, choose names that will flow properly with other 243[PEP 8 style names](https://pep8.org/#descriptive-naming-styles). 244 245### Prefer pseudo-containers 246 247Many core IR constructs provide methods directly on the instance to query count 248and begin/end iterators. Prefer hoisting these to dedicated pseudo containers. 249 250For example, a direct mapping of blocks within regions could be done this way: 251 252```python 253region = ... 254 255for block in region: 256 257 pass 258``` 259 260However, this way is preferred: 261 262```python 263region = ... 264 265for block in region.blocks: 266 267 pass 268 269print(len(region.blocks)) 270print(region.blocks[0]) 271print(region.blocks[-1]) 272``` 273 274Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate 275them to appropriate `__dunder__` methods and iterator wrappers in the bindings. 276 277Note that this can be taken too far, so use good judgment. For example, block 278arguments may appear container-like but have defined methods for lookup and 279mutation that would be hard to model properly without making semantics 280complicated. If running into these, just mirror the C/C++ API. 281 282### Provide one stop helpers for common things 283 284One stop helpers that aggregate over multiple low level entities can be 285incredibly helpful and are encouraged within reason. For example, making 286`Context` have a `parse_asm` or equivalent that avoids needing to explicitly 287construct a SourceMgr can be quite nice. One stop helpers do not have to be 288mutually exclusive with a more complete mapping of the backing constructs. 289 290## Testing 291 292Tests should be added in the `test/Bindings/Python` directory and should 293typically be `.py` files that have a lit run line. 294 295While lit can run any python module, prefer to lay tests out according to these 296rules: 297 298* For tests of the API surface area, prefer 299 [`doctest`](https://docs.python.org/3/library/doctest.html). 300* For generative tests (those that produce IR), define a Python module that 301 constructs/prints the IR and pipe it through `FileCheck`. 302* Parsing should be kept self-contained within the module under test by use of 303 raw constants and an appropriate `parse_asm` call. 304* Any file I/O code should be staged through a tempfile vs relying on file 305 artifacts/paths outside of the test module. 306 307### Sample Doctest 308 309```python 310# RUN: %PYTHON %s 311 312""" 313 >>> m = load_test_module() 314Test basics: 315 >>> m.operation.name 316 "module" 317 >>> m.operation.is_registered 318 True 319 >>> ... etc ... 320 321Verify that repr prints: 322 >>> m.operation 323 <operation 'module'> 324""" 325 326import mlir 327 328TEST_MLIR_ASM = r""" 329func @test_operation_correct_regions() { 330 // ... 331} 332""" 333 334# TODO: Move to a test utility class once any of this actually exists. 335def load_test_module(): 336 ctx = mlir.ir.Context() 337 ctx.allow_unregistered_dialects = True 338 module = ctx.parse_asm(TEST_MLIR_ASM) 339 return module 340 341 342if __name__ == "__main__": 343 import doctest 344 doctest.testmod() 345``` 346 347### Sample FileCheck test 348 349```python 350# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck 351 352# TODO: Move to a test utility class once any of this actually exists. 353def print_module(f): 354 m = f() 355 print("// -----") 356 print("// TEST_FUNCTION:", f.__name__) 357 print(m.to_asm()) 358 return f 359 360# CHECK-LABEL: TEST_FUNCTION: create_my_op 361@print_module 362def create_my_op(): 363 m = mlir.ir.Module() 364 builder = m.new_op_builder() 365 # CHECK: mydialect.my_operation ... 366 builder.my_op() 367 return m 368``` 369