1# Developing a new backend for XLA 2 3This preliminary guide is for early adopters that want to easily retarget 4TensorFlow to their hardware in an efficient manner. The guide is not 5step-by-step and assumes knowledge of [LLVM](http://llvm.org), 6[Bazel](https://bazel.build/), and TensorFlow. 7 8XLA provides an abstract interface that a new architecture or accelerator can 9implement to create a backend to run TensorFlow graphs. Retargeting XLA should 10be significantly simpler and scalable than implementing every existing 11TensorFlow Op for new hardware. 12 13Most implementations will fall into one of the following scenarios: 14 151. Existing CPU architecture not yet officially supported by XLA, with or 16 without an existing [LLVM](http://llvm.org) backend. 172. Non-CPU-like hardware with an existing LLVM backend. 183. Non-CPU-like hardware without an existing LLVM backend. 19 20> Note: An LLVM backend can mean either one of the officially released LLVM 21> backends or a custom LLVM backend developed in-house. 22 23## Scenario 1: Existing CPU architecture not yet officially supported by XLA 24 25In this scenario, start by looking at the existing 26[XLA CPU backend](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/cpu/). 27XLA makes it easy to retarget TensorFlow to different CPUs by using LLVM, since 28the main difference between XLA backends for CPUs is the code generated by LLVM. 29Google tests XLA for x64 and ARM64 architectures. 30 31If the hardware vendor has an LLVM backend for their hardware, it is simple to 32link the backend with the LLVM built with XLA. In JIT mode, the XLA CPU backend 33emits code for the host CPU. For ahead-of-time compilation, 34[`xla::AotCompilationOptions`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h) 35can provide an LLVM triple to configure the target architecture. 36 37If there is no existing LLVM backend but another kind of code generator exists, 38it should be possible to reuse most of the existing CPU backend. 39 40## Scenario 2: Non-CPU-like hardware with an existing LLVM backend 41 42It is possible to model a new 43[`xla::Compiler`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h) 44implementation on the existing 45[`xla::CPUCompiler`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/cpu/cpu_compiler.cc) 46and [`xla::GPUCompiler`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc) 47classes, since these already emit LLVM IR. Depending on the nature of the 48hardware, it is possible that many of the LLVM IR generation aspects will have 49to be changed, but a lot of code can be shared with the existing backends. 50 51A good example to follow is the 52[GPU backend](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/gpu/) 53of XLA. The GPU backend targets a non-CPU-like ISA, and therefore some aspects 54of its code generation are unique to the GPU domain. Other kinds of hardware, 55e.g. DSPs like Hexagon (which has an upstream LLVM backend), can reuse parts of 56the LLVM IR emission logic, but other parts will be unique. 57 58## Scenario 3: Non-CPU-like hardware without an existing LLVM backend 59 60If it is not possible to utilize LLVM, then the best option is to implement a 61new backend for XLA for the desired hardware. This option requires the most 62effort. The classes that need to be implemented are as follows: 63 64* [`StreamExecutor`](https://www.tensorflow.org/code/tensorflow/stream_executor/stream_executor.h): 65 For many devices not all methods of `StreamExecutor` are needed. See 66 existing `StreamExecutor` implementations for details. 67* [`xla::Compiler`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h): 68 This class encapsulates the compilation of an HLO computation into an 69 `xla::Executable`. 70* [`xla::Executable`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/executable.h): 71 This class is used to launch a compiled computation on the platform. 72* [`xla::TransferManager`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/transfer_manager.h): 73 This class enables backends to provide platform-specific mechanisms for 74 constructing XLA literal data from given device memory handles. In other 75 words, it helps encapsulate the transfer of data from the host to the device 76 and back. 77