1# TensorFlow Lite inference
2
3The term *inference* refers to the process of executing a TensorFlow Lite model
4on-device in order to make predictions based on input data. To perform an
5inference with a TensorFlow Lite model, you must run it through an
6*interpreter*. The TensorFlow Lite interpreter is designed to be lean and fast.
7The interpreter uses a static graph ordering and a custom (less-dynamic) memory
8allocator to ensure minimal load, initialization, and execution latency.
9
10This page describes how to access to the TensorFlow Lite interpreter and perform
11an inference using C++, Java, and Python, plus links to other resources for each
12[supported platform](#supported-platforms).
13
14[TOC]
15
16## Important concepts
17
18TensorFlow Lite inference typically follows the following steps:
19
201.  **Loading a model**
21
22    You must load the `.tflite` model into memory, which contains the model's
23    execution graph.
24
251.  **Transforming data**
26
27    Raw input data for the model generally does not match the input data format
28    expected by the model. For example, you might need to resize an image or
29    change the image format to be compatible with the model.
30
311.  **Running inference**
32
33    This step involves using the TensorFlow Lite API to execute the model. It
34    involves a few steps such as building the interpreter, and allocating
35    tensors, as described in the following sections.
36
371.  **Interpreting output**
38
39    When you receive results from the model inference, you must interpret the
40    tensors in a meaningful way that's useful in your application.
41
42    For example, a model might return only a list of probabilities. It's up to
43    you to map the probabilities to relevant categories and present it to your
44    end-user.
45
46## Supported platforms
47
48TensorFlow inference APIs are provided for most common mobile/embedded platforms
49such as [Android](#android-platform), [iOS](#ios-platform) and
50[Linux](#linux-platform), in multiple programming languages.
51
52In most cases, the API design reflects a preference for performance over ease of
53use. TensorFlow Lite is designed for fast inference on small devices, so it
54should be no surprise that the APIs try to avoid unnecessary copies at the
55expense of convenience. Similarly, consistency with TensorFlow APIs was not an
56explicit goal and some variance between languages is to be expected.
57
58Across all libraries, the TensorFlow Lite API enables you to load models, feed
59inputs, and retrieve inference outputs.
60
61### Android Platform
62
63On Android, TensorFlow Lite inference can be performed using either Java or C++
64APIs. The Java APIs provide convenience and can be used directly within your
65Android Activity classes. The C++ APIs offer more flexibility and speed, but may
66require writing JNI wrappers to move data between Java and C++ layers.
67
68See below for details about using [C++](#load-and-run-a-model-in-c) and
69[Java](#load-and-run-a-model-in-java), or follow the
70[Android quickstart](android.md) for a tutorial and example code.
71
72#### TensorFlow Lite Android wrapper code generator
73
74Note: TensorFlow Lite wrapper code generator is in experimental (beta) phase and
75it currently only supports Android.
76
77For TensorFlow Lite model enhanced with [metadata](../convert/metadata.md),
78developers can use the TensorFlow Lite Android wrapper code generator to create
79platform specific wrapper code. The wrapper code removes the need to interact
80directly with `ByteBuffer` on Android. Instead, developers can interact with the
81TensorFlow Lite model with typed objects such as `Bitmap` and `Rect`. For more
82information, please refer to the
83[TensorFlow Lite Android wrapper code generator](../inference_with_metadata/codegen.md).
84
85### iOS Platform
86
87On iOS, TensorFlow Lite is available with native iOS libraries written in
88[Swift](https://www.tensorflow.org/code/tensorflow/lite/swift)
89and
90[Objective-C](https://www.tensorflow.org/code/tensorflow/lite/objc).
91You can also use
92[C API](https://www.tensorflow.org/code/tensorflow/lite/c/c_api.h)
93directly in Objective-C codes.
94
95See below for details about using [Swift](#load-and-run-a-model-in-swift),
96[Objective-C](#load-and-run-a-model-in-objective-c) and the
97[C API](#using-c-api-in-objective-c-code), or follow the
98[iOS quickstart](ios.md) for a tutorial and example code.
99
100### Linux Platform
101
102On Linux platforms (including [Raspberry Pi](build_rpi.md)), you can run
103inferences using TensorFlow Lite APIs available in
104[C++](#load-and-run-a-model-in-c) and [Python](#load-and-run-a-model-in-python),
105as shown in the following sections.
106
107## Running a model
108
109Running a TensorFlow Lite model involves a few simple steps:
110
1111.  Load the model into memory.
1122.  Build an `Interpreter` based on an existing model.
1133.  Set input tensor values. (Optionally resize input tensors if the predefined
114    sizes are not desired.)
1154.  Invoke inference.
1165.  Read output tensor values.
117
118Following sections describe how these steps can be done in each language.
119
120## Load and run a model in Java
121
122*Platform: Android*
123
124The Java API for running an inference with TensorFlow Lite is primarily designed
125for use with Android, so it's available as an Android library dependency:
126`org.tensorflow:tensorflow-lite`.
127
128In Java, you'll use the `Interpreter` class to load a model and drive model
129inference. In many cases, this may be the only API you need.
130
131You can initialize an `Interpreter` using a `.tflite` file:
132
133```java
134public Interpreter(@NotNull File modelFile);
135```
136
137Or with a `MappedByteBuffer`:
138
139```java
140public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);
141```
142
143In both cases, you must provide a valid TensorFlow Lite model or the API throws
144`IllegalArgumentException`. If you use `MappedByteBuffer` to initialize an
145`Interpreter`, it must remain unchanged for the whole lifetime of the
146`Interpreter`.
147
148To then run an inference with the model, simply call `Interpreter.run()`. For
149example:
150
151```java
152try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
153  interpreter.run(input, output);
154}
155```
156
157The `run()` method takes only one input and returns only one output. So if your
158model has multiple inputs or multiple outputs, instead use:
159
160```java
161interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);
162```
163
164In this case, each entry in `inputs` corresponds to an input tensor and
165`map_of_indices_to_outputs` maps indices of output tensors to the corresponding
166output data.
167
168In both cases, the tensor indices should correspond to the values you gave to
169the [TensorFlow Lite Converter](../convert/) when you created the model. Be
170aware that the order of tensors in `input` must match the order given to the
171TensorFlow Lite Converter.
172
173The `Interpreter` class also provides convenient functions for you to get the
174index of any model input or output using an operation name:
175
176```java
177public int getInputIndex(String opName);
178public int getOutputIndex(String opName);
179```
180
181If `opName` is not a valid operation in the model, it throws an
182`IllegalArgumentException`.
183
184Also beware that `Interpreter` owns resources. To avoid memory leak, the
185resources must be released after use by:
186
187```java
188interpreter.close();
189```
190
191For an example project with Java, see the
192[Android image classification sample](https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/android).
193
194### Supported data types (in Java)
195
196To use TensorFlow Lite, the data types of the input and output tensors must be
197one of the following primitive types:
198
199*   `float`
200*   `int`
201*   `long`
202*   `byte`
203
204`String` types are also supported, but they are encoded differently than the
205primitive types. In particular, the shape of a string Tensor dictates the number
206and arrangement of strings in the Tensor, with each element itself being a
207variable length string. In this sense, the (byte) size of the Tensor cannot be
208computed from the shape and type alone, and consequently strings cannot be
209provided as a single, flat `ByteBuffer` argument.
210
211If other data types, including boxed types like `Integer` and `Float`, are used,
212an `IllegalArgumentException` will be thrown.
213
214#### Inputs
215
216Each input should be an array or multi-dimensional array of the supported
217primitive types, or a raw `ByteBuffer` of the appropriate size. If the input is
218an array or multi-dimensional array, the associated input tensor will be
219implicitly resized to the array's dimensions at inference time. If the input is
220a ByteBuffer, the caller should first manually resize the associated input
221tensor (via `Interpreter.resizeInput()`) before running inference.
222
223When using `ByteBuffer`, prefer using direct byte buffers, as this allows the
224`Interpreter` to avoid unnecessary copies. If the `ByteBuffer` is a direct byte
225buffer, its order must be `ByteOrder.nativeOrder()`. After it is used for a
226model inference, it must remain unchanged until the model inference is finished.
227
228#### Outputs
229
230Each output should be an array or multi-dimensional array of the supported
231primitive types, or a ByteBuffer of the appropriate size. Note that some models
232have dynamic outputs, where the shape of output tensors can vary depending on
233the input. There's no straightforward way of handling this with the existing
234Java inference API, but planned extensions will make this possible.
235
236## Load and run a model in Swift
237
238*Platform: iOS*
239
240The
241[Swift API](https://www.tensorflow.org/code/tensorflow/lite/swift)
242is available in `TensorFlowLiteSwift` Pod from Cocoapods.
243
244First, you need to import `TensorFlowLite` module.
245
246```swift
247import TensorFlowLite
248```
249
250```swift
251// Getting model path
252guard
253  let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite")
254else {
255  // Error handling...
256}
257
258do {
259  // Initialize an interpreter with the model.
260  let interpreter = try Interpreter(modelPath: modelPath)
261
262  // Allocate memory for the model's input `Tensor`s.
263  try interpreter.allocateTensors()
264
265  let inputData: Data  // Should be initialized
266
267  // input data preparation...
268
269  // Copy the input data to the input `Tensor`.
270  try self.interpreter.copy(inputData, toInputAt: 0)
271
272  // Run inference by invoking the `Interpreter`.
273  try self.interpreter.invoke()
274
275  // Get the output `Tensor`
276  let outputTensor = try self.interpreter.output(at: 0)
277
278  // Copy output to `Data` to process the inference results.
279  let outputSize = outputTensor.shape.dimensions.reduce(1, {x, y in x * y})
280  let outputData =
281        UnsafeMutableBufferPointer<Float32>.allocate(capacity: outputSize)
282  outputTensor.data.copyBytes(to: outputData)
283
284  if (error != nil) { /* Error handling... */ }
285} catch error {
286  // Error handling...
287}
288```
289
290## Load and run a model in Objective-C
291
292*Platform: iOS*
293
294The
295[Objective-C API](https://www.tensorflow.org/code/tensorflow/lite/objc)
296is available in `TensorFlowLiteObjC` Pod from Cocoapods.
297
298First, you need to import `TensorFlowLite` module.
299
300```objc
301@import TensorFlowLite;
302```
303
304```objc
305NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
306                                                      ofType:@"tflite"];
307NSError *error;
308
309// Initialize an interpreter with the model.
310TFLInterpreter *interpreter = [[TFLInterpreter alloc] initWithModelPath:modelPath
311                                                                  error:&error];
312if (error != nil) { /* Error handling... */ }
313
314// Allocate memory for the model's input `TFLTensor`s.
315[interpreter allocateTensorsWithError:&error];
316if (error != nil) { /* Error handling... */ }
317
318NSMutableData *inputData;  // Should be initialized
319// input data preparation...
320
321// Copy the input data to the input `TFLTensor`.
322[interpreter copyData:inputData toInputTensorAtIndex:0 error:&error];
323if (error != nil) { /* Error handling... */ }
324
325// Run inference by invoking the `TFLInterpreter`.
326[interpreter invokeWithError:&error];
327if (error != nil) { /* Error handling... */ }
328
329// Get the output `TFLTensor`
330TFLTensor *outputTensor = [interpreter outputTensorAtIndex:0 error:&error];
331if (error != nil) { /* Error handling... */ }
332
333// Copy output to `NSData` to process the inference results.
334NSData *outputData = [outputTensor dataWithError:&amp;error];
335if (error != nil) { /* Error handling... */ }
336```
337
338### Using C API in Objective-C code
339
340Currently Objective-C API does not support delegates. In order to use delegates
341with Objective-C code, you need to directly call underlying
342[C API](https://www.tensorflow.org/code/tensorflow/lite/c/c_api.h).
343
344```c
345#include "tensorflow/lite/c/c_api.h"
346```
347
348```c
349TfLiteModel* model = TfLiteModelCreateFromFile([modelPath UTF8String]);
350TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
351
352// Create the interpreter.
353TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);
354
355// Allocate tensors and populate the input tensor data.
356TfLiteInterpreterAllocateTensors(interpreter);
357TfLiteTensor* input_tensor =
358    TfLiteInterpreterGetInputTensor(interpreter, 0);
359TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
360                           input.size() * sizeof(float));
361
362// Execute inference.
363TfLiteInterpreterInvoke(interpreter);
364
365// Extract the output tensor data.
366const TfLiteTensor* output_tensor =
367    TfLiteInterpreterGetOutputTensor(interpreter, 0);
368TfLiteTensorCopyToBuffer(output_tensor, output.data(),
369                         output.size() * sizeof(float));
370
371// Dispose of the model and interpreter objects.
372TfLiteInterpreterDelete(interpreter);
373TfLiteInterpreterOptionsDelete(options);
374TfLiteModelDelete(model);
375```
376
377## Load and run a model in C++
378
379*Platforms: Android, iOS, and Linux*
380
381Note: C++ API on iOS is only available when using bazel.
382
383In C++, the model is stored in
384[`FlatBufferModel`](https://www.tensorflow.org/lite/api_docs/cc/class/tflite/flat-buffer-model.html)
385class. It encapsulates a TensorFlow Lite model and you can build it in a couple
386of different ways, depending on where the model is stored:
387
388```c++
389class FlatBufferModel {
390  // Build a model based on a file. Return a nullptr in case of failure.
391  static std::unique_ptr<FlatBufferModel> BuildFromFile(
392      const char* filename,
393      ErrorReporter* error_reporter);
394
395  // Build a model based on a pre-loaded flatbuffer. The caller retains
396  // ownership of the buffer and should keep it alive until the returned object
397  // is destroyed. Return a nullptr in case of failure.
398  static std::unique_ptr<FlatBufferModel> BuildFromBuffer(
399      const char* buffer,
400      size_t buffer_size,
401      ErrorReporter* error_reporter);
402};
403```
404
405Note: If TensorFlow Lite detects the presence of the
406[Android NNAPI](https://developer.android.com/ndk/guides/neuralnetworks), it
407will automatically try to use shared memory to store the `FlatBufferModel`.
408
409Now that you have the model as a `FlatBufferModel` object, you can execute it
410with an
411[`Interpreter`](https://www.tensorflow.org/lite/api_docs/cc/class/tflite/interpreter.html).
412A single `FlatBufferModel` can be used simultaneously by more than one
413`Interpreter`.
414
415Caution: The `FlatBufferModel` object must remain valid until all instances of
416`Interpreter` using it have been destroyed.
417
418The important parts of the `Interpreter` API are shown in the code snippet
419below. It should be noted that:
420
421*   Tensors are represented by integers, in order to avoid string comparisons
422    (and any fixed dependency on string libraries).
423*   An interpreter must not be accessed from concurrent threads.
424*   Memory allocation for input and output tensors must be triggered by calling
425    `AllocateTensors()` right after resizing tensors.
426
427The simplest usage of TensorFlow Lite with C++ looks like this:
428
429```c++
430// Load the model
431std::unique_ptr<tflite::FlatBufferModel> model =
432    tflite::FlatBufferModel::BuildFromFile(filename);
433
434// Build the interpreter
435tflite::ops::builtin::BuiltinOpResolver resolver;
436std::unique_ptr<tflite::Interpreter> interpreter;
437tflite::InterpreterBuilder(*model, resolver)(&interpreter);
438
439// Resize input tensors, if desired.
440interpreter->AllocateTensors();
441
442float* input = interpreter->typed_input_tensor<float>(0);
443// Fill `input`.
444
445interpreter->Invoke();
446
447float* output = interpreter->typed_output_tensor<float>(0);
448```
449
450For more example code, see
451[`minimal.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/minimal/minimal.cc)
452and
453[`label_image.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/label_image.cc).
454
455## Load and run a model in Python
456
457*Platform: Linux*
458
459The Python API for running an inference is provided in the `tf.lite` module.
460From which, you mostly need only
461[`tf.lite.Interpreter`](https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter)
462to load a model and run an inference.
463
464The following example shows how to use the Python interpreter to load a
465`.tflite` file and run inference with random input data:
466
467```python
468import numpy as np
469import tensorflow as tf
470
471# Load the TFLite model and allocate tensors.
472interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
473interpreter.allocate_tensors()
474
475# Get input and output tensors.
476input_details = interpreter.get_input_details()
477output_details = interpreter.get_output_details()
478
479# Test the model on random input data.
480input_shape = input_details[0]['shape']
481input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
482interpreter.set_tensor(input_details[0]['index'], input_data)
483
484interpreter.invoke()
485
486# The function `get_tensor()` returns a copy of the tensor data.
487# Use `tensor()` in order to get a pointer to the tensor.
488output_data = interpreter.get_tensor(output_details[0]['index'])
489print(output_data)
490```
491
492As an alternative to loading the model as a pre-converted `.tflite` file, you
493can combine your code with the
494[TensorFlow Lite Converter Python API](https://www.tensorflow.org/lite/convert/python_api)
495(`tf.lite.TFLiteConverter`), allowing you to convert your TensorFlow model into
496the TensorFlow Lite format and then run inference:
497
498```python
499import numpy as np
500import tensorflow as tf
501
502img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
503const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
504val = img + const
505out = tf.identity(val, name="out")
506
507# Convert to TF Lite format
508with tf.Session() as sess:
509  converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
510  tflite_model = converter.convert()
511
512# Load the TFLite model and allocate tensors.
513interpreter = tf.lite.Interpreter(model_content=tflite_model)
514interpreter.allocate_tensors()
515
516# Continue to get tensors and so forth, as shown above...
517```
518
519For more Python sample code, see
520[`label_image.py`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py).
521
522Tip: Run `help(tf.lite.Interpreter)` in the Python terminal to get detailed
523documentation about the interpreter.
524
525## Supported operations
526
527TensorFlow Lite supports a subset of TensorFlow operations with some
528limitations. For full list of operations and limitations see
529[TF Lite Ops page](https://www.tensorflow.org/mlir/tfl_ops).
530