1
2# TensorFlow Lite guide
3
4TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded
5devices. It enables on-device machine learning inference with low latency and a
6small binary size. TensorFlow Lite also supports hardware acceleration with the
7[Android Neural Networks
8API](https://developer.android.com/ndk/guides/neuralnetworks/index.html).
9
10TensorFlow Lite uses many techniques for achieving low latency such as
11optimizing the kernels for mobile apps, pre-fused activations, and quantized
12kernels that allow smaller and faster (fixed-point math) models.
13
14Most of our TensorFlow Lite documentation is [on
15GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite)
16for the time being.
17
18## What does TensorFlow Lite contain?
19
20TensorFlow Lite supports a set of core operators, both quantized and
21float, which have been tuned for mobile platforms. They incorporate pre-fused
22activations and biases to further enhance performance and quantized
23accuracy. Additionally, TensorFlow Lite also supports using custom operations in
24models.
25
26TensorFlow Lite defines a new model file format, based on
27[FlatBuffers](https://google.github.io/flatbuffers/). FlatBuffers is an
28efficient open-source cross-platform serialization library. It is similar to
29[protocol buffers](https://developers.google.com/protocol-buffers/?hl=en), but
30the primary difference is that FlatBuffers does not need a parsing/unpacking
31step to a secondary representation before you can access data, often coupled
32with per-object memory allocation. Also, the code footprint of FlatBuffers is an
33order of magnitude smaller than protocol buffers.
34
35TensorFlow Lite has a new mobile-optimized interpreter, which has the key goals
36of keeping apps lean and fast. The interpreter uses a static graph ordering and
37a custom (less-dynamic) memory allocator to ensure minimal load, initialization,
38and execution latency.
39
40TensorFlow Lite provides an interface to leverage hardware acceleration, if
41available on the device. It does so via the
42[Android Neural Networks API](https://developer.android.com/ndk/guides/neuralnetworks/index.html),
43available on Android 8.1 (API level 27) and higher.
44
45## Why do we need a new mobile-specific library?
46
47Machine Learning is changing the computing paradigm, and we see an emerging
48trend of new use cases on mobile and embedded devices. Consumer expectations are
49also trending toward natural, human-like interactions with their devices, driven
50by the camera and voice interaction models.
51
52There are several factors which are fueling interest in this domain:
53
54- Innovation at the silicon layer is enabling new possibilities for hardware
55  acceleration, and frameworks such as the Android Neural Networks API make it
56  easy to leverage these.
57
58- Recent advances in real-time computer-vision and spoken language understanding
59  have led to mobile-optimized benchmark models being open sourced
60  (e.g. MobileNets, SqueezeNet).
61
62- Widely-available smart appliances create new possibilities for
63  on-device intelligence.
64
65- Interest in stronger user data privacy paradigms where user data does not need
66  to leave the mobile device.
67
68- Ability to serve ‘offline’ use cases, where the device does not need to be
69  connected to a network.
70
71We believe the next wave of machine learning applications will have significant
72processing on mobile and embedded devices.
73
74## TensorFlow Lite highlights
75
76TensorFlow Lite provides:
77
78- A set of core operators, both quantized and float, many of which have been
79  tuned for mobile platforms.  These can be used to create and run custom
80  models.  Developers can also write their own custom operators and use them in
81  models.
82
83- A new [FlatBuffers](https://google.github.io/flatbuffers/)-based
84  model file format.
85
86- On-device interpreter with kernels optimized for faster execution on mobile.
87
88- TensorFlow converter to convert TensorFlow-trained models to the TensorFlow
89  Lite format.
90
91- Smaller in size: TensorFlow Lite is smaller than 300KB when all supported
92  operators are linked and less than 200KB when using only the operators needed
93  for supporting InceptionV3 and Mobilenet.
94
95- **Pre-tested models:**
96
97    All of the following models are guaranteed to work out of the box:
98
99    - Inception V3, a popular model for detecting the dominant objects
100      present in an image.
101
102    - [MobileNets](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md),
103      a family of mobile-first computer vision models designed to effectively
104      maximize accuracy while being mindful of the restricted resources for an
105      on-device or embedded application. They are small, low-latency, low-power
106      models parameterized to meet the resource constraints of a variety of use
107      cases. They can be built upon for classification, detection, embeddings
108      and segmentation. MobileNet models are smaller but [lower in
109      accuracy](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
110      than Inception V3.
111
112    - On Device Smart Reply, an on-device model which provides one-touch
113      replies for an incoming text message by suggesting contextually relevant
114      messages. The model was built specifically for memory constrained devices
115      such as watches & phones and it has been successfully used to surface
116      [Smart Replies on Android
117      Wear](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html)
118      to all first-party and third-party apps.
119
120    Also see the complete list of
121    [TensorFlow Lite's supported models](hosted_models.md),
122    including the model sizes, performance numbers, and downloadable model files.
123
124- Quantized versions of the MobileNet model, which runs faster than the
125  non-quantized (float) version on CPU.
126
127- New Android demo app to illustrate the use of TensorFlow Lite with a quantized
128  MobileNet model for object classification.
129
130- Java and C++ API support
131
132
133## Getting Started
134
135We recommend you try out TensorFlow Lite with the pre-tested models indicated
136above. If you have an existing model, you will need to test whether your model
137is compatible with both the converter and the supported operator set.  To test
138your model, see the
139[documentation on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite).
140
141### Retrain Inception-V3 or MobileNet for a custom data set
142
143The pre-trained models mentioned above have been trained on the ImageNet data
144set, which consists of 1000 predefined classes. If those classes are not
145relevant or useful for your use case, you will need to retrain those
146models. This technique is called transfer learning, which starts with a model
147that has been already trained on a problem and will then be retrained on a
148similar problem. Deep learning from scratch can take days, but transfer learning
149can be done fairly quickly. In order to do this, you'll need to generate your
150custom data set labeled with the relevant classes.
151
152The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/)
153codelab walks through this process step-by-step. The retraining code supports
154retraining for both floating point and quantized inference.
155
156## TensorFlow Lite Architecture
157
158The following diagram shows the architectural design of TensorFlow Lite:
159
160<img src="https://www.tensorflow.org/images/tflite-architecture.jpg"
161     alt="TensorFlow Lite architecture diagram"
162     style="max-width:600px;">
163
164Starting with a trained TensorFlow model on disk, you'll convert that model to
165the TensorFlow Lite file format (`.tflite`) using the TensorFlow Lite
166Converter. Then you can use that converted file in your mobile application.
167
168Deploying the TensorFlow Lite model file uses:
169
170- Java API: A convenience wrapper around the C++ API on Android.
171
172- C++ API: Loads the TensorFlow Lite Model File and invokes the Interpreter. The
173  same library is available on both Android and iOS.
174
175- Interpreter: Executes the model using a set of kernels. The interpreter
176  supports selective kernel loading; without kernels it is only 100KB, and 300KB
177  with all the kernels loaded. This is a significant reduction from the 1.5M
178  required by TensorFlow Mobile.
179
180- On select Android devices, the Interpreter will use the Android Neural
181  Networks API for hardware acceleration, or default to CPU execution if none
182  are available.
183
184You can also implement custom kernels using the C++ API that can be used by the
185Interpreter.
186
187## Future Work
188
189In future releases, TensorFlow Lite will support more models and built-in
190operators, contain performance improvements for both fixed point and floating
191point models, improvements to the tools to enable easier developer workflows and
192support for other smaller devices and more. As we continue development, we hope
193that TensorFlow Lite will greatly simplify the developer experience of targeting
194a model for small devices.
195
196Future plans include using specialized machine learning hardware to get the best
197possible performance for a particular model on a particular device.
198
199## Next Steps
200
201The TensorFlow Lite [GitHub repository](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite).
202contains additional docs, code samples, and demo applications.
203