1 2# TensorFlow Lite guide 3 4TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded 5devices. It enables on-device machine learning inference with low latency and a 6small binary size. TensorFlow Lite also supports hardware acceleration with the 7[Android Neural Networks 8API](https://developer.android.com/ndk/guides/neuralnetworks/index.html). 9 10TensorFlow Lite uses many techniques for achieving low latency such as 11optimizing the kernels for mobile apps, pre-fused activations, and quantized 12kernels that allow smaller and faster (fixed-point math) models. 13 14Most of our TensorFlow Lite documentation is [on 15GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite) 16for the time being. 17 18## What does TensorFlow Lite contain? 19 20TensorFlow Lite supports a set of core operators, both quantized and 21float, which have been tuned for mobile platforms. They incorporate pre-fused 22activations and biases to further enhance performance and quantized 23accuracy. Additionally, TensorFlow Lite also supports using custom operations in 24models. 25 26TensorFlow Lite defines a new model file format, based on 27[FlatBuffers](https://google.github.io/flatbuffers/). FlatBuffers is an 28efficient open-source cross-platform serialization library. It is similar to 29[protocol buffers](https://developers.google.com/protocol-buffers/?hl=en), but 30the primary difference is that FlatBuffers does not need a parsing/unpacking 31step to a secondary representation before you can access data, often coupled 32with per-object memory allocation. Also, the code footprint of FlatBuffers is an 33order of magnitude smaller than protocol buffers. 34 35TensorFlow Lite has a new mobile-optimized interpreter, which has the key goals 36of keeping apps lean and fast. The interpreter uses a static graph ordering and 37a custom (less-dynamic) memory allocator to ensure minimal load, initialization, 38and execution latency. 39 40TensorFlow Lite provides an interface to leverage hardware acceleration, if 41available on the device. It does so via the 42[Android Neural Networks API](https://developer.android.com/ndk/guides/neuralnetworks/index.html), 43available on Android 8.1 (API level 27) and higher. 44 45## Why do we need a new mobile-specific library? 46 47Machine Learning is changing the computing paradigm, and we see an emerging 48trend of new use cases on mobile and embedded devices. Consumer expectations are 49also trending toward natural, human-like interactions with their devices, driven 50by the camera and voice interaction models. 51 52There are several factors which are fueling interest in this domain: 53 54- Innovation at the silicon layer is enabling new possibilities for hardware 55 acceleration, and frameworks such as the Android Neural Networks API make it 56 easy to leverage these. 57 58- Recent advances in real-time computer-vision and spoken language understanding 59 have led to mobile-optimized benchmark models being open sourced 60 (e.g. MobileNets, SqueezeNet). 61 62- Widely-available smart appliances create new possibilities for 63 on-device intelligence. 64 65- Interest in stronger user data privacy paradigms where user data does not need 66 to leave the mobile device. 67 68- Ability to serve ‘offline’ use cases, where the device does not need to be 69 connected to a network. 70 71We believe the next wave of machine learning applications will have significant 72processing on mobile and embedded devices. 73 74## TensorFlow Lite highlights 75 76TensorFlow Lite provides: 77 78- A set of core operators, both quantized and float, many of which have been 79 tuned for mobile platforms. These can be used to create and run custom 80 models. Developers can also write their own custom operators and use them in 81 models. 82 83- A new [FlatBuffers](https://google.github.io/flatbuffers/)-based 84 model file format. 85 86- On-device interpreter with kernels optimized for faster execution on mobile. 87 88- TensorFlow converter to convert TensorFlow-trained models to the TensorFlow 89 Lite format. 90 91- Smaller in size: TensorFlow Lite is smaller than 300KB when all supported 92 operators are linked and less than 200KB when using only the operators needed 93 for supporting InceptionV3 and Mobilenet. 94 95- **Pre-tested models:** 96 97 All of the following models are guaranteed to work out of the box: 98 99 - Inception V3, a popular model for detecting the dominant objects 100 present in an image. 101 102 - [MobileNets](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md), 103 a family of mobile-first computer vision models designed to effectively 104 maximize accuracy while being mindful of the restricted resources for an 105 on-device or embedded application. They are small, low-latency, low-power 106 models parameterized to meet the resource constraints of a variety of use 107 cases. They can be built upon for classification, detection, embeddings 108 and segmentation. MobileNet models are smaller but [lower in 109 accuracy](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html) 110 than Inception V3. 111 112 - On Device Smart Reply, an on-device model which provides one-touch 113 replies for an incoming text message by suggesting contextually relevant 114 messages. The model was built specifically for memory constrained devices 115 such as watches & phones and it has been successfully used to surface 116 [Smart Replies on Android 117 Wear](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html) 118 to all first-party and third-party apps. 119 120 Also see the complete list of 121 [TensorFlow Lite's supported models](hosted_models.md), 122 including the model sizes, performance numbers, and downloadable model files. 123 124- Quantized versions of the MobileNet model, which runs faster than the 125 non-quantized (float) version on CPU. 126 127- New Android demo app to illustrate the use of TensorFlow Lite with a quantized 128 MobileNet model for object classification. 129 130- Java and C++ API support 131 132 133## Getting Started 134 135We recommend you try out TensorFlow Lite with the pre-tested models indicated 136above. If you have an existing model, you will need to test whether your model 137is compatible with both the converter and the supported operator set. To test 138your model, see the 139[documentation on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite). 140 141### Retrain Inception-V3 or MobileNet for a custom data set 142 143The pre-trained models mentioned above have been trained on the ImageNet data 144set, which consists of 1000 predefined classes. If those classes are not 145relevant or useful for your use case, you will need to retrain those 146models. This technique is called transfer learning, which starts with a model 147that has been already trained on a problem and will then be retrained on a 148similar problem. Deep learning from scratch can take days, but transfer learning 149can be done fairly quickly. In order to do this, you'll need to generate your 150custom data set labeled with the relevant classes. 151 152The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/) 153codelab walks through this process step-by-step. The retraining code supports 154retraining for both floating point and quantized inference. 155 156## TensorFlow Lite Architecture 157 158The following diagram shows the architectural design of TensorFlow Lite: 159 160<img src="https://www.tensorflow.org/images/tflite-architecture.jpg" 161 alt="TensorFlow Lite architecture diagram" 162 style="max-width:600px;"> 163 164Starting with a trained TensorFlow model on disk, you'll convert that model to 165the TensorFlow Lite file format (`.tflite`) using the TensorFlow Lite 166Converter. Then you can use that converted file in your mobile application. 167 168Deploying the TensorFlow Lite model file uses: 169 170- Java API: A convenience wrapper around the C++ API on Android. 171 172- C++ API: Loads the TensorFlow Lite Model File and invokes the Interpreter. The 173 same library is available on both Android and iOS. 174 175- Interpreter: Executes the model using a set of kernels. The interpreter 176 supports selective kernel loading; without kernels it is only 100KB, and 300KB 177 with all the kernels loaded. This is a significant reduction from the 1.5M 178 required by TensorFlow Mobile. 179 180- On select Android devices, the Interpreter will use the Android Neural 181 Networks API for hardware acceleration, or default to CPU execution if none 182 are available. 183 184You can also implement custom kernels using the C++ API that can be used by the 185Interpreter. 186 187## Future Work 188 189In future releases, TensorFlow Lite will support more models and built-in 190operators, contain performance improvements for both fixed point and floating 191point models, improvements to the tools to enable easier developer workflows and 192support for other smaller devices and more. As we continue development, we hope 193that TensorFlow Lite will greatly simplify the developer experience of targeting 194a model for small devices. 195 196Future plans include using specialized machine learning hardware to get the best 197possible performance for a particular model on a particular device. 198 199## Next Steps 200 201The TensorFlow Lite [GitHub repository](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite). 202contains additional docs, code samples, and demo applications. 203