1# Object detection 2 3Given an image or a video stream, an object detection model can identify which 4of a known set of objects might be present and provide information about their 5positions within the image. 6 7For example, this screenshot of the <a href="#get_started">example 8application</a> shows how two objects have been recognized and their positions 9annotated: 10 11<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%"> 12 13## Get started 14 15To learn how to use object detection in a mobile app, explore the 16<a href="#example_applications_and_guides">Example applications and guides</a>. 17 18If you are using a platform other than Android or iOS, or if you are already 19familiar with the 20<a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite 21APIs</a>, you can download our starter object detection model and the 22accompanying labels. 23 24<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download 25starter model with Metadata</a> 26 27For more information about Metadata and associated fields (eg: `labels.txt`) see 28<a href="https://www.tensorflow.org/lite/convert/metadata#read_the_metadata_from_models">Read 29the metadata from models</a> 30 31If you want to train a custom detection model for your own task, see 32<a href="#model-customization">Model customization</a>. 33 34For the following use cases, you should use a different type of model: 35 36<ul> 37 <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li> 38 <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li> 39</ul> 40 41### Example applications and guides 42 43If you are new to TensorFlow Lite and are working with Android or iOS, we 44recommend exploring the following example applications that can help you get 45started. 46 47#### Android 48 49You can leverage the out-of-box API from 50[TensorFlow Lite Task Library](../../inference_with_metadata/task_library/object_detector) 51to integrate object detection models in just a few lines of code. You can also 52build your own custom inference pipeline using the 53[TensorFlow Lite Interpreter Java API](../../guide/inference#load_and_run_a_model_in_java). 54 55The Android example below demonstrates the implementation for both methods as 56[lib_task_api](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android/lib_task_api) 57and 58[lib_interpreter](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android/lib_interpreter), 59respectively. 60 61<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android">View 62Android example</a> 63 64#### iOS 65 66You can integrate the model using the 67[TensorFlow Lite Interpreter Swift API](../../guide/inference#load_and_run_a_model_in_swift). 68See the iOS example below. 69 70<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios">View 71iOS example</a> 72 73## Model description 74 75This section describes the signature for 76[Single-Shot Detector](https://arxiv.org/abs/1512.02325) models converted to 77TensorFlow Lite from the 78[TensorFlow Object Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/). 79 80An object detection model is trained to detect the presence and location of 81multiple classes of objects. For example, a model might be trained with images 82that contain various pieces of fruit, along with a _label_ that specifies the 83class of fruit they represent (e.g. an apple, a banana, or a strawberry), and 84data specifying where each object appears in the image. 85 86When an image is subsequently provided to the model, it will output a list of 87the objects it detects, the location of a bounding box that contains each 88object, and a score that indicates the confidence that detection was correct. 89 90### Input Signature 91 92The model takes an image as input. 93 94Lets assume the expected image is 300x300 pixels, with three channels (red, 95blue, and green) per pixel. This should be fed to the model as a flattened 96buffer of 270,000 byte values (300x300x3). If the model is 97<a href="../../performance/post_training_quantization.md">quantized</a>, each 98value should be a single byte representing a value between 0 and 255. 99 100You can take a look at our 101[example app code](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android) 102to understand how to do this pre-processing on Android. 103 104### Output Signature 105 106The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2 107describe `N` detected objects, with one element in each array corresponding to 108each object. 109 110<table> 111 <thead> 112 <tr> 113 <th>Index</th> 114 <th>Name</th> 115 <th>Description</th> 116 </tr> 117 </thead> 118 <tbody> 119 <tr> 120 <td>0</td> 121 <td>Locations</td> 122 <td>Multidimensional array of [N][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td> 123 </tr> 124 <tr> 125 <td>1</td> 126 <td>Classes</td> 127 <td>Array of N integers (output as floating point values) each indicating the index of a class label from the labels file</td> 128 </tr> 129 <tr> 130 <td>2</td> 131 <td>Scores</td> 132 <td>Array of N floating point values between 0 and 1 representing probability that a class was detected</td> 133 </tr> 134 <tr> 135 <td>3</td> 136 <td>Number of detections</td> 137 <td>Integer value of N</td> 138 </tr> 139 </tbody> 140</table> 141 142NOTE: The number of results (10 in the above case) is a parameter set while 143exporting the detection model to TensorFlow Lite. See 144<a href="#model-customization">Model customization</a> for more details. 145 146For example, imagine a model has been trained to detect apples, bananas, and 147strawberries. When provided an image, it will output a set number of detection 148results - in this example, 5. 149 150<table style="width: 60%;"> 151 <thead> 152 <tr> 153 <th>Class</th> 154 <th>Score</th> 155 <th>Location</th> 156 </tr> 157 </thead> 158 <tbody> 159 <tr> 160 <td>Apple</td> 161 <td>0.92</td> 162 <td>[18, 21, 57, 63]</td> 163 </tr> 164 <tr> 165 <td>Banana</td> 166 <td>0.88</td> 167 <td>[100, 30, 180, 150]</td> 168 </tr> 169 <tr> 170 <td>Strawberry</td> 171 <td>0.87</td> 172 <td>[7, 82, 89, 163] </td> 173 </tr> 174 <tr> 175 <td>Banana</td> 176 <td>0.23</td> 177 <td>[42, 66, 57, 83]</td> 178 </tr> 179 <tr> 180 <td>Apple</td> 181 <td>0.11</td> 182 <td>[6, 42, 31, 58]</td> 183 </tr> 184 </tbody> 185</table> 186 187#### Confidence score 188 189To interpret these results, we can look at the score and the location for each 190detected object. The score is a number between 0 and 1 that indicates confidence 191that the object was genuinely detected. The closer the number is to 1, the more 192confident the model is. 193 194Depending on your application, you can decide a cut-off threshold below which 195you will discard detection results. For the current example, a sensible cut-off 196is a score of 0.5 (meaning a 50% probability that the detection is valid). In 197that case, the last two objects in the array would be ignored because those 198confidence scores are below 0.5: 199 200<table style="width: 60%;"> 201 <thead> 202 <tr> 203 <th>Class</th> 204 <th>Score</th> 205 <th>Location</th> 206 </tr> 207 </thead> 208 <tbody> 209 <tr> 210 <td>Apple</td> 211 <td>0.92</td> 212 <td>[18, 21, 57, 63]</td> 213 </tr> 214 <tr> 215 <td>Banana</td> 216 <td>0.88</td> 217 <td>[100, 30, 180, 150]</td> 218 </tr> 219 <tr> 220 <td>Strawberry</td> 221 <td>0.87</td> 222 <td>[7, 82, 89, 163] </td> 223 </tr> 224 <tr> 225 <td style="background-color: #e9cecc; text-decoration-line: line-through;">Banana</td> 226 <td style="background-color: #e9cecc; text-decoration-line: line-through;">0.23</td> 227 <td style="background-color: #e9cecc; text-decoration-line: line-through;">[42, 66, 57, 83]</td> 228 </tr> 229 <tr> 230 <td style="background-color: #e9cecc; text-decoration-line: line-through;">Apple</td> 231 <td style="background-color: #e9cecc; text-decoration-line: line-through;">0.11</td> 232 <td style="background-color: #e9cecc; text-decoration-line: line-through;">[6, 42, 31, 58]</td> 233 </tr> 234 </tbody> 235</table> 236 237The cut-off you use should be based on whether you are more comfortable with 238false positives (objects that are wrongly identified, or areas of the image that 239are erroneously identified as objects when they are not), or false negatives 240(genuine objects that are missed because their confidence was low). 241 242For example, in the following image, a pear (which is not an object that the 243model was trained to detect) was misidentified as a "person". This is an example 244of a false positive that could be ignored by selecting an appropriate cut-off. 245In this case, a cut-off of 0.6 (or 60%) would comfortably exclude the false 246positive. 247 248<img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%"> 249 250#### Location 251 252For each detected object, the model will return an array of four numbers 253representing a bounding rectangle that surrounds its position. For the starter 254model provided, the numbers are ordered as follows: 255 256<table style="width: 50%; margin: 0 auto;"> 257 <tbody> 258 <tr style="border-top: none;"> 259 <td>[</td> 260 <td>top,</td> 261 <td>left,</td> 262 <td>bottom,</td> 263 <td>right</td> 264 <td>]</td> 265 </tr> 266 </tbody> 267</table> 268 269The top value represents the distance of the rectangle’s top edge from the top 270of the image, in pixels. The left value represents the left edge’s distance from 271the left of the input image. The other values represent the bottom and right 272edges in a similar manner. 273 274Note: Object detection models accept input images of a specific size. This is likely to be different from the size of the raw image captured by your device’s camera, and you will have to write code to crop and scale your raw image to fit the model’s input size (there are examples of this in our <a href="#get_started">example applications</a>).<br /><br />The pixel values output by the model refer to the position in the cropped and scaled image, so you must scale them to fit the raw image in order to interpret them correctly. 275 276## Performance benchmarks 277 278Performance benchmark numbers for our 279<a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">starter 280model</a> are generated with the tool 281[described here](https://www.tensorflow.org/lite/performance/benchmarks). 282 283<table> 284 <thead> 285 <tr> 286 <th>Model Name</th> 287 <th>Model size </th> 288 <th>Device </th> 289 <th>GPU</th> 290 <th>CPU</th> 291 </tr> 292 </thead> 293 <tr> 294 <td rowspan = 3> 295 <a href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">COCO SSD MobileNet v1</a> 296 </td> 297 <td rowspan = 3> 298 27 Mb 299 </td> 300 <td>Pixel 3 (Android 10) </td> 301 <td>22ms</td> 302 <td>46ms*</td> 303 </tr> 304 <tr> 305 <td>Pixel 4 (Android 10) </td> 306 <td>20ms</td> 307 <td>29ms*</td> 308 </tr> 309 <tr> 310 <td>iPhone XS (iOS 12.4.1) </td> 311 <td>7.6ms</td> 312 <td>11ms** </td> 313 </tr> 314</table> 315 316\* 4 threads used. 317 318\*\* 2 threads used on iPhone for the best performance result. 319 320## Model Customization 321 322### Pre-trained models 323 324Mobile-optimized detection models with a variety of latency and precision 325characteristics can be found in the 326[Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#mobile-models). 327Each one of them follows the input and output signatures described in the 328following sections. 329 330Most of the download zips contain a `model.tflite` file. If there isn't one, a 331TensorFlow Lite flatbuffer can be generated using 332[these instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md). 333SSD models from the 334[TF2 Object Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) 335can also be converted to TensorFlow Lite using the instructions 336[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md). 337It is important to note that detection models cannot be converted directly using 338the [TensorFlow Lite Converter](https://www.tensorflow.org/lite/convert), since 339they require an intermediate step of generating a mobile-friendly source model. 340The scripts linked above perform this step. 341 342Both the 343[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md) 344& 345[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md) 346exporting scripts have parameters that can enable a larger number of output 347objects or slower, more-accurate post processing. Please use `--help` with the 348scripts to see an exhaustive list of supported arguments. 349 350> Currently, on-device inference is only optimized with SSD models. Better 351> support for other architectures like CenterNet and EfficientDet is being 352> investigated. 353 354### How to choose a model to customize? 355 356Each model comes with its own precision (quantified by mAP value) and latency 357characteristics. You should choose a model that works the best for your use-case 358and intended hardware. For example, the 359[Edge TPU](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#pixel4-edge-tpu-models) 360models are ideal for inference on Google's Edge TPU on Pixel 4. 361 362You can use our 363[benchmark tool](https://www.tensorflow.org/lite/performance/measurement) to 364evaluate models and choose the most efficient option available. 365 366## Fine-tuning models on custom data 367 368The pre-trained models we provide are trained to detect 90 classes of objects. 369For a full list of classes, see the labels file in the 370<a href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">model 371metadata</a>. 372 373You can use a technique known as transfer learning to re-train a model to 374recognize classes not in the original set. For example, you could re-train the 375model to detect multiple types of vegetable, despite there only being one 376vegetable in the original training data. To do this, you will need a set of 377training images for each of the new labels you wish to train. Please see our 378[Few-shot detection Colab](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tflite.ipynb) 379as an example of fine-tuning a pre-trained model with few examples. 380 381For fine-tuning with larger datasets, take a look at the these guides for 382training your own models with the TensorFlow Object Detection API: 383[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_training_and_evaluation.md), 384[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md). 385Once trained, they can be converted to a TFLite-friendly format with the 386instructions here: 387[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md), 388[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md) 389