1# Object detection
2
3Given an image or a video stream, an object detection model can identify which
4of a known set of objects might be present and provide information about their
5positions within the image.
6
7For example, this screenshot of the <a href="#get_started">example
8application</a> shows how two objects have been recognized and their positions
9annotated:
10
11<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">
12
13## Get started
14
15To learn how to use object detection in a mobile app, explore the
16<a href="#example_applications_and_guides">Example applications and guides</a>.
17
18If you are using a platform other than Android or iOS, or if you are already
19familiar with the
20<a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite
21APIs</a>, you can download our starter object detection model and the
22accompanying labels.
23
24<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
25starter model with Metadata</a>
26
27For more information about Metadata and associated fields (eg: `labels.txt`) see
28<a href="https://www.tensorflow.org/lite/convert/metadata#read_the_metadata_from_models">Read
29the metadata from models</a>
30
31If you want to train a custom detection model for your own task, see
32<a href="#model-customization">Model customization</a>.
33
34For the following use cases, you should use a different type of model:
35
36<ul>
37  <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
38  <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
39</ul>
40
41### Example applications and guides
42
43If you are new to TensorFlow Lite and are working with Android or iOS, we
44recommend exploring the following example applications that can help you get
45started.
46
47#### Android
48
49You can leverage the out-of-box API from
50[TensorFlow Lite Task Library](../../inference_with_metadata/task_library/object_detector)
51to integrate object detection models in just a few lines of code. You can also
52build your own custom inference pipeline using the
53[TensorFlow Lite Interpreter Java API](../../guide/inference#load_and_run_a_model_in_java).
54
55The Android example below demonstrates the implementation for both methods as
56[lib_task_api](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android/lib_task_api)
57and
58[lib_interpreter](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android/lib_interpreter),
59respectively.
60
61<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android">View
62Android example</a>
63
64#### iOS
65
66You can integrate the model using the
67[TensorFlow Lite Interpreter Swift API](../../guide/inference#load_and_run_a_model_in_swift).
68See the iOS example below.
69
70<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios">View
71iOS example</a>
72
73## Model description
74
75This section describes the signature for
76[Single-Shot Detector](https://arxiv.org/abs/1512.02325) models converted to
77TensorFlow Lite from the
78[TensorFlow Object Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/).
79
80An object detection model is trained to detect the presence and location of
81multiple classes of objects. For example, a model might be trained with images
82that contain various pieces of fruit, along with a _label_ that specifies the
83class of fruit they represent (e.g. an apple, a banana, or a strawberry), and
84data specifying where each object appears in the image.
85
86When an image is subsequently provided to the model, it will output a list of
87the objects it detects, the location of a bounding box that contains each
88object, and a score that indicates the confidence that detection was correct.
89
90### Input Signature
91
92The model takes an image as input.
93
94Lets assume the expected image is 300x300 pixels, with three channels (red,
95blue, and green) per pixel. This should be fed to the model as a flattened
96buffer of 270,000 byte values (300x300x3). If the model is
97<a href="../../performance/post_training_quantization.md">quantized</a>, each
98value should be a single byte representing a value between 0 and 255.
99
100You can take a look at our
101[example app code](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android)
102to understand how to do this pre-processing on Android.
103
104### Output Signature
105
106The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
107describe `N` detected objects, with one element in each array corresponding to
108each object.
109
110<table>
111  <thead>
112    <tr>
113      <th>Index</th>
114      <th>Name</th>
115      <th>Description</th>
116    </tr>
117  </thead>
118  <tbody>
119    <tr>
120      <td>0</td>
121      <td>Locations</td>
122      <td>Multidimensional array of [N][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
123    </tr>
124    <tr>
125      <td>1</td>
126      <td>Classes</td>
127      <td>Array of N integers (output as floating point values) each indicating the index of a class label from the labels file</td>
128    </tr>
129    <tr>
130      <td>2</td>
131      <td>Scores</td>
132      <td>Array of N floating point values between 0 and 1 representing probability that a class was detected</td>
133    </tr>
134    <tr>
135      <td>3</td>
136      <td>Number of detections</td>
137      <td>Integer value of N</td>
138    </tr>
139  </tbody>
140</table>
141
142NOTE: The number of results (10 in the above case) is a parameter set while
143exporting the detection model to TensorFlow Lite. See
144<a href="#model-customization">Model customization</a> for more details.
145
146For example, imagine a model has been trained to detect apples, bananas, and
147strawberries. When provided an image, it will output a set number of detection
148results - in this example, 5.
149
150<table style="width: 60%;">
151  <thead>
152    <tr>
153      <th>Class</th>
154      <th>Score</th>
155      <th>Location</th>
156    </tr>
157  </thead>
158  <tbody>
159    <tr>
160      <td>Apple</td>
161      <td>0.92</td>
162      <td>[18, 21, 57, 63]</td>
163    </tr>
164    <tr>
165      <td>Banana</td>
166      <td>0.88</td>
167      <td>[100, 30, 180, 150]</td>
168    </tr>
169    <tr>
170      <td>Strawberry</td>
171      <td>0.87</td>
172      <td>[7, 82, 89, 163] </td>
173    </tr>
174    <tr>
175      <td>Banana</td>
176      <td>0.23</td>
177      <td>[42, 66, 57, 83]</td>
178    </tr>
179    <tr>
180      <td>Apple</td>
181      <td>0.11</td>
182      <td>[6, 42, 31, 58]</td>
183    </tr>
184  </tbody>
185</table>
186
187#### Confidence score
188
189To interpret these results, we can look at the score and the location for each
190detected object. The score is a number between 0 and 1 that indicates confidence
191that the object was genuinely detected. The closer the number is to 1, the more
192confident the model is.
193
194Depending on your application, you can decide a cut-off threshold below which
195you will discard detection results. For the current example, a sensible cut-off
196is a score of 0.5 (meaning a 50% probability that the detection is valid). In
197that case, the last two objects in the array would be ignored because those
198confidence scores are below 0.5:
199
200<table style="width: 60%;">
201  <thead>
202    <tr>
203      <th>Class</th>
204      <th>Score</th>
205      <th>Location</th>
206    </tr>
207  </thead>
208  <tbody>
209    <tr>
210      <td>Apple</td>
211      <td>0.92</td>
212      <td>[18, 21, 57, 63]</td>
213    </tr>
214    <tr>
215      <td>Banana</td>
216      <td>0.88</td>
217      <td>[100, 30, 180, 150]</td>
218    </tr>
219    <tr>
220      <td>Strawberry</td>
221      <td>0.87</td>
222      <td>[7, 82, 89, 163] </td>
223    </tr>
224    <tr>
225      <td style="background-color: #e9cecc; text-decoration-line: line-through;">Banana</td>
226      <td style="background-color: #e9cecc; text-decoration-line: line-through;">0.23</td>
227      <td style="background-color: #e9cecc; text-decoration-line: line-through;">[42, 66, 57, 83]</td>
228    </tr>
229    <tr>
230      <td style="background-color: #e9cecc; text-decoration-line: line-through;">Apple</td>
231      <td style="background-color: #e9cecc; text-decoration-line: line-through;">0.11</td>
232      <td style="background-color: #e9cecc; text-decoration-line: line-through;">[6, 42, 31, 58]</td>
233    </tr>
234  </tbody>
235</table>
236
237The cut-off you use should be based on whether you are more comfortable with
238false positives (objects that are wrongly identified, or areas of the image that
239are erroneously identified as objects when they are not), or false negatives
240(genuine objects that are missed because their confidence was low).
241
242For example, in the following image, a pear (which is not an object that the
243model was trained to detect) was misidentified as a "person". This is an example
244of a false positive that could be ignored by selecting an appropriate cut-off.
245In this case, a cut-off of 0.6 (or 60%) would comfortably exclude the false
246positive.
247
248<img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%">
249
250#### Location
251
252For each detected object, the model will return an array of four numbers
253representing a bounding rectangle that surrounds its position. For the starter
254model provided, the numbers are ordered as follows:
255
256<table style="width: 50%; margin: 0 auto;">
257  <tbody>
258    <tr style="border-top: none;">
259      <td>[</td>
260      <td>top,</td>
261      <td>left,</td>
262      <td>bottom,</td>
263      <td>right</td>
264      <td>]</td>
265    </tr>
266  </tbody>
267</table>
268
269The top value represents the distance of the rectangle’s top edge from the top
270of the image, in pixels. The left value represents the left edge’s distance from
271the left of the input image. The other values represent the bottom and right
272edges in a similar manner.
273
274Note: Object detection models accept input images of a specific size. This is likely to be different from the size of the raw image captured by your device’s camera, and you will have to write code to crop and scale your raw image to fit the model’s input size (there are examples of this in our <a href="#get_started">example applications</a>).<br /><br />The pixel values output by the model refer to the position in the cropped and scaled image, so you must scale them to fit the raw image in order to interpret them correctly.
275
276## Performance benchmarks
277
278Performance benchmark numbers for our
279<a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">starter
280model</a> are generated with the tool
281[described here](https://www.tensorflow.org/lite/performance/benchmarks).
282
283<table>
284  <thead>
285    <tr>
286      <th>Model Name</th>
287      <th>Model size </th>
288      <th>Device </th>
289      <th>GPU</th>
290      <th>CPU</th>
291    </tr>
292  </thead>
293  <tr>
294    <td rowspan = 3>
295      <a href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">COCO SSD MobileNet v1</a>
296    </td>
297    <td rowspan = 3>
298      27 Mb
299    </td>
300    <td>Pixel 3 (Android 10) </td>
301    <td>22ms</td>
302    <td>46ms*</td>
303  </tr>
304   <tr>
305     <td>Pixel 4 (Android 10) </td>
306    <td>20ms</td>
307    <td>29ms*</td>
308  </tr>
309   <tr>
310     <td>iPhone XS (iOS 12.4.1) </td>
311     <td>7.6ms</td>
312    <td>11ms** </td>
313  </tr>
314</table>
315
316\* 4 threads used.
317
318\*\* 2 threads used on iPhone for the best performance result.
319
320## Model Customization
321
322### Pre-trained models
323
324Mobile-optimized detection models with a variety of latency and precision
325characteristics can be found in the
326[Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#mobile-models).
327Each one of them follows the input and output signatures described in the
328following sections.
329
330Most of the download zips contain a `model.tflite` file. If there isn't one, a
331TensorFlow Lite flatbuffer can be generated using
332[these instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md).
333SSD models from the
334[TF2 Object Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)
335can also be converted to TensorFlow Lite using the instructions
336[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md).
337It is important to note that detection models cannot be converted directly using
338the [TensorFlow Lite Converter](https://www.tensorflow.org/lite/convert), since
339they require an intermediate step of generating a mobile-friendly source model.
340The scripts linked above perform this step.
341
342Both the
343[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
344&
345[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
346exporting scripts have parameters that can enable a larger number of output
347objects or slower, more-accurate post processing. Please use `--help` with the
348scripts to see an exhaustive list of supported arguments.
349
350> Currently, on-device inference is only optimized with SSD models. Better
351> support for other architectures like CenterNet and EfficientDet is being
352> investigated.
353
354### How to choose a model to customize?
355
356Each model comes with its own precision (quantified by mAP value) and latency
357characteristics. You should choose a model that works the best for your use-case
358and intended hardware. For example, the
359[Edge TPU](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#pixel4-edge-tpu-models)
360models are ideal for inference on Google's Edge TPU on Pixel 4.
361
362You can use our
363[benchmark tool](https://www.tensorflow.org/lite/performance/measurement) to
364evaluate models and choose the most efficient option available.
365
366## Fine-tuning models on custom data
367
368The pre-trained models we provide are trained to detect 90 classes of objects.
369For a full list of classes, see the labels file in the
370<a href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">model
371metadata</a>.
372
373You can use a technique known as transfer learning to re-train a model to
374recognize classes not in the original set. For example, you could re-train the
375model to detect multiple types of vegetable, despite there only being one
376vegetable in the original training data. To do this, you will need a set of
377training images for each of the new labels you wish to train. Please see our
378[Few-shot detection Colab](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tflite.ipynb)
379as an example of fine-tuning a pre-trained model with few examples.
380
381For fine-tuning with larger datasets, take a look at the these guides for
382training your own models with the TensorFlow Object Detection API:
383[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_training_and_evaluation.md),
384[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md).
385Once trained, they can be converted to a TFLite-friendly format with the
386instructions here:
387[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md),
388[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
389