1# Pose estimation
2
3<img src="../images/pose.png" class="attempt-right" />
4
5Pose estimation is the task of using an ML model to estimate the pose of a
6person from an image or a video by estimating the spatial locations of key body
7joints (keypoints).
8
9## Get started
10
11If you are new to TensorFlow Lite and are working with Android or iOS, explore
12the following example applications that can help you get started.
13
14<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/posenet/android">
15Android example</a>
16<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/posenet/ios">
17iOS example</a>
18
19If you are familiar with the
20[TensorFlow Lite APIs](https://www.tensorflow.org/api_docs/python/tf/lite),
21download the starter PoseNet model and supporting files.
22
23<a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite">
24Download starter model</a>
25
26If you want to try pose estimation on a web browser, check out the
27<a href="https://github.com/tensorflow/tfjs-models/tree/master/posenet">
28TensorFlow JS GitHub repository</a>.
29
30## Model description
31
32### How it works
33
34Pose estimation refers to computer vision techniques that detect human figures
35in images and videos, so that one could determine, for example, where someone’s
36elbow shows up in an image. It is important to be aware of the fact that pose
37estimation merely estimates where key body joints are and does not recognize who
38is in an image or video.
39
40The PoseNet model takes a processed camera image as the input and outputs
41information about keypoints. The keypoints detected are indexed by a part ID,
42with a confidence score between 0.0 and 1.0. The confidence score indicates the
43probability that a keypoint exists in that position.
44
45The various body joints detected by the PoseNet model are tabulated below:
46
47<table style="width: 30%;">
48  <thead>
49    <tr>
50      <th>Id</th>
51      <th>Part</th>
52    </tr>
53  </thead>
54  <tbody>
55    <tr>
56      <td>0</td>
57      <td>nose</td>
58    </tr>
59    <tr>
60      <td>1</td>
61      <td>leftEye</td>
62    </tr>
63    <tr>
64      <td>2</td>
65      <td>rightEye</td>
66    </tr>
67    <tr>
68      <td>3</td>
69      <td>leftEar</td>
70    </tr>
71    <tr>
72      <td>4</td>
73      <td>rightEar</td>
74    </tr>
75    <tr>
76      <td>5</td>
77      <td>leftShoulder</td>
78    </tr>
79    <tr>
80      <td>6</td>
81      <td>rightShoulder</td>
82    </tr>
83    <tr>
84      <td>7</td>
85      <td>leftElbow</td>
86    </tr>
87    <tr>
88      <td>8</td>
89      <td>rightElbow</td>
90    </tr>
91    <tr>
92      <td>9</td>
93      <td>leftWrist</td>
94    </tr>
95    <tr>
96      <td>10</td>
97      <td>rightWrist</td>
98    </tr>
99    <tr>
100      <td>11</td>
101      <td>leftHip</td>
102    </tr>
103    <tr>
104      <td>12</td>
105      <td>rightHip</td>
106    </tr>
107    <tr>
108      <td>13</td>
109      <td>leftKnee</td>
110    </tr>
111    <tr>
112      <td>14</td>
113      <td>rightKnee</td>
114    </tr>
115    <tr>
116      <td>15</td>
117      <td>leftAnkle</td>
118    </tr>
119    <tr>
120      <td>16</td>
121      <td>rightAnkle</td>
122    </tr>
123  </tbody>
124</table>
125
126An example output is shown below:
127
128<img alt="Animation showing pose estimation" src="https://www.tensorflow.org/images/lite/models/pose_estimation.gif"/>
129
130## Performance benchmarks
131
132Performance varies based on your device and output stride (heatmaps and offset
133vectors). The PoseNet model is image size invariant, which means it can predict
134pose positions in the same scale as the original image regardless of whether the
135image is downscaled. This means that you configure the model to have a higher
136accuracy at the expense of performance.
137
138The output stride determines how much the output is scaled down relative to the
139input image size. It affects the size of the layers and the model outputs.
140
141The higher the output stride, the smaller the resolution of layers in the
142network and the outputs, and correspondingly their accuracy. In this
143implementation, the output stride can have values of 8, 16, or 32. In other
144words, an output stride of 32 will result in the fastest performance but lowest
145accuracy, while 8 will result in the highest accuracy but slowest performance.
146The recommended starting value is 16.
147
148The following image shows how the output stride determines how much the output
149is scaled down relative to the input image size. A higher output stride is
150faster but results in lower accuracy.
151
152<img alt="Output stride and heatmap resolution" src="../images/output_stride.png" >
153
154Performance benchmark numbers are generated with the tool
155[described here](https://www.tensorflow.org/lite/performance/benchmarks).
156
157<table>
158  <thead>
159    <tr>
160      <th>Model Name</th>
161      <th>Model size </th>
162      <th>Device </th>
163      <th>GPU</th>
164      <th>CPU</th>
165    </tr>
166  </thead>
167  <tr>
168    <td rowspan = 3>
169      <a href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite">Posenet</a>
170    </td>
171    <td rowspan = 3>
172      12.7 Mb
173    </td>
174    <td>Pixel 3 (Android 10) </td>
175    <td>12ms</td>
176    <td>31ms*</td>
177  </tr>
178   <tr>
179     <td>Pixel 4 (Android 10) </td>
180    <td>12ms</td>
181    <td>19ms*</td>
182  </tr>
183   <tr>
184     <td>iPhone XS (iOS 12.4.1) </td>
185     <td>4.8ms</td>
186    <td>22ms** </td>
187  </tr>
188</table>
189
190\* 4 threads used.
191
192\*\* 2 threads used on iPhone for the best performance result.
193
194## Further reading and resources
195
196*   Check out this
197    [blog post](https://medium.com/tensorflow/track-human-poses-in-real-time-on-android-with-tensorflow-lite-e66d0f3e6f9e)
198    to learn more about pose estimation using TensorFlow Lite.
199*   Check out this
200    [blog post](https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5)
201    to learn more about pose estimation using TensorFlow JS.
202*   Read the PoseNet paper [here](https://arxiv.org/abs/1803.08225)
203
204Also, check out these use cases of pose estimation.
205
206<ul>
207  <li><a href="https://vimeo.com/128375543">‘PomPom Mirror’</a></li>
208  <li><a href="https://youtu.be/I5__9hq-yas">Amazing Art Installation Turns You Into A Bird | Chris Milk "The Treachery of Sanctuary"</a></li>
209  <li><a href="https://vimeo.com/34824490">Puppet Parade - Interactive Kinect Puppets</a></li>
210  <li><a href="https://vimeo.com/2892576">Messa di Voce (Performance), Excerpts</a></li>
211  <li><a href="https://www.instagram.com/p/BbkKLiegrTR/">Augmented reality</a></li>
212  <li><a href="https://www.instagram.com/p/Bg1EgOihgyh/">Interactive animation</a></li>
213  <li><a href="https://www.runnersneed.com/expert-advice/gear-guides/gait-analysis.html">Gait analysis</a></li>
214</ul>
215