1{ 2 "cells": [ 3 { 4 "cell_type": "markdown", 5 "metadata": { 6 "id": "c8Cx-rUMVX25" 7 }, 8 "source": [ 9 "##### Copyright 2019 The TensorFlow Authors." 10 ] 11 }, 12 { 13 "cell_type": "code", 14 "execution_count": null, 15 "metadata": { 16 "cellView": "form", 17 "id": "I9sUhVL_VZNO" 18 }, 19 "outputs": [], 20 "source": [ 21 "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 22 "# you may not use this file except in compliance with the License.\n", 23 "# You may obtain a copy of the License at\n", 24 "#\n", 25 "# https://www.apache.org/licenses/LICENSE-2.0\n", 26 "#\n", 27 "# Unless required by applicable law or agreed to in writing, software\n", 28 "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 29 "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 30 "# See the License for the specific language governing permissions and\n", 31 "# limitations under the License." 32 ] 33 }, 34 { 35 "cell_type": "markdown", 36 "metadata": { 37 "id": "6Y8E0lw5eYWm" 38 }, 39 "source": [ 40 "# Post-training float16 quantization" 41 ] 42 }, 43 { 44 "cell_type": "markdown", 45 "metadata": { 46 "id": "CGuqeuPSVNo-" 47 }, 48 "source": [ 49 "<table class=\"tfo-notebook-buttons\" align=\"left\">\n", 50 " <td>\n", 51 " <a target=\"_blank\" href=\"https://www.tensorflow.org/lite/performance/post_training_float16_quant\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n", 52 " </td>\n", 53 " <td>\n", 54 " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/post_training_float16_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", 55 " </td>\n", 56 " <td>\n", 57 " <a target=\"_blank\" href=\"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/post_training_float16_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", 58 " </td>\n", 59 " <td>\n", 60 " <a href=\"https://storage.googleapis.com/tensorflow_docs/tensorflow/lite/g3doc/performance/post_training_float16_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n", 61 " </td>\n", 62 "</table>" 63 ] 64 }, 65 { 66 "cell_type": "markdown", 67 "metadata": { 68 "id": "BTC1rDAuei_1" 69 }, 70 "source": [ 71 "## Overview\n", 72 "\n", 73 "[TensorFlow Lite](https://www.tensorflow.org/lite/) now supports\n", 74 "converting weights to 16-bit floating point values during model conversion from TensorFlow to TensorFlow Lite's flat buffer format. This results in a 2x reduction in model size. Some harware, like GPUs, can compute natively in this reduced precision arithmetic, realizing a speedup over traditional floating point execution. The Tensorflow Lite GPU delegate can be configured to run in this way. However, a model converted to float16 weights can still run on the CPU without additional modification: the float16 weights are upsampled to float32 prior to the first inference. This permits a significant reduction in model size in exchange for a minimal impacts to latency and accuracy.\n", 75 "\n", 76 "In this tutorial, you train an MNIST model from scratch, check its accuracy in TensorFlow, and then convert the model into a Tensorflow Lite flatbuffer\n", 77 "with float16 quantization. Finally, check the accuracy of the converted model and compare it to the original float32 model." 78 ] 79 }, 80 { 81 "cell_type": "markdown", 82 "metadata": { 83 "id": "2XsEP17Zelz9" 84 }, 85 "source": [ 86 "## Build an MNIST model" 87 ] 88 }, 89 { 90 "cell_type": "markdown", 91 "metadata": { 92 "id": "dDqqUIZjZjac" 93 }, 94 "source": [ 95 "### Setup" 96 ] 97 }, 98 { 99 "cell_type": "code", 100 "execution_count": null, 101 "metadata": { 102 "id": "gyqAw1M9lyab" 103 }, 104 "outputs": [], 105 "source": [ 106 "import logging\n", 107 "logging.getLogger(\"tensorflow\").setLevel(logging.DEBUG)\n", 108 "\n", 109 "import tensorflow as tf\n", 110 "from tensorflow import keras\n", 111 "import numpy as np\n", 112 "import pathlib" 113 ] 114 }, 115 { 116 "cell_type": "code", 117 "execution_count": null, 118 "metadata": { 119 "id": "c6nb7OPlXs_3" 120 }, 121 "outputs": [], 122 "source": [ 123 "tf.float16" 124 ] 125 }, 126 { 127 "cell_type": "markdown", 128 "metadata": { 129 "id": "eQ6Q0qqKZogR" 130 }, 131 "source": [ 132 "### Train and export the model" 133 ] 134 }, 135 { 136 "cell_type": "code", 137 "execution_count": null, 138 "metadata": { 139 "id": "hWSAjQWagIHl" 140 }, 141 "outputs": [], 142 "source": [ 143 "# Load MNIST dataset\n", 144 "mnist = keras.datasets.mnist\n", 145 "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n", 146 "\n", 147 "# Normalize the input image so that each pixel value is between 0 to 1.\n", 148 "train_images = train_images / 255.0\n", 149 "test_images = test_images / 255.0\n", 150 "\n", 151 "# Define the model architecture\n", 152 "model = keras.Sequential([\n", 153 " keras.layers.InputLayer(input_shape=(28, 28)),\n", 154 " keras.layers.Reshape(target_shape=(28, 28, 1)),\n", 155 " keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),\n", 156 " keras.layers.MaxPooling2D(pool_size=(2, 2)),\n", 157 " keras.layers.Flatten(),\n", 158 " keras.layers.Dense(10)\n", 159 "])\n", 160 "\n", 161 "# Train the digit classification model\n", 162 "model.compile(optimizer='adam',\n", 163 " loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", 164 " metrics=['accuracy'])\n", 165 "model.fit(\n", 166 " train_images,\n", 167 " train_labels,\n", 168 " epochs=1,\n", 169 " validation_data=(test_images, test_labels)\n", 170 ")" 171 ] 172 }, 173 { 174 "cell_type": "markdown", 175 "metadata": { 176 "id": "5NMaNZQCkW9X" 177 }, 178 "source": [ 179 "For the example, you trained the model for just a single epoch, so it only trains to ~96% accuracy." 180 ] 181 }, 182 { 183 "cell_type": "markdown", 184 "metadata": { 185 "id": "xl8_fzVAZwOh" 186 }, 187 "source": [ 188 "### Convert to a TensorFlow Lite model\n", 189 "\n", 190 "Using the Python [TFLiteConverter](https://www.tensorflow.org/lite/convert/python_api), you can now convert the trained model into a TensorFlow Lite model.\n", 191 "\n", 192 "Now load the model using the `TFLiteConverter`:" 193 ] 194 }, 195 { 196 "cell_type": "code", 197 "execution_count": null, 198 "metadata": { 199 "id": "_i8B2nDZmAgQ" 200 }, 201 "outputs": [], 202 "source": [ 203 "converter = tf.lite.TFLiteConverter.from_keras_model(model)\n", 204 "tflite_model = converter.convert()" 205 ] 206 }, 207 { 208 "cell_type": "markdown", 209 "metadata": { 210 "id": "F2o2ZfF0aiCx" 211 }, 212 "source": [ 213 "Write it out to a `.tflite` file:" 214 ] 215 }, 216 { 217 "cell_type": "code", 218 "execution_count": null, 219 "metadata": { 220 "id": "vptWZq2xnclo" 221 }, 222 "outputs": [], 223 "source": [ 224 "tflite_models_dir = pathlib.Path(\"/tmp/mnist_tflite_models/\")\n", 225 "tflite_models_dir.mkdir(exist_ok=True, parents=True)" 226 ] 227 }, 228 { 229 "cell_type": "code", 230 "execution_count": null, 231 "metadata": { 232 "id": "Ie9pQaQrn5ue" 233 }, 234 "outputs": [], 235 "source": [ 236 "tflite_model_file = tflite_models_dir/\"mnist_model.tflite\"\n", 237 "tflite_model_file.write_bytes(tflite_model)" 238 ] 239 }, 240 { 241 "cell_type": "markdown", 242 "metadata": { 243 "id": "7BONhYtYocQY" 244 }, 245 "source": [ 246 "To instead quantize the model to float16 on export, first set the `optimizations` flag to use default optimizations. Then specify that float16 is the supported type on the target platform:" 247 ] 248 }, 249 { 250 "cell_type": "code", 251 "execution_count": null, 252 "metadata": { 253 "id": "HEZ6ET1AHAS3" 254 }, 255 "outputs": [], 256 "source": [ 257 "converter.optimizations = [tf.lite.Optimize.DEFAULT]\n", 258 "converter.target_spec.supported_types = [tf.float16]" 259 ] 260 }, 261 { 262 "cell_type": "markdown", 263 "metadata": { 264 "id": "xW84iMYjHd9t" 265 }, 266 "source": [ 267 "Finally, convert the model like usual. Note, by default the converted model will still use float input and outputs for invocation convenience." 268 ] 269 }, 270 { 271 "cell_type": "code", 272 "execution_count": null, 273 "metadata": { 274 "id": "yuNfl3CoHNK3" 275 }, 276 "outputs": [], 277 "source": [ 278 "tflite_fp16_model = converter.convert()\n", 279 "tflite_model_fp16_file = tflite_models_dir/\"mnist_model_quant_f16.tflite\"\n", 280 "tflite_model_fp16_file.write_bytes(tflite_fp16_model)" 281 ] 282 }, 283 { 284 "cell_type": "markdown", 285 "metadata": { 286 "id": "PhMmUTl4sbkz" 287 }, 288 "source": [ 289 "Note how the resulting file is approximately `1/2` the size." 290 ] 291 }, 292 { 293 "cell_type": "code", 294 "execution_count": null, 295 "metadata": { 296 "id": "JExfcfLDscu4" 297 }, 298 "outputs": [], 299 "source": [ 300 "!ls -lh {tflite_models_dir}" 301 ] 302 }, 303 { 304 "cell_type": "markdown", 305 "metadata": { 306 "id": "L8lQHMp_asCq" 307 }, 308 "source": [ 309 "## Run the TensorFlow Lite models" 310 ] 311 }, 312 { 313 "cell_type": "markdown", 314 "metadata": { 315 "id": "-5l6-ciItvX6" 316 }, 317 "source": [ 318 "Run the TensorFlow Lite model using the Python TensorFlow Lite Interpreter." 319 ] 320 }, 321 { 322 "cell_type": "markdown", 323 "metadata": { 324 "id": "Ap_jE7QRvhPf" 325 }, 326 "source": [ 327 "### Load the model into the interpreters" 328 ] 329 }, 330 { 331 "cell_type": "code", 332 "execution_count": null, 333 "metadata": { 334 "id": "Jn16Rc23zTss" 335 }, 336 "outputs": [], 337 "source": [ 338 "interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))\n", 339 "interpreter.allocate_tensors()" 340 ] 341 }, 342 { 343 "cell_type": "code", 344 "execution_count": null, 345 "metadata": { 346 "id": "J8Pztk1mvNVL" 347 }, 348 "outputs": [], 349 "source": [ 350 "interpreter_fp16 = tf.lite.Interpreter(model_path=str(tflite_model_fp16_file))\n", 351 "interpreter_fp16.allocate_tensors()" 352 ] 353 }, 354 { 355 "cell_type": "markdown", 356 "metadata": { 357 "id": "2opUt_JTdyEu" 358 }, 359 "source": [ 360 "### Test the models on one image" 361 ] 362 }, 363 { 364 "cell_type": "code", 365 "execution_count": null, 366 "metadata": { 367 "id": "AKslvo2kwWac" 368 }, 369 "outputs": [], 370 "source": [ 371 "test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)\n", 372 "\n", 373 "input_index = interpreter.get_input_details()[0][\"index\"]\n", 374 "output_index = interpreter.get_output_details()[0][\"index\"]\n", 375 "\n", 376 "interpreter.set_tensor(input_index, test_image)\n", 377 "interpreter.invoke()\n", 378 "predictions = interpreter.get_tensor(output_index)" 379 ] 380 }, 381 { 382 "cell_type": "code", 383 "execution_count": null, 384 "metadata": { 385 "id": "XZClM2vo3_bm" 386 }, 387 "outputs": [], 388 "source": [ 389 "import matplotlib.pylab as plt\n", 390 "\n", 391 "plt.imshow(test_images[0])\n", 392 "template = \"True:{true}, predicted:{predict}\"\n", 393 "_ = plt.title(template.format(true= str(test_labels[0]),\n", 394 " predict=str(np.argmax(predictions[0]))))\n", 395 "plt.grid(False)" 396 ] 397 }, 398 { 399 "cell_type": "code", 400 "execution_count": null, 401 "metadata": { 402 "id": "3gwhv4lKbYZ4" 403 }, 404 "outputs": [], 405 "source": [ 406 "test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)\n", 407 "\n", 408 "input_index = interpreter_fp16.get_input_details()[0][\"index\"]\n", 409 "output_index = interpreter_fp16.get_output_details()[0][\"index\"]\n", 410 "\n", 411 "interpreter_fp16.set_tensor(input_index, test_image)\n", 412 "interpreter_fp16.invoke()\n", 413 "predictions = interpreter_fp16.get_tensor(output_index)" 414 ] 415 }, 416 { 417 "cell_type": "code", 418 "execution_count": null, 419 "metadata": { 420 "id": "CIH7G_MwbY2x" 421 }, 422 "outputs": [], 423 "source": [ 424 "plt.imshow(test_images[0])\n", 425 "template = \"True:{true}, predicted:{predict}\"\n", 426 "_ = plt.title(template.format(true= str(test_labels[0]),\n", 427 " predict=str(np.argmax(predictions[0]))))\n", 428 "plt.grid(False)" 429 ] 430 }, 431 { 432 "cell_type": "markdown", 433 "metadata": { 434 "id": "LwN7uIdCd8Gw" 435 }, 436 "source": [ 437 "### Evaluate the models" 438 ] 439 }, 440 { 441 "cell_type": "code", 442 "execution_count": null, 443 "metadata": { 444 "id": "05aeAuWjvjPx" 445 }, 446 "outputs": [], 447 "source": [ 448 "# A helper function to evaluate the TF Lite model using \"test\" dataset.\n", 449 "def evaluate_model(interpreter):\n", 450 " input_index = interpreter.get_input_details()[0][\"index\"]\n", 451 " output_index = interpreter.get_output_details()[0][\"index\"]\n", 452 "\n", 453 " # Run predictions on every image in the \"test\" dataset.\n", 454 " prediction_digits = []\n", 455 " for test_image in test_images:\n", 456 " # Pre-processing: add batch dimension and convert to float32 to match with\n", 457 " # the model's input data format.\n", 458 " test_image = np.expand_dims(test_image, axis=0).astype(np.float32)\n", 459 " interpreter.set_tensor(input_index, test_image)\n", 460 "\n", 461 " # Run inference.\n", 462 " interpreter.invoke()\n", 463 "\n", 464 " # Post-processing: remove batch dimension and find the digit with highest\n", 465 " # probability.\n", 466 " output = interpreter.tensor(output_index)\n", 467 " digit = np.argmax(output()[0])\n", 468 " prediction_digits.append(digit)\n", 469 "\n", 470 " # Compare prediction results with ground truth labels to calculate accuracy.\n", 471 " accurate_count = 0\n", 472 " for index in range(len(prediction_digits)):\n", 473 " if prediction_digits[index] == test_labels[index]:\n", 474 " accurate_count += 1\n", 475 " accuracy = accurate_count * 1.0 / len(prediction_digits)\n", 476 "\n", 477 " return accuracy" 478 ] 479 }, 480 { 481 "cell_type": "code", 482 "execution_count": null, 483 "metadata": { 484 "id": "T5mWkSbMcU5z" 485 }, 486 "outputs": [], 487 "source": [ 488 "print(evaluate_model(interpreter))" 489 ] 490 }, 491 { 492 "cell_type": "markdown", 493 "metadata": { 494 "id": "Km3cY9ry8ZlG" 495 }, 496 "source": [ 497 "Repeat the evaluation on the float16 quantized model to obtain:" 498 ] 499 }, 500 { 501 "cell_type": "code", 502 "execution_count": null, 503 "metadata": { 504 "id": "-9cnwiPp6EGm" 505 }, 506 "outputs": [], 507 "source": [ 508 "# NOTE: Colab runs on server CPUs. At the time of writing this, TensorFlow Lite\n", 509 "# doesn't have super optimized server CPU kernels. For this reason this may be\n", 510 "# slower than the above float interpreter. But for mobile CPUs, considerable\n", 511 "# speedup can be observed.\n", 512 "print(evaluate_model(interpreter_fp16))" 513 ] 514 }, 515 { 516 "cell_type": "markdown", 517 "metadata": { 518 "id": "L7lfxkor8pgv" 519 }, 520 "source": [ 521 "In this example, you have quantized a model to float16 with no difference in the accuracy.\n", 522 "\n", 523 "It's also possible to evaluate the fp16 quantized model on the GPU. To perform all arithmetic with the reduced precision values, be sure to create the `TfLiteGPUDelegateOptions` struct in your app and set `precision_loss_allowed` to `1`, like this:\n", 524 "\n", 525 "```\n", 526 "//Prepare GPU delegate.\n", 527 "const TfLiteGpuDelegateOptions options = {\n", 528 " .metadata = NULL,\n", 529 " .compile_options = {\n", 530 " .precision_loss_allowed = 1, // FP16\n", 531 " .preferred_gl_object_type = TFLITE_GL_OBJECT_TYPE_FASTEST,\n", 532 " .dynamic_batch_enabled = 0, // Not fully functional yet\n", 533 " },\n", 534 "};\n", 535 "```\n", 536 "\n", 537 "Detailed documentation on the TFLite GPU delegate and how to use it in your application can be found [here](https://www.tensorflow.org/lite/performance/gpu_advanced?source=post_page---------------------------)" 538 ] 539 } 540 ], 541 "metadata": { 542 "colab": { 543 "collapsed_sections": [], 544 "name": "post_training_float16_quant.ipynb", 545 "toc_visible": true 546 }, 547 "kernelspec": { 548 "display_name": "Python 3", 549 "name": "python3" 550 } 551 }, 552 "nbformat": 4, 553 "nbformat_minor": 0 554} 555