Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
bluepill/ | 23-Nov-2023 | - | 28 | 9 | ||
ecm3531/ | 23-Nov-2023 | - | 21 | 3 | ||
examples/micro_speech/ | 23-Nov-2023 | - | 16,411 | 12,606 | ||
kernels/ | 23-Nov-2023 | - | 2,620 | 2,188 | ||
mbed/ | 23-Nov-2023 | - | 25 | 6 | ||
riscv32_mcu/ | 23-Nov-2023 | - | 27 | 7 | ||
testing/ | 23-Nov-2023 | - | 684 | 467 | ||
tools/ | 23-Nov-2023 | - | 2,760 | 2,111 | ||
BUILD | D | 23-Nov-2023 | 1.7 KiB | 80 | 72 | |
README.md | D | 23-Nov-2023 | 51.6 KiB | 932 | 796 | |
compatibility.h | D | 23-Nov-2023 | 1.4 KiB | 33 | 9 | |
debug_log.cc | D | 23-Nov-2023 | 2.2 KiB | 42 | 3 | |
debug_log.h | D | 23-Nov-2023 | 1.1 KiB | 24 | 4 | |
debug_log_numbers.cc | D | 23-Nov-2023 | 6.2 KiB | 186 | 127 | |
debug_log_numbers.h | D | 23-Nov-2023 | 1 KiB | 29 | 10 | |
micro_error_reporter.cc | D | 23-Nov-2023 | 2 KiB | 67 | 48 | |
micro_error_reporter.h | D | 23-Nov-2023 | 1.3 KiB | 37 | 16 | |
micro_error_reporter_test.cc | D | 23-Nov-2023 | 1.1 KiB | 26 | 9 | |
micro_interpreter.cc | D | 23-Nov-2023 | 11 KiB | 311 | 271 | |
micro_interpreter.h | D | 23-Nov-2023 | 2.7 KiB | 72 | 35 | |
micro_interpreter_test.cc | D | 23-Nov-2023 | 7.5 KiB | 198 | 158 | |
micro_mutable_op_resolver.cc | D | 23-Nov-2023 | 3 KiB | 81 | 55 | |
micro_mutable_op_resolver.h | D | 23-Nov-2023 | 1.8 KiB | 47 | 24 | |
micro_mutable_op_resolver_test.cc | D | 23-Nov-2023 | 3 KiB | 84 | 50 | |
simple_tensor_allocator.cc | D | 23-Nov-2023 | 5.7 KiB | 163 | 136 | |
simple_tensor_allocator.h | D | 23-Nov-2023 | 1.9 KiB | 52 | 24 | |
simple_tensor_allocator_test.cc | D | 23-Nov-2023 | 5.8 KiB | 170 | 125 |
README.md
1# TensorFlow Lite for Microcontrollers 2 3This an experimental port of TensorFlow Lite aimed at micro controllers and 4other devices with only kilobytes of memory. It doesn't require any operating 5system support, any standard C or C++ libraries, or dynamic memory allocation, 6so it's designed to be portable even to 'bare metal' systems. The core runtime 7fits in 16KB on a Cortex M3, and with enough operators to run a speech keyword 8detection model, takes up a total of 22KB. 9 10## Table of Contents 11 12- [Getting Started](#getting-started) 13 14 * [Getting Started with Portable Reference Code](#getting-started-with-portable-reference-code) 15 * [Building Portable Reference Code using Make](#building-portable-reference-code-using-make) 16 * [Building for the "Blue Pill" STM32F103 using Make](#building-for-the-blue-pill-stm32f103-using-make) 17 * [Building for "Hifive1" SiFive FE310 development board using Make](#building-for-hifive1-sifive-fe310-development-board-using-make) 18 * [Building for Ambiq Micro Apollo3Blue EVB using Make](#building-for-ambiq-micro-apollo3blue-evb-using-make) 19 * [Additional Apollo3 Instructions](#additional-apollo3-instructions) 20 * [Building for the Eta Compute ECM3531 EVB using Make](#Building-for-the-Eta-Compute-ECM3531-EVB-using-Make) 21 22- [Goals](#goals) 23 24- [Generating Project Files](#generating-project-#files) 25 26- [How to Port TensorFlow Lite Micro to a New Platform](#how-to-port-tensorflow-lite-micro-to-a-new-platform) 27 28 * [Requirements](#requirements) 29 * [Getting Started](getting-started) 30 * [Troubleshooting](#troubleshooting) 31 * [Optimizing for your Platform](#optimizing-for-your-platform) 32 * [Code Module Organization](#code-module-organization) 33 * [Working with Generated Projects](#working-with-generated-projects) 34 * [Supporting a Platform with Makefiles](#supporting-a-platform-with-makefiles) 35 * [Supporting a Platform with Emulation Testing](#supporting-a-platform-with-emulation-testing) 36 * [Implementing More Optimizations](#implementing-more-optimizations) 37 38# Getting Started 39 40One of the challenges of embedded software development is that there are a lot 41of different architectures, devices, operating systems, and build systems. We 42aim to support as many of the popular combinations as we can, and make it as 43easy as possible to add support for others. 44 45If you're a product developer, we have build instructions or pre-generated 46project files that you can download for the following platforms: 47 48Device | Mbed | Keil | Make/GCC 49---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------ | -------- 50[STM32F746G Discovery Board](https://www.st.com/en/evaluation-tools/32f746gdiscovery.html) | [Download](https://drive.google.com/open?id=1OtgVkytQBrEYIpJPsE8F6GUKHPBS3Xeb) | - | [Download](https://drive.google.com/open?id=1u46mTtAMZ7Y1aD-He1u3R8AE4ZyEpnOl) 51["Blue Pill" STM32F103-compatible development board](https://github.com/google/stm32_bare_lib) | - | - | [Instructions](#building-for-the-blue-pill-stm32f103-using-make) 52[Ambiq Micro Apollo3Blue EVB using Make](https://ambiqmicro.com/apollo-ultra-low-power-mcus/) | - | - | [Instructions](#building-for-ambiq-micro-apollo3blue-evb-using-make) 53[Generic Keil uVision Projects](http://www2.keil.com/mdk5/uvision/) | - | [Download](https://drive.google.com/open?id=1Lw9rsdquNKObozClLPoE5CTJLuhfh5mV) | - 54[Eta Compute ECM3531 EVB](https://etacompute.com/) | - | - | [Instructions](#Building-for-the-Eta-Compute-ECM3531-EVB-using-Make) 55 56If your device is not yet supported, it may not be too hard to add support. You 57can learn about that process 58[here](#how-to-port-tensorflow-lite-micro-to-a-new-platform). We're looking 59forward to getting your help expanding this table! 60 61## Getting Started with Portable Reference Code 62 63If you don't have a particular microcontroller platform in mind yet, or just 64want to try out the code before beginning porting, the easiest way to begin is 65by 66[downloading the platform-agnostic reference code](https://drive.google.com/open?id=1cawEQAkqquK_SO4crReDYqf_v7yAwOY8). 67You'll see a series of folders inside the archive, with each one containing just 68the source files you need to build one binary. There is a simple Makefile for 69each folder, but you should be able to load the files into almost any IDE and 70build them. There's also a [Visual Studio Code](https://code.visualstudio.com/) project file already set up, so 71you can easily explore the code in a cross-platform IDE. 72 73## Building Portable Reference Code using Make 74 75It's easy to build portable reference code directly from GitHub using make if 76you're on a Linux or OS X machine. 77 78- Open a terminal 79- Download the TensorFlow source with `git clone 80 https://github.com/tensorflow/tensorflow.git` 81- Enter the source root directory by running `cd tensorflow` 82- Download the dependencies by running 83 `tensorflow/lite/experimental/micro/tools/make/download_dependencies.sh`. 84 This may take a few minutes 85- Build and test the library with `make -f 86 tensorflow/lite/experimental/micro/tools/make/Makefile test` 87 88You should see a series of compilation steps, followed by `~~~ALL TESTS 89PASSED~~~` for the various tests of the code that it will run. If there's an 90error, you should get an informative message from make about what went wrong. 91 92These tests are all built as simple binaries with few dependencies, so you can 93run them manually. For example, here's how to run the depthwise convolution 94test, and its output: 95 96``` 97tensorflow/lite/experimental/micro/tools/make/gen/linux_x86_64/bin/tensorflow/lite/experimental/micro/kernels/depthwise_conv_test 98 99Testing SimpleTest 100Testing SimpleTestQuantized 101Testing SimpleTestRelu 102Testing SimpleTestReluQuantized 1034/4 tests passed 104~ALL TESTS PASSED~~~ 105``` 106 107Looking at the 108[depthwise_conv_test.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/kernels/depthwise_conv_test.cc) 109code, you'll see a sequence that looks like this: 110 111``` 112... 113TF_LITE_MICRO_TESTS_BEGIN 114 115TF_LITE_MICRO_TEST(SimpleTest) { 116... 117} 118... 119TF_LITE_MICRO_TESTS_END 120``` 121 122These macros work a lot like 123[the Google test framework](https://github.com/google/googletest), but they 124don't require any dependencies and just write results to stderr, rather than 125aborting the program. If all the tests pass, then `~~~ALL TESTS PASSED~~~` is 126output, and the test harness that runs the binary during the make process knows 127that everything ran correctly. If there's an error, the lack of the expected 128string lets the harness know that the test failed. 129 130So, why are we running tests in this complicated way? So far, we've been 131building binaries that run locally on the Mac OS or Linux machine you're 132building on, but this approach becomes important when we're targeting simple 133micro controller devices. 134 135## Building for the "Blue Pill" STM32F103 using Make 136 137The goal of this library is to enable machine learning on resource-constrained 138micro controllers and DSPs, and as part of that we've targeted the 139["Blue Pill" STM32F103-compatible development board](https://github.com/google/stm32_bare_lib) 140as a cheap and popular platform. It only has 20KB of RAM and 64KB of flash, so 141it's a good device to ensure we can run efficiently on small chips. 142 143It's fairly easy to 144[buy and wire up a physical board](https://github.com/google/stm32_bare_lib#wiring-up-your-blue-pill), 145but even if you don't have an actual device, the 146[Renode project](https://renode.io/) makes it easy to run a faithful emulation 147on your desktop machine. You'll need [Docker](https://www.docker.com/) 148installed, but once you have that set up, try running the following command: 149 150`make -f tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=bluepill 151test` 152 153You should see a similar set of outputs as you did in the previous section, with 154the addition of some extra Docker logging messages. These are because we're 155using Docker to run the Renode micro controller emulation tool, and the tests 156themselves are being run on a simulated STM32F103 device. The communication 157channels between an embedded device and the host are quite limited, so the test 158harness looks at the output of the debug log to see if tests have passed, just 159as it did in the previous section. This makes it a very flexible way to run 160cross-platform tests, even when a platform has no operating system facilities, 161as long as it can output debugging text logs. 162 163To understand what's happening here, try running the same depthwise convolution 164test, but through the emulated device test harness, with the following command: 165 166``` 167tensorflow/lite/experimental/micro/testing/test_bluepill_binary.sh \ 168tensorflow/lite/experimental/micro/tools/make/gen/bluepill_cortex-m3/bin/tensorflow/lite/experimental/micro/kernels/depthwise_conv_test \ 169'~~~ALL TESTS PASSED~~~' 170 171``` 172 173You should see output that looks something like this: 174 175``` 176Sending build context to Docker daemon 21.5kB 177Step 1/2 : FROM antmicro/renode:latest 178 ---> 1b670a243e8f 179Step 2/2 : LABEL maintainer="Pete Warden <petewarden@google.com>" 180 ---> Using cache 181 ---> 3afcd410846d 182Successfully built 3afcd410846d 183Successfully tagged renode_bluepill:latest 184LOGS: 185... 18603:27:32.4340 [INFO] machine-0: Machine started. 18703:27:32.4790 [DEBUG] cpu.uartSemihosting: [+0.22s host +0s virt 0s virt from start] Testing SimpleTest 18803:27:32.4812 [DEBUG] cpu.uartSemihosting: [+2.21ms host +0s virt 0s virt from start] Testing SimpleTestQuantized 18903:27:32.4833 [DEBUG] cpu.uartSemihosting: [+2.14ms host +0s virt 0s virt from start] Testing SimpleTestRelu 19003:27:32.4834 [DEBUG] cpu.uartSemihosting: [+0.18ms host +0s virt 0s virt from start] Testing SimpleTestReluQuantized 19103:27:32.4838 [DEBUG] cpu.uartSemihosting: [+0.4ms host +0s virt 0s virt from start] 4/4 tests passed 19203:27:32.4839 [DEBUG] cpu.uartSemihosting: [+41µs host +0s virt 0s virt from start] ~~~ALL TESTS PASSED~~~ 19303:27:32.4839 [DEBUG] cpu.uartSemihosting: [+5µs host +0s virt 0s virt from start] 194... 195tensorflow/lite/experimental/micro/tools/make/gen/bluepill_cortex-m3/bin/tensorflow/lite/experimental/micro/kernels/depthwise_conv_test: PASS 196``` 197 198There's a lot of output here, but you should be able to see that the same tests 199that were covered when we ran locally on the development machine show up in the 200debug logs here, along with the magic string `~~~ALL TESTS PASSED~~~`. This is 201the exact same code as before, just compiled and run on the STM32F103 rather 202than your desktop. We hope that the simplicity of this testing approach will 203help make adding support for new platforms as easy as possible. 204 205## Building for "Hifive1" SiFive FE310 development board 206 207We've targeted the 208["HiFive1" Arduino-compatible development board](https://www.sifive.com/boards/hifive1) 209as a test platform for RISC-V MCU. 210 211Similar to Blue Pill setup, you will need Docker installed. The binary can be 212executed on either HiFive1 board or emulated using 213[Renode project](https://renode.io/) on your desktop machine. 214 215The following instructions builds and transfers the source files to the Docker 216`docker build -t riscv_build \ -f 217{PATH_TO_TENSORFLOW_ROOT_DIR}/tensorflow/lite/experimental/micro/testing/Dockerfile.riscv 218\ {PATH_TO_TENSORFLOW_ROOT_DIR}/tensorflow/lite/experimental/micro/testing/` 219 220You should see output that looks something like this: 221 222``` 223Sending build context to Docker daemon 28.16kB 224Step 1/4 : FROM antmicro/renode:latest 225 ---> 19c08590e817 226Step 2/4 : LABEL maintainer="Pete Warden <petewarden@google.com>" 227 ---> Using cache 228 ---> 5a7770d3d3f5 229Step 3/4 : RUN apt-get update 230 ---> Using cache 231 ---> b807ab77eeb1 232Step 4/4 : RUN apt-get install -y curl git unzip make g++ 233 ---> Using cache 234 ---> 8da1b2aa2438 235Successfully built 8da1b2aa2438 236Successfully tagged riscv_build:latest 237``` 238 239Building micro_speech_test binary 240 241- Launch the Docker that we just created using: `docker run -it-v 242 /tmp/copybara_out:/workspace riscv_build:latest bash` 243- Enter the source root directory by running `cd /workspace` 244- Download the dependencies by running 245 `./tensorflow/lite/experimental/micro/tools/make/download_dependencies.sh`. 246 This may take a few minutes. 247- Set the path to RISC-V tools: `export 248 PATH=${PATH}:/workspace/tensorflow/lite/experimental/micro/tools/make/downloads/riscv_toolchain/bin/` 249- Build the binary: `make -f 250 tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=riscv32_mcu` 251 252Launching Renode to test the binary, currently this set up is not automated. 253 254- Execute the binary on Renode: `renode -P 5000 --disable-xwt -e 's 255 @/workspace/tensorflow/lite/experimental/micro/testing/sifive_fe310.resc'` 256 257You should see the following log with the magic string `~~~ALL TEST PASSED~~~`: 258 259``` 26002:25:22.2059 [DEBUG] uart0: [+17.25s host +80ms virt 80ms virt from start] core freq at 0 Hz 26102:25:22.2065 [DEBUG] uart0: [+0.61ms host +0s virt 80ms virt from start] Testing TestInvoke 26202:25:22.4243 [DEBUG] uart0: [+0.22s host +0.2s virt 0.28s virt from start] Ran successfully 26302:25:22.4244 [DEBUG] uart0: [+42µs host +0s virt 0.28s virt from start] 26402:25:22.4245 [DEBUG] uart0: [+0.15ms host +0s virt 0.28s virt from start] 1/1 tests passed 26502:25:22.4247 [DEBUG] uart0: [+62µs host +0s virt 0.28s virt from start] ~~~ALL TESTS PASSED~~~ 26602:25:22.4251 [DEBUG] uart0: [+8µs host +0s virt 0.28s virt from start] 26702:25:22.4252 [DEBUG] uart0: [+0.39ms host +0s virt 0.28s virt from start] 26802:25:22.4253 [DEBUG] uart0: [+0.16ms host +0s virt 0.28s virt from start] Progam has exited with code:0x00000000 269``` 270 271## Building for Ambiq Micro Apollo3Blue EVB using Make 272 273Follow these steps to get the pushbutton yes/no example working on Apollo 3: 274 2751. Make sure to run the "Building Portable Reference Code using Make" section 276 before performing the following steps 2772. The Ambiq Micro SDK is downloaded into 278 `tensorflow/lite/experimental/micro/tools/make/downloads` by 279 'download_dependencies.sh'. 2803. Compile the project with the following command: make -f 281 tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=apollo3evb 282 pushbutton_cmsis_speech_test_bin 2834. Install [Segger JLink tools](https://www.segger.com/downloads/jlink/) 2845. Connect the Apollo3 EVB (with mic shield in slot 3 of Microbus Shield board) 285 to the computer and power it on. 2866. Start the GDB server in a new terminal with the following command: 287 JLinkGDBServer -select USB -device AMA3B1KK-KBR -endian little -if SWD 288 -speed 1000 -noir -noLocalhostOnly 289 1. The command has run successfully if you see the message "Waiting for GDB 290 connection" 2917. Back in the original terminal, run the program via the debugger 292 1. Navigate to 293 tensorflow/lite/experimental/micro/examples/micro_speech/apollo3 294 2. Start gdb by entering the following command: arm-none-eabi-gdb 295 3. Run the command script by entering the following command: source 296 pushbutton_cmsis_scores.cmd. This script does the following: 297 1. Load the binary created in step 6 298 2. Set a breakpoint after inference scores have been computed 299 3. Tell the debugger what variables should be printed out at this 300 breakpoint 301 4. Begin program execution 302 5. Press Ctrl+c to exit 303 4. Press BTN2. An LED will flash for 1 second. Speak your utterance during 304 this one second 305 5. The debugger will print out four numbers. They are the probabilites for 306 1. no speech 307 2. unknown speech 308 3. yes 309 4. no 310 6. The EVB LEDs will indicate detection. 311 1. LED0 (rightmost LED) - ON when capturing 1sec of audio 312 2. LED1 - ON when detecting silence 313 3. LED2 - ON when detecting UNKNOWN utterance 314 4. LED3 - ON when detecting YES utterance 315 5. LED4 (leftmost LED) - ON when detecting NO utterance 316 317### Additional Apollo3 Instructions 318 319To flash a part with JFlash Lite, do the following: 320 3211. At the command line: JFlashLiteExe 3222. Device = AMA3B1KK-KBR 3233. Interface = SWD at 1000 kHz 3244. Data file = `tensorflow/lite/experimental/micro/tools/make/gen/apollo3evb_cortex-m4/bin/pushbutton_cmsis_speech_test.bin` 3255. Prog Addr = 0x0000C000 326 327## Building for the Eta Compute ECM3531 EVB using Make 328 3291. Follow the instructions at 330 [Tensorflow Micro Speech](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/examples/micro_speech#getting-started) 331 to down load the Tensorflow source code and the support libraries \(but do 332 not run the make command shown there.\) 3332. Download the Eta Compute SDK, version 0.0.17. Contact info@etacompute.com 3343. You will need the the Arm compiler arm-none-eabi-gcc, version 7.3.1 335 20180622, release ARM/embedded-7-branch revision 261907, 7-2018-q2-update. 336 This compiler is downloaded when you run the 337 tensorflow/lite/experimental/micro/tools/make/download_dependencies.sh 338 script. 3394. Edit the file 340 tensorflow/lite/experimental/micro/tools/make/targets/ecm3531_makefile.inc 341 so that the variables ETA_SDK and GCC_ARM point to the correct directories. 3425. Compile the code with the command \ 343 make -f 344 tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=ecm3531 345 TAGS="CMSIS" test \ 346 This will produce a set of executables in the 347 tensorflow/lite/experimental/micro/tools/make/gen/ecm3531_cortex-m3/bin 348 directory. 3496. To load an executable into SRAM \ 350 Start ocd \ 351 cd 352 tensorflow/lite/experimental/micro/tools/make/targets/ecm3531 \ 353 ./load_program name_of_executable, for e.g., 354 ./load_program audio_provider_test \ 355 Start PuTTY \(Connection type = Serial, Speed = 356 11520, Data bits = 8, Stop bits = 1, Parity = None\) \ 357 The following output should appear: \ 358 Testing TestAudioProvider \ 359 Testing TestTimer \ 360 2/2 tests passed \ 361 \~\~\~ALL TESTS PASSED\~\~\~ \ 362 Execution time \(msec\) = 7 3637. To load into flash \ 364 Edit the variable ETA_LDS_FILE in 365 tensorflow/lite/experimental/micro/tools/ make/targets/ecm3531_makefile.inc 366 to point to the ecm3531_flash.lds file \ 367 Recompile \( make -f 368 tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=ecm3531 369 TAGS="CMSIS" test\) \ 370 cd 371 tensorflow/lite/experimental/micro/tools/make/targets/ecm3531 \ 372 ./flash_program executable_name to load into flash. 373 374## Goals 375 376The design goals are for the framework to be: 377 378- **Readable**: We want embedded software engineers to be able to understand 379 what's required to run ML inference without having to study research papers. 380 We've tried to keep the code base small, modular, and have reference 381 implementations of all operations to help with this. 382 383- **Easy to modify**: We know that there are a lot of different platforms and 384 requirements in the embedded world, and we don't expect to cover all of them 385 in one framework. Instead, we're hoping that it can be a good starting point 386 for developers to build on top of to meet their own needs. For example, we 387 tried to make it easy to replace the implementations of key computational 388 operators that are often crucial for performance, without having to touch 389 the data flow and other runtime code. We want it to make more sense to use 390 our workflow to handle things like model import and less-important 391 operations, and customize the parts that matter, rather than having to 392 reimplement everything in your own engine. 393 394- **Well-tested**: If you're modifying code, you need to know if your changes 395 are correct. Having an easy way to test lets you develop much faster. To 396 help there, we've written tests for all the components, and we've made sure 397 that the tests can be run on almost any platform, with no dependencies apart 398 from the ability to log text to a debug console somewhere. We also provide 399 an easy way to run all the tests on-device as part of an automated test 400 framework, and we use qemu/Renode emulation so that tests can be run even 401 without physical devices present. 402 403- **Easy to integrate**: We want to be as open a system as possible, and use 404 the best code available for each platform. To do that, we're going to rely 405 on projects like 406 [CMSIS-NN](https://www.keil.com/pack/doc/CMSIS/NN/html/index.html), 407 [uTensor](https://github.com/uTensor/uTensor), and other vendor libraries to 408 handle as much performance-critical code as possible. We know that there are 409 an increasing number of options to accelerate neural networks on 410 microcontrollers, so we're aiming to be a good host for deploying those 411 hardware technologies too. 412 413- **Compatible**: We're using the same file schema, interpreter API, and 414 kernel interface as regular TensorFlow Lite, so we leverage the large 415 existing set of tools, documentation, and examples for the project. The 416 biggest barrier to deploying ML models is getting them from a training 417 environment into a form that's easy to run inference on, so we see reusing 418 this rich ecosystem as being crucial to being easily usable. We also hope to 419 integrate this experimental work back into the main codebase in the future. 420 421To meet those goals, we've made some tradeoffs: 422 423- **Simple C++**: To help with readability, our code is written in a modern 424 version of C++, but we generally treat it as a "better C", rather relying on 425 more complex features such as template meta-programming. As mentioned 426 earlier, we avoid any use of dynamic memory allocation (new/delete) or the 427 standard C/C++ libraries, so we believe this should still be fairly 428 portable. It does mean that some older devices with C-only toolchains won't 429 be supported, but we're hoping that the reference operator implementations 430 (which are simple C-like functions) can still be useful in those cases. The 431 interfaces are also designed to be C-only, so it should be possible to 432 integrate the resulting library with pure C projects. 433 434- **Interpreted**: Code generation is a popular pattern for embedded code, 435 because it gives standalone code that's easy to modify and step through, but 436 we've chosen to go with an interpreted approach. In our internal 437 microcontroller work we've found that using an extremely stripped-down 438 interpreter with almost no dependencies gives us a lot of the same 439 advantages, but is easier to maintain. For example, when new updates come 440 out for the underlying library, you can just merge your local modifications 441 in a single step, rather than having to regenerate new code and then patch 442 in any changes you subsequently made. The coarse granularity of the 443 interpreted primitives means that each operation call typically takes 444 hundreds of thousands of instruction cycles at least, so we don't see 445 noticeable performance gains from avoiding what's essentially a single 446 switch statement at the interpreter level to call each operation. We're 447 still working on improving the packaging though, for example we're 448 considering having the ability to snapshot all the source files and headers 449 used for a particular model, being able to compile the code and data 450 together as a library, and then access it through a minimal set of C 451 interface calls which hide the underlying complexity. 452 453- **Flatbuffers**: We represent our models using 454 [the standard flatbuffer schema used by the rest of TensorFlow Lite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema.fbs), 455 with the difference that we always keep it in read-only program memory 456 (typically flash) rather than relying on having a file system to read it 457 from. This is a good fit because flatbuffer's serialized format is designed 458 to be mapped into memory without requiring any extra memory allocations or 459 modifications to access it. All of the functions to read model values work 460 directly on the serialized bytes, and large sections of data like weights 461 are directly accessible as sequential C-style arrays of their data type, 462 with no strides or unpacking needed. We do get a lot of value from using 463 flatbuffers, but there is a cost in complexity. The flat buffer library code 464 is all inline 465 [inside the main headers](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema_generated.h), 466 but it isn't straightforward to inspect their implementations, and the model 467 data structures aren't easy to comprehend from the debugger. The header for 468 the schema itself also has to be periodically updated when new information 469 is added to the file format, though we try to handle that transparently for 470 most developers by checking in a pre-generated version. 471 472- **Code Duplication**: Some of the code in this prototype largely duplicates 473 the logic in other parts of the TensorFlow Lite code base, for example the 474 operator wrappers. We've tried to keep share as much as we can between the 475 two interpreters, but there are some assumptions built into the original 476 runtime that make this difficult. We'll be working on modularizing the main 477 interpreter so that we can move to an entirely shared system. 478 479This initial preview release is designed to get early feedback, and is not 480intended to be a final product. It only includes enough operations to run a 481simple keyword recognition model, and the implementations are not optimized. 482We're hoping this will be a good way to get feedback and collaborate to improve 483the framework. 484 485## Generating Project Files 486 487It's not always easy or convenient to use a makefile-based build process, 488especially if you're working on a product that uses a different IDE for the rest 489of its code. To address that, it's possible to generate standalone project 490folders for various popular build systems. These projects are self-contained, 491with only the headers and source files needed by a particular binary, and 492include project files to make loading them into an IDE easy. These can be 493auto-generated for any target you can compile using the main Make system, using 494a command like this (making sure you've run `download_dependencies.sh` first): 495 496``` 497make -f tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=mbed TAGS="CMSIS disco_f746ng" generate_micro_speech_mbed_project 498``` 499 500This will create a folder in 501`tensorflow/lite/experimental/micro/tools/make/gen/mbed_cortex-m4/prj/micro_speech_main_test/mbed` 502that contains the source and header files, some Mbed configuration files, and a 503README. You should then be able to copy this directory to another machine, and 504use it just like any other Mbed project. There's more information about project 505files [below](#working-with-generated-projects). 506 507## How to Port TensorFlow Lite Micro to a New Platform 508 509Are you a hardware or operating system provider looking to run machine learning 510on your platform? We're keen to help, and we've had experience helping other 511teams do the same thing, so here are our recommendations. 512 513### Requirements 514 515Since the core neural network operations are pure arithmetic, and don't require 516any I/O or other system-specific functionality, the code doesn't have to have 517many dependencies. We've tried to enforce this, so that it's as easy as possible 518to get TensorFlow Lite Micro running even on 'bare metal' systems without an OS. 519Here are the core requirements that a platform needs to run the framework: 520 521- C/C++ compiler capable of C++11 compatibility. This is probably the most 522 restrictive of the requirements, since C++11 is not as widely adopted in the 523 embedded world as it is elsewhere. We made the decision to require it since 524 one of the main goals of TFL Micro is to share as much code as possible with 525 the wider TensorFlow codebase, and since that relies on C++11 features, we 526 need compatibility to achieve it. We only use a small, sane, subset of C++ 527 though, so don't worry about having to deal with template metaprogramming or 528 similar challenges! 529 530- Debug logging. The core network operations don't need any I/O functions, but 531 to be able to run tests and tell if they've worked as expected, the 532 framework needs some way to write out a string to some kind of debug 533 console. This will vary from system to system, for example on Linux it could 534 just be `fprintf(stderr, debug_string)` whereas an embedded device might 535 write the string out to a specified UART. As long as there's some mechanism 536 for outputting debug strings, you should be able to use TFL Micro on that 537 platform. 538 539- Math library. The C standard `libm.a` library is needed to handle some of 540 the mathematical operations used to calculate neural network results. 541 542- Global variable initialization. We do use a pattern of relying on global 543 variables being set before `main()` is run in some places, so you'll need to 544 make sure your compiler toolchain 545 546And that's it! You may be wondering about some other common requirements that 547are needed by a lot of non-embedded software, so here's a brief list of things 548that aren't necessary to get started with TFL Micro on a new platform: 549 550- Operating system. Since the only platform-specific function we need is 551 `DebugLog()`, there's no requirement for any kind of Posix or similar 552 functionality around files, processes, or threads. 553 554- C or C++ standard libraries. The framework tries to avoid relying on any 555 standard library functions that require linker-time support. This includes 556 things like string functions, but still allows us to use headers like 557 `stdtypes.h` which typically just define constants and typedefs. 558 Unfortunately this distinction isn't officially defined by any standard, so 559 it's possible that different toolchains may decide to require linked code 560 even for the subset we use, but in practice we've found it's usually a 561 pretty obvious decision and stable over platforms and toolchains. 562 563- Dynamic memory allocation. All the TFL Micro code avoids dynamic memory 564 allocation, instead relying on local variables on the stack in most cases, 565 or global variables for a few situations. These are all fixed-size, which 566 can mean some compile-time configuration to ensure there's enough space for 567 particular networks, but does avoid any need for a heap and the 568 implementation of `malloc\new` on a platform. 569 570- Floating point. Eight-bit integer arithmetic is enough for inference on many 571 networks, so if a model sticks to these kind of quantized operations, no 572 floating point instructions should be required or executed by the framework. 573 574### Getting Started 575 576We recommend that you start trying to compile and run one of the simplest tests 577in the framework as your first step. The full TensorFlow codebase can seem 578overwhelming to work with at first, so instead you can begin with a collection 579of self-contained project folders that only include the source files needed for 580a particular test or executable. You can find a set of pre-generated projects 581[here](https://drive.google.com/open?id=1cawEQAkqquK_SO4crReDYqf_v7yAwOY8). 582 583As mentioned above, the one function you will need to implement for a completely 584new platform is debug logging. If your device is just a variation on an existing 585platform you may be able to reuse code that's already been written. To 586understand what's available, begin with the default reference implementation at 587[tensorflow/lite/experimental/micro/debug_log.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/debug_log.cc]), 588which uses fprintf and stderr. If your platform has this level of support for 589the C standard library in its toolchain, then you can just reuse this. 590Otherwise, you'll need to do some research into how your platform and device can 591communicate logging statements to the outside world. As another example, take a 592look at 593[the Mbed version of `DebugLog()`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/mbed/debug_log.cc), 594which creates a UART object and uses it to output strings to the host's console 595if it's connected. 596 597Begin by navigating to the micro_error_reporter_test folder in the pregenerated 598projects you downloaded. Inside here, you'll see a set of folders containing all 599the source code you need. If you look through them, you should find a total of 600around 60 C or C++ files that compiled together will create the test executable. 601There's an example makefile in the directory that lists all of the source files 602and include paths for the headers. If you're building on a Linux or MacOS host 603system, you may just be able to reuse that same makefile to cross-compile for 604your system, as long as you swap out the `CC` and `CXX` variables from their 605defaults, to point to your cross compiler instead (for example 606`arm-none-eabi-gcc` or `riscv64-unknown-elf-gcc`). Otherwise, set up a project 607in the build system you are using. It should hopefully be fairly 608straightforward, since all of the source files in the folder need to be 609compiled, so on many IDEs you can just drag the whole lot in. Then you need to 610make sure that C++11 compatibility is turned on, and that the right include 611paths (as mentioned in the makefile) have been added. 612 613You'll see the default `DebugLog()` implementation in 614'tensorflow/lite/experimental/micro/debug_log.cc' inside the 615micro_error_reporter_test folder. Modify that file to add the right 616implementation for your platform, and then you should be able to build the set 617of files into an executable. Transfer that executable to your target device (for 618example by flashing it), and then try running it. You should see output that 619looks something like this: 620 621``` 622Number: 42 623Badly-formed format string 624Another badly-formed format string 625~~ALL TESTS PASSED~~~ 626``` 627 628If not, you'll need to debug what went wrong, but hopefully with this small 629starting project it should be manageable. 630 631### Troubleshooting 632 633When we've been porting to new platforms, it's often been hard to figure out 634some of the fundamentals like linker settings and other toolchain setup flags. 635If you are having trouble, see if you can find a simple example program for your 636platform, like one that just blinks an LED. If you're able to build and run that 637successfully, then start to swap in parts of the TF Lite Micro codebase to that 638working project, taking it a step at a time and ensuring it's still working 639after every change. For example, a first step might be to paste in your 640`DebugLog()` implementation and call `DebugLog("Hello World!")` from the main 641function. 642 643Another common problem on embedded platforms is the stack size being too small. 644Mbed defaults to 4KB for the main thread's stack, which is too small for most 645models since TensorFlow Lite allocates buffers and other data structures that 646require more memory. The exact size will depend on which model you're running, 647but try increasing it if you are running into strange corruption issues that 648might be related to stack overwriting. 649 650### Optimizing for your Platform 651 652The default reference implementations in TensorFlow Lite Micro are written to be 653portable and easy to understand, not fast, so you'll want to replace performance 654critical parts of the code with versions specifically tailored to your 655architecture. The framework has been designed with this in mind, and we hope the 656combination of small modules and many tests makes it as straightforward as 657possible to swap in your own code a piece at a time, ensuring you have a working 658version at every step. To write specialized implementations for a platform, it's 659useful to understand how optional components are handled inside the build 660system. 661 662### Code Module Organization 663 664We have adopted a system of small modules with platform-specific implementations 665to help with portability. Every module is just a standard `.h` header file 666containing the interface (either functions or a class), with an accompanying 667reference implementation in a `.cc` with the same name. The source file 668implements all of the code that's declared in the header. If you have a 669specialized implementation, you can create a folder in the same directory as the 670header and reference source, name it after your platform, and put your 671implementation in a `.cc` file inside that folder. We've already seen one 672example of this, where the Mbed and Bluepill versions of `DebugLog()` are inside 673[mbed](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/mbed) 674and 675[bluepill](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/bluepill) 676folders, children of the 677[same directory](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro) 678where the stdio-based 679[`debug_log.cc`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/debug_log.cc) 680reference implementation is found. 681 682The advantage of this approach is that we can automatically pick specialized 683implementations based on the current build target, without having to manually 684edit build files for every new platform. It allows incremental optimizations 685from a always-working foundation, without cluttering the reference 686implementations with a lot of variants. 687 688To see why we're doing this, it's worth looking at the alternatives. TensorFlow 689Lite has traditionally used preprocessor macros to separate out some 690platform-specific code within particular files, for example: 691 692``` 693#ifndef USE_NEON 694#if defined(__ARM_NEON__) || defined(__ARM_NEON) 695#define USE_NEON 696#include <arm_neon.h> 697#endif 698``` 699 700There’s also a tradition in gemmlowp of using file suffixes to indicate 701platform-specific versions of particular headers, with kernel_neon.h being 702included by kernel.h if `USE_NEON` is defined. As a third variation, kernels are 703separated out using a directory structure, with 704tensorflow/lite/kernels/internal/reference containing portable implementations, 705and tensorflow/lite/kernels/internal/optimized holding versions optimized for 706NEON on Arm platforms. 707 708These approaches are hard to extend to multiple platforms. Using macros means 709that platform-specific code is scattered throughout files in a hard-to-find way, 710and can make following the control flow difficult since you need to understand 711the macro state to trace it. For example, I temporarily introduced a bug that 712disabled NEON optimizations for some kernels when I removed 713tensorflow/lite/kernels/internal/common.h from their includes, without realizing 714it was where USE_NEON was defined! 715 716It’s also tough to port to different build systems, since figuring out the right 717combination of macros to use can be hard, especially since some of them are 718automatically defined by the compiler, and others are only set by build scripts, 719often across multiple rules. 720 721The approach we are using extends the file system approach that we use for 722kernel implementations, but with some specific conventions: 723 724- For each module in TensorFlow Lite, there will be a parent directory that 725 contains tests, interface headers used by other modules, and portable 726 implementations of each part. 727- Portable means that the code doesn’t include code from any libraries except 728 flatbuffers, or other TF Lite modules. You can include a limited subset of 729 standard C or C++ headers, but you can’t use any functions that require 730 linking against those libraries, including fprintf, etc. You can link 731 against functions in the standard math library, in <math.h>. 732- Specialized implementations are held inside subfolders of the parent 733 directory, named after the platform or library that they depend on. So, for 734 example if you had my_module/foo.cc, a version that used RISC-V extensions 735 would live in my_module/riscv/foo.cc. If you had a version that used the 736 CMSIS library, it should be in my_module/cmsis/foo.cc. 737- These specialized implementations should completely replace the top-level 738 implementations. If this involves too much code duplication, the top-level 739 implementation should be split into smaller files, so only the 740 platform-specific code needs to be replaced. 741- There is a convention about how build systems pick the right implementation 742 file. There will be an ordered list of 'tags' defining the preferred 743 implementations, and to generate the right list of source files, each module 744 will be examined in turn. If a subfolder with a tag’s name contains a .cc 745 file with the same base name as one in the parent folder, then it will 746 replace the parent folder’s version in the list of build files. If there are 747 multiple subfolders with matching tags and file names, then the tag that’s 748 latest in the ordered list will be chosen. This allows us to express “I’d 749 like generically-optimized fixed point if it’s available, but I’d prefer 750 something using the CMSIS library” using the list 'fixed_point cmsis'. These 751 tags are passed in as `TAGS="<foo>"` on the command line when you use the 752 main Makefile to build. 753- There is an implicit “reference” tag at the start of every list, so that 754 it’s possible to support directory structures like the current 755 tensorflow/kernels/internal where portable implementations are held in a 756 “reference” folder that’s a sibling to the NEON-optimized folder. 757- The headers for each unit in a module should remain platform-agnostic, and 758 be the same for all implementations. Private headers inside a sub-folder can 759 be used as needed, but shouldn’t be referred to by any portable code at the 760 top level. 761- Tests should be at the parent level, with no platform-specific code. 762- No platform-specific macros or #ifdef’s should be used in any portable code. 763 764The implementation of these rules is handled inside the Makefile, with a 765[`specialize` function](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/tools/make/helper_functions.inc#L42) 766that takes a list of reference source file paths as an input, and returns the 767equivalent list with specialized versions of those files swapped in if they 768exist. 769 770### Working with Generated Projects 771 772So far, I've recommended that you use the standalone generated projects for your 773system. You might be wondering why you're not just checking out the full 774[TensorFlow codebase from GitHub](https://github.com/tensorflow/tensorflow/)? 775The main reason is that there is a lot more diversity of architectures, IDEs, 776support libraries, and operating systems in the embedded world. Many of the 777toolchains require their own copy of source files, or a list of sources to be 778written to a project file. When a developer working on TensorFlow adds a new 779source file or changes its location, we can't expect her to update multiple 780different project files, many of which she may not have the right software to 781verify the change was correct. That means we have to rely on a central listing 782of source files (which in our case is held in the makefile), and then call a 783tool to generate other project files from those. We could ask embedded 784developers to do this process themselves after downloading the main source, but 785running the makefile requires a Linux system which may not be available, takes 786time, and involves downloading a lot of dependencies. That is why we've opted to 787make regular snapshots of the results of generating these projects for popular 788IDEs and platforms, so that embedded developers have a fast and friendly way to 789start using TensorFlow Lite for Microcontrollers. 790 791This does have the disadvantage that you're no longer working directly on the 792main repository, instead you have a copy that's outside of source control. We've 793tried to make the copy as similar to the main repo as possible, for example by 794keeping the paths of all source files the same, and ensuring that there are no 795changes between the copied files and the originals, but it still makes it 796tougher to sync as the main repository is updated. There are also multiple 797copies of the source tree, one for each target, so any change you make to one 798copy has to be manually propagated across all the other projects you care about. 799This doesn't matter so much if you're just using the projects as they are to 800build products, but if you want to support a new platform and have the changes 801reflected in the main code base, you'll have to do some extra work. 802 803As an example, think about the `DebugLog()` implementation we discussed adding 804for a new platform earlier. At this point, you have a new version of 805`debug_log.cc` that does what's required, but how can you share that with the 806wider community? The first step is to pick a tag name for your platform. This 807can either be the operating system (for example 'mbed'), the name of a device 808('bluepill'), or some other text that describes it. This should be a short 809string with no spaces or special characters. Log in or create an account on 810GitHub, fork the full 811[TensorFlow codebase](https://github.com/tensorflow/tensorflow/) using the 812'Fork' button on the top left, and then grab your fork by using a command like 813`git clone https://github.com/<your user name>/tensorflow`. 814 815You'll either need Linux, MacOS, or Windows with something like CygWin installed 816to run the next steps, since they involve building a makefile. Run the following 817commands from a terminal, inside the root of the source folder: 818 819``` 820tensorflow/lite/experimental/micro/tools/make/download_dependencies.sh 821make -f tensorflow/lite/experimental/micro/tools/make/Makefile generate_projects 822``` 823 824This will take a few minutes, since it has to download some large toolchains for 825the dependencies. Once it has finished, you should see some folders created 826inside a path like 827`tensorflow/lite/experimental/micro/tools/make/gen/linux_x86_64/prj/`. The exact 828path depends on your host operating system, but you should be able to figure it 829out from all the copy commands. These folders contain the generated project and 830source files, with 831`tensorflow/lite/experimental/micro/tools/make/gen/linux_x86_64/prj/keil` 832containing the Keil uVision targets, 833`tensorflow/lite/experimental/micro/tools/make/gen/linux_x86_64/prj/mbed` with 834the Mbed versions, and so on. 835 836If you've got this far, you've successfully set up the project generation flow. 837Now you need to add your specialized implementation of `DebugLog()`. Start by 838creating a folder inside `tensorflow/lite/experimental/micro/` named after the 839tag you picked earlier. Put your `debug_log.cc` file inside this folder, and 840then run this command, with '<your tag>' replaced by the actual folder name: 841 842``` 843make -f tensorflow/lite/experimental/micro/tools/make/Makefile TAGS="<your tag>" generate_projects 844``` 845 846If your tag name actually refers to a whole target architecture, then you'll use 847TARGET or TARGET_ARCH instead. For example, here's how a simple RISC-V set of 848projects is generated: 849 850``` 851make -f tensorflow/lite/experimental/micro/tools/make/Makefile TARGET="riscv32_mcu" generate_projects 852``` 853 854The way it works is the same as TAGS though, it just looks for specialized 855implementations with the same containing folder name. 856 857If you look inside the projects that have been created, you should see that the 858default `DebugLog()` implementation is no longer present at 859`tensorflow/lite/experimental/micro/debug_log.cc`, and instead 860`tensorflow/lite/experimental/micro/<your tag>/debug_log.cc` is being used. Copy 861over the generated project files and try building them in your own IDE. If 862everything works, then you're ready to submit your change. 863 864To do this, run something like: 865 866``` 867git add tensorflow/lite/experimental/micro/<your tag>/debug_log.cc 868git commit -a -m "Added DebugLog() support for <your platform>" 869git push origin master 870``` 871 872Then go back to https://github.com/<your account>/tensorflow, and choose "New 873Pull Request" near the top. You should then be able to go through the standard 874TensorFlow PR process to get your change added to the main repository, and 875available to the rest of the community! 876 877### Supporting a Platform with Makefiles 878 879The changes you've made so far will enable other developers using the generated 880projects to use your platform, but TensorFlow's continuous integration process 881uses makefiles to build frequently and ensure changes haven't broken the build 882process for different systems. If you are able to convert your build procedure 883into something that can be expressed by a makefile, then we can integrate your 884platform into our CI builds and make sure it continues to work. 885 886Fully describing how to do this is beyond the scope of this documentation, but 887the biggest needs are: 888 889- A command-line compiler that can be called for every source file. 890- A list of the arguments to pass into the compiler to build and link all 891 files. 892- The correct linker map files and startup assembler to ensure `main()` gets 893 called. 894 895### Supporting a Platform with Emulation Testing 896 897Integrating your platform into the makefile process should help us make sure 898that it continues to build, but it doesn't guarantee that the results of the 899build process will run correctly. Running tests is something we require to be 900able to say that TensorFlow officially supports a platform, since otherwise we 901can't guarantee that users will have a good experience when they try using it. 902Since physically maintaining a full set of all supported hardware devices isn't 903feasible, we rely on software emulation to run these tests. A good example is 904our 905[STM32F4 'Bluepill' support](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/testing/test_bluepill_binary.sh), 906which uses [Docker](https://www.docker.com/) and [Renode](https://renode.io/) to 907run built binaries in an emulator. You can use whatever technologies you want, 908the only requirements are that they capture the debug log output of the tests 909being run in the emulator, and parse them for the string that indicates the test 910was successful. These scripts need to run on Ubuntu 18.04, in a bash 911environment, though Docker is available if you need to install extra software or 912have other dependencies. 913 914### Implementing More Optimizations 915 916Clearly, getting debug logging support is only the beginning of the work you'll 917need to do on a particular platform. It's very likely that you'll want to 918optimize the core deep learning operations that take up the most time when 919running models you care about. The good news is that the process for providing 920optimized implementations is the same as the one you just went through to 921provide your own logging. You'll need to identify parts of the code that are 922bottlenecks, and then add specialized implementations in their own folders. 923These don't need to be platform specific, they can also be broken out by which 924library they rely on for example. [Here's where we do that for the CMSIS 925implementation of integer fast-fourier 926transforms](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/examples/micro_speech/CMSIS/preprocessor.cc). 927This more complex case shows that you can also add helper source files alongside 928the main implementation, as long as you 929[mention them in the platform-specific makefile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/examples/micro_speech/CMSIS/Makefile.inc). 930You can also do things like update the list of libraries that need to be linked 931in, or add include paths to required headers. 932