page.title=Testing Display Performance page.article=true page.image=images/cards/card-test-performance_2x.png page.keywords=performance, fps, tools @jd:body
User interface (UI) performance testing ensures that your app not only meets its functional requirements, but that user interactions with your app are buttery smooth, running at a consistent 60 frames per second (why 60fps?), without any dropped or delayed frames, or as we like to call it, jank. This document explains tools available to measure UI performance, and lays out an approach to integrate UI performance measurements into your testing practices.
In order to improve performance you first need the ability to measure the performance of your system, and then diagnose and identify problems that may arrive from various parts of your pipeline.
dumpsys is an Android tool that runs on the device and dumps interesting information about the status of system services. Passing the gfxinfo command to dumpsys provides an output in logcat with performance information relating to frames of animation that are occurring during the recording phase.
> adb shell dumpsys gfxinfo <PACKAGE_NAME>
This command can produce multiple different variants of frame timing data.
With the M Preview the command prints out aggregated analysis of frame data to logcat, collected across the entire lifetime of the process. For example:
Stats since: 752958278148ns Total frames rendered: 82189 Janky frames: 35335 (42.99%) 90th percentile: 34ms 95th percentile: 42ms 99th percentile: 69ms Number Missed Vsync: 4706 Number High input latency: 142 Number Slow UI thread: 17270 Number Slow bitmap uploads: 1542 Number Slow draw: 23342
These high level statistics convey at a high level the rendering performance of the app, as well as its stability across many frames.
With the M Preview comes a new command for gfxinfo, and that’s framestats which provides extremely detailed frame timing information from recent frames, so that you can track down and debug problems more accurately.
>adb shell dumpsys gfxinfo <PACKAGE_NAME> framestats
This command prints out frame timing information, with nanosecond timestamps, from the last 120 frames produced by the app. Below is example raw output from adb dumpsys gfxinfo <PACKAGE_NAME> framestats:
0,27965466202353,27965466202353,27965449758000,27965461202353,27965467153286,27965471442505,27965471925682,27965474025318,27965474588547,27965474860786,27965475078599,27965479796151,27965480589068, 0,27965482993342,27965482993342,27965465835000,27965477993342,27965483807401,27965486875630,27965487288443,27965489520682,27965490184380,27965490568703,27965491408078,27965496119641,27965496619641, 0,27965499784331,27965499784331,27965481404000,27965494784331,27965500785318,27965503736099,27965504201151,27965506776568,27965507298443,27965507515005,27965508405474,27965513495318,27965514061984, 0,27965516575320,27965516575320,27965497155000,27965511575320,27965517697349,27965521276151,27965521734797,27965524350474,27965524884536,27965525160578,27965526020891,27965531371203,27965532114484,
Each line of this output represents a frame produced by the app. Each line has a fixed number of columns describing time spent in each stage of the frame-producing pipeline. The next section describes this format in detail, including what each column represents.
Since the block of data is output in CSV format, it's very straightforward to paste it to your spreadsheet tool of choice, or collect and parse with a script. The following table explains the format of the output data columns. All timestamps are in nanoseconds.
You can use this data in different ways. One simple but useful visualization is a histogram showing the distribution of frames times (FRAME_COMPLETED - INTENDED_VSYNC) in different latency buckets, see figure below. This graph tells us at a glance that most frames were very good - well below the 16ms deadline (depicted in red), but a few frames were significantly over the deadline. We can look at changes in this histogram over time to see wholesale shifts or new outliers being created. You can also graph input latency, time spent in layout, or other similar interesting metrics based on the many timestamps in the data.
If Profile GPU rendering is set to In adb shell dumpsys gfxinfo
in Developer Options, the adb shell dumpsys gfxinfo
command prints out timing
information for the most recent 120 frames, broken into a few different categories with
tab-separated-values. This data can be useful for indicating which parts of the drawing pipeline
may be slow at a high level.
Similar to framestats above, it's very straightforward to paste it to your spreadsheet tool of choice, or collect and parse with a script. The following graph shows a breakdown of where many frames produced by the app were spending their time.
The result of running gfxinfo, copying the output, pasting it into a spreadsheet application, and graphing the data as stacked bars.
Each vertical bar represents one frame of animation; its height represents the number of milliseconds it took to compute that frame of animation. Each colored segment of the bar represents a different stage of the rendering pipeline, so that you can see what parts of your application may be creating a bottleneck. For more information on understanding the rendering pipeline, and how to optimize for it, see the Invalidations Layouts and Performance video.
Both the framestats and simple frame timings gather data over a very short window - about two seconds worth of rendering. In order to precisely control this window of time - for example, to constrain the data to a particular animation - you can reset all counters, and aggregate statistics gathered.
>adb shell dumpsys gfxinfo <PACKAGE_NAME> reset
This can also be used in conjunction with the dumping commands themselves to collect and reset at a regular cadence, capturing less-than-two-second windows of frames continuously.
Identification of regressions is a good first step to tracking down problems, and maintaining high application health. However, dumpsys just identifies the existence and relative severity of problems. You still need to diagnose the particular cause of the performance problems, and find appropriate ways to fix them. For that, it’s highly recommended to use the systrace tool.
For more information on how Android’s rendering pipeline works, common problems that you can find there, and how to fix them, some of the following resources may be useful to you:
One approach to UI Performance testing is to simply have a human tester perform a set of user operations on the target app, and either visually look for jank, or spend an very large amount of time using a tool-driven approach to find it. But this manual approach is fraught with peril - human ability to perceive frame rate changes varies tremendously, and this is also time consuming, tedious, and error prone.
A more efficient approach is to log and analyze key performance metrics from automated UI tests. The Android M developer preview includes new logging capabilities which make it easy to determine the amount and severity of jank in your application’s animations, and that can be used to build a rigorous process to determine your current performance and track future performance objectives.
This article walks you through a recommended approach to using that data to automate your performance testing.
This is mostly broken down into two key actions. Firstly, identifying what you're testing, and how you’re testing it. and Secondly, setting up, and maintaining an automated testing environment.
Before you can get started with automated testing, it’s important to determine a few high level decisions, in order to properly understand your test space, and needs you may have.
Remember that bad performance is most visible to users when it interrupts a smooth animation. As such, when identifying what types of UI actions to test for, it’s useful to focus on the key animations that users see most, or are most important to their experience. For example, here are some common scenarios that may be useful to identify:
Work with engineers, designers, and product managers on your team to prioritize these key product animations for test coverage.
From a high-level, it may be critical to identify your specific performance goals, and focus on writing tests, and collecting data around them. For example:
In all of these cases, you’ll want historical tracking which shows performance across multiple versions of your application.
Application performance varies depending on the device it's running on. Some devices may contain less memory, less powerful GPUs, or slower CPU chips. This means that animations which may perform well on one set of hardware, may not on others, and worse, may be a result of a bottleneck in a different part of the pipeline. So, to account for this variation in what a user might see, pick a range of devices to execute tests on, both current high end devices, low end devices, tablets, etc. Look for variation in CPU performance, RAM, screen density, size, and so on. Tests that pass on a high end device may fail on a low end device.
Tool suites, like UI Automator and Espresso, are built to help automate the action of a user moving through your application. These are simple frameworks which mimic user interaction with your device. To use these frameworks, you effectively create unique scripts, which run through a set of user-actions, and play them out on the device itself.
By combining these automated tests, alongside dumpsys gfxinfo
you can quickly
create a reproducible system that allows you to execute a test, and measure the
performance information of that particular condition.
Once you have the ability to execute a UI test, and a pipeline to gather the data from a single test, the next important step is to embrace a framework which can execute that test multiple times, across multiple devices, and aggregate the resulting performance data for further analysis by your development team.
It’s worth noting that UI testing frameworks (like UI Automator) run on the target device/emulator directly. While performance gathering information done by dumpsys gfxinfo is driven by a host machine, sending commands over ADB. To help bridge the automation of these separate entities, MonkeyRunner framework was developed; A scripting system that runs on your host machine, which can issue commands to a set of connected devices, as well as receive data from them.
Building a set of scripts for proper automation of UI Performance testing, at a minimum, should be able to utilize monkeyRunner to accomplish the following tasks:
Once problem patterns or regressions are identified, the next step is identifying and applying the fix. If your automated test framework preserves precise timing breakdowns for frames, it can help you scrutinize recent suspicious code/layout changes (in the case of regression), or narrow down the part of the system you’re analyzing when you switch to manual investigation. For manual investigation, systrace is a great place to start, showing precise timing information about every stage of the rendering pipeline, every thread and core in the system, as well as any custom event markers you define.
It is important to note the difficulties in obtaining and measuring timings that come from rendering performance. These numbers are, by nature, non deterministic, and often fluctuate depending on the state of the system, amount of memory available, thermal throttling, and the last time a sun flare hit your area of the earth. The point is that you can run the same test, twice and get slightly different numbers that may be close to each other, but not exact.
Properly gathering and profiling data in this manner means running the same test, multiple times, and accumulating the results as an average, or median value. (for the sake of simplicity, let’s call this a ‘batch’) This gives you the rough approximation of the performance of the test, while not needing exact timings.
Batches can be used between code changes to see the relative impact of those changes on performance. If the average frame rate for the pre-change Batch is larger than the post-change batch, then you generally have an overall win wrt performance for that particular change.
This means that any Automated UI testing you do should take this concept into consideration, and also account for any anomalies that might occur during a test. For example, if your application performance suddenly dips, due to some device issue (that isn’t caused by your application) then you may want to re-run the batch in order to get less chaotic timings.
So, how many times should you run a test, before the measurements become meaningful? 10 times should be the minimum, with higher numbers like 50 or 100 yielding more accurate results (of course, you’re now trading off time for accuracy)