training/articles/perf-tips.jd

page.title=Performance Tips
page.article=true
@jd:body

<div id="tb-wrapper">
<div id="tb">

<h2>In this document</h2>
<ol class="nolist">
  <li><a href="#ObjectCreation">Avoid Creating Unnecessary Objects</a></li>
  <li><a href="#PreferStatic">Prefer Static Over Virtual</a></li>
  <li><a href="#UseFinal">Use Static Final For Constants</a></li>
  <li><a href="#GettersSetters">Avoid Internal Getters/Setters</a></li>
  <li><a href="#Loops">Use Enhanced For Loop Syntax</a></li>
  <li><a href="#PackageInner">Consider Package Instead of Private Access with Private Inner Classes</a></li>
  <li><a href="#AvoidFloat">Avoid Using Floating-Point</a></li>
  <li><a href="#UseLibraries">Know and Use the Libraries</a></li>
  <li><a href="#NativeMethods">Use Native Methods Carefully</a></li>
  <li><a href="#native_methods">Use Native Methods Judiciously</a></li>
  <li><a href="#closing_notes">Closing Notes</a></li>
</ol>

</div>
</div>

<p>This document primarily covers micro-optimizations that can improve overall app performance
when combined, but it's unlikely that these changes will result in dramatic
performance effects. Choosing the right algorithms and data structures should always be your
priority, but is outside the scope of this document. You should use the tips in this document
as general coding practices that you can incorporate into your habits for general code
efficiency.</p>

<p>There are two basic rules for writing efficient code:</p>
<ul>
    <li>Don't do work that you don't need to do.</li>
    <li>Don't allocate memory if you can avoid it.</li>
</ul>

<p>One of the trickiest problems you'll face when micro-optimizing an Android
app is that your app is certain to be running on multiple types of
hardware. Different versions of the VM running on different
processors running at different speeds. It's not even generally the case
that you can simply say "device X is a factor F faster/slower than device Y",
and scale your results from one device to others. In particular, measurement
on the emulator tells you very little about performance on any device. There
are also huge differences between devices with and without a
<acronym title="Just In Time compiler">JIT</acronym>: the best
code for a device with a JIT is not always the best code for a device
without.</p>

<p>To ensure your app performs well across a wide variety of devices, ensure
your code is efficient at all levels and agressively optimize your performance.</p>


<h2 id="ObjectCreation">Avoid Creating Unnecessary Objects</h2>

<p>Object creation is never free. A generational garbage collector with per-thread allocation
pools for temporary objects can make allocation cheaper, but allocating memory
is always more expensive than not allocating memory.</p>

<p>As you allocate more objects in your app, you will force a periodic
garbage collection, creating little "hiccups" in the user experience. The
concurrent garbage collector introduced in Android 2.3 helps, but unnecessary work
should always be avoided.</p>

<p>Thus, you should avoid creating object instances you don't need to.  Some
examples of things that can help:</p>

<ul>
    <li>If you have a method returning a string, and you know that its result
    will always be appended to a {@link java.lang.StringBuffer} anyway, change your signature
    and implementation so that the function does the append directly,
    instead of creating a short-lived temporary object.</li>
    <li>When extracting strings from a set of input data, try
    to return a substring of the original data, instead of creating a copy.
    You will create a new {@link java.lang.String} object, but it will share the {@code char[]}
    with the data. (The trade-off being that if you're only using a small
    part of the original input, you'll be keeping it all around in memory
    anyway if you go this route.)</li>
</ul>

<p>A somewhat more radical idea is to slice up multidimensional arrays into
parallel single one-dimension arrays:</p>

<ul>
    <li>An array of {@code int}s is a much better than an array of {@link java.lang.Integer}
    objects,
    but this also generalizes to the fact that two parallel arrays of ints
    are also a <strong>lot</strong> more efficient than an array of {@code (int,int)}
    objects.  The same goes for any combination of primitive types.</li>

    <li>If you need to implement a container that stores tuples of {@code (Foo,Bar)}
    objects, try to remember that two parallel {@code Foo[]} and {@code Bar[]} arrays are
    generally much better than a single array of custom {@code (Foo,Bar)} objects.
    (The exception to this, of course, is when you're designing an API for
    other code to access. In those cases, it's usually better to make a small
    compromise to the speed in order to achieve a good API design. But in your own internal
    code, you should try and be as efficient as possible.)</li>
</ul>

<p>Generally speaking, avoid creating short-term temporary objects if you
can.  Fewer objects created mean less-frequent garbage collection, which has
a direct impact on user experience.</p>


<h2 id="PreferStatic">Prefer Static Over Virtual</h2>

<p>If you don't need to access an object's fields, make your method static.
Invocations will be about 15%-20% faster.
It's also good practice, because you can tell from the method
signature that calling the method can't alter the object's state.</p>


<h2 id="UseFinal">Use Static Final For Constants</h2>

<p>Consider the following declaration at the top of a class:</p>

<pre>
static int intVal = 42;
static String strVal = "Hello, world!";
</pre>

<p>The compiler generates a class initializer method, called
<code>&lt;clinit&gt;</code>, that is executed when the class is first used.
The method stores the value 42 into <code>intVal</code>, and extracts a
reference from the classfile string constant table for <code>strVal</code>.
When these values are referenced later on, they are accessed with field
lookups.</p>

<p>We can improve matters with the "final" keyword:</p>

<pre>
static final int intVal = 42;
static final String strVal = "Hello, world!";
</pre>

<p>The class no longer requires a <code>&lt;clinit&gt;</code> method,
because the constants go into static field initializers in the dex file.
Code that refers to <code>intVal</code> will use
the integer value 42 directly, and accesses to <code>strVal</code> will
use a relatively inexpensive "string constant" instruction instead of a
field lookup.</p>

<p class="note"><strong>Note:</strong> This optimization applies only to primitive types and
{@link java.lang.String} constants, not arbitrary reference types. Still, it's good
practice to declare constants <code>static final</code> whenever possible.</p>


<h2 id="GettersSetters">Avoid Internal Getters/Setters</h2>

<p>In native languages like C++ it's common practice to use getters
(<code>i = getCount()</code>) instead of accessing the field directly (<code>i
= mCount</code>). This is an excellent habit for C++ and is often practiced in other
object oriented languages like C# and Java, because the compiler can
usually inline the access, and if you need to restrict or debug field access
you can add the code at any time.</p>

<p>However, this is a bad idea on Android.  Virtual method calls are expensive,
much more so than instance field lookups.  It's reasonable to follow
common object-oriented programming practices and have getters and setters
in the public interface, but within a class you should always access
fields directly.</p>

<p>Without a <acronym title="Just In Time compiler">JIT</acronym>,
direct field access is about 3x faster than invoking a
trivial getter. With the JIT (where direct field access is as cheap as
accessing a local), direct field access is about 7x faster than invoking a
trivial getter.</p>

<p>Note that if you're using <a href="{@docRoot}tools/help/proguard.html">ProGuard</a>,
you can have the best of both worlds because ProGuard can inline accessors for you.</p>


<h2 id="Loops">Use Enhanced For Loop Syntax</h2>

<p>The enhanced <code>for</code> loop (also sometimes known as "for-each" loop) can be used
for collections that implement the {@link java.lang.Iterable} interface and for arrays.
With collections, an iterator is allocated to make interface calls
to {@code hasNext()} and {@code next()}. With an {@link java.util.ArrayList},
a hand-written counted loop is
about 3x faster (with or without JIT), but for other collections the enhanced
for loop syntax will be exactly equivalent to explicit iterator usage.</p>

<p>There are several alternatives for iterating through an array:</p>

<pre>
static class Foo {
    int mSplat;
}

Foo[] mArray = ...

public void zero() {
    int sum = 0;
    for (int i = 0; i &lt; mArray.length; ++i) {
        sum += mArray[i].mSplat;
    }
}

public void one() {
    int sum = 0;
    Foo[] localArray = mArray;
    int len = localArray.length;

    for (int i = 0; i &lt; len; ++i) {
        sum += localArray[i].mSplat;
    }
}

public void two() {
    int sum = 0;
    for (Foo a : mArray) {
        sum += a.mSplat;
    }
}
</pre>

<p><code>zero()</code> is slowest, because the JIT can't yet optimize away
the cost of getting the array length once for every iteration through the
loop.</p>

<p><code>one()</code> is faster. It pulls everything out into local
variables, avoiding the lookups. Only the array length offers a performance
benefit.</p>

<p><code>two()</code> is fastest for devices without a JIT, and
indistinguishable from <strong>one()</strong> for devices with a JIT.
It uses the enhanced for loop syntax introduced in version 1.5 of the Java
programming language.</p>

<p>So, you should use the enhanced <code>for</code> loop by default, but consider a
hand-written counted loop for performance-critical {@link java.util.ArrayList} iteration.</p>

<p class="note"><strong>Tip:</strong>
Also see Josh Bloch's <em>Effective Java</em>, item 46.</p>


<h2 id="PackageInner">Consider Package Instead of Private Access with Private Inner Classes</h2>

<p>Consider the following class definition:</p>

<pre>
public class Foo {
    private class Inner {
        void stuff() {
            Foo.this.doStuff(Foo.this.mValue);
        }
    }

    private int mValue;

    public void run() {
        Inner in = new Inner();
        mValue = 27;
        in.stuff();
    }

    private void doStuff(int value) {
        System.out.println("Value is " + value);
    }
}</pre>

<p>What's important here is that we define a private inner class
(<code>Foo$Inner</code>) that directly accesses a private method and a private
instance field in the outer class. This is legal, and the code prints "Value is
27" as expected.</p>

<p>The problem is that the VM considers direct access to <code>Foo</code>'s
private members from <code>Foo$Inner</code> to be illegal because
<code>Foo</code> and <code>Foo$Inner</code> are different classes, even though
the Java language allows an inner class to access an outer class' private
members. To bridge the gap, the compiler generates a couple of synthetic
methods:</p>

<pre>
/*package*/ static int Foo.access$100(Foo foo) {
    return foo.mValue;
}
/*package*/ static void Foo.access$200(Foo foo, int value) {
    foo.doStuff(value);
}</pre>

<p>The inner class code calls these static methods whenever it needs to
access the <code>mValue</code> field or invoke the <code>doStuff()</code> method
in the outer class. What this means is that the code above really boils down to
a case where you're accessing member fields through accessor methods.
Earlier we talked about how accessors are slower than direct field
accesses, so this is an example of a certain language idiom resulting in an
"invisible" performance hit.</p>

<p>If you're using code like this in a performance hotspot, you can avoid the
overhead by declaring fields and methods accessed by inner classes to have
package access, rather than private access. Unfortunately this means the fields
can be accessed directly by other classes in the same package, so you shouldn't
use this in public API.</p>


<h2 id="AvoidFloat">Avoid Using Floating-Point</h2>

<p>As a rule of thumb, floating-point is about 2x slower than integer on
Android-powered devices.</p>

<p>In speed terms, there's no difference between <code>float</code> and
<code>double</code> on the more modern hardware. Space-wise, <code>double</code>
is 2x larger. As with desktop machines, assuming space isn't an issue, you
should prefer <code>double</code> to <code>float</code>.</p>

<p>Also, even for integers, some processors have hardware multiply but lack
hardware divide. In such cases, integer division and modulus operations are
performed in software&mdash;something to think about if you're designing a
hash table or doing lots of math.</p>


<h2 id="UseLibraries">Know and Use the Libraries</h2>

<p>In addition to all the usual reasons to prefer library code over rolling
your own, bear in mind that the system is at liberty to replace calls
to library methods with hand-coded assembler, which may be better than the
best code the JIT can produce for the equivalent Java. The typical example
here is {@link java.lang.String#indexOf String.indexOf()} and
related APIs, which Dalvik replaces with
an inlined intrinsic. Similarly, the {@link java.lang.System#arraycopy
System.arraycopy()} method
is about 9x faster than a hand-coded loop on a Nexus One with the JIT.</p>


<p class="note"><strong>Tip:</strong>
Also see Josh Bloch's <em>Effective Java</em>, item 47.</p>


<h2 id="NativeMethods">Use Native Methods Carefully</h2>

<p>Developing your app with native code using the
<a href="{@docRoot}tools/sdk/ndk/index.html">Android NDK</a>
isn't necessarily more efficient than programming with the
Java language. For one thing,
there's a cost associated with the Java-native transition, and the JIT can't
optimize across these boundaries. If you're allocating native resources (memory
on the native heap, file descriptors, or whatever), it can be significantly
more difficult to arrange timely collection of these resources. You also
need to compile your code for each architecture you wish to run on (rather
than rely on it having a JIT). You may even have to compile multiple versions
for what you consider the same architecture: native code compiled for the ARM
processor in the G1 can't take full advantage of the ARM in the Nexus One, and
code compiled for the ARM in the Nexus One won't run on the ARM in the G1.</p>

<p>Native code is primarily useful when you have an existing native codebase
that you want to port to Android, not for "speeding up" parts of your Android app
written with the Java language.</p>

<p>If you do need to use native code, you should read our
<a href="{@docRoot}guide/practices/jni.html">JNI Tips</a>.</p>

<p class="note"><strong>Tip:</strong>
Also see Josh Bloch's <em>Effective Java</em>, item 54.</p>


<h2 id="Myths">Performance Myths</h2>


<p>On devices without a JIT, it is true that invoking methods via a
variable with an exact type rather than an interface is slightly more
efficient. (So, for example, it was cheaper to invoke methods on a
<code>HashMap map</code> than a <code>Map map</code>, even though in both
cases the map was a <code>HashMap</code>.) It was not the case that this
was 2x slower; the actual difference was more like 6% slower. Furthermore,
the JIT makes the two effectively indistinguishable.</p>

<p>On devices without a JIT, caching field accesses is about 20% faster than
repeatedly accessing the field. With a JIT, field access costs about the same
as local access, so this isn't a worthwhile optimization unless you feel it
makes your code easier to read. (This is true of final, static, and static
final fields too.)


<h2 id="Measure">Always Measure</h2>

<p>Before you start optimizing, make sure you have a problem that you
need to solve. Make sure you can accurately measure your existing performance,
or you won't be able to measure the benefit of the alternatives you try.</p>

<p>Every claim made in this document is backed up by a benchmark. The source
to these benchmarks can be found in the <a
href="http://code.google.com/p/dalvik/source/browse/#svn/trunk/benchmarks">code.google.com
"dalvik" project</a>.</p>

<p>The benchmarks are built with the
<a href="http://code.google.com/p/caliper/">Caliper</a> microbenchmarking
framework for Java. Microbenchmarks are hard to get right, so Caliper goes out
of its way to do the hard work for you, and even detect some cases where you're
not measuring what you think you're measuring (because, say, the VM has
managed to optimize all your code away). We highly recommend you use Caliper
to run your own microbenchmarks.</p>

<p>You may also find
<a href="{@docRoot}tools/debugging/debugging-tracing.html">Traceview</a> useful
for profiling, but it's important to realize that it currently disables the JIT,
which may cause it to misattribute time to code that the JIT may be able to win
back. It's especially important after making changes suggested by Traceview
data to ensure that the resulting code actually runs faster when run without
Traceview.</p>

<p>For more help profiling and debugging your apps, see the following documents:</p>

<ul>
  <li><a href="{@docRoot}tools/debugging/debugging-tracing.html">Profiling with
    Traceview and dmtracedump</a></li>
  <li><a href="{@docRoot}tools/debugging/systrace.html">Analysing Display and Performance
    with Systrace</a></li>
</ul>