1<html> 2<head> 3 <title>Dalvik Optimization and Verification</title> 4</head> 5 6<body> 7<h1>Dalvik Optimization and Verification With <i>dexopt</i></h1> 8 9<p> 10The Dalvik virtual machine was designed specifically for the Android 11mobile platform. The target systems have little RAM, store data on slow 12internal flash memory, and generally have the performance characteristics 13of decade-old desktop systems. They also run Linux, which provides 14virtual memory, processes and threads, and UID-based security mechanisms. 15<p> 16The features and limitations caused us to focus on certain goals: 17 18<ul> 19 <li>Class data, notably bytecode, must be shared between multiple 20 processes to minimize total system memory usage. 21 <li>The overhead in launching a new app must be minimized to keep 22 the device responsive. 23 <li>Storing class data in individual files results in a lot of 24 redundancy, especially with respect to strings. To conserve disk 25 space we need to factor this out. 26 <li>Parsing class data fields adds unnecessary overhead during 27 class loading. Accessing data values (e.g. integers and strings) 28 directly as C types is better. 29 <li>Bytecode verification is necessary, but slow, so we want to verify 30 as much as possible outside app execution. 31 <li>Bytecode optimization (quickened instructions, method pruning) is 32 important for speed and battery life. 33 <li>For security reasons, processes may not edit shared code. 34</ul> 35 36<p> 37The typical VM implementation uncompresses individual classes from a 38compressed archive and stores them on the heap. This implies a separate 39copy of each class in every process, and slows application startup because 40the code must be uncompressed (or at least read off disk in many small 41pieces). On the other hand, having the bytecode on the local heap makes 42it easy to rewrite instructions on first use, facilitating a number of 43different optimizations. 44<p> 45The goals led us to make some fundamental decisions: 46 47<ul> 48 <li>Multiple classes are aggregated into a single "DEX" file. 49 <li>DEX files are mapped read-only and shared between processes. 50 <li>Byte ordering and word alignment are adjusted to suit the local 51 system. 52 <li>Bytecode verification is mandatory for all classes, but we want 53 to "pre-verify" whatever we can. 54 <li>Optimizations that require rewriting bytecode must be done ahead 55 of time. 56</ul> 57 58<p> 59The consequences of these decisions are explained in the following sections. 60 61 62<h2>VM Operation</h2> 63 64<p> 65Application code is delivered to the system in a <code>.jar</code> 66or <code>.apk</code> file. These are really just <code>.zip</code> 67archives with some meta-data files added. The Dalvik DEX data file 68is always called <code>classes.dex</code>. 69<p> 70The bytecode cannot be memory-mapped and executed directly from the zip 71file, because the data is compressed and the start of the file is not 72guaranteed to be word-aligned. These problems could be addressed by 73storing <code>classes.dex</code> without compression and padding out the zip 74file, but that would increase the size of the package sent across the 75data network. 76<p> 77We need to extract <code>classes.dex</code> from the zip archive before 78we can use it. While we have the file available, we might as well perform 79some of the other actions (realignment, optimization, verification) described 80earlier. This raises a new question however: who is responsible for doing 81this, and where do we keep the output? 82 83<h3>Preparation</h3> 84 85<p> 86There are at least three different ways to create a "prepared" DEX file, 87sometimes known as "ODEX" (for Optimized DEX): 88<ol> 89 <li>The VM does it "just in time". The output goes into a special 90 <code>dalvik-cache</code> directory. This works on the desktop and 91 engineering-only device builds where the permissions on the 92 <code>dalvik-cache</code> directory are not restricted. On production 93 devices, this is not allowed. 94 <li>The system installer does it when an application is first added. 95 It has the privileges required to write to <code>dalvik-cache</code>. 96 <li>The build system does it ahead of time. The relevant <code>jar</code> 97 / <code>apk</code> files are present, but the <code>classes.dex</code> 98 is stripped out. The optimized DEX is stored next to the original 99 zip archive, not in <code>dalvik-cache</code>, and is part of the 100 system image. 101</ol> 102<p> 103The <code>dalvik-cache</code> directory is more accurately 104<code>$ANDROID_DATA/data/dalvik-cache</code>. The files inside it have 105names derived from the full path of the source DEX. On the device the 106directory is owned by <code>system</code> / <code>system</code> 107and has 0771 permissions, and the optimized DEX files stored there are 108owned by <code>system</code> and the 109application's group, with 0644 permissions. DRM-locked applications will 110use 640 permissions to prevent other user applications from examining them. 111The bottom line is that you can read your own DEX file and those of most 112other applications, but you cannot create, modify, or remove them. 113<p> 114Preparation of the DEX file for the "just in time" and "system installer" 115approaches proceeds in three steps: 116<p> 117First, the dalvik-cache file is created. This must be done in a process 118with appropriate privileges, so for the "system installer" case this is 119done within <code>installd</code>, which runs as root. 120<p> 121Second, the <code>classes.dex</code> entry is extracted from the the zip 122archive. A small amount of space is left at the start of the file for 123the ODEX header. 124<p> 125Third, the file is memory-mapped for easy access and tweaked for use on 126the current system. This includes byte-swapping and structure realigning, 127but no meaningful changes to the DEX file. We also do some basic 128structure checks, such as ensuring that file offsets and data indices 129fall within valid ranges. 130<p> 131The build system uses a hairy process that involves starting the 132emulator, forcing just-in-time optimization of all relevant DEX files, 133and then extracting the results from <code>dalvik-cache</code>. The 134reasons for doing this, rather than using a tool that runs on the desktop, 135will become more apparent when the optimizations are explained. 136<p> 137Once the code is byte-swapped and aligned, we're ready to go. We append 138some pre-computed data, fill in the ODEX header at the start of the file, 139and start executing. (The header is filled in last, so that we don't 140try to use a partial file.) If we're interested in verification and 141optimization, however, we need to insert a step after the initial prep. 142 143<h3>dexopt</h3> 144 145<p> 146We want to verify and optimize all of the classes in the DEX file. The 147easiest and safest way to do this is to load all of the classes into 148the VM and run through them. Anything that fails to load is simply not 149verified or optimized. Unfortunately, this can cause allocation of some 150resources that are difficult to release (e.g. loading of native shared 151libraries), so we don't want to do it in the same virtual machine that 152we're running applications in. 153<p> 154The solution is to invoke a program called <code>dexopt</code>, which 155is really just a back door into the VM. It performs an abbreviated VM 156initialization, loads zero or more DEX files from the bootstrap class 157path, and then sets about verifying and optimizing whatever it can from 158the target DEX. On completion, the process exits, freeing all resources. 159<p> 160It is possible for multiple VMs to want the same DEX file at the same 161time. File locking is used to ensure that dexopt is only run once. 162 163 164<h2>Verification</h2> 165 166<p> 167The bytecode verification process involves scanning through the instructions 168in every method in every class in a DEX file. The goal is to identify 169illegal instruction sequences so that we don't have to check for them at 170run time. Many of the computations involved are also necessary for "exact" 171garbage collection. See 172<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more 173information. 174<p> 175For performance reasons, the optimizer (described in the next section) 176assumes that the verifier has run successfully, and makes some potentially 177unsafe assumptions. By default, Dalvik insists upon verifying all classes, 178and only optimizes classes that have been verified. If you want to 179disable the verifier, you can use command-line flags to do so. See also 180<a href="embedded-vm-control.html"> Controlling the Embedded VM</a> 181for instructions on controlling these 182features within the Android application framework. 183<p> 184Reporting of verification failures is a tricky issue. For example, 185calling a package-scope method on a class in a different package is 186illegal and will be caught by the verifier. We don't necessarily want 187to report it during verification though -- we actually want to throw 188an exception when the method call is attempted. Checking the access 189flags on every method call is expensive though. The 190<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document 191addresses this issue. 192<p> 193Classes that have been verified successfully have a flag set in the ODEX. 194They will not be re-verified when loaded. The Linux access permissions 195are expected to prevent tampering; if you can get around those, installing 196faulty bytecode is far from the easiest line of attack. The ODEX file has 197a 32-bit checksum, but that's chiefly present as a quick check for 198corrupted data. 199 200 201<h2>Optimization</h2> 202 203<p> 204Virtual machine interpreters typically perform certain optimizations the 205first time a piece of code is used. Constant pool references are replaced 206with pointers to internal data structures, operations that always succeed 207or always work a certain way are replaced with simpler forms. Some of 208these require information only available at runtime, others can be inferred 209statically when certain assumptions are made. 210<p> 211The Dalvik optimizer does the following: 212<ul> 213 <li>For virtual method calls, replace the method index with a 214 vtable index. 215 <li>For instance field get/put, replace the field index with 216 a byte offset. Also, merge the boolean / byte / char / short 217 variants into a single 32-bit form (less code in the interpreter 218 means more room in the CPU I-cache). 219 <li>Replace a handful of high-volume calls, like String.length(), 220 with "inline" replacements. This skips the usual method call 221 overhead, directly switching from the interpreter to a native 222 implementation. 223 <li>Prune empty methods. The simplest example is 224 <code>Object.<init></code>, which does nothing, but must be 225 called whenever any object is allocated. The instruction is 226 replaced with a new version that acts as a no-op unless a debugger 227 is attached. 228 <li>Append pre-computed data. For example, the VM wants to have a 229 hash table for lookups on class name. Instead of computing this 230 when the DEX file is loaded, we can compute it now, saving heap 231 space and computation time in every VM where the DEX is loaded. 232</ul> 233 234<p> 235All of the instruction modifications involve replacing the opcode with 236one not defined by the Dalvik specification. This allows us to freely 237mix optimized and unoptimized instructions. The set of optimized 238instructions, and their exact representation, is tied closely to the VM 239version. 240<p> 241Most of the optimizations are obvious "wins". The use of raw indices 242and offsets not only allows us to execute more quickly, we can also 243skip the initial symbolic resolution. Pre-computation eats up 244disk space, and so must be done in moderation. 245<p> 246There are a couple of potential sources of trouble with these 247optimizations. First, vtable indices and byte offsets are subject to 248change if the VM is updated. Second, if a superclass is in a different 249DEX, and that other DEX is updated, we need to ensure that our optimized 250indices and offsets are updated as well. A similar but more subtle 251problem emerges when user-defined class loaders are employed: the class 252we actually call may not be the one we expected to call. 253<p>These problems are addressed with dependency lists and some limitations 254on what can be optimized. 255 256 257<h2>Dependencies and Limitations</h2> 258 259<p> 260The optimized DEX file includes a list of dependencies on other DEX files, 261plus the CRC-32 and modification date from the originating 262<code>classes.dex</code> zip file entry. The dependency list includes the 263full path to the <code>dalvik-cache</code> file, and the file's SHA-1 264signature. The timestamps of files on the device are unreliable and 265not used. The dependency area also includes the VM version number. 266<p> 267An optimized DEX is dependent upon all of the DEX files in the bootstrap 268class path. DEX files that are part of the bootstrap class path depend 269upon the DEX files that appeared earlier. To ensure that nothing outside 270the dependent DEX files is available, <code>dexopt</code> only loads the 271bootstrap classes. References to classes in other DEX files fail, which 272causes class loading and/or verification to fail, and classes with 273external dependencies are simply not optimized. 274<p> 275This means that splitting code out into many separate DEX files has a 276disadvantage: virtual method calls and instance field lookups between 277non-boot DEX files can't be optimized. Because verification is pass/fail 278with class granularity, no method in a class that has any reliance on 279classes in external DEX files can be optimized. This may be a bit 280heavy-handed, but it's the only way to guarantee that nothing breaks 281when individual pieces are updated. 282<p> 283Another negative consequence: any change to a bootstrap DEX will result 284in rejection of all optimized DEX files. This makes it hard to keep 285system updates small. 286<p> 287Despite our caution, there is still a possibility that a class in a DEX 288file loaded by a user-defined class loader could ask for a bootstrap class 289(say, String) and be given a different class with the same name. If a 290class in the DEX file being processed has the same name as a class in the 291bootstrap DEX files, the class will be flagged as ambiguous and references 292to it will not be resolved during verification / optimization. The class 293linking code in the VM does additional checks to plug another hole; 294see the verbose description in the VM sources for details (vm/oo/Class.c). 295<p> 296If one of the dependencies is updated, we need to re-verify and 297re-optimize the DEX file. If we can do a just-in-time <code>dexopt</code> 298invocation, this is easy. If we have to rely on the installer daemon, or 299the DEX was shipped only in ODEX, then the VM has to reject the DEX. 300<p> 301The output of <code>dexopt</code> is byte-swapped and struct-aligned 302for the host, and contains indices and offsets that are highly VM-specific 303(both version-wise and platform-wise). For this reason it's tricky to 304write a version of <code>dexopt</code> that runs on the desktop but 305generates output suitable for a particular device. The safest way to 306invoke it is on the target device, or on an emulator for that device. 307 308 309<h2>Generated DEX</h2> 310 311<p> 312Some languages and frameworks rely on the ability to generate bytecode 313and execute it. The rather heavy <code>dexopt</code> verification and 314optimization model doesn't work well with that. 315<p> 316We intend to support this in a future release, but the exact method is 317to be determined. We may allow individual classes to be added or whole 318DEX files; may allow Java bytecode or Dalvik bytecode in instructions; 319may perform the usual set of optimizations, or use a separate interpreter 320that performs on-first-use optimizations directly on the bytecode (which 321won't be mapped read-only, since it's locally defined). 322 323<address>Copyright © 2008 The Android Open Source Project</address> 324 325</body> 326</html> 327