1<html>
2<head>
3<title>Dalvik Bytecode Verifier Notes</title>
4</head>
5
6<body>
7<h1>Dalvik Bytecode Verifier Notes</h1>
8
9<p>
10The bytecode verifier in the Dalvik VM attempts to provide the same sorts
11of checks and guarantees that other popular virtual machines do.  We
12perform generally the same set of checks as are described in _The Java
13Virtual Machine Specification, Second Edition_, including the updates
14planned for the Third Edition.
15
16<p>
17Verification can be enabled for all classes, disabled for all, or enabled
18only for "remote" (non-bootstrap) classes.  It should be performed for any
19class that will be processed with the DEX optimizer, and in fact the
20default VM behavior is to only optimize verified classes.
21
22
23<h2>Why Verify?</h2>
24
25<p>
26The verification process adds additional time to the build and to
27the installation of new applications.  It's fairly quick for app-sized
28DEX files, but rather slow for the big "core" and "framework" files.
29Why do it all, when our system relies on UNIX processes for security?
30<p>
31<ol>
32    <li>Optimizations.  The interpreter can ignore a lot of potential
33    error cases because the verifier guarantees that they are impossible.
34    Also, we can optimize the DEX file more aggressively if we start
35    with a stronger set of assumptions about the bytecode.
36    <li>"Precise" GC.  The work peformed during verification has significant
37    overlap with the work required to compute register use maps for
38    type-precise GC.
39    <li>Intra-application security.  If an app wants to download bits
40    of interpreted code over the network and execute them, it can safely
41    do so using well-established security mechanisms.
42    <li>3rd party app failure analysis.  We have no way to control the
43    tools and post-processing utilities that external developers employ,
44    so when we get bug reports with a weird exception or native crash
45    it's very helpful to start with the assumption that the bytecode
46    is valid.
47</ol>
48<p>
49It's also a convenient framework to deal with certain situations, notably
50replacement of instructions that access volatile 64-bit fields with
51more rigorous versions that guarantee atomicity.
52
53
54<h2>Verifier Differences</h2>
55
56<p>
57There are a few checks that the Dalvik bytecode verifier does not perform,
58because they're not relevant.  For example:
59<ul>
60    <li>Type restrictions on constant pool references are not enforced,
61    because Dalvik does not have a pool of typed constants.  (Dalvik
62    uses a simple index into type-specific pools.)
63    <li>Verification of the operand stack size is not performed, because
64    Dalvik does not have an operand stack.
65    <li>Limitations on <code>jsr</code> and <code>ret</code> do not apply,
66    because Dalvik doesn't support subroutines.
67</ul>
68
69In some cases they are implemented differently, e.g.:
70<ul>
71    <li>In a conventional VM, backward branches and exceptions are
72    forbidden when a local variable holds an uninitialized reference.  The
73    restriction was changed to mark registers as invalid when they hold
74    references to the uninitialized result of a previous invocation of the
75    same <code>new-instance</code> instruction.
76    This solves the same problem -- trickery potentially allowing
77    uninitialized objects to slip past the verifier -- without unduly
78    limiting branches.
79</ul>
80
81There are also some new ones, such as:
82<ul>
83    <li>The <code>move-exception</code> instruction can only appear as
84    the first instruction in an exception handler.
85    <li>The <code>move-result*</code> instructions can only appear
86    immediately after an appropriate <code>invoke-*</code>
87    or <code>filled-new-array</code> instruction.
88</ul>
89
90<p>
91The VM is permitted but not required to enforce "structured locking"
92constraints, which are designed to ensure that, when a method returns, all
93monitors locked by the method have been unlocked an equal number of times.
94This is not currently implemented.
95
96<p>
97The Dalvik verifier is more restrictive than other VMs in one area:
98type safety on sub-32-bit integer widths.  These additional restrictions
99should make it impossible to, say, pass a value outside the range
100[-128, 127] to a function that takes a <code>byte</code> as an argument.
101
102
103<h2>Monitor Verification</h2>
104
105<p>
106If a method locks an object with a <code>synchronized</code> statement, the
107object must be unlocked before the method returns.  At the bytecode level,
108this means the method must execute a matching <code>monitor-exit</code>
109for every <code>monitor-enter</code> instruction, whether the function
110completes normally or abnormally.  The bytecode verifier optionally
111enforces this.
112
113<p>
114The verifier uses a fairly simple-minded model.  If you enter a monitor
115held in register N, you can exit the monitor using register N or any
116subsequently-made copies of register N.  The verifier does not attempt
117to identify previously-made copies, track loads and stores through
118fields, or recognize identical constant values (for example, the result
119values from two <code>const-class</code> instructions on the same class
120will be the same reference, but the verifier doesn't recognize this).
121
122<p>
123Further, you may only exit the monitor most recently entered.  "Hand
124over hand" locking techniques, e.g. "lock A; lock B; unlock A; unlock B",
125are not allowed.
126
127<p>
128This means that there are a number of situations in which the verifier
129will throw an exception on code that would execute correctly at run time.
130This is not expected to be an issue for compiler-generated bytecode.
131
132<p>
133For implementation convenience, the maximum nesting depth of
134<code>synchronized</code> statements has been set to 32.  This is not
135a limitation on the recursion count.  The only way to trip this would be
136to have a single method with more than 32 nested <code>synchronized</code>
137statements, something that is unlikely to occur.
138
139
140<h2>Verification Failures</h2>
141
142<p>
143The verifier may reject a class immediately, or it may defer throwing
144an exception until the code is actually used.  For example, if a class
145attempts to perform an illegal access on a field, the VM should throw
146an IllegalAccessError the first time the instruction is encountered.
147On the other hand, if a class contains an invalid bytecode, it should be
148rejected immediately with a VerifyError.
149
150<p>
151Immediate VerifyErrors are accompanied by detailed, if somewhat cryptic,
152information in the log file.  From this it's possible to determine the
153exact instruction that failed, and the reason for the failure.
154
155<p>
156It's a bit tricky to implement deferred verification errors in Dalvik.
157A few approaches were considered:
158
159<ol>
160<li>We could replace the invalid field access instruction with a special
161instruction that generates an illegal access error, and allow class
162verification to complete successfully.  This type of verification must
163be deferred to first class load, rather than be performed ahead of time
164during DEX optimization, because some failures will depend on the current
165execution environment (e.g. not all classes are available at dexopt time).
166At that point the bytecode instructions are mapped read-only during
167verification, so rewriting them isn't possible.
168</li>
169
170<li>We can perform the access checks when the field/method/class is
171resolved.  In a typical VM implementation we would do the check when the
172entry is resolved in the context of the current classfile, but our DEX
173files combine multiple classfiles together, merging the field/method/class
174resolution results into a single large table.  Once one class successfully
175resolves the field, every other class in the same DEX file would be able
176to access the field.  This is incorrect.
177</li>
178
179<li>Perform the access checks on every field/method/class access.
180This adds significant overhead.  This is mitigated somewhat by the DEX
181optimizer, which will convert many field/method/class accesses into a
182simpler form after performing the access check.  However, not all accesses
183can be optimized (e.g. accesses to classes unknown at dexopt time),
184and we don't currently have an optimized form of certain instructions
185(notably static field operations).
186</li>
187</ol>
188
189<p>
190In early versions of Dalvik (as found in Android 1.6 and earlier), the verifier
191simply regarded all problems as immediately fatal.  This generally worked,
192but in some cases the VM was rejecting classes because of bits of code
193that were never used.  The VerifyError itself was sometimes difficult to
194decipher, because it was thrown during verification rather than at the
195point where the problem was first noticed during execution.
196<p>
197The current version uses a variation of approach #1.  The dexopt
198command works the way it did before, leaving the code untouched and
199flagging fully-correct classes as "pre-verified".  When the VM loads a
200class that didn't pass pre-verification, the verifier is invoked.  If a
201"deferrable" problem is detected, a modifiable copy of the instructions
202in the problematic method is made.  In that copy, the troubled instruction
203is replaced with an "always throw" opcode, and verification continues.
204
205<p>
206In the example used earlier, an attempt to read from an inaccessible
207field would result in the "field get" instruction being replaced by
208"always throw IllegalAccessError on field X".  Creating copies of method
209bodies requires additional heap space, but since this affects very few
210methods overall the memory impact should be minor.
211
212<p>
213<address>Copyright &copy; 2008 The Android Open Source Project</address>
214
215</body>
216</html>
217