1Dalvik "mterp" README
2
3NOTE: Find rebuilding instructions at the bottom of this file.
4
5
6==== Overview ====
7
8This is the source code for the Dalvik interpreter.  The core of the
9original version was implemented as a single C function, but to improve
10performance we rewrote it in assembly.  To make this and future assembly
11ports easier and less error-prone, we used a modular approach that allows
12development of platform-specific code one opcode at a time.
13
14The original all-in-one-function C version still exists as the "portable"
15interpreter, and is generated using the same sources and tools that
16generate the platform-specific versions.  One form of the portable
17interpreter includes support for profiling and debugging features, and
18is included even if we have a platform-optimized implementation.
19
20Every configuration has a "config-*" file that controls how the sources
21are generated.  The sources are written into the "out" directory, where
22they are picked up by the Android build system.
23
24The best way to become familiar with the interpreter is to look at the
25generated files in the "out" directory, such as out/InterpC-portstd.c,
26rather than trying to look at the various component pieces in (say)
27armv5te.
28
29
30==== Platform-specific source generation ====
31
32The architecture-specific config files determine what goes into two
33generated output files (InterpC-<arch>.c, InterpAsm-<arch>.S).  The goal is
34to make it easy to swap C and assembly sources during initial development
35and testing, and to provide a way to use architecture-specific versions of
36some operations (e.g. making use of PLD instructions on ARMv6 or avoiding
37CLZ on ARMv4T).
38
39Two basic assumptions are made about the operation of the interpreter:
40
41 - The assembly version uses fixed-size areas for each instruction
42   (e.g. 64 bytes).  "Overflow" code is tacked on to the end.
43 - When a C implementation is desired, the assembly version packs all
44   local state into a "glue" struct, and passes that into the C function.
45   Updates to the state are pulled out of the "glue" on return.
46
47The "arch" value should indicate an architecture family with common
48programming characteristics, so "armv5te" would work for all ARMv5TE CPUs,
49but might not be backward- or forward-compatible.  (We *might* want to
50specify the ABI model as well, e.g. "armv5te-eabi", but currently that adds
51verbosity without value.)
52
53
54==== Config file format ====
55
56The config files are parsed from top to bottom.  Each line in the file
57may be blank, hold a comment (line starts with '#'), or be a command.
58
59The commands are:
60
61  handler-size <bytes>
62
63    Specify the size of the assembly region, in bytes.  On most platforms
64    this will need to be a power of 2.
65
66  import <filename>
67
68    The specified file is included immediately, in its entirety.  No
69    substitutions are performed.  ".c" and ".h" files are copied to the
70    C output, ".S" files are copied to the asm output.
71
72  asm-stub <filename>
73
74    The named file will be included whenever an assembly "stub" is needed.
75    Text substitution is performed on the opcode name.
76
77  op-start <directory>
78
79    Indicates the start of the opcode list.  Must precede any "op"
80    commands.  The specified directory is the default location to pull
81    instruction files from.
82
83  op <opcode> <directory>
84
85    Can only appear after "op-start" and before "op-end".  Overrides the
86    default source file location of the specified opcode.  The opcode
87    definition will come from the specified file, e.g. "op OP_NOP armv5te"
88    will load from "armv5te/OP_NOP.S".  A substitution dictionary will be
89    applied (see below).
90
91  op-end
92
93    Indicates the end of the opcode list.  All 256 opcodes are emitted
94    when this is seen, followed by any code that didn't fit inside the
95    fixed-size instruction handler space.
96
97
98The order of "op" directives is not significant; the generation tool will
99extract ordering info from the VM sources.
100
101Typically the form in which most opcodes currently exist is used in
102the "op-start" directive.  For a new port you would start with "c",
103and add architecture-specific "op" entries as you write instructions.
104When complete it will default to the target architecture, and you insert
105"c" ops to stub out platform-specific code.
106
107For the <directory> specified in the "op" command, the "c" directory
108is special in two ways: (1) the sources are assumed to be C code, and
109will be inserted into the generated C file; (2) when a C implementation
110is emitted, a "glue stub" is emitted in the assembly source file.
111(The generator script always emits 256 assembly instructions, unless
112"asm-stub" was left blank, in which case it only emits some labels.)
113
114
115==== Instruction file format ====
116
117The assembly instruction files are simply fragments of assembly sources.
118The starting label will be provided by the generation tool, as will
119declarations for the segment type and alignment.  The expected target
120assembler is GNU "as", but others will work (may require fiddling with
121some of the pseudo-ops emitted by the generation tool).
122
123The C files do a bunch of fancy things with macros in an attempt to share
124code with the portable interpreter.  (This is expected to be reduced in
125the future.)
126
127A substitution dictionary is applied to all opcode fragments as they are
128appended to the output.  Substitutions can look like "$value" or "${value}".
129
130The dictionary always includes:
131
132  $opcode - opcode name, e.g. "OP_NOP"
133  $opnum - opcode number, e.g. 0 for OP_NOP
134  $handler_size_bytes - max size of an instruction handler, in bytes
135  $handler_size_bits - max size of an instruction handler, log 2
136
137Both C and assembly sources will be passed through the C pre-processor,
138so you can take advantage of C-style comments and preprocessor directives
139like "#define".
140
141Some generator operations are available.
142
143  %include "filename" [subst-dict]
144
145    Includes the file, which should look like "armv5te/OP_NOP.S".  You can
146    specify values for the substitution dictionary, using standard Python
147    syntax.  For example, this:
148      %include "armv5te/unop.S" {"result":"r1"}
149    would insert "armv5te/unop.S" at the current file position, replacing
150    occurrences of "$result" with "r1".
151
152  %default <subst-dict>
153
154    Specify default substitution dictionary values, using standard Python
155    syntax.  Useful if you want to have a "base" version and variants.
156
157  %break
158
159    Identifies the split between the main portion of the instruction
160    handler (which must fit in "handler-size" bytes) and the "sister"
161    code, which is appended to the end of the instruction handler block.
162
163  %verify "message"
164
165    Leave a note to yourself about what needs to be tested.  (This may
166    turn into something more interesting someday; for now, it just gets
167    stripped out before the output is generated.)
168
169The generation tool does *not* print a warning if your instructions
170exceed "handler-size", but the VM will abort on startup if it detects an
171oversized handler.  On architectures with fixed-width instructions this
172is easy to work with, on others this you will need to count bytes.
173
174
175==== Using C constants from assembly sources ====
176
177The file "common/asm-constants.h" has some definitions for constant
178values, structure sizes, and struct member offsets.  The format is fairly
179restricted, as simple macros are used to massage it for use with both C
180(where it is verified) and assembly (where the definitions are used).
181
182If a constant in the file becomes out of sync, the VM will log an error
183message and abort during startup.
184
185
186==== Development tips ====
187
188If you need to debug the initial piece of an opcode handler, and your
189debug code expands it beyond the handler size limit, you can insert a
190generic header at the top:
191
192    b       ${opcode}_start
193%break
194${opcode}_start:
195
196If you already have a %break, it's okay to leave it in place -- the second
197%break is ignored.
198
199
200==== Rebuilding ====
201
202If you change any of the source file fragments, you need to rebuild the
203combined source files in the "out" directory.  Make sure the files in
204"out" are editable, then:
205
206    $ cd mterp
207    $ ./rebuild.sh
208
209As of this writing, this requires Python 2.5. You may see inscrutible
210error messages or just general failure if you have a different version
211of Python installed.
212
213The ultimate goal is to have the build system generate the necessary
214output files without requiring this separate step, but we're not yet
215ready to require Python in the build.
216