Schemas.md revision ea592296b8d56c10c16c2b410584a0a42f5eae2d
1# Writing a schema
2
3The syntax of the schema language (aka IDL, Interface Definition
4Language) should look quite familiar to users of any of the C family of
5languages, and also to users of other IDLs. Let's look at an example
6first:
7
8    // example IDL file
9
10    namespace MyGame;
11
12    enum Color : byte { Red = 1, Green, Blue }
13
14    union Any { Monster, Weapon, Pickup }
15
16    struct Vec3 {
17      x:float;
18      y:float;
19      z:float;
20    }
21
22    table Monster {
23      pos:Vec3;
24      mana:short = 150;
25      hp:short = 100;
26      name:string;
27      friendly:bool = false (deprecated, priority: 1);
28      inventory:[ubyte];
29      color:Color = Blue;
30      test:Any;
31    }
32
33    root_type Monster;
34
35(Weapon & Pickup not defined as part of this example).
36
37### Tables
38
39Tables are the main way of defining objects in FlatBuffers, and consist
40of a name (here `Monster`) and a list of fields. Each field has a name,
41a type, and optionally a default value (if omitted, it defaults to 0 /
42NULL).
43
44Each field is optional: It does not have to appear in the wire
45representation, and you can choose to omit fields for each individual
46object. As a result, you have the flexibility to add fields without fear of
47bloating your data. This design is also FlatBuffer's mechanism for forward
48and backwards compatibility. Note that:
49
50-   You can add new fields in the schema ONLY at the end of a table
51    definition. Older data will still
52    read correctly, and give you the default value when read. Older code
53    will simply ignore the new field.
54    If you want to have flexibility to use any order for fields in your
55    schema, you can manually assign ids (much like Protocol Buffers),
56    see the `id` attribute below.
57
58-   You cannot delete fields you don't use anymore from the schema,
59    but you can simply
60    stop writing them into your data for almost the same effect.
61    Additionally you can mark them as `deprecated` as in the example
62    above, which will prevent the generation of accessors in the
63    generated C++, as a way to enforce the field not being used any more.
64    (careful: this may break code!).
65
66-   You may change field names and table names, if you're ok with your
67    code breaking until you've renamed them there too.
68
69
70
71### Structs
72
73Similar to a table, only now none of the fields are optional (so no defaults
74either), and fields may not be added or be deprecated. Structs may only contain
75scalars or other structs. Use this for
76simple objects where you are very sure no changes will ever be made
77(as quite clear in the example `Vec3`). Structs use less memory than
78tables and are even faster to access (they are always stored in-line in their
79parent object, and use no virtual table).
80
81### Types
82
83Builtin scalar types are:
84
85-   8 bit: `byte ubyte bool`
86
87-   16 bit: `short ushort`
88
89-   32 bit: `int uint float`
90
91-   64 bit: `long ulong double`
92
93-   Vector of any other type (denoted with `[type]`). Nesting vectors
94    is not supported, instead you can wrap the inner vector in a table.
95
96-   `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
97    or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
98
99-   References to other tables or structs, enums or unions (see
100    below).
101
102You can't change types of fields once they're used, with the exception
103of same-size data where a `reinterpret_cast` would give you a desirable result,
104e.g. you could change a `uint` to an `int` if no values in current data use the
105high bit yet.
106
107### (Default) Values
108
109Values are a sequence of digits, optionally followed by a `.` and more digits
110for float constants, and optionally prefixed by a `-`. Non-scalar defaults are
111currently not supported (always NULL).
112
113You generally do not want to change default values after they're initially
114defined. Fields that have the default value are not actually stored in the
115serialized data but are generated in code, so when you change the default, you'd
116now get a different value than from code generated from an older version of
117the schema. There are situations however where this may be
118desirable, especially if you can ensure a simultaneous rebuild of
119all code.
120
121### Enums
122
123Define a sequence of named constants, each with a given value, or
124increasing by one from the previous one. The default first value
125is `0`. As you can see in the enum declaration, you specify the underlying
126integral type of the enum with `:` (in this case `byte`), which then determines
127the type of any fields declared with this enum type.
128
129### Unions
130
131Unions share a lot of properties with enums, but instead of new names
132for constants, you use names of tables. You can then declare
133a union field which can hold a reference to any of those types, and
134additionally a hidden field with the suffix `_type` is generated that
135holds the corresponding enum value, allowing you to know which type to
136cast to at runtime.
137
138### Namespaces
139
140These will generate the corresponding namespace in C++ for all helper
141code, and packages in Java. You can use `.` to specify nested namespaces /
142packages.
143
144### Includes
145
146You can include other schemas files in your current one, e.g.:
147
148    include "mydefinitions.fbs";
149
150This makes it easier to refer to types defined elsewhere. `include`
151automatically ensures each file is parsed just once, even when referred to
152more than once.
153
154When using the `flatc` compiler to generate code for schema definitions,
155only definitions in the current file will be generated, not those from the
156included files (those you still generate separately).
157
158### Root type
159
160This declares what you consider to be the root table (or struct) of the
161serialized data. This is particular important for parsing JSON data,
162which doesn't include object type information.
163
164### File identification and extension
165
166Typically, a FlatBuffer binary buffer is not self-describing, i.e. it
167needs you to know its schema to parse it correctly. But if you
168want to use a FlatBuffer as a file format, it would be convenient
169to be able to have a "magic number" in there, like most file formats
170have, to be able to do a sanity check to see if you're reading the
171kind of file you're expecting.
172
173Now, you can always prefix a FlatBuffer with your own file header,
174but FlatBuffers has a built-in way to add an identifier to a
175FlatBuffer that takes up minimal space, and keeps the buffer
176compatible with buffers that don't have such an identifier.
177
178You can specify in a schema, similar to `root_type`, that you intend
179for this type of FlatBuffer to be used as a file format:
180
181    file_identifier "MYFI";
182
183Identifiers must always be exactly 4 characters long. These 4 characters
184will end up as bytes at offsets 4-7 (inclusive) in the buffer.
185
186For any schema that has such an identifier, `flatc` will automatically
187add the identifier to any binaries it generates (with `-b`),
188and generated calls like `FinishMonsterBuffer` also add the identifier.
189If you have specified an identifier and wish to generate a buffer
190without one, you can always still do so by calling
191`FlatBufferBuilder::Finish` explicitly.
192
193After loading a buffer, you can use a call like
194`MonsterBufferHasIdentifier` to check if the identifier is present.
195
196Additionally, by default `flatc` will output binary files as `.bin`.
197This declaration in the schema will change that to whatever you want:
198
199    file_extension "ext";
200
201### Comments & documentation
202
203May be written as in most C-based languages. Additionally, a triple
204comment (`///`) on a line by itself signals that a comment is documentation
205for whatever is declared on the line after it
206(table/struct/field/enum/union/element), and the comment is output
207in the corresponding C++ code. Multiple such lines per item are allowed.
208
209### Attributes
210
211Attributes may be attached to a declaration, behind a field, or after
212the name of a table/struct/enum/union. These may either have a value or
213not. Some attributes like `deprecated` are understood by the compiler,
214others are simply ignored (like `priority` in the example above), but are
215available to query if you parse the schema at runtime.
216This is useful if you write your own code generators/editors etc., and
217you wish to add additional information specific to your tool (such as a
218help text).
219
220Current understood attributes:
221
222-   `id: n` (on a table field): manually set the field identifier to `n`.
223    If you use this attribute, you must use it on ALL fields of this table,
224    and the numbers must be a contiguous range from 0 onwards.
225    Additionally, since a union type effectively adds two fields, its
226    id must be that of the second field (the first field is the type
227    field and not explicitly declared in the schema).
228    For example, if the last field before the union field had id 6,
229    the union field should have id 8, and the unions type field will
230    implicitly be 7.
231    IDs allow the fields to be placed in any order in the schema.
232    When a new field is added to the schema is must use the next available ID.
233-   `deprecated` (on a field): do not generate accessors for this field
234    anymore, code should stop using this data.
235-   `required` (on a non-scalar table field): this field must always be set.
236    By default, all fields are optional, i.e. may be left out. This is
237    desirable, as it helps with forwards/backwards compatibility, and
238    flexibility of data structures. It is also a burden on the reading code,
239    since for non-scalar fields it requires you to check against NULL and
240    take appropriate action. By specifying this field, you force code that
241    constructs FlatBuffers to ensure this field is initialized, so the reading
242    code may access it directly, without checking for NULL. If the constructing
243    code does not initialize this field, they will get an assert, and also
244    the verifier will fail on buffers that have missing required fields.
245-   `original_order` (on a table): since elements in a table do not need
246    to be stored in any particular order, they are often optimized for
247    space by sorting them to size. This attribute stops that from happening.
248-   `force_align: size` (on a struct): force the alignment of this struct
249    to be something higher than what it is naturally aligned to. Causes
250    these structs to be aligned to that amount inside a buffer, IF that
251    buffer is allocated with that alignment (which is not necessarily
252    the case for buffers accessed directly inside a `FlatBufferBuilder`).
253-   `bit_flags` (on an enum): the values of this field indicate bits,
254    meaning that any value N specified in the schema will end up
255    representing 1<<N, or if you don't specify values at all, you'll get
256    the sequence 1, 2, 4, 8, ...
257-   `nested_flatbuffer: table_name` (on a field): this indicates that the field
258    (which must be a vector of ubyte) contains flatbuffer data, for which the
259    root type is given by `table_name`. The generated code will then produce
260    a convenient accessor for the nested FlatBuffer.
261
262## JSON Parsing
263
264The same parser that parses the schema declarations above is also able
265to parse JSON objects that conform to this schema. So, unlike other JSON
266parsers, this parser is strongly typed, and parses directly into a FlatBuffer
267(see the compiler documentation on how to do this from the command line, or
268the C++ documentation on how to do this at runtime).
269
270Besides needing a schema, there are a few other changes to how it parses
271JSON:
272
273-   It accepts field names with and without quotes, like many JSON parsers
274    already do. It outputs them without quotes as well, though can be made
275    to output them using the `strict_json` flag.
276-   If a field has an enum type, the parser will recognize symbolic enum
277    values (with or without quotes) instead of numbers, e.g.
278    `field: EnumVal`. If a field is of integral type, you can still use
279    symbolic names, but values need to be prefixed with their type and
280    need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums
281    representing flags, you may place multiple inside a string
282    separated by spaces to OR them, e.g.
283    `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`.
284
285When parsing JSON, it recognizes the following escape codes in strings:
286
287-   `\n` - linefeed.
288-   `\t` - tab.
289-   `\r` - carriage return.
290-   `\b` - backspace.
291-   `\f` - form feed.
292-   `\"` - double quote.
293-   `\\` - backslash.
294-   `\/` - forward slash.
295-   `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8
296    representation.
297-   `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is
298     not in the JSON spec (see http://json.org/), but is needed to be able to
299     encode arbitrary binary in strings to text and back without losing
300     information (e.g. the byte 0xFF can't be represented in standard JSON).
301
302It also generates these escape codes back again when generating JSON from a
303binary representation.
304
305## Gotchas
306
307### Schemas and version control
308
309FlatBuffers relies on new field declarations being added at the end, and earlier
310declarations to not be removed, but be marked deprecated when needed. We think
311this is an improvement over the manual number assignment that happens in
312Protocol Buffers (and which is still an option using the `id` attribute
313mentioned above).
314
315One place where this is possibly problematic however is source control. If user
316A adds a field, generates new binary data with this new schema, then tries to
317commit both to source control after user B already committed a new field also,
318and just auto-merges the schema, the binary files are now invalid compared to
319the new schema.
320
321The solution of course is that you should not be generating binary data before
322your schema changes have been committed, ensuring consistency with the rest of
323the world. If this is not practical for you, use explicit field ids, which
324should always generate a merge conflict if two people try to allocate the same
325id.
326
327