Schemas.md revision dddd0865cb161a081afbdfa11919622a49f4141a
1Writing a schema    {#flatbuffers_guide_writing_schema}
2================
3
4The syntax of the schema language (aka IDL, [Interface Definition Language][])
5should look quite familiar to users of any of the C family of
6languages, and also to users of other IDLs. Let's look at an example
7first:
8
9    // example IDL file
10
11    namespace MyGame;
12
13    attribute "priority";
14
15    enum Color : byte { Red = 1, Green, Blue }
16
17    union Any { Monster, Weapon, Pickup }
18
19    struct Vec3 {
20      x:float;
21      y:float;
22      z:float;
23    }
24
25    table Monster {
26      pos:Vec3;
27      mana:short = 150;
28      hp:short = 100;
29      name:string;
30      friendly:bool = false (deprecated, priority: 1);
31      inventory:[ubyte];
32      color:Color = Blue;
33      test:Any;
34    }
35
36    root_type Monster;
37
38(`Weapon` & `Pickup` not defined as part of this example).
39
40### Tables
41
42Tables are the main way of defining objects in FlatBuffers, and consist
43of a name (here `Monster`) and a list of fields. Each field has a name,
44a type, and optionally a default value (if omitted, it defaults to `0` /
45`NULL`).
46
47Each field is optional: It does not have to appear in the wire
48representation, and you can choose to omit fields for each individual
49object. As a result, you have the flexibility to add fields without fear of
50bloating your data. This design is also FlatBuffer's mechanism for forward
51and backwards compatibility. Note that:
52
53-   You can add new fields in the schema ONLY at the end of a table
54    definition. Older data will still
55    read correctly, and give you the default value when read. Older code
56    will simply ignore the new field.
57    If you want to have flexibility to use any order for fields in your
58    schema, you can manually assign ids (much like Protocol Buffers),
59    see the `id` attribute below.
60
61-   You cannot delete fields you don't use anymore from the schema,
62    but you can simply
63    stop writing them into your data for almost the same effect.
64    Additionally you can mark them as `deprecated` as in the example
65    above, which will prevent the generation of accessors in the
66    generated C++, as a way to enforce the field not being used any more.
67    (careful: this may break code!).
68
69-   You may change field names and table names, if you're ok with your
70    code breaking until you've renamed them there too.
71
72See "Schema evolution examples" below for more on this
73topic.
74
75### Structs
76
77Similar to a table, only now none of the fields are optional (so no defaults
78either), and fields may not be added or be deprecated. Structs may only contain
79scalars or other structs. Use this for
80simple objects where you are very sure no changes will ever be made
81(as quite clear in the example `Vec3`). Structs use less memory than
82tables and are even faster to access (they are always stored in-line in their
83parent object, and use no virtual table).
84
85### Types
86
87Built-in scalar types are:
88
89-   8 bit: `byte`, `ubyte`, `bool`
90
91-   16 bit: `short`, `ushort`
92
93-   32 bit: `int`, `uint`, `float`
94
95-   64 bit: `long`, `ulong`, `double`
96
97Built-in non-scalar types:
98
99-   Vector of any other type (denoted with `[type]`). Nesting vectors
100    is not supported, instead you can wrap the inner vector in a table.
101
102-   `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
103    or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
104
105-   References to other tables or structs, enums or unions (see
106    below).
107
108You can't change types of fields once they're used, with the exception
109of same-size data where a `reinterpret_cast` would give you a desirable result,
110e.g. you could change a `uint` to an `int` if no values in current data use the
111high bit yet.
112
113### (Default) Values
114
115Values are a sequence of digits. Values may be optionally followed by a decimal
116point (`.`) and more digits, for float constants, or optionally prefixed by
117a `-`. Floats may also be in scientific notation; optionally ending with an `e`
118or `E`, followed by a `+` or `-` and more digits.
119
120Only scalar values can have defaults, non-scalar (string/vector/table) fields
121default to `NULL` when not present.
122
123You generally do not want to change default values after they're initially
124defined. Fields that have the default value are not actually stored in the
125serialized data (see also Gotchas below) but are generated in code,
126so when you change the default, you'd
127now get a different value than from code generated from an older version of
128the schema. There are situations, however, where this may be
129desirable, especially if you can ensure a simultaneous rebuild of
130all code.
131
132### Enums
133
134Define a sequence of named constants, each with a given value, or
135increasing by one from the previous one. The default first value
136is `0`. As you can see in the enum declaration, you specify the underlying
137integral type of the enum with `:` (in this case `byte`), which then determines
138the type of any fields declared with this enum type.
139
140Typically, enum values should only ever be added, never removed (there is no
141deprecation for enums). This requires code to handle forwards compatibility
142itself, by handling unknown enum values.
143
144### Unions
145
146Unions share a lot of properties with enums, but instead of new names
147for constants, you use names of tables. You can then declare
148a union field, which can hold a reference to any of those types, and
149additionally a hidden field with the suffix `_type` is generated that
150holds the corresponding enum value, allowing you to know which type to
151cast to at runtime.
152
153Unions are a good way to be able to send multiple message types as a FlatBuffer.
154Note that because a union field is really two fields, it must always be
155part of a table, it cannot be the root of a FlatBuffer by itself.
156
157If you have a need to distinguish between different FlatBuffers in a more
158open-ended way, for example for use as files, see the file identification
159feature below.
160
161There is an experimental support only in C++ for a vector of unions
162(and types). In the example IDL file above, use [Any] to add a
163vector of Any to Monster table.
164
165### Namespaces
166
167These will generate the corresponding namespace in C++ for all helper
168code, and packages in Java. You can use `.` to specify nested namespaces /
169packages.
170
171### Includes
172
173You can include other schemas files in your current one, e.g.:
174
175    include "mydefinitions.fbs";
176
177This makes it easier to refer to types defined elsewhere. `include`
178automatically ensures each file is parsed just once, even when referred to
179more than once.
180
181When using the `flatc` compiler to generate code for schema definitions,
182only definitions in the current file will be generated, not those from the
183included files (those you still generate separately).
184
185### Root type
186
187This declares what you consider to be the root table (or struct) of the
188serialized data. This is particularly important for parsing JSON data,
189which doesn't include object type information.
190
191### File identification and extension
192
193Typically, a FlatBuffer binary buffer is not self-describing, i.e. it
194needs you to know its schema to parse it correctly. But if you
195want to use a FlatBuffer as a file format, it would be convenient
196to be able to have a "magic number" in there, like most file formats
197have, to be able to do a sanity check to see if you're reading the
198kind of file you're expecting.
199
200Now, you can always prefix a FlatBuffer with your own file header,
201but FlatBuffers has a built-in way to add an identifier to a
202FlatBuffer that takes up minimal space, and keeps the buffer
203compatible with buffers that don't have such an identifier.
204
205You can specify in a schema, similar to `root_type`, that you intend
206for this type of FlatBuffer to be used as a file format:
207
208    file_identifier "MYFI";
209
210Identifiers must always be exactly 4 characters long. These 4 characters
211will end up as bytes at offsets 4-7 (inclusive) in the buffer.
212
213For any schema that has such an identifier, `flatc` will automatically
214add the identifier to any binaries it generates (with `-b`),
215and generated calls like `FinishMonsterBuffer` also add the identifier.
216If you have specified an identifier and wish to generate a buffer
217without one, you can always still do so by calling
218`FlatBufferBuilder::Finish` explicitly.
219
220After loading a buffer, you can use a call like
221`MonsterBufferHasIdentifier` to check if the identifier is present.
222
223Note that this is best for open-ended uses such as files. If you simply wanted
224to send one of a set of possible messages over a network for example, you'd
225be better off with a union.
226
227Additionally, by default `flatc` will output binary files as `.bin`.
228This declaration in the schema will change that to whatever you want:
229
230    file_extension "ext";
231
232### RPC interface declarations
233
234You can declare RPC calls in a schema, that define a set of functions
235that take a FlatBuffer as an argument (the request) and return a FlatBuffer
236as the response (both of which must be table types):
237
238    rpc_service MonsterStorage {
239      Store(Monster):StoreResponse;
240      Retrieve(MonsterId):Monster;
241    }
242
243What code this produces and how it is used depends on language and RPC system
244used, there is preliminary support for GRPC through the `--grpc` code generator,
245see `grpc/tests` for an example.
246
247### Comments & documentation
248
249May be written as in most C-based languages. Additionally, a triple
250comment (`///`) on a line by itself signals that a comment is documentation
251for whatever is declared on the line after it
252(table/struct/field/enum/union/element), and the comment is output
253in the corresponding C++ code. Multiple such lines per item are allowed.
254
255### Attributes
256
257Attributes may be attached to a declaration, behind a field, or after
258the name of a table/struct/enum/union. These may either have a value or
259not. Some attributes like `deprecated` are understood by the compiler;
260user defined ones need to be declared with the attribute declaration
261(like `priority` in the example above), and are
262available to query if you parse the schema at runtime.
263This is useful if you write your own code generators/editors etc., and
264you wish to add additional information specific to your tool (such as a
265help text).
266
267Current understood attributes:
268
269-   `id: n` (on a table field): manually set the field identifier to `n`.
270    If you use this attribute, you must use it on ALL fields of this table,
271    and the numbers must be a contiguous range from 0 onwards.
272    Additionally, since a union type effectively adds two fields, its
273    id must be that of the second field (the first field is the type
274    field and not explicitly declared in the schema).
275    For example, if the last field before the union field had id 6,
276    the union field should have id 8, and the unions type field will
277    implicitly be 7.
278    IDs allow the fields to be placed in any order in the schema.
279    When a new field is added to the schema it must use the next available ID.
280-   `deprecated` (on a field): do not generate accessors for this field
281    anymore, code should stop using this data.
282-   `required` (on a non-scalar table field): this field must always be set.
283    By default, all fields are optional, i.e. may be left out. This is
284    desirable, as it helps with forwards/backwards compatibility, and
285    flexibility of data structures. It is also a burden on the reading code,
286    since for non-scalar fields it requires you to check against NULL and
287    take appropriate action. By specifying this field, you force code that
288    constructs FlatBuffers to ensure this field is initialized, so the reading
289    code may access it directly, without checking for NULL. If the constructing
290    code does not initialize this field, they will get an assert, and also
291    the verifier will fail on buffers that have missing required fields.
292-   `force_align: size` (on a struct): force the alignment of this struct
293    to be something higher than what it is naturally aligned to. Causes
294    these structs to be aligned to that amount inside a buffer, IF that
295    buffer is allocated with that alignment (which is not necessarily
296    the case for buffers accessed directly inside a `FlatBufferBuilder`).
297-   `bit_flags` (on an enum): the values of this field indicate bits,
298    meaning that any value N specified in the schema will end up
299    representing 1<<N, or if you don't specify values at all, you'll get
300    the sequence 1, 2, 4, 8, ...
301-   `nested_flatbuffer: "table_name"` (on a field): this indicates that the field
302    (which must be a vector of ubyte) contains flatbuffer data, for which the
303    root type is given by `table_name`. The generated code will then produce
304    a convenient accessor for the nested FlatBuffer.
305-   `flexbuffer` (on a field): this indicates that the field
306    (which must be a vector of ubyte) contains flexbuffer data. The generated
307    code will then produce a convenient accessor for the FlexBuffer root.
308-   `key` (on a field): this field is meant to be used as a key when sorting
309    a vector of the type of table it sits in. Can be used for in-place
310    binary search.
311-   `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose
312    value during JSON parsing is allowed to be a string, which will then be
313    stored as its hash. The value of attribute is the hashing algorithm to
314    use, one of `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`.
315-   `original_order` (on a table): since elements in a table do not need
316    to be stored in any particular order, they are often optimized for
317    space by sorting them to size. This attribute stops that from happening.
318    There should generally not be any reason to use this flag.
319-   'native_*'.  Several attributes have been added to support the [C++ object
320    Based API](@ref flatbuffers_cpp_object_based_api).  All such attributes
321    are prefixed with the term "native_".
322
323
324## JSON Parsing
325
326The same parser that parses the schema declarations above is also able
327to parse JSON objects that conform to this schema. So, unlike other JSON
328parsers, this parser is strongly typed, and parses directly into a FlatBuffer
329(see the compiler documentation on how to do this from the command line, or
330the C++ documentation on how to do this at runtime).
331
332Besides needing a schema, there are a few other changes to how it parses
333JSON:
334
335-   It accepts field names with and without quotes, like many JSON parsers
336    already do. It outputs them without quotes as well, though can be made
337    to output them using the `strict_json` flag.
338-   If a field has an enum type, the parser will recognize symbolic enum
339    values (with or without quotes) instead of numbers, e.g.
340    `field: EnumVal`. If a field is of integral type, you can still use
341    symbolic names, but values need to be prefixed with their type and
342    need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums
343    representing flags, you may place multiple inside a string
344    separated by spaces to OR them, e.g.
345    `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`.
346-   Similarly, for unions, these need to specified with two fields much like
347    you do when serializing from code. E.g. for a field `foo`, you must
348    add a field `foo_type: FooOne` right before the `foo` field, where
349    `FooOne` would be the table out of the union you want to use.
350-   A field that has the value `null` (e.g. `field: null`) is intended to
351    have the default value for that field (thus has the same effect as if
352    that field wasn't specified at all).
353-   It has some built in conversion functions, so you can write for example
354    `rad(180)` where ever you'd normally write `3.14159`.
355    Currently supports the following functions: `rad`, `deg`, `cos`, `sin`,
356    `tan`, `acos`, `asin`, `atan`.
357
358When parsing JSON, it recognizes the following escape codes in strings:
359
360-   `\n` - linefeed.
361-   `\t` - tab.
362-   `\r` - carriage return.
363-   `\b` - backspace.
364-   `\f` - form feed.
365-   `\"` - double quote.
366-   `\\` - backslash.
367-   `\/` - forward slash.
368-   `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8
369    representation.
370-   `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is
371     not in the JSON spec (see http://json.org/), but is needed to be able to
372     encode arbitrary binary in strings to text and back without losing
373     information (e.g. the byte 0xFF can't be represented in standard JSON).
374
375It also generates these escape codes back again when generating JSON from a
376binary representation.
377
378## Guidelines
379
380### Efficiency
381
382FlatBuffers is all about efficiency, but to realize that efficiency you
383require an efficient schema. There are usually multiple choices on
384how to represent data that have vastly different size characteristics.
385
386It is very common nowadays to represent any kind of data as dictionaries
387(as in e.g. JSON), because of its flexibility and extensibility. While
388it is possible to emulate this in FlatBuffers (as a vector
389of tables with key and value(s)), this is a bad match for a strongly
390typed system like FlatBuffers, leading to relatively large binaries.
391FlatBuffer tables are more flexible than classes/structs in most systems,
392since having a large number of fields only few of which are actually
393used is still efficient. You should thus try to organize your data
394as much as possible such that you can use tables where you might be
395tempted to use a dictionary.
396
397Similarly, strings as values should only be used when they are
398truely open-ended. If you can, always use an enum instead.
399
400FlatBuffers doesn't have inheritance, so the way to represent a set
401of related data structures is a union. Unions do have a cost however,
402so an alternative to a union is to have a single table that has
403all the fields of all the data structures you are trying to
404represent, if they are relatively similar / share many fields.
405Again, this is efficient because optional fields are cheap.
406
407FlatBuffers supports the full range of integer sizes, so try to pick
408the smallest size needed, rather than defaulting to int/long.
409
410Remember that you can share data (refer to the same string/table
411within a buffer), so factoring out repeating data into its own
412data structure may be worth it.
413
414### Style guide
415
416Identifiers in a schema are meant to translate to many different programming
417languages, so using the style of your "main" language is generally a bad idea.
418
419For this reason, below is a suggested style guide to adhere to, to keep schemas
420consistent for interoperation regardless of the target language.
421
422Where possible, the code generators for specific languages will generate
423identifiers that adhere to the language style, based on the schema identifiers.
424
425- Table, struct, enum and rpc names (types): UpperCamelCase.
426- Table and struct field names: snake_case. This is translated to lowerCamelCase
427  automatically for some languages, e.g. Java.
428- Enum values: UpperCamelCase.
429- namespaces: UpperCamelCase.
430
431Formatting (this is less important, but still worth adhering to):
432
433- Opening brace: on the same line as the start of the declaration.
434- Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`.
435
436For an example, see the schema at the top of this file.
437
438## Gotchas
439
440### Schemas and version control
441
442FlatBuffers relies on new field declarations being added at the end, and earlier
443declarations to not be removed, but be marked deprecated when needed. We think
444this is an improvement over the manual number assignment that happens in
445Protocol Buffers (and which is still an option using the `id` attribute
446mentioned above).
447
448One place where this is possibly problematic however is source control. If user
449A adds a field, generates new binary data with this new schema, then tries to
450commit both to source control after user B already committed a new field also,
451and just auto-merges the schema, the binary files are now invalid compared to
452the new schema.
453
454The solution of course is that you should not be generating binary data before
455your schema changes have been committed, ensuring consistency with the rest of
456the world. If this is not practical for you, use explicit field ids, which
457should always generate a merge conflict if two people try to allocate the same
458id.
459
460### Schema evolution examples
461
462Some examples to clarify what happens as you change a schema:
463
464If we have the following original schema:
465
466    table { a:int; b:int; }
467
468And we extend it:
469
470    table { a:int; b:int; c:int; }
471
472This is ok. Code compiled with the old schema reading data generated with the
473new one will simply ignore the presence of the new field. Code compiled with the
474new schema reading old data will get the default value for `c` (which is 0
475in this case, since it is not specified).
476
477    table { a:int (deprecated); b:int; }
478
479This is also ok. Code compiled with the old schema reading newer data will now
480always get the default value for `a` since it is not present. Code compiled
481with the new schema now cannot read nor write `a` anymore (any existing code
482that tries to do so will result in compile errors), but can still read
483old data (they will ignore the field).
484
485    table { c:int a:int; b:int; }
486
487This is NOT ok, as this makes the schemas incompatible. Old code reading newer
488data will interpret `c` as if it was `a`, and new code reading old data
489accessing `a` will instead receive `b`.
490
491    table { c:int (id: 2); a:int (id: 0); b:int (id: 1); }
492
493This is ok. If your intent was to order/group fields in a way that makes sense
494semantically, you can do so using explicit id assignment. Now we are compatible
495with the original schema, and the fields can be ordered in any way, as long as
496we keep the sequence of ids.
497
498    table { b:int; }
499
500NOT ok. We can only remove a field by deprecation, regardless of wether we use
501explicit ids or not.
502
503    table { a:uint; b:uint; }
504
505This is MAYBE ok, and only in the case where the type change is the same size,
506like here. If old data never contained any negative numbers, this will be
507safe to do.
508
509    table { a:int = 1; b:int = 2; }
510
511Generally NOT ok. Any older data written that had 0 values were not written to
512the buffer, and rely on the default value to be recreated. These will now have
513those values appear to `1` and `2` instead. There may be cases in which this
514is ok, but care must be taken.
515
516    table { aa:int; bb:int; }
517
518Occasionally ok. You've renamed fields, which will break all code (and JSON
519files!) that use this schema, but as long as the change is obvious, this is not
520incompatible with the actual binary buffers, since those only ever address
521fields by id/offset.
522<br>
523
524### Testing whether a field is present in a table
525
526Most serialization formats (e.g. JSON or Protocol Buffers) make it very
527explicit in the format whether a field is present in an object or not,
528allowing you to use this as "extra" information.
529
530In FlatBuffers, this also holds for everything except scalar values.
531
532FlatBuffers by default will not write fields that are equal to the default
533value (for scalars), sometimes resulting in a significant space savings.
534
535However, this also means testing whether a field is "present" is somewhat
536meaningless, since it does not tell you if the field was actually written by
537calling `add_field` style calls, unless you're only interested in this
538information for non-default values.
539
540Some `FlatBufferBuilder` implementations have an option called `force_defaults`
541that circumvents this behavior, and writes fields even if they are equal to
542the default. You can then use `IsFieldPresent` to query this.
543
544Another option that works in all languages is to wrap a scalar field in a
545struct. This way it will return null if it is not present. The cool thing
546is that structs don't take up any more space than the scalar they represent.
547
548   [Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language
549