Schemas.md revision 89d2b0861b2f74e84ec698a2536d48eb7ca62268
1# Writing a schema 2 3The syntax of the schema language (aka IDL, Interface Definition 4Language) should look quite familiar to users of any of the C family of 5languages, and also to users of other IDLs. Let's look at an example 6first: 7 8 // example IDL file 9 10 namespace MyGame; 11 12 attribute "priority"; 13 14 enum Color : byte { Red = 1, Green, Blue } 15 16 union Any { Monster, Weapon, Pickup } 17 18 struct Vec3 { 19 x:float; 20 y:float; 21 z:float; 22 } 23 24 table Monster { 25 pos:Vec3; 26 mana:short = 150; 27 hp:short = 100; 28 name:string; 29 friendly:bool = false (deprecated, priority: 1); 30 inventory:[ubyte]; 31 color:Color = Blue; 32 test:Any; 33 } 34 35 root_type Monster; 36 37(Weapon & Pickup not defined as part of this example). 38 39### Tables 40 41Tables are the main way of defining objects in FlatBuffers, and consist 42of a name (here `Monster`) and a list of fields. Each field has a name, 43a type, and optionally a default value (if omitted, it defaults to 0 / 44NULL). 45 46Each field is optional: It does not have to appear in the wire 47representation, and you can choose to omit fields for each individual 48object. As a result, you have the flexibility to add fields without fear of 49bloating your data. This design is also FlatBuffer's mechanism for forward 50and backwards compatibility. Note that: 51 52- You can add new fields in the schema ONLY at the end of a table 53 definition. Older data will still 54 read correctly, and give you the default value when read. Older code 55 will simply ignore the new field. 56 If you want to have flexibility to use any order for fields in your 57 schema, you can manually assign ids (much like Protocol Buffers), 58 see the `id` attribute below. 59 60- You cannot delete fields you don't use anymore from the schema, 61 but you can simply 62 stop writing them into your data for almost the same effect. 63 Additionally you can mark them as `deprecated` as in the example 64 above, which will prevent the generation of accessors in the 65 generated C++, as a way to enforce the field not being used any more. 66 (careful: this may break code!). 67 68- You may change field names and table names, if you're ok with your 69 code breaking until you've renamed them there too. 70 71 72 73### Structs 74 75Similar to a table, only now none of the fields are optional (so no defaults 76either), and fields may not be added or be deprecated. Structs may only contain 77scalars or other structs. Use this for 78simple objects where you are very sure no changes will ever be made 79(as quite clear in the example `Vec3`). Structs use less memory than 80tables and are even faster to access (they are always stored in-line in their 81parent object, and use no virtual table). 82 83### Types 84 85Built-in scalar types are: 86 87- 8 bit: `byte ubyte bool` 88 89- 16 bit: `short ushort` 90 91- 32 bit: `int uint float` 92 93- 64 bit: `long ulong double` 94 95Built-in non-scalar types: 96 97- Vector of any other type (denoted with `[type]`). Nesting vectors 98 is not supported, instead you can wrap the inner vector in a table. 99 100- `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings 101 or general binary data use vectors (`[byte]` or `[ubyte]`) instead. 102 103- References to other tables or structs, enums or unions (see 104 below). 105 106You can't change types of fields once they're used, with the exception 107of same-size data where a `reinterpret_cast` would give you a desirable result, 108e.g. you could change a `uint` to an `int` if no values in current data use the 109high bit yet. 110 111### (Default) Values 112 113Values are a sequence of digits, optionally followed by a `.` and more digits 114for float constants, and optionally prefixed by a `-`. Non-scalar defaults are 115currently not supported (always NULL). 116 117You generally do not want to change default values after they're initially 118defined. Fields that have the default value are not actually stored in the 119serialized data but are generated in code, so when you change the default, you'd 120now get a different value than from code generated from an older version of 121the schema. There are situations however where this may be 122desirable, especially if you can ensure a simultaneous rebuild of 123all code. 124 125### Enums 126 127Define a sequence of named constants, each with a given value, or 128increasing by one from the previous one. The default first value 129is `0`. As you can see in the enum declaration, you specify the underlying 130integral type of the enum with `:` (in this case `byte`), which then determines 131the type of any fields declared with this enum type. 132 133### Unions 134 135Unions share a lot of properties with enums, but instead of new names 136for constants, you use names of tables. You can then declare 137a union field which can hold a reference to any of those types, and 138additionally a hidden field with the suffix `_type` is generated that 139holds the corresponding enum value, allowing you to know which type to 140cast to at runtime. 141 142Unions are a good way to be able to send multiple message types as a FlatBuffer. 143Note that because a union field is really two fields, it must always be 144part of a table, it cannot be the root of a FlatBuffer by itself. 145 146If you have a need to distinguish between different FlatBuffers in a more 147open-ended way, for example for use as files, see the file identification 148feature below. 149 150### Namespaces 151 152These will generate the corresponding namespace in C++ for all helper 153code, and packages in Java. You can use `.` to specify nested namespaces / 154packages. 155 156### Includes 157 158You can include other schemas files in your current one, e.g.: 159 160 include "mydefinitions.fbs"; 161 162This makes it easier to refer to types defined elsewhere. `include` 163automatically ensures each file is parsed just once, even when referred to 164more than once. 165 166When using the `flatc` compiler to generate code for schema definitions, 167only definitions in the current file will be generated, not those from the 168included files (those you still generate separately). 169 170### Root type 171 172This declares what you consider to be the root table (or struct) of the 173serialized data. This is particular important for parsing JSON data, 174which doesn't include object type information. 175 176### File identification and extension 177 178Typically, a FlatBuffer binary buffer is not self-describing, i.e. it 179needs you to know its schema to parse it correctly. But if you 180want to use a FlatBuffer as a file format, it would be convenient 181to be able to have a "magic number" in there, like most file formats 182have, to be able to do a sanity check to see if you're reading the 183kind of file you're expecting. 184 185Now, you can always prefix a FlatBuffer with your own file header, 186but FlatBuffers has a built-in way to add an identifier to a 187FlatBuffer that takes up minimal space, and keeps the buffer 188compatible with buffers that don't have such an identifier. 189 190You can specify in a schema, similar to `root_type`, that you intend 191for this type of FlatBuffer to be used as a file format: 192 193 file_identifier "MYFI"; 194 195Identifiers must always be exactly 4 characters long. These 4 characters 196will end up as bytes at offsets 4-7 (inclusive) in the buffer. 197 198For any schema that has such an identifier, `flatc` will automatically 199add the identifier to any binaries it generates (with `-b`), 200and generated calls like `FinishMonsterBuffer` also add the identifier. 201If you have specified an identifier and wish to generate a buffer 202without one, you can always still do so by calling 203`FlatBufferBuilder::Finish` explicitly. 204 205After loading a buffer, you can use a call like 206`MonsterBufferHasIdentifier` to check if the identifier is present. 207 208Note that this is best for open-ended uses such as files. If you simply wanted 209to send one of a set of possible messages over a network for example, you'd 210be better off with a union. 211 212Additionally, by default `flatc` will output binary files as `.bin`. 213This declaration in the schema will change that to whatever you want: 214 215 file_extension "ext"; 216 217### Comments & documentation 218 219May be written as in most C-based languages. Additionally, a triple 220comment (`///`) on a line by itself signals that a comment is documentation 221for whatever is declared on the line after it 222(table/struct/field/enum/union/element), and the comment is output 223in the corresponding C++ code. Multiple such lines per item are allowed. 224 225### Attributes 226 227Attributes may be attached to a declaration, behind a field, or after 228the name of a table/struct/enum/union. These may either have a value or 229not. Some attributes like `deprecated` are understood by the compiler, 230user defined ones need to be declared with the attribute declaration 231(like `priority` in the example above), and are 232available to query if you parse the schema at runtime. 233This is useful if you write your own code generators/editors etc., and 234you wish to add additional information specific to your tool (such as a 235help text). 236 237Current understood attributes: 238 239- `id: n` (on a table field): manually set the field identifier to `n`. 240 If you use this attribute, you must use it on ALL fields of this table, 241 and the numbers must be a contiguous range from 0 onwards. 242 Additionally, since a union type effectively adds two fields, its 243 id must be that of the second field (the first field is the type 244 field and not explicitly declared in the schema). 245 For example, if the last field before the union field had id 6, 246 the union field should have id 8, and the unions type field will 247 implicitly be 7. 248 IDs allow the fields to be placed in any order in the schema. 249 When a new field is added to the schema is must use the next available ID. 250- `deprecated` (on a field): do not generate accessors for this field 251 anymore, code should stop using this data. 252- `required` (on a non-scalar table field): this field must always be set. 253 By default, all fields are optional, i.e. may be left out. This is 254 desirable, as it helps with forwards/backwards compatibility, and 255 flexibility of data structures. It is also a burden on the reading code, 256 since for non-scalar fields it requires you to check against NULL and 257 take appropriate action. By specifying this field, you force code that 258 constructs FlatBuffers to ensure this field is initialized, so the reading 259 code may access it directly, without checking for NULL. If the constructing 260 code does not initialize this field, they will get an assert, and also 261 the verifier will fail on buffers that have missing required fields. 262- `original_order` (on a table): since elements in a table do not need 263 to be stored in any particular order, they are often optimized for 264 space by sorting them to size. This attribute stops that from happening. 265- `force_align: size` (on a struct): force the alignment of this struct 266 to be something higher than what it is naturally aligned to. Causes 267 these structs to be aligned to that amount inside a buffer, IF that 268 buffer is allocated with that alignment (which is not necessarily 269 the case for buffers accessed directly inside a `FlatBufferBuilder`). 270- `bit_flags` (on an enum): the values of this field indicate bits, 271 meaning that any value N specified in the schema will end up 272 representing 1<<N, or if you don't specify values at all, you'll get 273 the sequence 1, 2, 4, 8, ... 274- `nested_flatbuffer: "table_name"` (on a field): this indicates that the field 275 (which must be a vector of ubyte) contains flatbuffer data, for which the 276 root type is given by `table_name`. The generated code will then produce 277 a convenient accessor for the nested FlatBuffer. 278 279## JSON Parsing 280 281The same parser that parses the schema declarations above is also able 282to parse JSON objects that conform to this schema. So, unlike other JSON 283parsers, this parser is strongly typed, and parses directly into a FlatBuffer 284(see the compiler documentation on how to do this from the command line, or 285the C++ documentation on how to do this at runtime). 286 287Besides needing a schema, there are a few other changes to how it parses 288JSON: 289 290- It accepts field names with and without quotes, like many JSON parsers 291 already do. It outputs them without quotes as well, though can be made 292 to output them using the `strict_json` flag. 293- If a field has an enum type, the parser will recognize symbolic enum 294 values (with or without quotes) instead of numbers, e.g. 295 `field: EnumVal`. If a field is of integral type, you can still use 296 symbolic names, but values need to be prefixed with their type and 297 need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums 298 representing flags, you may place multiple inside a string 299 separated by spaces to OR them, e.g. 300 `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`. 301 302When parsing JSON, it recognizes the following escape codes in strings: 303 304- `\n` - linefeed. 305- `\t` - tab. 306- `\r` - carriage return. 307- `\b` - backspace. 308- `\f` - form feed. 309- `\"` - double quote. 310- `\\` - backslash. 311- `\/` - forward slash. 312- `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8 313 representation. 314- `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is 315 not in the JSON spec (see http://json.org/), but is needed to be able to 316 encode arbitrary binary in strings to text and back without losing 317 information (e.g. the byte 0xFF can't be represented in standard JSON). 318 319It also generates these escape codes back again when generating JSON from a 320binary representation. 321 322## Gotchas 323 324### Schemas and version control 325 326FlatBuffers relies on new field declarations being added at the end, and earlier 327declarations to not be removed, but be marked deprecated when needed. We think 328this is an improvement over the manual number assignment that happens in 329Protocol Buffers (and which is still an option using the `id` attribute 330mentioned above). 331 332One place where this is possibly problematic however is source control. If user 333A adds a field, generates new binary data with this new schema, then tries to 334commit both to source control after user B already committed a new field also, 335and just auto-merges the schema, the binary files are now invalid compared to 336the new schema. 337 338The solution of course is that you should not be generating binary data before 339your schema changes have been committed, ensuring consistency with the rest of 340the world. If this is not practical for you, use explicit field ids, which 341should always generate a merge conflict if two people try to allocate the same 342id. 343 344