Schemas.md revision 5da7bda826a98fa92eb1356907afa631bfa9c1b1
1# Writing a schema 2 3The syntax of the schema language (aka IDL, Interface Definition 4Language) should look quite familiar to users of any of the C family of 5languages, and also to users of other IDLs. Let's look at an example 6first: 7 8 // example IDL file 9 10 namespace MyGame; 11 12 enum Color : byte { Red = 1, Green, Blue } 13 14 union Any { Monster, Weapon, Pickup } 15 16 struct Vec3 { 17 x:float; 18 y:float; 19 z:float; 20 } 21 22 table Monster { 23 pos:Vec3; 24 mana:short = 150; 25 hp:short = 100; 26 name:string; 27 friendly:bool = false (deprecated, priority: 1); 28 inventory:[ubyte]; 29 color:Color = Blue; 30 test:Any; 31 } 32 33 root_type Monster; 34 35(Weapon & Pickup not defined as part of this example). 36 37### Tables 38 39Tables are the main way of defining objects in FlatBuffers, and consist 40of a name (here `Monster`) and a list of fields. Each field has a name, 41a type, and optionally a default value (if omitted, it defaults to 0 / 42NULL). 43 44Each field is optional: It does not have to appear in the wire 45representation, and you can choose to omit fields for each individual 46object. As a result, you have the flexibility to add fields without fear of 47bloating your data. This design is also FlatBuffer's mechanism for forward 48and backwards compatibility. Note that: 49 50- You can add new fields in the schema ONLY at the end of a table 51 definition. Older data will still 52 read correctly, and give you the default value when read. Older code 53 will simply ignore the new field. 54 If you want to have flexibility to use any order for fields in your 55 schema, you can manually assign ids (much like Protocol Buffers), 56 see the `id` attribute below. 57 58- You cannot delete fields you don't use anymore from the schema, 59 but you can simply 60 stop writing them into your data for almost the same effect. 61 Additionally you can mark them as `deprecated` as in the example 62 above, which will prevent the generation of accessors in the 63 generated C++, as a way to enforce the field not being used any more. 64 (careful: this may break code!). 65 66- You may change field names and table names, if you're ok with your 67 code breaking until you've renamed them there too. 68 69 70 71### Structs 72 73Similar to a table, only now none of the fields are optional (so no defaults 74either), and fields may not be added or be deprecated. Structs may only contain 75scalars or other structs. Use this for 76simple objects where you are very sure no changes will ever be made 77(as quite clear in the example `Vec3`). Structs use less memory than 78tables and are even faster to access (they are always stored in-line in their 79parent object, and use no virtual table). 80 81### Types 82 83Builtin scalar types are: 84 85- 8 bit: `byte ubyte bool` 86 87- 16 bit: `short ushort` 88 89- 32 bit: `int uint float` 90 91- 64 bit: `long ulong double` 92 93- Vector of any other type (denoted with `[type]`). Nesting vectors 94 is not supported, instead you can wrap the inner vector in a table. 95 96- `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings 97 or general binary data use vectors (`[byte]` or `[ubyte]`) instead. 98 99- References to other tables or structs, enums or unions (see 100 below). 101 102You can't change types of fields once they're used, with the exception 103of same-size data where a `reinterpret_cast` would give you a desirable result, 104e.g. you could change a `uint` to an `int` if no values in current data use the 105high bit yet. 106 107### (Default) Values 108 109Values are a sequence of digits, optionally followed by a `.` and more digits 110for float constants, and optionally prefixed by a `-`. Non-scalar defaults are 111currently not supported (always NULL). 112 113You generally do not want to change default values after they're initially 114defined. Fields that have the default value are not actually stored in the 115serialized data but are generated in code, so when you change the default, you'd 116now get a different value than from code generated from an older version of 117the schema. There are situations however where this may be 118desirable, especially if you can ensure a simultaneous rebuild of 119all code. 120 121### Enums 122 123Define a sequence of named constants, each with a given value, or 124increasing by one from the previous one. The default first value 125is `0`. As you can see in the enum declaration, you specify the underlying 126integral type of the enum with `:` (in this case `byte`), which then determines 127the type of any fields declared with this enum type. 128 129### Unions 130 131Unions share a lot of properties with enums, but instead of new names 132for constants, you use names of tables. You can then declare 133a union field which can hold a reference to any of those types, and 134additionally a hidden field with the suffix `_type` is generated that 135holds the corresponding enum value, allowing you to know which type to 136cast to at runtime. 137 138### Namespaces 139 140These will generate the corresponding namespace in C++ for all helper 141code, and packages in Java. You can use `.` to specify nested namespaces / 142packages. 143 144### Root type 145 146This declares what you consider to be the root table (or struct) of the 147serialized data. This is particular important for parsing JSON data, 148which doesn't include object type information. 149 150### File identification and extension 151 152Typically, a FlatBuffer binary buffer is not self-describing, i.e. it 153needs you to know its schema to parse it correctly. But if you 154want to use a FlatBuffer as a file format, it would be convenient 155to be able to have a "magic number" in there, like most file formats 156have, to be able to do a sanity check to see if you're reading the 157kind of file you're expecting. 158 159Now, you can always prefix a FlatBuffer with your own file header, 160but FlatBuffers has a built-in way to add an identifier to a 161FlatBuffer that takes up minimal space, and keeps the buffer 162compatible with buffers that don't have such an identifier. 163 164You can specify in a schema, similar to `root_type`, that you intend 165for this type of FlatBuffer to be used as a file format: 166 167 file_identifier "MYFI"; 168 169Identifiers must always be exactly 4 characters long. These 4 characters 170will end up as bytes at offsets 4-7 (inclusive) in the buffer. 171 172For any schema that has such an identifier, `flatc` will automatically 173add the identifier to any binaries it generates (with `-b`), 174and generated calls like `FinishMonsterBuffer` also add the identifier. 175If you have specified an identifier and wish to generate a buffer 176without one, you can always still do so by calling 177`FlatBufferBuilder::Finish` explicitly. 178 179After loading a buffer, you can use a call like 180`MonsterBufferHasIdentifier` to check if the identifier is present. 181 182Additionally, by default `flatc` will output binary files as `.bin`. 183This declaration in the schema will change that to whatever you want: 184 185 file_extension "ext"; 186 187### Comments & documentation 188 189May be written as in most C-based languages. Additionally, a triple 190comment (`///`) on a line by itself signals that a comment is documentation 191for whatever is declared on the line after it 192(table/struct/field/enum/union/element), and the comment is output 193in the corresponding C++ code. Multiple such lines per item are allowed. 194 195### Attributes 196 197Attributes may be attached to a declaration, behind a field, or after 198the name of a table/struct/enum/union. These may either have a value or 199not. Some attributes like `deprecated` are understood by the compiler, 200others are simply ignored (like `priority`), but are available to query 201if you parse the schema at runtime. 202This is useful if you write your own code generators/editors etc., and 203you wish to add additional information specific to your tool (such as a 204help text). 205 206Current understood attributes: 207 208- `id: n` (on a table field): manually set the field identifier to `n`. 209 If you use this attribute, you must use it on ALL fields of this table, 210 and the numbers must be a contiguous range from 0 onwards. 211 Additionally, since a union type effectively adds two fields, its 212 id must be that of the second field (the first field is the type 213 field and not explicitly declared in the schema). 214 For example, if the last field before the union field had id 6, 215 the union field should have id 8, and the unions type field will 216 implicitly be 7. 217 IDs allow the fields to be placed in any order in the schema. 218 When a new field is added to the schema is must use the next available ID. 219- `deprecated` (on a field): do not generate accessors for this field 220 anymore, code should stop using this data. 221- `original_order` (on a table): since elements in a table do not need 222 to be stored in any particular order, they are often optimized for 223 space by sorting them to size. This attribute stops that from happening. 224- `force_align: size` (on a struct): force the alignment of this struct 225 to be something higher than what it is naturally aligned to. Causes 226 these structs to be aligned to that amount inside a buffer, IF that 227 buffer is allocated with that alignment (which is not necessarily 228 the case for buffers accessed directly inside a `FlatBufferBuilder`). 229- `bit_flags` (on an enum): the values of this field indicate bits, 230 meaning that any value N specified in the schema will end up 231 representing 1<<N, or if you don't specify values at all, you'll get 232 the sequence 1, 2, 4, 8, ... 233 234## JSON Parsing 235 236The same parser that parses the schema declarations above is also able 237to parse JSON objects that conform to this schema. So, unlike other JSON 238parsers, this parser is strongly typed, and parses directly into a FlatBuffer 239(see the compiler documentation on how to do this from the command line, or 240the C++ documentation on how to do this at runtime). 241 242Besides needing a schema, there are a few other changes to how it parses 243JSON: 244 245- It accepts field names with and without quotes, like many JSON parsers 246 already do. It outputs them without quotes as well, though can be made 247 to output them using the `strict_json` flag. 248- If a field has an enum type, the parser will recognize symbolic enum 249 values (with or without quotes) instead of numbers, e.g. 250 `field: EnumVal`. If a field is of integral type, you can still use 251 symbolic names, but values need to be prefixed with their type and 252 need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums 253 representing flags, you may place multiple inside a string 254 separated by spaces to OR them, e.g. 255 `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`. 256 257## Gotchas 258 259### Schemas and version control 260 261FlatBuffers relies on new field declarations being added at the end, and earlier 262declarations to not be removed, but be marked deprecated when needed. We think 263this is an improvement over the manual number assignment that happens in 264Protocol Buffers (and which is still an option using the `id` attribute 265mentioned above). 266 267One place where this is possibly problematic however is source control. If user 268A adds a field, generates new binary data with this new schema, then tries to 269commit both to source control after user B already committed a new field also, 270and just auto-merges the schema, the binary files are now invalid compared to 271the new schema. 272 273The solution of course is that you should not be generating binary data before 274your schema changes have been committed, ensuring consistency with the rest of 275the world. If this is not practical for you, use explicit field ids, which 276should always generate a merge conflict if two people try to allocate the same 277id. 278 279