1Courgette Internals
2===================
3
4Patch Generation
5----------------
6
7![Patch Generation](generation.png)
8
9- courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch
10  generation by calling ensemble\_create.cc:GenerateEnsemblePatch
11
12- The files are read in by in courgette:SourceStream objects
13
14- ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which
15  uses MakeGenerator to create
16  patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes.
17
18- PatchGeneratorX86\_32's Transform method transforms the input file
19  using Courgette's core techniques that make the bsdiff delta
20  smaller.  The steps it takes are the following:
21
22  - _disassemble_ the old and new binaries into AssemblyProgram
23    objects,
24
25  - _adjust_ the new AssemblyProgram object, and
26
27  - _encode_ the AssemblyProgram object back into raw bytes.
28
29### Disassemble
30
31- The input is a pointer to a buffer containing the raw bytes of the
32  input file.
33
34- Disassembly converts certain machine instructions that reference
35  addresses to Courgette instructions.  It is not actually
36  disassembly, but this is the term the code-base uses.  Specifically,
37  it detects instructions that use absolute addresses given by the
38  binary file's relocation table, and relative addresses used in
39  relative branches.
40
41- Done by disassemble:ParseDetectedExecutable, which selects the
42  appropriate Disassembler subclass by looking at the binary file's
43  headers.
44
45  - disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler
46
47  - disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler
48
49  - disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler
50
51- The Disassembler replaces the relocation table with a Courgette
52  instruction that can regenerate the relocation table.
53
54- The Disassembler builds a list of addresses referenced by the
55  machine code, numbering each one.
56
57- The Disassembler replaces and address used in machine instructions
58  with its index number.
59
60- The output is an assembly\_program.h:AssemblyProgram class, which
61  contains a list of instructions, machine or Courgette, and a mapping
62  of indices to actual addresses.
63
64### Adjust
65
66- This step takes the AssemblyProgram for the old file and reassigns
67  the indices that map to actual addresses.  It is performed by
68  adjustment_method.cc:Adjust().
69
70- The goal is the match the indices from the old program to the new
71  program as closely as possible.
72
73- When matched correctly, machine instructions that jump to the
74  function in both the new and old binary will look the same to
75  bsdiff, even the function is located in a different part of the
76  binary.
77
78### Encode
79
80- This step takes an AssemblyProgram object and encodes both the
81  instructions and the mapping of indices to addresses as byte
82  vectors.  This format can be written to a file directly, and is also
83  more appropriate for bsdiffing.  It is done by
84  AssemblyProgram.Encode().
85
86- encoded_program.h:EncodedProgram defines the binary format and a
87  WriteTo method that writes to a file.
88
89### bsdiff
90
91- simple_delta.c:GenerateSimpleDelta
92
93Patch Application
94-----------------
95
96![Patch Application](application.png)
97
98- courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation
99  by calling ensemble\_apply.cc:ApplyEnsemblePatch
100
101- ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the
102  patch's header, then calls the overloaded version of
103  ensemble\_create.cc:ApplyEnsemblePatch.
104
105- The patch is read into an ensemble_apply.cc:EnsemblePatchApplication
106  object, which generates a set of patcher_x86_32.h:PatcherX86_32
107  objects for the sections in the patch.
108
109- The original file is disassembled and encoded via a call
110  EnsemblePatchApplication.TransformUp, which in turn call
111  patcher_x86_32.h:PatcherX86_32.Transform.
112
113- The transformed file is then bspatched via
114  EnsemblePatchApplication.SubpatchTransformedElements, which calls
115  EnsemblePatchApplication.SubpatchStreamSets, which calls
116  simple_delta.cc:ApplySimpleDelta, Courgette's built-in
117  implementation of bspatch.
118
119- Finally, EnsemblePatchApplication.TransformDown assembles, i.e.,
120  reverses the encoding and disassembly, on the patched binary data.
121  This is done by calling PatcherX86_32.Reform, which in turn calls
122  the global function encoded_program.cc:Assemble, which calls
123  EncodedProgram.AssembleTo.
124
125
126Glossary
127--------
128
129**Adjust**: Reassign address indices in the new program to match more
130  closely those from the old.
131
132**Assembly program**: The output of _disassembly_.  Contains a list of
133  _Courgette instructions_ and an index of branch target addresses.
134
135**Assemble**: Convert an _assembly program_ back into an object file
136  by evaluating the _Courgette instructions_ and leaving the machine
137  instructions in place.
138
139**Courgette instruction**: Replaces machine instructions in the
140  program.  Courgette instructions replace branches with an index to
141  the target addresses and replace part of the relocation table.
142
143**Disassembler**: Takes a binary file and produces an _assembly
144  program_.
145
146**Encode**: Convert an _assembly program_ into an _encoded program_ by
147  serializing its data structures into byte vectors more appropriate
148  for storage in a file.
149
150**Encoded Program**: The output of encoding.
151
152**Ensemble**: A Courgette-style patch containing sections for the list
153  of branch addresses, the encoded program.  It supports patching
154  multiple object files at once.
155
156**Opcode**: The number corresponding to either a machine or _Courgette
157  instruction_.
158