1<h1>Courgette Internals</h1>
2
3<h2>Patch Generation</h2>
4
5<p><img src="generation.png" alt="Patch Generation" title="" /></p>
6
7<ul>
8<li><p>courgette_tool.cc:GenerateEnsemblePatch kicks off the patch
9generation by calling ensemble_create.cc:GenerateEnsemblePatch</p></li>
10<li><p>The files are read in by in courgette:SourceStream objects</p></li>
11<li><p>ensemble_create.cc:GenerateEnsemblePatch uses FindGenerators, which
12uses MakeGenerator to create
13patch_generator_x86_32.h:PatchGeneratorX86_32 classes.</p></li>
14<li><p>PatchGeneratorX86_32's Transform method transforms the input file
15using Courgette's core techniques that make the bsdiff delta
16smaller.  The steps it takes are the following:</p>
17
18<ul>
19<li><p><em>disassemble</em> the old and new binaries into AssemblyProgram
20objects,</p></li>
21<li><p><em>adjust</em> the new AssemblyProgram object, and</p></li>
22<li><p><em>encode</em> the AssemblyProgram object back into raw bytes.</p></li>
23</ul></li>
24</ul>
25
26<h3>Disassemble</h3>
27
28<ul>
29<li><p>The input is a pointer to a buffer containing the raw bytes of the
30input file.</p></li>
31<li><p>Disassembly converts certain machine instructions that reference
32addresses to Courgette instructions.  It is not actually
33disassembly, but this is the term the code-base uses.  Specifically,
34it detects instructions that use absolute addresses given by the
35binary file's relocation table, and relative addresses used in
36relative branches.</p></li>
37<li><p>Done by disassemble:ParseDetectedExecutable, which selects the
38appropriate Disassembler subclass by looking at the binary file's
39headers.</p>
40
41<ul>
42<li><p>disassembler_win32_x86.h defines the PE/COFF x86 disassembler</p></li>
43<li><p>disassembler_elf_32_x86.h defines the ELF 32-bit x86 disassembler</p></li>
44<li><p>disassembler_elf_32_arm.h defines the ELF 32-bit arm disassembler</p></li>
45</ul></li>
46<li><p>The Disassembler replaces the relocation table with a Courgette
47instruction that can regenerate the relocation table.</p></li>
48<li><p>The Disassembler builds a list of addresses referenced by the
49machine code, numbering each one.</p></li>
50<li><p>The Disassembler replaces and address used in machine instructions
51with its index number.</p></li>
52<li><p>The output is an assembly_program.h:AssemblyProgram class, which
53contains a list of instructions, machine or Courgette, and a mapping
54of indices to actual addresses.</p></li>
55</ul>
56
57<h3>Adjust</h3>
58
59<ul>
60<li><p>This step takes the AssemblyProgram for the old file and reassigns
61the indices that map to actual addresses.  It is performed by
62adjustment_method.cc:Adjust().</p></li>
63<li><p>The goal is the match the indices from the old program to the new
64program as closely as possible.</p></li>
65<li><p>When matched correctly, machine instructions that jump to the
66function in both the new and old binary will look the same to
67bsdiff, even the function is located in a different part of the
68binary.</p></li>
69</ul>
70
71<h3>Encode</h3>
72
73<ul>
74<li><p>This step takes an AssemblyProgram object and encodes both the
75instructions and the mapping of indices to addresses as byte
76vectors.  This format can be written to a file directly, and is also
77more appropriate for bsdiffing.  It is done by
78AssemblyProgram.Encode().</p></li>
79<li><p>encoded_program.h:EncodedProgram defines the binary format and a
80WriteTo method that writes to a file.</p></li>
81</ul>
82
83<h3>bsdiff</h3>
84
85<ul>
86<li>simple_delta.c:GenerateSimpleDelta</li>
87</ul>
88
89<h2>Patch Application</h2>
90
91<p><img src="application.png" alt="Patch Application" title="" /></p>
92
93<ul>
94<li><p>courgette_tool.cc:ApplyEnsemblePatch kicks off the patch generation
95by calling ensemble_apply.cc:ApplyEnsemblePatch</p></li>
96<li><p>ensemble_create.cc:ApplyEnsemblePatch, reads and verifies the
97patch's header, then calls the overloaded version of
98ensemble_create.cc:ApplyEnsemblePatch.</p></li>
99<li><p>The patch is read into an ensemble<em>apply.cc:EnsemblePatchApplication
100object, which generates a set of patcher</em>x86<em>32.h:PatcherX86</em>32
101objects for the sections in the patch.</p></li>
102<li><p>The original file is disassembled and encoded via a call
103EnsemblePatchApplication.TransformUp, which in turn call
104patcher<em>x86</em>32.h:PatcherX86_32.Transform.</p></li>
105<li><p>The transformed file is then bspatched via
106EnsemblePatchApplication.SubpatchTransformedElements, which calls
107EnsemblePatchApplication.SubpatchStreamSets, which calls
108simple_delta.cc:ApplySimpleDelta, Courgette's built-in
109implementation of bspatch.</p></li>
110<li><p>Finally, EnsemblePatchApplication.TransformDown assembles, i.e.,
111reverses the encoding and disassembly, on the patched binary data.
112This is done by calling PatcherX86<em>32.Reform, which in turn calls
113the global function encoded</em>program.cc:Assemble, which calls
114EncodedProgram.AssembleTo.</p></li>
115</ul>
116
117<h2>Glossary</h2>
118
119<p><strong>Adjust</strong>: Reassign address indices in the new program to match more
120  closely those from the old.</p>
121
122<p><strong>Assembly program</strong>: The output of <em>disassembly</em>.  Contains a list of
123  <em>Courgette instructions</em> and an index of branch target addresses.</p>
124
125<p><strong>Assemble</strong>: Convert an <em>assembly program</em> back into an object file
126  by evaluating the <em>Courgette instructions</em> and leaving the machine
127  instructions in place.</p>
128
129<p><strong>Courgette instruction</strong>: Replaces machine instructions in the
130  program.  Courgette instructions replace branches with an index to
131  the target addresses and replace part of the relocation table.</p>
132
133<p><strong>Disassembler</strong>: Takes a binary file and produces an <em>assembly
134  program</em>.</p>
135
136<p><strong>Encode</strong>: Convert an <em>assembly program</em> into an <em>encoded program</em> by
137  serializing its data structures into byte vectors more appropriate
138  for storage in a file.</p>
139
140<p><strong>Encoded Program</strong>: The output of encoding.</p>
141
142<p><strong>Ensemble</strong>: A Courgette-style patch containing sections for the list
143  of branch addresses, the encoded program.  It supports patching
144  multiple object files at once.</p>
145
146<p><strong>Opcode</strong>: The number corresponding to either a machine or <em>Courgette
147  instruction</em>.</p>
148