X86Disassembler.h revision 9899f70a7406d632c82849978bf6981f1ee4ccb5
18ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan//===- X86Disassembler.h - Disassembler for x86 and x86_64 ------*- C++ -*-===// 28ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 38ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The LLVM Compiler Infrastructure 48ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 58ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// This file is distributed under the University of Illinois Open Source 68ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// License. See LICENSE.TXT for details. 78ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 88ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan//===----------------------------------------------------------------------===// 98ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 108ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The X86 disassembler is a table-driven disassembler for the 16-, 32-, and 118ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 64-bit X86 instruction sets. The main decode sequence for an assembly 128ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instruction in this disassembler is: 138ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 148ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 1. Read the prefix bytes and determine the attributes of the instruction. 158ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// These attributes, recorded in enum attributeBits 168ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h), form a bitmask. The table CONTEXTS_SYM 178ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// provides a mapping from bitmasks to contexts, which are represented by 188ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// enum InstructionContext (ibid.). 198ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 208ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 2. Read the opcode, and determine what kind of opcode it is. The 218ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// disassembler distinguishes four kinds of opcodes, which are enumerated in 228ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OpcodeType (X86DisassemblerDecoderCommon.h): one-byte (0xnn), two-byte 238ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (0x0f 0xnn), three-byte-38 (0x0f 0x38 0xnn), or three-byte-3a 248ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (0x0f 0x3a 0xnn). Mandatory prefixes are treated as part of the context. 258ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 268ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 3. Depending on the opcode type, look in one of four ClassDecision structures 278ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h). Use the opcode class to determine which 288ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OpcodeDecision (ibid.) to look the opcode in. Look up the opcode, to get 298ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// a ModRMDecision (ibid.). 308ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 318ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 4. Some instructions, such as escape opcodes or extended opcodes, or even 328ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instructions that have ModRM*Reg / ModRM*Mem forms in LLVM, need the 338ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModR/M byte to complete decode. The ModRMDecision's type is an entry from 348ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModRMDecisionType (X86DisassemblerDecoderCommon.h) that indicates if the 358ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModR/M byte is required and how to interpret it. 368ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 378ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 5. After resolving the ModRMDecision, the disassembler has a unique ID 388ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// of type InstrUID (X86DisassemblerDecoderCommon.h). Looking this ID up in 398ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// INSTRUCTIONS_SYM yields the name of the instruction and the encodings and 408ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// meanings of its operands. 418ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 428ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 6. For each operand, its encoding is an entry from OperandEncoding 438ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h) and its type is an entry from 448ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OperandType (ibid.). The encoding indicates how to read it from the 458ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instruction; the type indicates how to interpret the value once it has 468ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// been read. For example, a register operand could be stored in the R/M 478ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// field of the ModR/M byte, the REG field of the ModR/M byte, or added to 488ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// the main opcode. This is orthogonal from its meaning (an GPR or an XMM 498ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// register, for instance). Given this information, the operands can be 508ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// extracted and interpreted. 518ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 528ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 7. As the last step, the disassembler translates the instruction information 538ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// and operands into a format understandable by the client - in this case, an 548ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// MCInst for use by the MC infrastructure. 558ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 568ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The disassembler is broken broadly into two parts: the table emitter that 578ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// emits the instruction decode tables discussed above during compilation, and 588ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// the disassembler itself. The table emitter is documented in more detail in 598ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// utils/TableGen/X86DisassemblerEmitter.h. 608ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 618ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86Disassembler.h contains the public interface for the disassembler, 628ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// adhering to the MCDisassembler interface. 638ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86Disassembler.cpp contains the code responsible for step 7, and for 648ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// invoking the decoder to execute steps 1-6. 658ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoderCommon.h contains the definitions needed by both the 668ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// table emitter and the disassembler. 678ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoder.h contains the public interface of the decoder, 688ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// factored out into C for possible use by other projects. 698ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoder.c contains the source code of the decoder, which is 708ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// responsible for steps 1-6. 718ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 728ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan//===----------------------------------------------------------------------===// 738ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 748ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#ifndef X86DISASSEMBLER_H 758ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#define X86DISASSEMBLER_H 768ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 778ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#define INSTRUCTION_SPECIFIER_FIELDS \ 788ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan const char* name; 798ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 808ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#define INSTRUCTION_IDS \ 818ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan InstrUID* instructionIDs; 828ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 838ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#include "X86DisassemblerDecoderCommon.h" 848ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 858ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#undef INSTRUCTION_SPECIFIER_FIELDS 868ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#undef INSTRUCTION_IDS 878ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 888ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#include "llvm/MC/MCDisassembler.h" 898ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 908ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananstruct InternalInstruction; 918ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 928ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanannamespace llvm { 938ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 948ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass MCInst; 958ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass MemoryObject; 968ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass raw_ostream; 979899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan 989899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callananstruct EDInstInfo; 998ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1008ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanannamespace X86Disassembler { 1018ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1028ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86GenericDisassembler - Generic disassembler for all X86 platforms. 1038ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// All each platform class should have to do is subclass the constructor, and 1048ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// provide a different disassemblerMode value. 1058ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86GenericDisassembler : public MCDisassembler { 1068ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananprotected: 1078ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// Constructor - Initializes the disassembler. 1088ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// 1098ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// @param mode - The X86 architecture mode to decode for. 1108ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan X86GenericDisassembler(DisassemblerMode mode); 1118ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananpublic: 1128ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan ~X86GenericDisassembler(); 1138ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1148ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// getInstruction - See MCDisassembler. 1158ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan bool getInstruction(MCInst &instr, 1168ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan uint64_t &size, 1178ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan const MemoryObject ®ion, 1188ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan uint64_t address, 1198ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan raw_ostream &vStream) const; 1209899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan 1219899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan /// getEDInfo - See MCDisassembler. 1229899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan EDInstInfo *getEDInfo() const; 1238ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananprivate: 1248ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan DisassemblerMode fMode; 1258ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1268ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1278ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86_16Disassembler - 16-bit X86 disassembler. 1288ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86_16Disassembler : public X86GenericDisassembler { 1298ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananpublic: 1308ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan X86_16Disassembler() : 1318ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan X86GenericDisassembler(MODE_16BIT) { 1328ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan } 1338ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1348ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1358ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86_16Disassembler - 32-bit X86 disassembler. 1368ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86_32Disassembler : public X86GenericDisassembler { 1378ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananpublic: 1388ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan X86_32Disassembler() : 1398ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan X86GenericDisassembler(MODE_32BIT) { 1408ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan } 1418ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1428ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1438ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86_16Disassembler - 64-bit X86 disassembler. 1448ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86_64Disassembler : public X86GenericDisassembler { 1458ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananpublic: 1468ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan X86_64Disassembler() : 1478ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan X86GenericDisassembler(MODE_64BIT) { 1488ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan } 1498ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1508ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1518ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan} // namespace X86Disassembler 1528ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1538ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan} // namespace llvm 1548ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1558ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#endif 156