X86Disassembler.h revision 98c5ddabca1debf935a07d14d0cbc9732374bdb8
18ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan//===- X86Disassembler.h - Disassembler for x86 and x86_64 ------*- C++ -*-===// 28ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 38ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The LLVM Compiler Infrastructure 48ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 58ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// This file is distributed under the University of Illinois Open Source 68ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// License. See LICENSE.TXT for details. 78ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 88ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan//===----------------------------------------------------------------------===// 98ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 108ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The X86 disassembler is a table-driven disassembler for the 16-, 32-, and 118ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 64-bit X86 instruction sets. The main decode sequence for an assembly 128ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instruction in this disassembler is: 138ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 148ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 1. Read the prefix bytes and determine the attributes of the instruction. 158ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// These attributes, recorded in enum attributeBits 168ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h), form a bitmask. The table CONTEXTS_SYM 178ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// provides a mapping from bitmasks to contexts, which are represented by 188ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// enum InstructionContext (ibid.). 198ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 208ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 2. Read the opcode, and determine what kind of opcode it is. The 218ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// disassembler distinguishes four kinds of opcodes, which are enumerated in 228ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OpcodeType (X86DisassemblerDecoderCommon.h): one-byte (0xnn), two-byte 238ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (0x0f 0xnn), three-byte-38 (0x0f 0x38 0xnn), or three-byte-3a 248ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (0x0f 0x3a 0xnn). Mandatory prefixes are treated as part of the context. 258ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 268ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 3. Depending on the opcode type, look in one of four ClassDecision structures 278ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h). Use the opcode class to determine which 288ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OpcodeDecision (ibid.) to look the opcode in. Look up the opcode, to get 298ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// a ModRMDecision (ibid.). 308ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 318ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 4. Some instructions, such as escape opcodes or extended opcodes, or even 328ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instructions that have ModRM*Reg / ModRM*Mem forms in LLVM, need the 338ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModR/M byte to complete decode. The ModRMDecision's type is an entry from 348ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModRMDecisionType (X86DisassemblerDecoderCommon.h) that indicates if the 358ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModR/M byte is required and how to interpret it. 368ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 378ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 5. After resolving the ModRMDecision, the disassembler has a unique ID 388ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// of type InstrUID (X86DisassemblerDecoderCommon.h). Looking this ID up in 398ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// INSTRUCTIONS_SYM yields the name of the instruction and the encodings and 408ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// meanings of its operands. 418ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 428ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 6. For each operand, its encoding is an entry from OperandEncoding 438ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h) and its type is an entry from 448ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OperandType (ibid.). The encoding indicates how to read it from the 458ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instruction; the type indicates how to interpret the value once it has 468ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// been read. For example, a register operand could be stored in the R/M 478ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// field of the ModR/M byte, the REG field of the ModR/M byte, or added to 488ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// the main opcode. This is orthogonal from its meaning (an GPR or an XMM 498ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// register, for instance). Given this information, the operands can be 508ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// extracted and interpreted. 518ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 528ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 7. As the last step, the disassembler translates the instruction information 538ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// and operands into a format understandable by the client - in this case, an 548ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// MCInst for use by the MC infrastructure. 558ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 568ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The disassembler is broken broadly into two parts: the table emitter that 578ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// emits the instruction decode tables discussed above during compilation, and 588ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// the disassembler itself. The table emitter is documented in more detail in 598ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// utils/TableGen/X86DisassemblerEmitter.h. 608ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 618ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86Disassembler.h contains the public interface for the disassembler, 628ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// adhering to the MCDisassembler interface. 638ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86Disassembler.cpp contains the code responsible for step 7, and for 648ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// invoking the decoder to execute steps 1-6. 658ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoderCommon.h contains the definitions needed by both the 668ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// table emitter and the disassembler. 678ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoder.h contains the public interface of the decoder, 688ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// factored out into C for possible use by other projects. 698ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoder.c contains the source code of the decoder, which is 708ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// responsible for steps 1-6. 718ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 728ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan//===----------------------------------------------------------------------===// 738ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 748ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#ifndef X86DISASSEMBLER_H 758ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#define X86DISASSEMBLER_H 768ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 778ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#define INSTRUCTION_SPECIFIER_FIELDS \ 788ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan const char* name; 798ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 808ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#define INSTRUCTION_IDS \ 814d1dca92bd6d4aad7121e28c7ffc93c0a6a187d7Benjamin Kramer const InstrUID *instructionIDs; 828ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 838ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#include "X86DisassemblerDecoderCommon.h" 848ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 858ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#undef INSTRUCTION_SPECIFIER_FIELDS 868ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#undef INSTRUCTION_IDS 878ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 888ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#include "llvm/MC/MCDisassembler.h" 898ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 908ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananstruct InternalInstruction; 918ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 928ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanannamespace llvm { 938ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 948ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass MCInst; 95b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloyclass MCSubtargetInfo; 968ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass MemoryObject; 978ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass raw_ostream; 989899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan 999899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callananstruct EDInstInfo; 1008ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1018ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanannamespace X86Disassembler { 1028ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1038ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86GenericDisassembler - Generic disassembler for all X86 platforms. 1048ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// All each platform class should have to do is subclass the constructor, and 1058ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// provide a different disassemblerMode value. 1068ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86GenericDisassembler : public MCDisassembler { 1078ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananprotected: 1088ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// Constructor - Initializes the disassembler. 1098ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// 1108ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// @param mode - The X86 architecture mode to decode for. 111b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloy X86GenericDisassembler(const MCSubtargetInfo &STI, DisassemblerMode mode); 1128ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananpublic: 1138ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan ~X86GenericDisassembler(); 1148ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1158ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// getInstruction - See MCDisassembler. 11683e3f67fb68d497b600da83a62f000fcce7868a9Owen Anderson DecodeStatus getInstruction(MCInst &instr, 11783e3f67fb68d497b600da83a62f000fcce7868a9Owen Anderson uint64_t &size, 11883e3f67fb68d497b600da83a62f000fcce7868a9Owen Anderson const MemoryObject ®ion, 11983e3f67fb68d497b600da83a62f000fcce7868a9Owen Anderson uint64_t address, 12098c5ddabca1debf935a07d14d0cbc9732374bdb8Owen Anderson raw_ostream &vStream, 12198c5ddabca1debf935a07d14d0cbc9732374bdb8Owen Anderson raw_ostream &cStream) const; 1229899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan 1239899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan /// getEDInfo - See MCDisassembler. 1249899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan EDInstInfo *getEDInfo() const; 1258ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananprivate: 1268ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan DisassemblerMode fMode; 1278ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1288ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1298ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86_16Disassembler - 16-bit X86 disassembler. 1308ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86_16Disassembler : public X86GenericDisassembler { 1318ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananpublic: 132b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloy X86_16Disassembler(const MCSubtargetInfo &STI) : 133b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloy X86GenericDisassembler(STI, MODE_16BIT) { 1348ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan } 1358ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1368ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1378ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86_16Disassembler - 32-bit X86 disassembler. 1388ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86_32Disassembler : public X86GenericDisassembler { 1398ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananpublic: 140b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloy X86_32Disassembler(const MCSubtargetInfo &STI) : 141b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloy X86GenericDisassembler(STI, MODE_32BIT) { 1428ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan } 1438ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1448ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1458ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86_16Disassembler - 64-bit X86 disassembler. 1468ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86_64Disassembler : public X86GenericDisassembler { 1478ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananpublic: 148b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloy X86_64Disassembler(const MCSubtargetInfo &STI) : 149b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloy X86GenericDisassembler(STI, MODE_64BIT) { 1508ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan } 1518ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1528ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1538ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan} // namespace X86Disassembler 1548ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1558ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan} // namespace llvm 1568ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1578ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#endif 158