131d157ae1ac2cd9c787dc3c1d28e64c682803844Jia Liu//===-- X86Disassembler.h - Disassembler for x86 and x86_64 -----*- C++ -*-===// 28ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 38ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The LLVM Compiler Infrastructure 48ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 58ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// This file is distributed under the University of Illinois Open Source 68ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// License. See LICENSE.TXT for details. 78ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 88ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan//===----------------------------------------------------------------------===// 98ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 108ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The X86 disassembler is a table-driven disassembler for the 16-, 32-, and 118ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 64-bit X86 instruction sets. The main decode sequence for an assembly 128ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instruction in this disassembler is: 138ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 148ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 1. Read the prefix bytes and determine the attributes of the instruction. 158ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// These attributes, recorded in enum attributeBits 168ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h), form a bitmask. The table CONTEXTS_SYM 178ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// provides a mapping from bitmasks to contexts, which are represented by 188ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// enum InstructionContext (ibid.). 198ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 208ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 2. Read the opcode, and determine what kind of opcode it is. The 218ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// disassembler distinguishes four kinds of opcodes, which are enumerated in 228ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OpcodeType (X86DisassemblerDecoderCommon.h): one-byte (0xnn), two-byte 23c60685e3207518de468b2cf21b470a9cbe3da00aCraig Topper// (0x0f 0xnn), three-byte-38 (0x0f 0x38 0xnn), or three-byte-3a 248ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (0x0f 0x3a 0xnn). Mandatory prefixes are treated as part of the context. 258ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 268ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 3. Depending on the opcode type, look in one of four ClassDecision structures 278ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h). Use the opcode class to determine which 288ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OpcodeDecision (ibid.) to look the opcode in. Look up the opcode, to get 298ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// a ModRMDecision (ibid.). 308ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 318ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 4. Some instructions, such as escape opcodes or extended opcodes, or even 328ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instructions that have ModRM*Reg / ModRM*Mem forms in LLVM, need the 338ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModR/M byte to complete decode. The ModRMDecision's type is an entry from 348ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModRMDecisionType (X86DisassemblerDecoderCommon.h) that indicates if the 358ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// ModR/M byte is required and how to interpret it. 368ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 378ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 5. After resolving the ModRMDecision, the disassembler has a unique ID 388ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// of type InstrUID (X86DisassemblerDecoderCommon.h). Looking this ID up in 398ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// INSTRUCTIONS_SYM yields the name of the instruction and the encodings and 408ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// meanings of its operands. 418ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 428ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 6. For each operand, its encoding is an entry from OperandEncoding 438ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// (X86DisassemblerDecoderCommon.h) and its type is an entry from 448ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// OperandType (ibid.). The encoding indicates how to read it from the 458ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// instruction; the type indicates how to interpret the value once it has 468ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// been read. For example, a register operand could be stored in the R/M 478ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// field of the ModR/M byte, the REG field of the ModR/M byte, or added to 488ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// the main opcode. This is orthogonal from its meaning (an GPR or an XMM 498ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// register, for instance). Given this information, the operands can be 508ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// extracted and interpreted. 518ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 528ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 7. As the last step, the disassembler translates the instruction information 538ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// and operands into a format understandable by the client - in this case, an 548ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// MCInst for use by the MC infrastructure. 558ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 568ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// The disassembler is broken broadly into two parts: the table emitter that 578ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// emits the instruction decode tables discussed above during compilation, and 588ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// the disassembler itself. The table emitter is documented in more detail in 598ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// utils/TableGen/X86DisassemblerEmitter.h. 608ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 618ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86Disassembler.h contains the public interface for the disassembler, 628ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// adhering to the MCDisassembler interface. 638ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86Disassembler.cpp contains the code responsible for step 7, and for 648ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// invoking the decoder to execute steps 1-6. 658ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoderCommon.h contains the definitions needed by both the 668ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// table emitter and the disassembler. 678ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoder.h contains the public interface of the decoder, 688ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// factored out into C for possible use by other projects. 698ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// X86DisassemblerDecoder.c contains the source code of the decoder, which is 708ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// responsible for steps 1-6. 718ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan// 728ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan//===----------------------------------------------------------------------===// 738ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 748ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#ifndef X86DISASSEMBLER_H 758ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#define X86DISASSEMBLER_H 768ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 778ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#include "X86DisassemblerDecoderCommon.h" 788ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#include "llvm/MC/MCDisassembler.h" 798ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 808ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanannamespace llvm { 81c60685e3207518de468b2cf21b470a9cbe3da00aCraig Topper 828ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass MCInst; 83953362cdfbf1088153f65376c86d22ee0176bcdfBenjamin Kramerclass MCInstrInfo; 84b950585cc5a0d665e9accfe5ce490cd269756f2eJames Molloyclass MCSubtargetInfo; 858ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass MemoryObject; 868ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass raw_ostream; 879899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan 888ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanannamespace X86Disassembler { 898ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 908ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// X86GenericDisassembler - Generic disassembler for all X86 platforms. 918ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// All each platform class should have to do is subclass the constructor, and 928ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan/// provide a different disassemblerMode value. 938ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananclass X86GenericDisassembler : public MCDisassembler { 94dce4a407a24b04eebc6a376f8e62b41aaa7b071fStephen Hines std::unique_ptr<const MCInstrInfo> MII; 95224c1b275d34ea32707c1d6f999d82ffabbac06eCraig Topperpublic: 968ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// Constructor - Initializes the disassembler. 978ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// 98dce4a407a24b04eebc6a376f8e62b41aaa7b071fStephen Hines X86GenericDisassembler(const MCSubtargetInfo &STI, MCContext &Ctx, 99dce4a407a24b04eebc6a376f8e62b41aaa7b071fStephen Hines std::unique_ptr<const MCInstrInfo> MII); 100224c1b275d34ea32707c1d6f999d82ffabbac06eCraig Topperpublic: 1018ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1028ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan /// getInstruction - See MCDisassembler. 10336b56886974eae4f9c5ebc96befd3e7bfe5de338Stephen Hines DecodeStatus getInstruction(MCInst &instr, uint64_t &size, 10436b56886974eae4f9c5ebc96befd3e7bfe5de338Stephen Hines const MemoryObject ®ion, uint64_t address, 10598c5ddabca1debf935a07d14d0cbc9732374bdb8Owen Anderson raw_ostream &vStream, 10636b56886974eae4f9c5ebc96befd3e7bfe5de338Stephen Hines raw_ostream &cStream) const override; 1079899f70a7406d632c82849978bf6981f1ee4ccb5Sean Callanan 1088ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callananprivate: 1098ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan DisassemblerMode fMode; 1108ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan}; 1118ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan 1128ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan} // namespace X86Disassembler 113224c1b275d34ea32707c1d6f999d82ffabbac06eCraig Topper 1148ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan} // namespace llvm 115224c1b275d34ea32707c1d6f999d82ffabbac06eCraig Topper 1168ed9f51663bc5533f36ca62e5668ae08e9a1313fSean Callanan#endif 117