Dynamic Re-Compilation of Binary RISC Code for CISC Architectures
Total Page:16
File Type:pdf, Size:1020Kb
d d d d d d d d d d d d d d d d d d d d Technische Universitat¨ Munchen¨ Institut fur¨ Informatik Diplomarbeit Dynamic Re-compilation of Binary RISC Code for CISC Architectures Verfasser: Michael Steil Betreuer: Dr. Georg Acher Aufgabensteller: Prof. Dr. A.Bode Abgabetermin: 15. September 2004 Ich versichere, daß ich diese Diplomarbeit selbstandig¨ verfaßt und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Munchen,¨ den 15.09.2004 Abstract This thesis describes a dynamic binary translation system that has the following features: • RISC (PowerPC) machine code is translated into CISC (i386) code. Special prob- lems caused by this combination, such as register allocation and condition code con- version, are addressed. • A modified hotspot method is used to do recompilation in two translation passes in order to gain speed. • The first translation pass is optimized for maximum translation speed and designed to be only slightly slower than one interpretive run. • The system optimizes common cases found in compiled user mode code, such as certain stack operations. The idea of this system is not to develop a high-level decompiler/compiler combination, but to do translation on a very low level and on an instruction basis. The recompiler that has been developed contains • recompiler technology that only needs about 25% more time per instruction for trans- lation compared to a single interpretation, and produces code that is only 3 to 4 times slower than native code. • recompiler technology that only needs about double the time per instruction for trans- lation compared to a single interpretation, and produces code that only takes twice as long for execution as native code. • recompiler technology that, without using intermediate code, optimizes full functions and allocates registers dynamically; and achieves speed close to that of native code. I would like to thank Georg Acher (supervision), Christian Hessmann (discussions, proofreading), Melissa Mears (discussions, proofreading), Axel Auweter (discussions, support code). Daniel Lehmann (discussions, proofreading), Tobias Bratfisch (support code), Sebastian Biallas (discussions), Costis (discussions), Alexan- der Mayer (proofreading) and Wolfgang Zehtner (proofreading) for their support. Contents 1 Motivation 13 2 Introduction to Emulation, RISC and CISC 17 2.1 Introduction to Emulation and Recompilation . 17 2.1.1 CPU Emulation Accuracy . 19 2.1.2 System Emulation vs. User Mode/API Emulation . 20 2.1.2.1 System Emulation . 21 2.1.2.2 User Mode & API Emulation . 21 2.1.2.3 Advantages of the Respective Concepts . 22 2.1.3 Interpretation and Recompilation . 24 2.1.3.1 Interpretation . 24 2.1.3.2 Recompilation . 26 2.1.4 Static Recompilation vs. Dynamic Recompilation . 29 2.1.4.1 Static Recompilation . 29 2.1.4.2 Dynamic Recompilation . 31 2.1.5 Hotspot: Interpretation/Recompilation . 34 2.1.6 Liveness Analysis and Register Allocation . 35 2.1.6.1 Liveness Analysis . 36 2.1.6.2 Register Allocation . 38 2.2 Endianness . 40 2.3 Intel i386 . 41 2.3.1 CISC . 42 2.3.2 History . 43 2.3.3 AT&T and Intel Syntax . 46 2.3.4 Registers . 46 2.3.5 Addressing Modes . 47 2.3.5.1 Characteristics . 48 2.3.5.2 r/m Addressing . 48 2.3.6 Instruction Set . 49 2.3.7 Instruction Encoding . 50 2.3.8 Endianness . 50 2.3.9 Stack Frames and Calling Conventions . 50 2.4 PowerPC . 52 2.4.1 RISC . 52 2.4.2 History . 53 5 6 CONTENTS 2.4.3 Registers . 54 2.4.4 Instruction Set and Addressing Modes . 54 2.4.5 Instruction Encoding . 55 2.4.6 Endianness . 55 2.4.7 Stack Frames and Calling Conventions . 56 2.4.8 Unique PowerPC Characteristics . 57 3 Design 59 3.1 Differences between PowerPC and i386 . 59 3.1.1 Modern RISC and CISC CPUs . 59 3.1.2 Problems of RISC/CISC Recompilation . 60 3.2 Objectives and Design Fundamentals . 62 3.2.1 Possible Objectives . 62 3.2.2 How to Reconcile all Aims . 62 3.3 Design Details . 63 3.3.1 The Hotspot Principle Revisited . 64 3.3.2 Register Mapping . 68 3.3.2.1 Simple Candidates . 68 3.3.2.2 Details of Static Mapping . 74 3.3.3 Condition Code Mapping . 81 3.3.3.1 Parity Flag . 82 3.3.3.2 Conversion to Signed Using a Table . 83 3.3.3.3 Conversion to PowerPC Format . 84 3.3.3.4 Intermediate i386 Flags . 84 3.3.3.5 Memory Access Optimization . 84 3.3.3.6 The Final Code . 85 3.3.3.7 Compatibility Issues . 86 3.3.4 Endianness . 87 3.3.4.1 Do Nothing . 87 3.3.4.2 Byte Swap . 87 3.3.4.3 Swapped Memory . 89 3.3.4.4 Conclusion . 92 3.3.5 Instruction Recompiler . 93 3.3.5.1 Dispatcher . 93 3.3.5.2 Decoder . 93 3.3.5.3 i386 Converter . 94 3.3.5.4 Instruction Encoder . 95 3.3.5.5 Speed Considerations . 95 3.3.6 Basic Block Logic . 95 3.3.6.1 Basic Blocks . 95 3.3.6.2 Basic Block Cache . 96 3.3.6.3 Control Flow Instructions . 96 3.3.6.4 Basic Block Linking . 97 3.3.6.5 Interpreter Fallback . 98 3.3.7 Environment . 99 CONTENTS 7 3.3.7.1 Loader . 99 3.3.7.2 Memory Environment . 99 3.3.7.3 Disassembler . 100 3.3.7.4 Execution . 100 3.3.8 Pass 2 Design . 100 3.3.8.1 Register Mapping Problem . 101 3.3.8.2 Condition Code Optimization . 101 3.3.8.3 Link Register Inefficiency . 102 3.3.8.4 Intermediate Code . 102 3.3.8.5 Pass 2 Design Overview . 105 3.3.9 Dynamic Register Allocation . 105 3.3.9.1 Design Overview . 105 3.3.9.2 Finding Functions . 106 3.3.9.3 Gathering use/def/pred/succ Information . 106 3.3.9.4 Register Allocation . 107 3.3.9.5 Signature Reconstruction . 107 3.3.9.6 Retranslation . 108 3.3.9.7 Address Backpatching . 109 3.3.9.8 Condition Codes . 109 3.3.10 Function Call Convertion . 110 3.3.10.1 ”call” and ”ret” . 110 3.3.10.2 Stack Frame Size . 111 3.3.10.3 ”lr” Sequence Elimination . 111 4 Some Implementation Details 113 4.1 Objectives and Concepts of the Implementation . 113 4.2 Loader . 114 4.3 Disassembler . 114 4.4 Interpreter . 114 4.5 Basic Block Cache . 115 4.6 Basic Block Linking . 115 4.7 Instruction Recompiler . 116 4.7.1 Dispatcher . 116 4.7.2 Decoder . 116 4.7.3 Register Mapping . 117 4.7.4 i386 Converter . 118 4.7.5 Instruction Encoder . 119 4.7.6 Speed of the Translation in Four Steps . 120 4.7.7 Execution . 120 4.8 Dynamic Register Allocation . 121 5 Effective Code Quality and Code Speed 123 5.1 Pass 1 . 123 5.1.1 Code Quality . 124 5.1.2 Speed of the Recompiled Code . 128 8 CONTENTS 5.2 Pass 2 . ..