Codbt: a Multi-Source Dynamic Binary Translator Using Hardwareâ

Journal of Systems Architecture 56 (2010) 500–508 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc CoDBT: A multi-source dynamic binary translator using hardware–software collaborative techniques Haibing Guan, Bo Liu *, Zhengwei Qi, Yindong Yang, Hongbo Yang, Alei Liang Shanghai Jiao Tong University, Shanghai 200240, China article info abstract Article history: For implementing a dynamic binary translation system, traditional software-based solutions suffer from Received 20 September 2009 significant runtime overhead and are not suitable for extra complex optimization. This paper proposes Received in revised form 24 June 2010 using hardware–software collaboration techniques to create an high efficient dynamic binary translation Accepted 24 July 2010 system, CoDBT, which emulates several heterogeneous ISAs (Instruction Set Architectures) on a host pro- Available online 5 August 2010 cessor without changing to the existing processor. We analyze the major performance bottlenecks via evaluating overhead of a pure software-solution DBT. Guidelines are provided for applying a suitable Keywords: hardware–software partition process to CoDBT, as are algorithms for designing hardware-based binary Dynamic binary translation translator and code cache management. An intermediate instruction set is introduced to make Hardware/software collaboration Multi-source multi-source translation more practicable and scalable. Meantime, a novel runtime profiling strategy is Runtime profiling integrated into the infrastructure to collect program hot spots information to supporting potential future optimizations. The advantages of using co-design as an implementation approach for DBT system are assessed by several SPEC benchmarks. Our results demonstrate that significant performance improve- ments can be achieved with appropriate hardware support choices. CoDBT could be an efficient and cost-effective solution for situations where the usual methods of performance acceleration for dynamic binary translation are inappropriate. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction observed that these hardware support strategies always offer performance advantages over existing software solution, via seeking a Dynamic binary translation (DBT) has many attractive applica- certain software overhead and then replaced it with hardware. tions in computer system designs. For instance, it can be used to However, there have had a little number of successful systems support legacy binary code [1]; support ISA virtualization [2]; en- which serve to entirety-orient. It mean that the designers may ren- able innovative co-designed micro-architectures [3], and many der some functions in hardware and some in software, according to others [4–9]. An adaptable DBT system can also profile program the product design goals and constraints. Different goals and runtime behavior and optimize blocks of frequently executed constraints in future products may result in different hardware– instructions [18]. However, DBT technology also comes with its software partitioning. The best known DBT based on collaborative costs: translation overhead, emulation overhead and potentially technique and entirety-orient is from Crusoe [3]. Compared to the other runtime overheads. A key consideration in designing a DBT prior work, such systems include a number of advanced features to system is the overhead resulting from translation time; any time improve a series of overheads and achieve high performance. spent on translating is time not spent executing the source pro- Unfortunately, these systems were designed to implement a gram. Currently, this is an interesting research topic to obtain in- specific type of dynamic translator for specific architecture with sights for designing systems featuring binary translation. modified micro-architecture and cannot be satisfied with the need Recent researchers have focused on various optimizing algo- of multi-source and flexibility. rithms and hardware acceleration methods to reduce overheads In this paper, as an alternative we present CoDBT (hardware– of specific parts in binary translation path, such as hardware sup- software Collaborative Dynamic Binary Translator). CoDBT crea- port for control transfers [15], source-target binary code memory tively employs hardware support, which is attractive because it management [12] and profile information collection [22].We allows greater flexibility for realizing hardware design innovations, and can offset software overheads. Its purpose is to lead to higher performance DBT than pure software solutions. Besides, due to * Corresponding author. Tel.: +86 21 34205581. defining an set of intermediate representation, make it be possible E-mail addresses: [email protected] (H. Guan), [email protected] (B. Liu), and convenient to support multi-sources instruction on single [email protected] (Z. Qi), [email protected] (Y. Yang), yanghongbo819@sjtu. edu.cn (H. Yang), [email protected] (A. Liang). physic platform. From the view point of design, CoDBT is composed 1383-7621/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2010.07.008 H. Guan et al. / Journal of Systems Architecture 56 (2010) 500–508 501 of a main software partition which is running on the target proces- instrumented instruction in software partition, and a profiling sor, and several hardware partitions for support special functions hardware for updating counter operations and collecting suffi- within DBT workflow. The hardware partition is able to execute cient data to provide a foundation for future optimizations. binary translation which is a part of DBT tasks and designed to The remainder of this paper is structured as follows. Section 2 run in conjunction with a host processor. The hardware partition describes the architecture of CoDBT. Section 3 discusses the is implemented in a FPGA chip in which a PowerPC processor core implementation of CoDBT in more detail. Section 4 evaluates was embedded, and communicates to the software partition the system performance. Section 5 discusses related work through memory sharing and on-chip bus. The software partition and, Section 6 summarizes our findings. is a complete process-level DBT software which will leave some tasks to executed by hardware. Therefore, the overlap in function- 2. CoDBT ality between the hardware and software partitions allows more flexibility in deciding which functions should run in which parti- 2.1. Overview tion and provides the opportunity for executing in hardware and software in parallel [8]. In the meanwhile, given the feature of mul- CoDBT is a hardware–software collaborative dynamic binary ti-source, CoDBT is able to emulate several heterogeneous ISAs (e.g. translator. The system is composed of three components: (1) the IA-32, MIPS, PowerPC). hardware accelerator, (2) the DBT software application, and (3) CoDBT focus on creating a collaborative DBT framework that is the control program that passes data between the other two com- efficient and systematic enough to be worth using in almost every ponents. In the CoDBT experimental platform, a general processor DBT system. To this end, we make the following contributions: is closely coupled with the hardware accelerator, providing a complete computing environment within a single FPGA chip. Fig. 1 1. Co-design framework for generic dynamic binary translation sys- shows the components of the collaborative framework. The hard- tem ware partition executes a subset of the dynamic binary translation One of our goals is to provide a whole DBT infrastructure based workflow and is designed to run in conjunction with a host proces- on hardware–software collaborative techniques. It is an adap- sor. The hardware partition is implemented by hardware descrip- tive translator that uses a simple basic block translator for ini- tion language Verilog, and communicates with software partition tial code emulation. These initial results indicate that basic through on-chip bus. The overlap in functionality between the block translation overhead is the major component of startup hardware and software allows more flexibility in deciding which overhead, and hot-spot optimization overhead can further exac- functions should run in which partition. erbate execution delays. Then, we propose two hardware mechanisms for reducing the overhead. The first of these hardware 2.2. Dynamic binary translation function features assists is a hardware translation module which is targeted at basic block translation, and the second is a special TCache man- There are three sub-problems that must be solved in determin- agement unit. Using such two mechanisms can reduce the ing the hardware–software partition of CoDBT: r Functional translation time and improve translated code management. clustering: cluster the system functionality into a set of tasks; Through experiment, we demonstrate that with basic hardware s Allocation: allocate the tasks to either hardware or software; support, CoDBT can provide competitive translation perfor- t Scheduling: schedule the allocated tasks to determine timing mance without changing to the host processor. correctness of the partitioned system. These problems are interde- 2. Intermediate representation for multi-source pendent and must be solved simultaneously to determine an opti- We designed CoDBT to adapt easily and inexpensively to mal solution. According to Amdahl’s Law, selecting hardware changes in multiple source machines, including translations

Codbt: a Multi-Source Dynamic Binary Translator Using Hardwareâ

Binary Translation Using Peephole Superoptimizers

Understanding Full Virtualization, Paravirtualization, and Hardware Assist

Specification-Driven Dynamic Binary Translation

Systems, Methods, and Computer Programs for Dynamic Binary Translation in an Interpreter

Binary Translation 1 Abstract Binary Translation Is a Technique Used To

The Evolution of an X86 Virtual Machine Monitor

Paper Describes a Technique for Improving Runtime Perfor- Mance of Statically Translated Programs

18 a Retargetable Static Binary Translator for the ARM Architecture

EECS 470 Lecture 20 Binary Translation

Binary Translation Using Peephole Superoptimizers

Dynamic Binary Translation ∗

Low Overhead Dynamic Binary Translation for ARM