A Performance Study of the Acorn RISC Machine

A Performance Study of the Acorn RISC Machine A thesis submitted in partial fulfilment of the requirements for the Degree of Master of Science in Computer Science in the University of Canterbury by David Vivian Jaggar University of Canterbury 1990 Abstract The design decisions behind the development of the Acorn RISC Machine (ARM) are investigated, by implementing the architecture with a software emulator, to record the effectiveness of the unusual architectural features that make the ARM architecture unique. The adaption of an existing compiler construction tool (the Amsterdam Compiler Kit) has demonstrated that an optimising compiler can exploit the RISC architecture to maximize CPU performance. By compiling high level language algorithms, a complete picture of the effectiveness of the ARM architecture to support high performance computing is formed. -2- Contents Abstract ....................................................................................... 2 Contents .......................................................................................3 Figures and Tables ........................................................................ 5 Computer Design .......................................................................... 7 The Central Processing Unit ................................................... 7 The Instruction Set ................................................................ 8 Microcode ............................................................................ 12 Registers ............................................................................. 13 The Memory ........................................................................ 14 Advancing Technology ......................................................... 21 RISC Architectures ..................................................................... 24 Improving Performance ....................................................... 26 Cycles per Instruction .......................................................... 26 Time per Cycle ..................................................................... 29 Instructions per Task ........................................................... 30 RISC Development ............................................................... 32 Commercial RISC Designs ................................................... 35 Other Commercial RISC Architectures .................................. 43 The Acorn RISC Machine ............................................................ 45 Architecture Characteristics ................................................. 48 The Impact on Performance .................................................. 00 Evaluating an architecture .................................................... 63 Computer Software ..................................................................... 64: The Operating System .......................................................... 64: Compilers ........................................................................... 65 Application Programs .......................................................... 74 An Optimising Compiler for ARM ................................................ 76 -3- Compiler Building Tools ....................................................... 76 EM code and the Code Generator Generator ............................ 00 Global and Peephole Optimisers ............................................ 89 The Assembler and Linker .................................................... 89 The After-Burner Optimiser ................................................. 00 Register Allocation ............................................................... 92 Compiler Validation ............................................................. 95 Evaluating an Architecture .......................................................... 96 Architectural Features ......................................................... 96 Measuring the Quality of an Architecture ............................... 00 Architectural Emulation ..................................................... 100 Emulator Validation and Performance .................................. 103 The Quality of the ARM Architecture ........................................... 105 Compiler Performance ........................................................ 106 Architecture Performance ................................................... 109 Instruction Usage ............................................................... 110 Branch and Conditional Instruction Utilisation ...................... 115 Memory Accessing Instructions ........................................... 118 Cache Effectiveness ............................................................. 119 Improving the ARM architecture ......................................... 120 Conclusion ................................................................................ 127 Acknowledgements .................................................................... 130 Bibliography .............................................................................. 131 ARM 3 Instruction Set Format .................................................... 136 EM Instruction Set ..................................................................... 139 Floating Point Accelerator Instruction Set .................................... 145 -4- Figures and Tables Figure 1: A Simple CPU ......................................................................8 Figure 2: An Instruction Pipeline ...................................................... 'Z7 Figure 3: SPARC Register Window Layout .......................................... 38 Figure 4: ARM Register Layout ......................................................... 52 Figure 5: ARM Shift Operations ......................................................... 53 Figure 6: Compiler Stages ................................................................. fJ7 Figure 7: ARM Signed Divide Routine ................................................ f57 Figure 8: Emulator Execution Breakdown .......................................... 104 Figure 9: Acorn and ACK Compiler Performance ............................... 107 Figure 10: Relative Instruction Usage ............................................... 111 Figure 11: Data Processing Instructions ............................................ 113 Figure 12: Data Processing Operands ................................................ 114 Figure 13: Data Processing Immediate Operands ............................... 116 Figure 14: Conditional Non-Branch Instructions ............................... 117 Figure 15: Addressing Modes ........................................................... 119 Figure 16: Cache Strategies .............................................................. 12A Table 1: A Simple Instruction Set ....................................................... 10 Table 2: Code Size for CISC (MC68020) and RISC (SPARC) .................... 31 Table 3: Relative Frequency of High Level Language Statements ............ 36 Table 4: ARM Instruction Set ............................................................ 46 Table 5: Architectural Benchmarks .................................................. 105 -5- To my Mother and Father -6- Chapter 1 Computer Design The performance of a computer system is measured by the time that it takes to execute programs, the shorter the elapsed time the higher the performance rating. To maximise the performance a designer must find ways to match the performance of each component in the computer, to yield a balanced system. As technology changes and new discoveries are made, different parts of a computer become the performance bottle-neck. The architecture of a computer defines the major attributes of the design. The number of registers, their layout, the instructions and addressing modes that the computer understands are all part of the architecture, while the number of clock cycles taken to execute each instruction, the type of transistor logic used to build the CPU, and the layout of memory are part of the implementation. The Central Processing Unit The Central Processing Unit (CPU), as shown in Figure 1, is the part of the computer that executes instructions. The CPU is composed of a number of specialized functional units (for example the Arithmetic Logic Unit, or ALU). These functional units are controlled by the Instruction Decoder, which activates the necessary sections of each unit to carry out the operation specified by each instruction. Each functional unit is connected by data paths, along which data, parts of decoded instructions and internal control information flows. -7- A Bus BBus Figure 1: A Simple CPU The Instruction Set The instruction set of a computer (the machine code) is the language that is used to directly program the CPU. The instruction sets of computers are significant factors in the overall price/performance of the machine. Each instruction of the CPU must be implemented using digital logic that must be custom designed (using "microprogramming" to replace some logic by sacrificing performance is discussed later). This "custom silicon" is extremely labour intensive to construct, so that a large and/or complex instruction set is very expensive to implement in hardware. Emulating some instructions with software is less expensive, but lowers the performance of the computer, due to the inefficiency of using combinations of the existing instructions to emulate missing instructions. Whilst machines exist whose instruction set is tailored for

Load more