Register File Exploration of Embedded Processor

Master Thesis Project Report Register File Exploration of Embedded Processor Chaitanya Cherukuri Lund University, Faculty of Engineering(LTH) Department of Electrical and Information Technology SE-221 00 Lund, Sweden Abstract Low power design is becoming an integral part of embedded system design with the profusion of mobile devices. Increasing computation requirements for audio/video applications in portable devices implies the need for long life battery devices and reduced energy dissipation. As battery improve- ment techniques are not able to satisfactorily address the growing energy requirements of processors, it is important to devise new low power architectural techniques to reduce microprocessor power consumption. In current embedded system processors, multi-ported register files are one of the power hungary parts of the processor. Registers play an important role in performance and power consumption of a processor. It consumes more power even when clustering techniques are applied. In this thesis I explained about IMEC technology called Very Wide Register file architecture, which has single ported cells and asymmetric interfaces to the memory and to the datapath. In this report I have also explained about the tool LISATek. LISATek is an automated embedded processor design and optimization environment that slashes months from processor hardware design time and engineer-years from the creation of processor-specific software development tools. The key to LISATeks automation is its Language for Instruction Set Architectures, LISA 2.0. i Acknowledgments This work would not have been completed without help and support of many individuals. I owe a debt of gratitude to everyone who had spent their time and effort along the way which helped me in successfull completion of the thesis work. I would like to express my deep and sincere gratitude to my supervisors in Interuniversity MicroElectronics Center(IMEC), Mr.Praveen Raghavan and Dr. Murali Jayapala. Their wide knowledge and their logical way of thinking has been of great value for me. Their understanding, encouraging and personal guidance has provided a good basis for my thesis. Next, I wish to express my warm and sincere thanks to my supervisor in Lund Tekniska Hogskola Dr. Viktor Owall,¨ Dr.Peter Nilsson for introducing me to the field of Digital ASIC. I owe my most sincere gratitude to my promoter Professor Francky Catthoor for giving me the oppurtuinity to work with Architecture Compiler Team at Interuniversity MicroElectronics Center(IMEC), Leuven, Belgium. I wish to extend my warmest thanks to all those who have helped me with my work in Interuniversity MicroElectronics Center(IMEC). Special thanks to my friend Narasinga Rao Miniskar for his encouragement. Last, I want to thank my parents without whom I would never have been able to achieve so much. Chaitanya Cherukuri Lund, 2008 ii Contents Abstract i Acknowledgments ii 1 INTRODUCTION 1 1.1 What is Register ? . .2 1.2 LISATek . .5 2 very wide register 7 2.1 Introduction . .7 2.2 very wide register . .8 2.3 Architecture Description . 10 2.3.1 Memory Design . 11 2.3.2 Foreground Memory Organization . 11 2.3.3 very wide register and Datapath Connectivity . 12 2.4 Example Operation of very wide register . 13 2.5 Conclusion . 15 3 Implementation of Very Wide Register 16 3.1 Introduction . 16 3.2 Baseline Reference RISC Architecture Description with Stan- dard Register File . 16 3.2.1 The Register Module . 18 3.2.2 The Memory Module . 19 3.2.3 The Pipeline Module . 21 iii 3.3 Architecture Description-Very Wide Registers . 21 3.3.1 Register Module . 22 3.3.2 Memory Module . 23 3.4 Conclusion . 24 4 Results 26 4.1 Tool Flow . 27 4.2 Implementation of Benchmark . 27 4.3 Simulation Results . 29 4.3.1 Results-LISATek Processor Debugger . 29 4.3.2 Results-Modelsim . 30 4.4 Synthesis Results . 30 4.5 Conclusion . 32 5 Conclusion and Future Work 35 5.1 Further Work . 35 A Lisatek 36 A.1 Introduction . 36 A.2 LISATek tools . 37 A.2.1 LISATek Processor Designer . 39 A.2.2 Generating the model . 40 A.2.3 LISATek Processor Debugger . 41 A.2.4 LISATek Processor Generator . 42 B Assembly Code 46 B.1 Assembly code using Very Wide Register . 46 B.2 Assembly code using Scalar Register File . 49 Bibliography 52 iv List of Figures 1.1 Organization of multiported register file . .4 1.2 Power consumption of processor . .5 1.3 Overview of LISATek Processor Designer . .6 2.1 Conceptual model of partitioned Register file . .8 2.2 very wide register Organization . .9 2.3 very wide register and Scalar Register file connectivity to the Datapath . 12 3.1 RISC Architecture . 17 3.2 LTRISC32ca Processor Structure . 18 3.3 Four Stage Pipeline . 22 3.4 RISC Architecture using VWR . 25 4.1 Tool Flow . 28 4.2 Simulation in Modelsim . 31 4.3 Execution time of an application . 32 4.4 Power Consumption of VWR RISC Architecture . 33 4.5 Power Consumption of Reference RISC Architecture . 33 4.6 Energy Consumption of Register File . 34 4.7 Energy Consumption of Processor . 34 A.1 LISATek Design flow . 38 A.2 Example OAT Domain . 43 A.3 Screenshot of Processor debugger . 43 A.4 Screenshot of Processor Generator . 45 v CHAPTER 1 INTRODUCTION Future notebook computers and hand-held portable multimedia wireless devices such as palm-pilots and wireless-phones which support new multimedia and wireless communications standards with a high computational complex- ity are becoming a major part of our daily life for personal management and communication functions. The cost of such human-machine interface applications closely depends on the combination of performance and energy efficiency of a processor in order to provide long battery life, with less energy dissipation. Current state of the art processors are facing several bottlenecks that prevent the required combination of performance and energy efficiency. Therefore designing a processor with both required combination of the performance and energy efficiency is one of the major challenges the designer. As battery improvements are not able to satisfy the growing energy requirements of the processors, it is important to devise new low power architectural techniques to reduce the microprocessor power consumption. 1 1.1 What is Register ? Registers play an important role in performance and energy consumption of a processor. Register can be defined as a container to hold the data which is not a memory. Todays architectures have converged on a view of architectural state that distinguish between two types of storage: memory and registers. Historically registers were fast, small, expensive, on chip storage areas connected to specific pieces of on chip-logic. Memory was slow, cheap, big, off chip and addressable. Recent technologies break this abstractions, but overall these characteristics remain the same. Modern implemenatations are designed around fast on-chip operations and relatively slower memory operations. Most, but not all modern computer architectures operate upon the principle of moving data from main memory into registers, operating on them, are often accessed repeatdely and holdng these frequently used values in registers improves peformance. Processor registers are at the top of the memory hierarchy, and provide the fastest way for a CPU to access data. Registers are normally measured by the number of bits they can hold, for example, an "8-bit register" or a "32-bit register". A processor often contains several kinds of registers: • Data registers are used to hold data such as numeric values such as integer and floating-point values. • Address registers hold addresses and are used by instructions that indirectly access memory. • Conditional registers hold truth values often used to determine whether some instruction should or should not be executed. • General purpose registers (GPRs) can store both data and addresses, i.e., they are combined Data/Address registers. Typically bit cell is the basic block of a register, bit cells group togeather to form a individual regsiter and registers group togeather to form a register file and is used to store large amount of data in order to reduce frequent 2 access of memory, which may help to increase the performance of a processor. The performance of the processor can be increased by organizing the registers in a homogenous set called register file. Apart from reducing the access to the memory, performance of a processor can be increased by per- forming multiple operations per cycle, which can be done by increasing the ports of a register file ,i.e, multiported Register file. Multiported Register files are mainly used in processors where multiple instructions are issued in the same cycle. Apart from the performance there are some negative impacts of using Multi- ported Register files. Assume a Register file is organized in a two dimensional grid structure where horizontal lines represent control path and vertical lines represent Data path.In a single port Register file the vertical path indicates the bit positions within the word and horizontal path indicates a Register from the Register file.The sample grid structure is shown in the Figure 1.1. In this case the vertical datapaths represent the bit positions in a word and horizontal control path selects a single register out of a register file. In a Multiported Register files we need more additional control and data lines to access the individual register. Figure 1.1 shows the area of the register file grows with the square of number of ports. The read access time of a register file grows approximately linearly with the number of ports, which has a negative impact on the overall cycle time. As the number of ports increases the internal bit cell loading becomes larger, larger area of Register file causes longer wire delays.

Load more