SCALP: a Superscalar Asynchronous Low-Power Processor Philip Brian

217

Copy Link

SCALP: A Superscalar Asynchronous Low-Power Processor A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science and Engineering Philip Brian Endecott Department of Computer Science 1996 Contents Contents . 2 List of Figures. 6 List of Tables . 8 Abstract. 10 Declaration . 12 Copyright and Intellectual Property Rights . 12 Acknowledgements. 13 The Author . 13 Chapter 1: SCALP: A Superscalar Asynchronous Low-Power Processor. 14 1.1 The Importance of Power Efficiency . 15 1.2 Low Power Design Techniques . 16 1.2.1 Low-Level Power Efficiency . 16 Process Technology. 16 Transistor Level Design . 17 Charge Recovery and Adiabatic Systems . 17 Logic Optimisation . 18 Supply Voltage Adjustment . 18 Parallelism. 19 Precomputation . 19 Clocking Schemes and Asynchronous Logic . 19 1.2.2 Higher-Level Power Efficiency: Microprocessors . 20 External Memory Bandwidth . 20 Cache Characteristics . 20 Datapath Arrangement . 21 1.3 Previous Low Power Processors. 21 1.4 Overview of the Thesis. 22 Chapter 2: Reducing Transitions . 24 2.1 Code Density . 24 2.1.1 Variable Length Instructions . 27 2.1.2 Register Specifiers . 29 2.1.3 Previous High Code Density Processors . 31 D16 . 31 Thumb . 32 Transputer . 32 2.2 Datapath Activity . 33 Chapter 3: Parallelism . 35 3.1 Parallelism and Power . 35 3.2 Power, Parallelism and Cost . 37 3.3 Parallelism in Special Purpose Processors . 38 3.4 Parallelism in General Purpose Processors. 39 3.5 Pipelining . 40 3.5.1 Branch Instructions. 43 3.5.2 Dependencies . 45 3.5.3 Different Pipeline Arrangements . 48 - 2 - Shorter Pipelines . 48 Longer Pipelines . 48 3.6 Superscalar Parallelism. 49 3.6.1 Dependencies . 50 3.6.2 Branch Instructions. 54 3.6.3 Observations . 54 3.6.4 Simpler forms of Multiple Functional Unit Parallelism . 56 3.7 Speculative Execution . 57 Chapter 4: Asynchronous Logic. 59 4.1 Asynchronous Logic and Power Efficiency . 59 4.1.1 Asynchronous Circuits can be Faster . 59 4.1.2 Asynchronous Circuits can be Smaller. 61 4.1.3 Asynchronous Logic in Variable Demand Systems . 62 4.1.4 Asynchronous Logic and Dynamic Supply Voltage Adjustment. 63 4.1.5 Asynchronous Logic Doesn’t Waste Transitions . 64 4.1.6 Other benefits of Asynchronous Logic. 65 4.1.7 Tangram and the Philips DCC Error Corrector . 66 4.1.8 Conclusion . 67 4.2 Asynchronous Pipelines . 67 4.2.1 Forwarding in Asynchronous Pipelines . 69 4.2.2 Conditional Forwarding . 71 4.2.3 Explicit Forwarding . 73 4.3 Asynchronous Superscalar Parallelism. 74 4.3.1 Simple Superscalar Structures . 74 4.3.2 Superscalar Forwarding . 76 4.4 Previous Asynchronous Processors . 77 The Caltech Asynchronous Processor. 77 AMULET1 and AMULET2 . 79 NSR . 80 Fred . 81 Counterflow Pipeline Processor . 82 Hades. 83 ECSTAC . 83 TITAC . 84 ST_RISC . 84 FAM . 84 STRiP . 84 4.4.1 Observations . 85 Chapter 5: The SCALP Architecture . 87 5.1 Explicit Forwarding and Non-Global Registers . 88 5.2 Deadlocks and Determinism. 94 5.3 Other SCALP Features . 99 5.4 A SCALP Instruction Set . 100 5.5 The SCALP Programming Model . 104 5.6 Interrupts and Exceptions . 115 5.7 Related Work . 117 5.7.1 Stack Processors . 117 5.7.2 Architectural Queues . 117 - 3 - 5.7.3 Explicit Forwarding . 118 Chapter 6: Implementation . 120 6.1 Asynchronous Design Styles . 120 6.1.1 Timing Models for Control Circuits . 122 6.1.2 Timing Models for Datapaths. 122 6.1.3 Asynchronous Signalling Protocols . 123 6.2 Implementation Methodology . 125 6.3 Overview of the Design . 129 Memory Interface . 129 Fetcher. 129 FIG Fifo. 131 Issuer . ..

Recommended publications

Rlsc & DSP Advanced Microprocessor System Design

Purdue University Purdue e-Pubs ECE Technical Reports Electrical and Computer Engineering 3-1-1992 RlSC & DSP Advanced Microprocessor System Design; Sample Projects, Fall 1991 John E. Fredine Purdue University, School of Electrical Engineering Dennis L. Goeckel Purdue University, School of Electrical Engineering David G. Meyer Purdue University, School of Electrical Engineering Stuart E. Sailer Purdue University, School of Electrical Engineering Glenn E. Schmottlach Purdue University, School of Electrical Engineering Follow this and additional works at: http://docs.lib.purdue.edu/ecetr Fredine, John E.; Goeckel, Dennis L.; Meyer, David G.; Sailer, Stuart E.; and Schmottlach, Glenn E., "RlSC & DSP Advanced Microprocessor System Design; Sample Projects, Fall 1991" (1992). ECE Technical Reports. Paper 302. http://docs.lib.purdue.edu/ecetr/302 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. RISC & DSP Advanced Microprocessor System Design Sample Projects, Fall 1991 John E. Fredine Dennis L. Goeckel David G. Meyer Stuart E. Sailer Glenn E. Schmottlach TR-EE 92- 11 March 1992 School of Electrical Engineering Purdue University West Lafayette, Indiana 47907 RlSC & DSP Advanced Microprocessor System Design Sample Projects, Fall 1991 John E. Fredine Dennis L. Goeckel David G. Meyer Stuart E. Sailer Glenn E. Schrnottlach School of Electrical Engineering Purdue University West Lafayette, Indiana 47907 Table of Contents Abstract ...................................................................................................................
Ronics APRIL 28, 1988

DATA GENERAL'S 88000 RISC CHIPS WILL HIT 100 MIPS/32 SMART-POWER IC SET ALL BUT ELIMINATES CIRCUIT DESIGN/93 A VNU PUBLICATION t• APRIL 28, 1988 ronics A CLOSE LOOK AT MOTOROLA'S 88000/75 CHOOSING A RISC CHIP: WHAT DRIVES CUSTOMERS?/85 This year,you'll hear alot of claims that "systems"design automation has arrived. database and user interface? This 32-bit Does it extend from design processor board At Mentor Grapimics/ definition through to was designed and PCB simulated on layout and output to manufacturing? Mentor Graphics we know better workstations by • Do you have more ASIC libraries Sequent Com- supported on your workstation than puter Systems for And so do our any other EDA vendor? Can you its multi-pro- cessor Symmetry include ASICs in board simulations? computer system. It contains over customers. Are your tools capable of manag- 175 IC compo- ing over 1000-page product docu- nents including They preach. We practice. 80386 pro- mentation projects from start to cessors, a14,000- Skeptical about "systems" elec- finish? gate standard cell and two 10,000- tronic design automation? Have you integrated mechanical gate arrays. You should be. Because in many packaging and analysis into the elec- cases, it's atriumph of form over tronic design and layout process? reason. Over 70% are repeat cus- content. In the end, there's only one Anything less than aperfect score tomers who've realized genuine practical yardstick for evaluating a is atotal loss. And aperfect score value added from our products and systems design solution. And that's does not mean just acheck in every seek to expand their competitive how many successful products it has box.
Comparative Architectures

Comparative Architectures CST Part II, 16 lectures Lent Term 2005 Ian Pratt [email protected] Course Outline 1. Comparing Implementations Developments fabrication technology • Cost, power, performance, compatibility • Benchmarking • 2. Instruction Set Architecture (ISA) Classic CISC and RISC traits • ISA evolution • 3. Microarchitecture Pipelining • Super-scalar • { static & out-of-order Multi-threading • Eﬀects of ISA on µarchitecture and vice versa • 4. Memory System Architecture Memory Hierarchy • 5. Multi-processor systems Cache coherent and message passing • Understanding design tradeoﬀs 2 Reading material OHP slides, articles • Recommended Book: • John Hennessy & David Patterson, Computer Architecture: a Quantitative Approach (3rd ed.) 2002 Morgan Kaufmann MIT Open Courseware: • 6.823 Computer System Architecture, by Krste Asanovic The Web • http://bwrc.eecs.berkeley.edu/CIC/ http://www.chip-architect.com/ http://www.geek.com/procspec/procspec.htm http://www.realworldtech.com/ http://www.anandtech.com/ http://www.arstechnica.com/ http://open.specbench.org/ comp.arch News Group • 3 Further Reading and Reference M Johnson • Superscalar microprocessor design 1991 Prentice-Hall P Markstein • IA-64 and Elementary Functions 2000 Prentice-Hall A Tannenbaum, • Structured Computer Organization (2nd ed.) 1990 Prentice-Hall A Someren & C Atack, • The ARM RISC Chip, 1994 Addison-Wesley R Sites, • Alpha Architecture Reference Manual, 1992 Digital Press G Kane & J Heinrich, • MIPS RISC Architecture 1992 Prentice-Hall H Messmer, • The Indispensable
Unit 1 Evolution of the Microprocessor Structure 1.1 Introduction

Subject Code : DEL34 Subject Title : Microprocessor Structure of the Course Content BLOCK 1 Introduction Unit 1: Evolution of Microprocessors Unit 2: Advantages of Microprocessors Unit 3: Various MPU Families (SSI, LSI, VLSI, SLSI) BLOCK 2 8085 Unit 1: Introduction Unit 2: Architecture of 8085 Unit 3: Block and Pin Diagram of 80851 and it’s functions Unit 4: BUS Details BLOCK 3 8085 Programming Unit 1: Instruction formats & Addressing Modes Unit 2: Instruction Set and It’s Cycle Unit 3: Timing Diagrams and Status Signals Unit 4: Simple Programs BLOCK 4 8085 Interfacing Unit 1: Memory mapping Unit 2: Interrupts Unit 3: I/O Peripheral Interfacing BLOCK 5 16 bit Microprocessor Unit 1: Introduction to 8086 Unit 2: Architecture of 8086 Unit 3: Block and Pin Diagram of 8086 and it’s functions Unit 4: BUS Details Books : 1. 8085 Microprocessor by Ramesh gaonkar by Penram Publishers 2. 8086 Microprocessor by Douglas hall Unit 1 Evolution of the Microprocessor Structure 1.1 Introduction 1.2 Objectives 1.3 The Breakthrough in Microprocessors 1.4 What led to the development of microprocessors? 1.5 How a microprocessor works 1.6 Archictecture of a microprocessor 1.7 Generation of microprocessors 1.8 Companies associated with microprocessors 1.9 Microprocessors Today 1.10 Where is the industry of microprocessors going? 1.11 Summary 1.12 Keywords 1.13 Exercise 1.1 Introduction The Collegiate Webster dictionary describes microprocessor as a computer processor contained on an integrated-circuit chip. In the mid-seventies, a microprocessor was defined as a central processing unit (CPU) realized on a LSI (large-scale integration) chip, operating at a clock frequency of 1 to 5 MHz and constituting an 8-bit system.
LCSH Section Numerals

0 (Group of artists) 1c Magenta (Stamp) 2e children USE Zero (Group of artists) USE British Guiana One-Cent Magenta (Stamp) USE Twice-exceptional children 0⁰ latitude 1I (Interstellar object) 2nd Avenue (Manhattan, New York, N.Y.) USE Equator USE ʻOumuamua (Interstellar object) USE Second Avenue (Manhattan, New York, N.Y.) 0⁰ meridian 1I/2017 U1 (Interstellar object) 2nd Avenue (Seattle, Wash.) USE Prime Meridian USE ʻOumuamua (Interstellar object) USE Second Avenue (Seattle, Wash.) 0-1 Bird Dog (Reconnaissance aircraft) 1I/ʻOumuamua (Interstellar object) 2nd Avenue West (Seattle, Wash.) USE Bird Dog (Reconnaissance aircraft) USE ʻOumuamua (Interstellar object) USE Second Avenue West (Seattle, Wash.) 0th law of thermodynamics 1P/ Halley (Comet) 2nd law of thermodynamics USE Zeroth law of thermodynamics USE Halley's comet USE Second law of thermodynamics 1,000 Year Monument (Novgorod, Russia) 1st Avenue (Seattle, Wash.) 2P/Encke (Comet) USE Tysi︠a︡cheletie Rossii (Novgorod, Russia) USE First Avenue (Seattle, Wash.) USE Encke comet 1,4-beta-D-glucan cellobiohydrolase 1st Avenue West (Seattle, Wash.) 2U 2030+40 (Astronomy) USE Cellulose 1,4-beta-cellobiosidase USE First Avenue West (Seattle, Wash.) USE Cygnus X-3 1 1/2 Strutter (Military aircraft) 1st century, A.D. 3-(1-piperazino)benzotrifluoride USE Sopwith 1 1/2 Strutter (Military aircraft) USE First century, A.D. USE Trifluoromethylphenylpiperazine 1-2 Montague Place (London, England) 1st Hill Park (Seattle, Wash.) 3.1 Tongnip Sŏnŏn Kinyŏmtʻap (Seoul, Korea) BT Office buildings—England
Lecture-I an Overview of Microprocessor the First

Lecture-I An Overview of Microprocessor The first question comes in a mind "What is a microprocessor?”. Let us start with a more familiar term computer. A digital computer is an electronic machine capable of quickly performing a wide variety of tasks. They can be used to compile, correlate, sort, merge and store data as well as perform calculations. A digital computer is different from a general purpose calculator in that it is capable of operating according to the instructions that are stored within the computer whereas a calculator must be given instructions on a step by step basis. By the definition a programmable calculator is a computer. Historically, digital computers have been categorized according to the size using the words large, medium, minicomputer and microcomputer. In the early years of development, the emphasis was on large and more powerful computers. Large and medium sized computers were designed to store complex scientific and engineering problems. These computers were accessible and affordable only to large corporations, big universities and government agencies. In the 1960s’ computers were accessible & affordable only to large corporations, big universities & government agencies, In late 1960s, minicomputers were available for use in a office, small collage, medium size business organization, small factory etc. As the technology has advanced from SSI to VLSI & SLSI (very large scale integration & super large scale integration) the face of the computer has changed. It has now become possible to build the control processing unit (CPU) with its related timing functions on a single chip known as microprocessor. A microprocessor combined with memory and input/output devices forms a microcomputer.
Cpus Used in Personal Computers Intel Processors Intel Corporation Is

CPUs Used in Personal Computers Intel Processors Intel Corporation is the largest manufacturer of microchips in the world and is historically the leading provider of PC chips. Intel invented the microprocessor in 1971 with the 4004 model. This led to the first microcomputers in 1975. Intel’s success in this market was not guaranteed until 1981 when IBM released the first IBM PC based on the Intel 8088 processor. Since 1981 all IBM PCs and compatibles based on the IBM design have been created around Intel chips. The 8088 was the first chip used in an IBM PC but they then went back to the 8086 and this design standard was used in all further chips – 80286, 80386, 84086, Pentium, Pentium Pro, Pentium II, Pentium III, Celeron and Xeon – often referred to as the 80x86 line. The basic structural design (architecture) of each chip has become more sophisticated and complex. The 8086 contained only a few thousand transistors whereas the Xeon processors contain 9.5 million. This increasingly complex architecture is primarily responsible for the increasing power and speed of the Intel processor line. Pre-Pentium Processors Intel’s first processors were simple by today’s standards but provided a level of computing never seen before in a single processing chip. As the 8086 à 80286 Intel developed methods for fitting more transistors onto the surface of its chips so processors could handle more instructions per second and access greater amounts of memory. A major improvement that came with the 80386 is called virtual 8086 mode. In this mode a single 80386 chip could achieve the processing power of 16 separate 8086 chips each running a separate copy of the operating system.
The History of the Microprocessor- Autumn 1997

♦ The History of the Microprocessor Michael R. Betker, John S. Fernando, and Shaun P. Whalen Invented in 1971, the microprocessor evolved from the inventions of the transistor (1947) and the integrated circuit (1958). Essentially a computer on a chip, it is the most advanced application of the transistor. The influence of the microprocessor today is well known, but in 1971 the effect the microprocessor would have on every- day life was a vision beyond even those who created it. This paper presents the his- tory of the microprocessor in the context of the technology and applications that drove its continued advancements. Introduction The microprocessor, which evolved from the The history of the microprocessor can be divided inventions of the transistor and the integrated circuit into five stages: (IC), is today an icon of the information age. The per- • The birth of the microprocessor, vasiveness of the microprocessor in this age goes far • The first microcomputers, beyond the wildest imagination at the time of the first • A leading role for the microprocessor, microprocessor. From the fastest computers to the • The promise of reduced instruction set com- simplest toys, the microprocessor continues to find puter (RISC), and new applications. • Microprocessors of the 1990s. The microprocessor today represents the most These five stages define a rough chronology, with complex application of the transistor, with well over some overlap. Each stage could be said to reflect a 10 million transistors on some of the most powerful generation of microprocessors, with corresponding microprocessors. In fact, throughout its history, the generations of applications. For each stage, we discuss microprocessor has always pushed the technology of representative microprocessors and their key applica- the day.
Advanced Communications Project

Advanced Communications Project Shipboard Communications Center Modernization Recommendations Report ELO DEV PM & EN H T C C R E A N E T S E Prepared for E R R The United States Coast Guard Research & Development Center U 1082 Shennecossett Rd. N D I R T A Groton, CT 06340-6096 E U D G S T T A S T E S C O A By PRC Inc. and VisiCom Laboratories Inc. Via August 1995 Volpe NTSC CUTTER MODERNIZATION REPORT FOR THE UNITED STATES COAST GUARD CUTTER COMMUNICATIONS CENTER MODERNIZATION PROJECT 14 July 1995 Submitted to: PRC, Inc. 1 Kendall Square Building 200, Suite 2200 Cambridge, MA 02139 Submitted by: VisiCom Laboratories, Inc. 41 N. Jefferson Street Pensacola, FL 32561 Executive Summary EXECUTIVE SUMMARY Background. This report presents the results of a communications system modernization study for the radio room of the U.S. Coast Guard (USCG) high endurance cutter (WHEC) and medium endurance cutter (WMEC). A four step sequential analysis approach was undertaken in the production of this report. Step 1 characterized the WHEC and WMEC operational environment. Based upon this environmental characterization, the mission and operational requirements were then determined in step 2. During step 3, technical requirements were derived from the operational requirements. Finally in step 4 and from the technical requirements, a recommended communications modernization approach and architecture was proposed. Previous trade-off studies and analysis have focused on identifying and assessing requirements for functional modifications and additions to existing system components supporting USCG operations. This report presents options and recommendations to the USCG on how to improve USCG communications capabilities within a modular architectural framework that is independent of specific spectrum, radios, modems or processing equipment.
Vector Microprocessors

Vector Microprocessors by Krste AsanoviÂc B.A. (University of Cambridge) 1987 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA, BERKELEY Committee in charge: Professor John Wawrzynek, Chair Professor David A. Patterson Professor David Wessel Spring 1998 The dissertation of Krste AsanoviÂc is approved: Chair Date Date Date University of California, Berkeley Spring 1998 Vector Microprocessors Copyright 1998 by Krste AsanoviÂc 1 Abstract Vector Microprocessors by Krste AsanoviÂc Doctor of Philosophy in Computer Science University of California, Berkeley Professor John Wawrzynek, Chair Most previous research into vector architectures has concentrated on supercomputing applications and small enhancements to existing vector supercomputer implementations. This thesis expands the body of vector research by examining designs appropriate for single-chip full-custom vector microprocessor imple- mentations targeting a much broader range of applications. I present the design, implementation, and evaluation of T0 (Torrent-0): the ®rst single-chip vector microprocessor. T0 is a compact but highly parallel processor that can sustain over 24 operations per cycle while issuing only a single 32-bit instruction per cycle. T0 demonstrates that vector architectures are well suited to full-custom VLSI implementation and that they perform well on many multimedia and human-machine interface tasks. The remainder of the thesis contains proposals for future vector microprocessor designs. I show that the most area-ef®cient vector register ®le designs have several banks with several ports, rather than many banks with few ports as used by traditional vector supercomputers, or one bank with many ports as used by superscalar microprocessors.
Microprocessors

13 Microprocessors The microprocessor is the heart of a microcomputer system. In fact, it forms the central processing unit of any microcomputer and has been rightly referred to as the computer on a chip. This chapter gives an introduction to microprocessor fundamentals, followed by application-relevant information, such as salient features, pin configuration, internal architecture, instruction set, etc., of popular brands of eight-bit, 16-bit, 32-bit and 64-bit microprocessors from international giants like INTEL, MOTOROLA and ZILOG. 13.1 Introduction to Microprocessors A microprocessor is a programmable device that accepts binary data from an input device, processes the data according to the instructions stored in the memory and provides results as output. In other words, the microprocessor executes the program stored in the memory and transfers data to and from the outside world through I/O ports. Any microprocessor-based system essentially comprises three parts, namely the microprocessor, the memory and peripheral I/O devices. The microprocessor is generally referred to as the heart of the system as it performs all the operations and also controls the rest of the system. The three parts are interconnected by the data bus, the address bus and the control bus (Fig. 13.1). The memory stores the binary instructions and data for the microprocessor. The memory can be classified as the primary or main memory and secondary memory. Read/write memory (R/WM) and read only memory (ROM) are examples of primary memory and are used for executing and storing programs. Magnetic disks and tapes are examples of secondary memory.
LCSH Section

*Naborr (Horse) (Not Subd Geog) Louis Allen Post Office (Chester, N.Y.) 3-i︠a︡ Artilleriĭskai︠a︡ linii︠a︡ (Saint Petersburg, Russia) UF Nabor (Horse) BT Post office buildings—New York (State) USE Furshtatskai︠a︡ ulit︠s︡a (Saint Petersburg, BT Horses 1st Street (Harare, Zimbabwe) Russia) *Raseyn (Horse) (Not Subd Geog) USE First Street (Harare, Zimbabwe) 3-i︠a︡ Artilleriĭskai︠a︡ ulit︠s︡a (Saint Petersburg, Russia) BT Horses 2 Ennerdale Drive (Barnet, London, England) USE Furshtatskai︠a︡ ulit︠s︡a (Saint Petersburg, 0 (Group of artists) BT Dwellings—England Russia) USE Zero (Group of artists) 2 Park Avenue (New York, N.Y.) 3-manifolds (Topology) 0⁰ latitude UF Two Park Avenue (New York, N.Y.) USE Three-manifolds (Topology) USE Equator BT Office buildings—New York (State) 3-sheet posters 0⁰ meridian 2A2 (Military aircraft) USE Three-sheet posters USE Prime Meridian USE Salmson 2A2 (Military aircraft) 3 Sisters Islands (Washington, D.C.) 0-1 Bird Dog (Reconnaissance aircraft) 2nd Avenue (Manhattan, New York, N.Y.) USE Three Sisters Islands (Washington, D.C.) USE Bird Dog (Reconnaissance aircraft) USE Second Avenue (Manhattan, New York, N.Y.) 3-trifluoromethylphenylpiperazine 0th law of thermodynamics 2nd Avenue (Seattle, Wash.) USE Trifluoromethylphenylpiperazine USE Zeroth law of thermodynamics USE Second Avenue (Seattle, Wash.) 3 Vallées (France) 1,000 Year Monument (Novgorod, Russia) 2nd Avenue West (Seattle, Wash.) USE Trois Vallées (France) USE Tysi︠a︡cheletie Rossii (Novgorod, Russia) USE Second Avenue West (Seattle, Wash.) 3 Valleys