An ALU Design Using a Novel Asynchronous Pipeline Architecture

An ALU Design using a Novel Asynchronous Pipeline Architecture TANG, Tin-Yau A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Philosophy in Electronic Engineering ©The Chinese University of Hong Kong August, 2000 The Chinese University of Hong Kong holds the copyright of this thesis. Any person(s) intending to use a part or whole of the materials in the thesis in a proposed publication must seek copyright release from the Dean of the Graduate School. • . ‘ ‘ • • • ‘ •'• ‘ ‘ ‘ . , . ； i ‘ HHffi aTl Wv ~UNIVERSITY絮! >®\library SYSTEM^W An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion Table of Content Table of Content 2 List of Figures 4 List of Tables 6 Acknowledgements 7 Abstract 8 I. Introduction 11 1.1 Asynchronous Design 12 1.1.1 What is asynchronous design? 12 1.1.2 Potential advantages of asynchronous design 12 1.1.3 Design methodology for asynchronous circuit 15 1.1.4 Difficulty and limitation of asynchronous design 19 1.2 Pipeline and Asynchronous Pipeline 21 1.2.1 What is pipeline? 21 1.2.2 Property of pipeline system 21 1.2.3 Asynchronous pipeline 23 1.3 Design Motivation 26 II. Design Theory 27 2.1 A Novel Asynchronous Pipeline Architecture 28 2.1.1 The problem of classical asynchronous pipeline 28 2.1.2 The new handshake cell 28 2.1.3 The modified asynchronous pipeline architecture 29 2.2 Design of the ALU 36 2.2.1 The functionality of ALU 36 2.2.2 The choice of the adder and the BLC adder 37 III. Implementation 41 3.1 ALU Detail 42 3.1.1 Global arrangement 42 3.1.2 Shift and Rotate 46 3.1.3 Flags generation 49 3.2 Application of the Pipeline Architecture 53 3.2.1 The reset network for the pipeline architecture 53 3.2.2 Handshake simplification for splitting and joining of datapath. 55 2 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion IV. Result 59 4.1 Measurement and Simulation Result 60 4.2 Global Routing Parasites 63 4.3 Low Power Application 65 V. Conclusion 67 VL Appendixes 69 6.1 The Small Micro-coded Processor 69 6.2 The Instruction Table of the ALU 70 6.3 Measurement and Simulation Result 71 6.4 VHDLs, Schematics and Layout 87 6.5 Pinout of the Test Chip 120 6.6 The Chip Photo 121 VII. Reference 122 3 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion List of Figures Figure 1: Block Diagram of standard bundled-data datapath. 16 Figure 2: A Differential Cascode Voltage Switch Logic Cell. 18 Figure 3: The operation of pipeline system. 21 Figure 4: The architecture of synchronous pipeline. 23 Figure 5: A classical asynchronous pipeline architecture. 24 Figure 6: The 4-phase handshake protocol in an asynchronous pipeline 24 Figure 7: The new handshake cell for the new asynchronous pipeline architecture. 29 Figure 8: The timing diagram of a pipeline stage with the handshake cell. 29 Figure 9: The basic structure of FIFO. 30 Figure 10: The block diagram of an asynchronous pipeline FIFO. 31 Figure 11: Handshake principle for joining (a) and splitting (b) of datapath. 32 Figure 12: The modified handshake cell with the "reset" signal. 33 4 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion Figure 13: The ripple-carry adder 37 Figure 14: The BLC adder in our ALU. 39 Figure 15: The floorplan of our ALU. 43 Figure 16: Multiplexer based network for 16-bit, arbitrary number of left shift. 47 Figure 17: An example of rotate left by 5 bit. 48 Figure 18: The structure of the zero and parity detection circuits. 51 Figure 19: A FIFO with the extra pmos for reset. 53 Figure 20: The reset distribution tree. 54 Figure 21: The simplified handshake for joining of datapath. 55 Figure 22: The modified handshake for splitting of datapath. 56 Figure 23: The testing circuit in the test chip 60 Figure 24: Distribution of the delay of the nets between any pipeline stages. 63 Figure 25: The refresh of a handshake cell. 66 5 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion List of Tables Table 1: The representation of output logic in dual-rail coding 19 Table 2: The literature results of other ALUs. 26 Table 3: The instruction in our ALU. 36 Table 4: The measurement and simulation result of the ALU. 61 Table 5: Comparison of our ALU with other literature result. 62 Table 6: The statistic of the delay of the nets between two pipeline stages. 64 Table 7: The power dissipation of first prototype of the ALU. 65 6 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion Acknowledgements My work would not have been make possible without the help and support of many individuals. I would like to take this opportunity to thank my supervisor, Professor Choy Chiu-Sing for his full support, guidance and suggestions on my work throughout my period of study. Also, I would like to thank Professor Chan Cheong-Fat for his valuable opinion on my work. Moreover, I would like to thank all my peers and colleagues in ASIC/VLSI Laboratory. They include Hon Kwok-Wai, Mak Wing-Sum, Siu Pui-Lam, Lee Chi-Wai, Leung Lai-Kan, Cheng Wang-Chi, Yang Jing-Ling, Wang Hong-Wei, our research assistants/associates Dr. Juraj Povazanec, Mr. Jan Butas, and our laboratory technician Mr. Yeung Wing-Yee. Of course, I would like to thank my family members and friends for their continuous support and encouragement. 7 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion Abstract Pipeline is an important methodology in current VLSI design. The main idea of pipeline is dividing an operation into many stages such that a design can accept a new operation even the pervious operations has not finished yet. That means a pipeline design can process many operations at the same time and so, a high-speed performance can be achieved. Most of the current pipeline systems are using synchronous pipeline. In which, a global clock is used to clock all the stages in the synchronous pipeline system. As a result, the global clock signal is always connected to many logic cells and so, the time of the clock signal to reach different logic cell may be different due to the different lengths of the delay from the global clock to different logic cells. This is the problem of clock skew and is one of the major factors that limit the frequency of the global clock and hence, the throughput of the pipeline system. Asynchronous pipeline system is different. Handshake control signal is used instead of global clock signal to control the circuit in the asynchronous design. As a result, the global clock signals can be removed and so, the problem of clock skew can be eliminated. The limitation of an asynchronous pipeline is usually on the generation of the handshake control signal. In this thesis, a novel asynchronous pipeline architecture will be introduced. The architecture has simple handshake cells such that the speed can be very fast. The architecture was applied in the design of a 16-bit ALU and the ALU is partitioned into 12 pipeline stages. The logic in each pipeline stage is very simple and so the stage delay is very small. As a result, the throughput can be enhanced. The test 8 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion chip of the ALU is implemented in a 0.6mm, 3 layer metal, CMOS technology. The measured throughput of the test chip is about 220 MHz. The performance of the ALU can be largely improved if better place and route result can be obtained. 9 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion 摘要管線(pipeline)是現今集成電路設計中一種重要的技巧，主要概念是將一個運算分割成多個階段，那麼，就算舊有的運算還沒有完全完成，管線電路也還可以接受新的運算，換句話說，管線電路能夠在同一時間處理多個運算，因此，我們能獲得較快速的糸統表現。現時’大部份管線糸統也是採用同步(synchronous)管線，在同步管線糸統内，所有管線階段也必須接受單一的總時脈訊號控制，因此，總時脈訊號通常也接駁到很多的邏輯單元。因為由總時脈訊號到不同邏輯單元的延遲可能不同，所以總時脆訊號到達不同邏輯單元的時間也可能不同，這導致時脈偏差(dock skew) 的問題，而這問題正是限制我們時脈訊號頻率及管線糸統吞吐量的一個主因。異步(asynchronous)管線糸統就有所不同，交握(handshake)控制訊號會取代總時脈訊號以控制電路的運作’因此總時脈訊號會被移去，相應地時脈偏差的問題也同時被消去。異步管線的限制通常是在交握控制訊號的產生方面，在本論文内，我們會介紹一種勒新的異步管線架構，這架構擁有簡單的交握單元，以支援較快的運算速度°這架構會應用來設計十六位元數學邏輯單元(16-bit ALU)，那單元被分割為十二個管線階段，而每一階段的邏輯電路也是非常簡單的，因此能夠達至很小階段延遲，從而能夠提高糸統的速度°測試晶片是使用零點六微求製程，三層金屬的互補金氧半導體(CMOS)技術所製造，測試速度為220 MHz，如果能有更佳的排位及走線設計，糸統速度就能大幅地提高。 10 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion L Introduction The research topic is related to asynchronous pipeline. Asynchronous design methodology has been a hot topic in research in recent year. Due to the potential advantage of asynchronous methodology, many researchers have been working in this field and many ideas related to asynchronous circuit design have been proposed. In this chapter, we will first introduce what asynchronous design is. This includes the advantages of asynchronous design and the problems faced by designers. Also, we will describe some asynchronous design techniques. Afterwards, we will introduce what pipeline is and describe how pipeline can be use in asynchronous design. Finally, we will explain the motivation of designing an asynchronous ALU and give a brief account of some pervious designs of ALU or other similar circuits. 11 An ALU design using a Novel Asynchronous Pipeline Architecture Result and Discussion 1.1 Asynchronous Design 1.1.1 What is asynchronous design? Hauck [1] has stated that the difference between synchronous and asynchronous designs is on the difference of the assumptions of signal properties. In general, the conventional synchronous design is based on the assumptions that all the signals are binary and also the time for the valid signals to appear is discrete.

An ALU Design Using a Novel Asynchronous Pipeline Architecture

PIPELINING and ASSOCIATED TIMING ISSUES Introduction: While

2.5 Classification of Parallel Computers

Microprocessor Architecture

Instruction Pipelining (1 of 7)

Review of Computer Architecture

CS152: Computer Systems Architecture Pipelining

Threading SIMD and MIMD in the Multicore Context the Ultrasparc T2

Pipelined Computations

Trends in Processor Architecture

Reverse Engineering X86 Processor Microcode

QDI Asynchronous Design and Voltage Scaling

Implementing Balsa Handshake Circuits