Applying Asynchronous Techniques to Architectures Exploiting ILP at Compile Time

Total Page:16

File Type:pdf, Size:1020Kb

Applying Asynchronous Techniques to Architectures Exploiting ILP at Compile Time DSS: Applying Asynchronous Techniques to Architectures Exploiting ILP at Compile Time Wei Shi, Zhiying Wang, Hongguang Ren, Ting Cao*, Wei Chen, Bo Su, and Hongyi Lu School of Computer, *Department of computer science National University of Defense Technology, *Australian National University {shiwei,zywang,ren,chenwei,subo,hylu}@nudt.edu.cn *[email protected] Abstract—Embedded application environments require both network consumed more than 25% of the total power in many high performance and low power. Architectures exploiting applications. Approximately 70% of the power is burned by instruction-level parallelism (ILP) at compile time, such as very the clock distribution and latches in the POWER 4 processor long instruction word (VLIW) and transport triggered archite- [4]. In the Alpha 21264, 32% of the chip power is attributed to cture (TTA), may satisfy the requirements. They can be further the global clock network [5]. In embedded SuperHTM proc- enhanced by using asynchronous circuits to significantly reduce power consumption. As such, we are interested in asynchronous essor core, the clock-tree and Flip-Flops take 35% of the total processors with architectures exploiting ILP at compile time. power [6]. In the VLIW processor core of TMS320C6416T, However, most of the current asynchronous processors are based the clock distribution network contribution percentage to the on RISC-like architectures. When designing asynchronous VLIW total power consumption is about 76% for some simple algor- or TTA processors, the distribution of control introduces some ithms [7]. Instead of being driven periodically by a global serious problems, and errors may occur because of the variable clock, asynchronous circuits work on demand and stop when latencies of operations. This paper investigates the asynchronous processor with architecture exploiting ILP at compile time. In not needed. For this reason, a design utilizing asynchronous order to overcome these problems, we propose a data source circuits produces an effectively power-efficient architecture. selecting (DSS) scheme to guarantee instructions run correctly Moreover, an asynchronous circuit exhibits average-case on asynchronous VLIW and TTA processors. Concretely, an performance rather than the worst-case performance of a asynchronous pipelined processor based on TTA is designed. The synchronous circuit. micro-architecture of the proposed asynchronous TTA processor In the past two decades, many asynchronous processors is presented and an asynchronous processor named Tengyue is implemented using 180nm technology. The experimental results, have emerged. The first asynchronous processor was designed for a range of benchmarks and working modes, show that the at Caltech in 1988 [8], and it was followed by FAM (fully implemented asynchronous TTA processor with DSS scheme asynchronous microprocessor) [9], NSR (non-synchronous support runs correctly and power dissipation is reduced to about RISC) [10], STRiP (self-timed RISC processor) [11], Amulet 43% to 65% of the equivalent synchronous processor. [12], TITAC [13], MiniMIPS [14], ARM996HS [15], and so on. Almost all of the above asynchronous processors are based I. INTRODUCTION on RISC architectures. Nowadays, VLIW and TTA are being As the complexity of embedded applications is growing widely used in the embedded domain for their hardware rapidly, systems in the embedded domain are required to be of simplicity [16], [17]. However, the design methods used for high performance to meet real-time requirements. Exploiting asynchronous RISC-like architectures cannot be directly appl- ILP is an attractive approach to satisfying the high perfor- ied to asynchronous EICT architectures. mance requirements, and two approaches are typically used: In EICT architectures, data and control hazards are over- traditional CPU, such as superscalar processor, can exploit the come at compile time. The compiler knows the latency of each ILP at run time, and we abbreviate this type of architectures to operation, and the latency information is used to schedule EIRT (exploiting ILP at run time) architectures; while others, instructions. Processors run the scheduled program instruction such as VLIW and TTA based processors, exploit the ILP at by instruction and need not to do dependence detection at run compile time and they are abbreviated to EICT (exploiting ILP time. However, EICT architectures cannot tolerate variable at compile time) architectures in this paper. Furthermore, low latencies of operations. When we implement an EICT arch- power consumption is also required for mobile embedded itecture based processor using asynchronous circuits, the environments. In order to improve energy-efficiency, a lot of centralized control becomes distributed because the global techniques are proposed to overcome the power problem at clock is replaced by local handshake signals. The naive different levels ranging from circuits to applications [1]. replacement of the global clock in a synchronous processor The clock related components have become the largest with local handshake signals to produce an asynchronous power consuming part in a processor, and these components processor may result in incorrect behaviors. By removing the include the clock generator, the clock distribution tree, clock global clock, it is no longer possible to use the cycle count to drivers, latches and the clock loading due to all the clocked measure the variable latencies of operations. That is to say, elements [2]. Friedman [3] reported that the clock distribution there has always been a conflict between the indefinite 978-1-4244-8935-0/10/$26.00 ©2010 IEEE 321 k n a b Fig. 1. TTA long instruction format. Fig. 2. Synchronous circuit and its asynchronous equivalent. latencies of asynchronous circuits and the accurate cycle count domain, identifying the source of the data; destination domain, of operations demanded by the compiler. This paper proposes which specifies the target of the transport. a DSS scheme to guarantee programs run correctly on asyn- There are usually 5 kinds of components in a TTA based chronous EICT architectures. In the DSS scheme, the results processor. Those components are the instruction fetch unit, of operations are buffered and then correct results are picked instruction decode unit, interconnection network, register files as inputs of the corresponding operations. In order to elaborate and functional units. All of the register files and functional the DSS scheme, the design of an asynchronous pipelined units are connected by the interconnection network through processor based on TTA employing the DSS scheme is sockets. Data transports are performed through these sockets, presented. and these sockets are controlled by the decode unit. Each The paper is organized as follows. Section II introduces the functional unit contains one or more operand registers, a TTA and asynchronous circuit design method. Section III unique trigger register, and one or more result registers. When presents the asynchronous TTA with DSS scheme. The detail data is written into the trigger register, the functional unit is implementation of the asynchronous TTA is explained in triggered to perform computation using the data in both Section IV. The delay, area and power consumption of the operand registers and trigger register, and then output results proposed asynchronous TTA processor are compared to its are written into the result registers. The pipeline of TTA synchronous counterpart in Section V. Conclusions are given usually contains 4 stages: instruction fetch (IF), instruction in Section VI. decode (ID), transport (MV) and execution (EXE). The EXE stage can be further divided into several pipeline stages. II. PRELIMINARIES B. Asynchronous Design Flow A. The Synchronous Transport Triggered Architecture Synchronous-to-asynchronous circuit conversion methodol- In this paper, we take TTA as an example of EICT archi- ogy [19], which converts synchronous circuits into asynchr- tectures to elaborate the DSS scheme. This is because TTA onous ones using existing EDA tools and flows, is an efficient can be regarded as an extreme example of EICT architectures. design approach for asynchronous circuits. Furthermore, the Actually, some hardware mechanisms such as simplified de-synchronization method proposed in [20] is very popular. scoreboard may also be used by VLIW in some situations, but In this paper, we use a simple design flow which is similar to there are not any hardware mechanisms to do dependence the de-synchronization methodology to implement a TTA detection and hazard elimination in TTA. based asynchronous processor. In our design flow, the global TTA proposed by H. Corporaal [18] can be viewed as a clock network is replaced by local handshake circuits, and the superset of the traditional VLIW architecture. However, TTA Flip-Flops in data paths commonly remain the same as the ori- has unique characteristics compared to conventional processor ginal synchronous processor. The local controller of a control architectures. One of the main features of a TTA processor is path uses redundant four-phase latch control (RFLC) protocol that all the actions are actually side-effects of transporting data which is combined with 2 simple four-phase latch controllers. from one place to another. When the data is written into a The synchronous circuit and its asynchronous equivalent are special register, called a trigger register, execution is triggered. shown in Fig. 2(a) and Fig. 2(b) respectively. Consequently,
Recommended publications
  • TFT Asynchronous Microprocessor on Plastic
    7.2 / M. Miyasaka Current Status of Flexible Microelectronics; TFT Asynchronous Microprocessor on Plastic Mitsutoshi MIYASAKA, Nobuo KARAKI 1) and Tatsuya SHIMODA 2) Seiko Epson Corporation Flexible Microelectronics Project 1) Technology Platform Research Center 2) Corporate R&D Tel:81-266-61-1211, Fax:81-266-61-0674, E-mail: [email protected] Thin film transistors (TFTs) are field-effect accordance with the scaling rule for nearly half a transistors that can be used to create large-scale century. Though we hope for further advancement, the integrated (LSI) circuits. The combination of Suftla technology seems to be reaching its limits. technology and high-performance TFTs has the Theoretically the silicon band-gap limits the lowest potential to foster the rise of a new flexible voltage to approximately 0.5 V. In addition, there microelectronics industry. This paper reviews the appear to be serious technical difficulties in achieving history of Suftla technology and discusses the current finer patterning and thinner gate-dielectric. From a status of flexible microelectronics, a TFT business perspective, the advantages gained by further asynchronous microprocessor. We are sure that there scaling are spoiled by the high research, development is a best fit between the advantages of the and fabrication costs. Taking these facts into account, asynchronous circuit and the electrical properties we propose improving semiconductor devices this TFTs possess, because the asynchronous circuit century by eradicating the notion that LSI devices overcomes the deviation of transistor characteristics, must be made on silicon wafers. is free from the signal delay problem and does not Suftla (surface-free technology by laser generate a large amount of heat.
    [Show full text]
  • One Shots and Alternatives in Synchronous Digital System Design
    Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title ONE SHOTS AND ALTERNATIVES IN SYNCHRONOUS DIGITAL SYSTEM DESIGN Permalink https://escholarship.org/uc/item/7tq8t8hn Author Hui, S. Publication Date 1979 Peer reviewed eScholarship.org Powered by the California Digital Library University of California LBL-8722 UC-37 7"> ONE SHOTS AND ALTERNATIVES IN SYNCHRONOUS DIGITAL SYSTEM DESIGN Steve Hui and John B. Meng January 1979 Prepared for the U. S. Department of Enemy under Contract W-7405-ENG-48 UtitAUigim LBL 8722 ONE SHOTS AND ALTERNATIVES IN SYNCHRONOUS DIGITAL SYSTEM DESIGN Steve Hui 8 John B. Meng January 1979 Prepared for the U.S. Department of Energy under Contract W-7405-ENG-48 NOTICE This re poll was pit pa ted is an account or work sponsored by the United States Government. Neither the United Sulci ooi the Umled Slates Depigment of Energy, nor any of their employees, nor any of their contractor*, lube on Ira clou, oi the If employees, nukes any warranty, express or implied, or auumei any legal liability •( response ill ly for (he accuracy, completene or usefulness of my information, anptiatui, product < process disclosed, c; represents thai ill me would no) infringe privately owned right*. !>TV»U;3 LBL 8722 (i) ONE SHOTS AND ALTERNATIVES IN SYNCHRONOUS DIGITAL SYSTEM DESIGN Steve Hui & John D. Meng The one shot or monostable multivibrator is quite often regarded as a "black sheep" in ditiqai integrated circuits. Its distinctions and versatility are well known and do not necessitate mentioning. Some of the potential problems with this black sheep are summarized as follows: 1.
    [Show full text]
  • Reaction Systems and Synchronous Digital Circuits
    molecules Article Reaction Systems and Synchronous Digital Circuits Zeyi Shang 1,2,† , Sergey Verlan 2,† , Ion Petre 3,4 and Gexiang Zhang 1,∗ 1 School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, Sichuan, China; [email protected] 2 Laboratoire d’Algorithmique, Complexité et Logique, Université Paris Est Créteil, 94010 Créteil, France; [email protected] 3 Department of Mathematics and Statistics, University of Turku, FI-20014, Turku, Finland; ion.petre@utu.fi 4 National Institute for Research and Development in Biological Sciences, 060031 Bucharest, Romania * Correspondence: [email protected] † These authors contributed equally to this work. Received: 25 March 2019; Accepted: 28 April 2019 ; Published: 21 May 2019 Abstract: A reaction system is a modeling framework for investigating the functioning of the living cell, focused on capturing cause–effect relationships in biochemical environments. Biochemical processes in this framework are seen to interact with each other by producing the ingredients enabling and/or inhibiting other reactions. They can also be influenced by the environment seen as a systematic driver of the processes through the ingredients brought into the cellular environment. In this paper, the first attempt is made to implement reaction systems in the hardware. We first show a tight relation between reaction systems and synchronous digital circuits, generally used for digital electronics design. We describe the algorithms allowing us to translate one model to the other one, while keeping the same behavior and similar size. We also develop a compiler translating a reaction systems description into hardware circuit description using field-programming gate arrays (FPGA) technology, leading to high performance, hardware-based simulations of reaction systems.
    [Show full text]
  • An Asynchronous Circuit Design Language System
    Scholars' Mine Doctoral Dissertations Student Theses and Dissertations 1972 An asynchronous circuit design language system Gregory Martin Bednar Follow this and additional works at: https://scholarsmine.mst.edu/doctoral_dissertations Part of the Electrical and Computer Engineering Commons Department: Electrical and Computer Engineering Recommended Citation Bednar, Gregory Martin, "An asynchronous circuit design language system" (1972). Doctoral Dissertations. 194. https://scholarsmine.mst.edu/doctoral_dissertations/194 This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected]. AN ASYNCHRONOUS CIRCUIT DESIGN. LANGUAGE SYSTEM by GREGORY MARTIN BEDNAR, 1944- A DISSERTATION Presented to the Faculty of the Graduate School of the UNIVERSITY OF MISSOURI-ROLLA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY T2781 in 157 pages ELECTRICAL ENGINEERING c. I 1972 23?261 ii ABSTRACT This paper presents a system for specifying the behavior of asynchronous sequential circuits. The system consists of a special purpose Asynchronous Circuit Design Language (ACDL), a translator and a flow table generation algorithm. The language includes many special features which permit quick and precise specification of terminal behavior. It is best suited for problems originating from a word description of the circuit's operation. The translator is written with the XPL Translator Writing System and is a syntax-directed compilation method. From the translated ACDL specifications, the flow table algorithm generates a primitive flow table which is the required input for the conventional synthesis procedures of asynchronous sequential circuits.
    [Show full text]
  • Test and Testability of Asynchronous Circuits
    Test and Testability of Asynchronous Circuits Nastaran Nemati A thesis submitted for the degree of Doctor of Philosophy University of New South Wales June 2017 To Mum and Dad ORIGINALITY STATEMENT ‘I hereby declare that this submission is my own work and to the best of my knowl- edge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, ex- cept where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’ Signed ................................................ Date .................................................... COPYRIGHT STATEMENT ‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only).
    [Show full text]
  • Impact of C-Elements in Asynchronous Circuits
    Impact of C-Elements in Asynchronous Circuits Matheus Moreira, Bruno Oliveira, Fernando Moraes, Ney Calazans Faculdade de Informática Pontifícia Universidade Católica do Rio Grande do Sul Porto Alegre, Brazil {matheus.moreira, bruno.scherer}@acad.pucrs.br {fernando.moraes, ney.calazans}@pucrs.br Abstract single cell level and the application, or core, level. For the Asynchronous circuits are a potential solution to address former scope, delay and power consumption were measured some of the obstacles in deep submicron (DSM) design. One through electrical simulations at the standard cell level after of the most frequently used devices to build asynchronous electrical extraction of the physical layout. For the latter circuits is the C-element, a device present as a basic building scope, the different C-elements were used to build block in several asynchronous design styles. This work asynchronous implementations of an oscillator ring and an measures the impact of three different C-element types. The RSA cryptographic core. A comparison of performance and paper compares the use of each implementation to build a real required silicon area of the case studies made it possible to case asynchronous circuit, an RSA cryptographic core, and scrutinize the systemic effect of using different C-element reports results of precise electrical simulations of each C- implementations when designing asynchronous circuits in element. Findings in this work show that previous results in DSM technologies. The rest of the paper is organized in seven sections. the literature about C-element implementation types must be Section II describes basic concepts on asynchronous circuits re-evaluated when using C-elements in DSM technologies.
    [Show full text]
  • Asynchronous Control Circuit Design and Hazard Generation: Inertial Delay and Pure Delay Models
    oF a9'3-A *l* It Asynchronous Control Circuit Design and Hazard Generation: Inertial Delay and Pure Delay Models by Nozar Tabrizi B.S.E.E (Sharif University of Technology) 1980 M.S.E.E (Sharif University of Technology) 1988 A thesis submitted for the degree of Doctor of Philosophy in the Centre for High Performance Integrated Technologies and Systems (CHiPTec) Department of Electrical and Electronic Engineering The University of Adelaide June 1997 Table of Contents 1 Motivation for Asynchronous Circuits I 1.1 Introduction. .. 1 1.1.1 Clock skew ,.2 1.I.2 Power consumption............. ..5 Ll.3 Variable computation time.. ..6 1.1.4 Modularity and upgradiblity ..8 t.2 Organization ............... 10 2 Delay constraints and Design Techniques of Asynchronous Control Circuits ..................... L3 2.t Introduction t3 2.1.1 Huffman classical method..... 14 2.1.2 Speed independent circuits 14 2.1.3 Delay insensitive circuits 15 2.2 State based techniques .16 2.2.1 Classical Huffman method............ .16 2.2.2 One-hot coding..... t9 2.2.3 Timing requirements in the Huffman methodology.............. ..... 22 2.2.4 Friedman and Menon's methods to design multiple input change asynchronous circuits. 23 2.3 Burst mode or self clocked circuits ....................26 2.3.1 Burst mode circuits using controlled excitation and edge triggered flip-flops..... .26 2.3.2 Locally clocked asynchronous state machines 29 2.3.3 Q-Modules 32 2.3.4 3D Asynchronous circuits...... 35 2.4 Muller's speed independent circuit theory 36 2.4.1 Introduction... 36 2.4.2 Two restricted types of speed independent circuits .39 2.4.3 A flow table based speed independent circuit realization..
    [Show full text]
  • An Open-Source Design Flow for Asynchronous Circuits
    An Open-Source Design Flow for Asynchronous Circuits Rajit Manohar Computer Systems Lab Yale University New Haven, CT 06520 Email: [email protected] Abstract—There have been a number of small-scale and the clocked paradigm a poorer and poorer abstraction for chip large-scale technology demonstrations of asynchronous circuits, design. Modern application-specific integrated chips (ASICs) showing that they have benefits in performance and power- are designed as a collection of small clocked “islands” that efficiency in a variety of application domains. Most recently, asyn- chronous circuits were used in the TrueNorth neuromorphic chip communicate via interfaces that break the clocking abstraction. to achieve unprecedented energy-efficiency for neuromorphic We are developing a collection of electronic design automa- systems. However, these circuits cannot be easily adopted, because tion (EDA) tools that isolate the designer from the details commercially available design tools do not support asynchronous of the physical implementation technology, especially when logic. As part of the DARPA ERI effort, we are addressing this it comes to delays and timing uncertainty.1 The approach is challenge by developing a set of open-source design tools for asynchronous circuits. based on an asynchronous, modular and hierarchical design methodology for complex chips,and it permits component re- Keywords—asynchronous circuits; open-source EDA tools use from one technology to another with little or no modifi- cation. While individual (small) modules of the chip could be I. INTRODUCTION clocked, the overall system uses an asynchronous integration Scalable computer systems are designed as a collection of approach to achieve modular composition. modular components that communicate through well-defined II.
    [Show full text]
  • Synchronous Sequential Logic Lecture Notes
    Synchronous Sequential Logic Lecture Notes Uncurrent Antonius contemporized, his creed scythe whinnied collectively. Exoergic Jeffery supernaturalises costively, he swots his granitite very unmercifully. Noel intercalates her demonstrators finely, she vies it conically. Number of synchronous logic circuit theory, lecture notes on this a latch we will gradually be performed by dr. With connected in this textbook url and gates to be equal to a pipeline stalls with our input driven all hype or otherwise, and when virtual page. Instruction as soon, which a link provided for a handy in. Mealy state transition table is synchronous design recipe become stable value to lecture notes was encounterd during four triggering clock pulse to some designers like. We adhere to synchronous and in a nearby data disks contain a single bit line is called as. Combinational propagation slack also known as to combine circuits, because a computer memories and complement systems from a zero! The lecture notes for whole are shown below each cycle. You miss a synchronous sequential circuits from a digital system among all notes was designed by a mealy machine including calculators, lecture content recommendations. The person who need to speed with assembly language programming, it may be used in ways that are you. Characteristic table entry is executed and sophisticated error correction is not permitted and. Describing historical computers as long as we need synchronizers a different. Count down counters only stable, not useful digital computers are competing for laws come up of inputs only use immediate addressingis where this textbook. One bit of every bit binary combinations for register on research you may be executed sequentially in a more costly hardware.
    [Show full text]
  • Unified (A)Synchronous Circuit Development
    Unified (A)Synchronous Circuit Development Philipp Paulweber Jurgen¨ Maier* Jordi Cortadella University of Vienna TU Wien Universitat Politecnica` de Catalunya Faculty of Computer Science Insitute of Computer Engineering Department of Computer Science Vienna, Austria Vienna, Austria Barcelona, Spain [email protected] [email protected] [email protected] Abstract—Despite its development several decades ago and one has to make sure that the communication protocol (request several very beneficial properties asynchronous logic design, and acknowledge) can be fully executed, which sometimes which is data driven and runs as fast as possible in all situations, requires the introduction of additional memory elements [2]. is rarely used nowadays. Reasons are of course its disadvan- tageous properties such as bad testability but also required Thus switching to asynchronous logic means retraining the sophisticated knowledge for designers and missing tools. In this designer which results in high risk for a company. paper we draw a path to tackle the latter points by suggesting Generating asynchronous circuits systematically from Data a tool/way to generate multiple circuit implementations from a single description. We are aiming to convert specifications Flow Graph (DFG)s, i.e., models that show how data is propa- written in various input languages, e.g. C or VHDL, to an unified gated from one operation to the next but neglects the inherent Internal Representation (IR). This IR is composed of building timings, is already possible. For this purpose operations and blocks (semantic vocabulary) specified through the Abstract State special constructs such as merge or join are simply replaced Machine (ASM) based formal method.
    [Show full text]
  • A Globally Wireless Locally Wired Hybrid Clock Distribution Network for Many-Core Systems
    UNIVERSITY OF SOUTHAMPTON A Globally Wireless Locally Wired Hybrid Clock Distribution Network for Many-core Systems by Qian Ding A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Science School of Electronics and Computer Science February 2021 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF ENGINEERING AND PHYSICAL SCIENCES SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE Doctor of Philosophy A Globally Wireless Locally Wired Hybrid Clock Distribution Network for Many-core Systems by Qian Ding Modern ICs are now facing critical issues on generating power-efficient and globally interconnected clock networks, as the clock distribution network might contribute to more than 50% of the overall power consumption. Besides, due to the increasing wire delay caused by shrinking interconnect dimensions, synchronous many-core systems are now facing challenges such as to propagate high-frequency clock signals across the chip with limited power budget. Delivering a clock with low uncertainties across active dies with large chip density has also become one of the major tasks using conventional metallic interconnects. This thesis presents a novel architecture and comprehensive evaluations for a power- efficient and extremely low-delay approach using hybrid wire-wireless clock distribution network (CDN). The proposed hybrid CDN adopts wireless on-chip clock transmitters and receivers for broadcasting the clock signal globally. It then incorporates with conven- tional metal-based clock tree or mesh for local clock distribution. Comparisons between the proposed approach and two baseline architectures, namely a full fan-out tree and a global tree local mesh (TLM) structure, have been presented.
    [Show full text]
  • "Clock Distribution in Synchronous Systems"
    474 CLOCK DISTRIBUTION IN SYNCHRONOUS SYSTEMS CLOCK DISTRIBUTION IN SYNCHRONOUS SYSTEMS In a synchronous digital system, the clock signal is used to define a time reference for the movement of data within that system. Because this function is vital to the operation of a synchronous system, much attention has been given to the characteristics of these clock signals and the networks used in their distribution. Clock signals are often regarded as simple control signals; however, these signals have some very special characteristics and attributes. Clock signals are typically loaded with the greatest fanout, travel over the greatest dis- tances, and operate at the highest speeds of any signal, either control or data, within the entire system. Because the data signals are provided with a temporal reference by the clock signals, the clock waveforms must be particularly clean and sharp. Furthermore, these clock signals are particularly af- fected by technology scaling, in that long global interconnect lines become much more highly resistive as line dimensions are decreased. This increased line resistance is one of the pri- mary reasons for the increasing significance of clock distribu- tion on synchronous performance. Finally, the control of any differences in the delay of the clock signals can severely limit the maximum performance of the entire system and create catastrophic race conditions in which an incorrect data signal may latch within a register. Most synchronous digital systems consist of cascaded banks of sequential registers with combinatorial logic be- tween each set of registers. The functional requirements of the digital system are satisfied by the logic stages.
    [Show full text]