Phd Musoll.Pdf

iii Signatures iv To al l my family vi ACKNOWLEDGEMENTS First of all I would like to stress that this work would not have b een p ossible without my teacher and now advisor Professor Jordi Cortadella Although a fearless and fulltime devoted leader to the design of asynchronous circuits he has always b een available for help always putting me on the right track always seeing the bright side of my problems I would like to thank Enric Pastor Fermn Sanchez Oriol Roig Marco A Pena and Gianluca Cornetta all members of the CADVLSI group where I have b een working these last years for their help and supp ort I would like to thank also my colleagues at the Computer Architecture Department Sp ecial thanks to Rosa M Badaand TomasLang who have reviewed some parts of this work and to Joan Figueras Their technical discussions and suggestions have denitely improved the quality of this work To all of them my sincere thanks This work has b een supp orted in part by the Ministerio de Educacion y Ciencia of Spain under contracts CYCIT TICE and TIC My bank ac count is denitely indebted to the Departament dEnsenyament de la Generalitat de Catalunya for their help nancing my do ctorate and recognizes the Amics de Gaspar de Portola asso ciation for providing the funds to present part of this work in a conference in California I would like to acknowledge the secretary sta of the Computer Architecture Depart ment for coping so well with the administrative pap erwork My sincere thanks also to the systems group p eople who have p erfectly managed the advanced computing resources provided by the Computer Architecture Department During the last years this work has absorb ed all my time even the socalled free time I would like to thank the p eople who have tried most of the times unsuccess fully to put for a while my eyes out of the b o oks and my hands out of the keyboard Sp ecial thanks to my b est friends and the UPC Rowing team for this work Last but not least thanks to my parents Pilar and Pere and my brother Ramon for their unconditional patience and supp ort Thank you all vii viii Highlevel and logic synthesis techniques for low power CONTENTS ACKNOWLEDGEMENTS vii LIST OF FIGURES ix LIST OF TABLES xiii INTRODUCTION Why lowpower design techniques Reliability Chip package Batteries Applications Lowpower design approach Overall p ower reduction Research ob jectives Overview STATEOFTHEART IN LOWPOWER DESIGN Powerconsumption sources Static dissipation Dynamic dissipation Dening p ower consumption System layer System shutdown System partitioning Algorithm selection Architecture layer Parallel and pip elined pro cessing Compiler transformations v vi Highlevel and logic synthesis techniques for low power Cache design Data representation Logic layer Combinational circuits Sequential circuits Circuit layer Logic style Synchronous vs asynchronous design Design style Adiabatic computing Layout layer Technology layer Technology scaling and SOI pro cess Packaging technology Lowpower techniques summary Conclusions HIGHLEVEL SYNTHESIS TECHNIQUES FOR LOW POWER Introduction Previous work Estimating p ower consumption Reducing p ower consumption Powerconsumption mo del of functional units Registertransfer level techniques for low p ower Lo op interchange Op erand reordering Op erand sharing Op erand retaining Op erand similarity Summary of targeted circuit domains Scheduling and register binding for low p ower Highlevel synthesis Previous work on highlevel synthesis for low p ower Overview Contents vii Scheduling for low p ower Register binding for low p ower Conclusions ARCHITECTURAL TECHNIQUE FOR LOW POWER Introduction Parallel multipliers Previous Work Potential p ower reduction in array multipliers TransitionRetaining Barriers Implementation of the TRB Results Conclusions LOGICCIRCUIT TECHNIQUE FOR LOW POWER Introduction Motivation examples Previous work and overview Overview of our approach Power consumption of CMOS gates Denitions and overview Transition density Extended p owerconsumption mo del Poweroptimization algorithm Algorithm overview Monotonic characteristic Equilibrium probabilities Exhaustive exploration of gate congurations Results Scenarios for the exp eriments Discussion Conclusions CONCLUSIONS AND FUTURE RESEARCH Introduction Contributions viii Highlevel and logic synthesis techniques for low power System layer Architecture layer Logiccircuit layer Future research A CIRCUIT DESIGN AND ANALYSIS TOOLS A Introduction A Ocean A SLS A SIS REFERENCES LIST OF FIGURES Chapter Stateoftheart in battery technology Pow Powerconsumption trend Mat Dierent layers in the design ow of a circuit Power breakdown in the ThinkPad noteb o ok Ike Chapter Static CMOS inverter schematic transistor and switch representa tions Shortcircuit p ower dissipation a Input waveform b charging and c discharging the capaci tance C load a Datapath circuit at V b Parallel and c pip elined versions at V BCS 3 2 Implementing p olynomial X AX BX C a straightforward and b Horners scheme Substituting one multiplication by one addition a Unbalanced and b balanced dataow graph of the addition of four op erands Delaying signals to eliminate hazards Dierent sub ject graphs implementing a input NAND function a Sub ject graph b available gates c lowarea mapping d lowpower mapping a FSM example b two dierent co dings with dierent switching activity among them RP Sequential circuit a b efore and b after retiming Comparator circuit a without and b with disabled input regis ters ix x Highlevel and logic synthesis techniques for low power a STG and b FSM with the registers enableddisabled by signal f a a Static and b dynamic implementations of the b o olean function Y A A B B 1 2 1 2 a Original synchronous system b asynchronous system with adaptive scaling of the supply voltage NS Design time vs p ower consumption of some design styles Clo ck tree driving schemes a buers at the clo ck source and b distributed buers Chapter Activity of a bit data stream for dierent temp oral correlations data approximated from LRa a Average activity in a multiplier as a function of the constant value data approximated from CR b A parallel and serial implementations of an adder tree Transformations for reducing a activity at the address bus and a number of memory references a Coarsegrained and b negrained p owerconsumption mo del for an bit Bo oth multiplier a Dataow graph b and c are two p ossible schedules and functionalunit bindings The input data order in b leads to a higher op erand AHD than in b Big arrows indicate the input data Thin arrows indicate the AHD b etween two consecutive input data Example of application of the lo opinterchange technique a Motion estimation algorithm and b motionestimation algo rithm with two lo op interchanges Datapath for executing a the motionestimation algorithm and b the matrixvector pro duct algorithm Matrixvector pro duct algorithm when referencing matrix A a by rows and b by columns a MAC op erator for p and b DFG of the thorder LMS adaptive lter Dierent inputreusing graphs for a the MAC op erator and b the symmetric matrixvector pro duct op erator a DFG of the AR lter bc two p ossible schedule and bindings List of Figures xi a Original co de and b after lo op unrolling a Lowpass image lter algorithm b DFG of the inner lo op of a without division by nine and c DFG after lo op unrolling a DFG of the thorder FIR lter bc two p ossible schedule and bindings DFG of the Dierential Equation Solver Highlevel synthesis pro cess Example of p eakp ower and averagepower optimization during the scheduling and functionalunit binding tasks a DFG b adder characteristics ce dierent schedules and bindings and d re sults Register binding for low p ower Listscheduling algorithm Lowpower listscheduling algorithm a DFG of the LMS lter and b schedule and FU binding Registerbinding algorithm for low p ower Chapter a Circuit structure with glitching b blo ck C presents more activity than blo ck A bit array multiplier a FA and b HA structure Signal transitions p er op eration for each basic blo ck in a bit array multiplier Useful and total number of signal transitions for an and bit array multiplier a Possible lo cations for one TRB and b transitions p er input vector for dierent lo cations of one TRB in a bit array Signal transitions p er input vector in a bit array multiplier a General view of the design with no TRBs and side view with b no TRBs c one TRB and d two TRBs 2 a Comparison b etween Static CMOS and C MOS and b avoid ing directpath currents bit array with two TRBs The four types of delaycells TA TD are shown Design of a FA xii Highlevel and logic synthesis techniques for low power Layout of an bit array multiplier in a seaofgates design style Chapter a Four implementations of function y a a b and b rel ative p ower consumption for two dierent input activity scenarios SPICE simulations of congurations A and D in Figure for two dierent input activity scenarios Transistorreordering technique in a input NAND gate example from SLW a Static CMOS input NAND gate b relative p ower consump tion for dierent input activities without varying the equilibrium probabilities of the inputs a Static CMOS gate representation and b algorithm to obtain function H n k bd

Phd Musoll.Pdf

Digital Semiconductor Alpha 21064 and Alpha 21064A Microprocessors Hardware Reference Manual

Alpha 21064A Microprocessors Data Sheet

Computer Architectures an Overview

Virtual Memory CS740 October 13, 1998

Address Translation & Caches Outline

Database Integration

Thesis May Never Have Been Completed

Virtual Memory

Iilihfflf WWETWIU HI Ull ISTITUTO NAZIONALE DI FISICA NUCLEARE - ISTITUTO NAZIONALE DI FISICA NUCLEARE - \ST, ^3 Laboratori Nazionali Di Frascati

Advancements in Microprocessor Architecture for Ubiquitous AI—An Overview on History, Evolution, and Upcoming Challenges in AI Implementation

A 160Mhz 32B 0.5W CMOS ARM Processor, Sribalan

History of Processor Performance