Low Power Data-Dependent Transform Video and Still Image Coding

Total Page:16

File Type:pdf, Size:1020Kb

Low Power Data-Dependent Transform Video and Still Image Coding Low Power Data-Dependent Transform Video and Still Image Coding by Thucydides Xanthopoulos Bachelor of Science in Electrical Engineering Massachusetts Institute of Technology, June 1992 Master of Science in Electrical Engineering and Computer Science Massachusetts Institute of Technology, February 1995 Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology February 1999 c 1999 Massachusetts Institute of Technology. All rights reserved. Signature of Author Department of Electrical Engineering and Computer Science February 1, 1999 Certified by Anantha Chandrakasan, Ph.D. Associate Professor of Electrical Engineering Thesis Supervisor Accepted by Arthur Clarke Smith, Ph.D. Professor of Electrical Engineering Graduate Officer 2 Low Power Data-Dependent Transform Video and Still Image Coding by Thucydides Xanthopoulos Submitted to the Department of Electrical Engineering and Computer Science on February 1, 1999 in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science Abstract This work introduces the idea of data dependent video coding for low power. Algorithms for the Discrete Cosine Transform (DCT) and its inverse are introduced which exploit sta- tistical properties of the input data in both the space and spatial frequency domains in order to minimize the total number of arithmetic operations. Two VLSI chips have been built as a proof-of-concept of data dependent processing implementing the DCT and its inverse (IDCT). The IDCT core processor exploits the presence of a large number of zero- valued spectral coefficients in the input stream when stimulated with MPEG-compressed video sequences. A data-driven IDCT computation algorithm along with clock gating tech- niques are used to reduce the number of arithmetic operations for video inputs. The sec- ond chip is a DCT core processor that exhibits two innovative techniques for arithmetic operation reduction in the DCT computation context along with standard voltage scal- ing techniques such as pipelining and parallelism. The first method reduces the bitwidth of arithmetic operations in the presence of data spatial correlation. The second method trades off power dissipation and image compression quality (arithmetic precision.) Both chips are fully functional and exhibit the lowest switched capacitance per sample among past DCT/IDCT chips reported in the literature. Their power dissipation profile shows a strong dependence with certain statistical properties of the data that they operate on, according to the design goal. Thesis Supervisor: Anantha Chandrakasan Title: Associate Professor of Electrical Engineering 3 4 Acknowledgments I am very grateful to a number of people for help and contributions during my years as a student. I would like to thank my thesis advisor and mentor Anantha Chandrakasan for all his help and support. Anantha has been a constant source of stimulation and new re- search ideas and his impact on this work is immeasurable. I thank him for treating me as a peer and not as a subordinate, for providing me with the best possible (and expensive!) equipment for my work, for sending me to numerous conferences (as far as Hawaii and Switzerland) and for having a lot of faith in my abilities. Above all I wish to thank him for the unprecedented amount of time he has devoted to me and my work. I wish to thank my thesis readers Tayo Akinwande and Ichiro Masaki for their time and for the very useful comments and suggestions. This work has been greatly improved because of their time and effort. I am very grateful to my fellow graduate students in 38-107 for a number of reasons. Raj Amirtharajah has been a constant source of wit and interesting ideas (in addition to complaints!) and has been very helpful in clarifying various ideas regarding this work. He has probably had dinner with me more often in the little round table in 38-107 than he did with his girlfriend! Tom Simon, the brilliant designer, has been behind all the interesting circuits of the DCT chip such as the pads and the flip flops. His cerebral sarcasm and our long discussions about France, Mahler and crazy dictators have been excellent and fulfilling chip layout breaks! Jim Goodman, in addition to being an incessant source of coarse Canadian humor, has been instrumental in setting up the technology files and the libraries for the ADI process. His help during the DCT chip tapeout has been invaluable. Scott Meninger, fellow Patriot fan, taught me how to throw the football and introduced me to Mississippi Mud in addition to being an excellent officemate. He is probably the only one in 38-107 that can match my beer drinking abilities! Wendi Rabiner has been a great source of comments and suggestions both for my thesis and my area exam. Josh Bretz introduced me to the humor of the “Onion”, supplied me with great CDs and taught me never to bet against him especially when money is involved. SeongHwan Cho has been a great help with my area exam and has graciously added my name to all the VLD papers. Vadim Gutnik has been a source of interesting discussions and has been very helpful with Unix problems. Gangs Konduri has graciously agreed to help me with a very useful Perl script. I also thank him for the late night study breaks. I wish to thank Rex Min for doing a great job on the DCT/ IDCT demonstration board. I thank Dan McMahill, Don Hitko and 5 Nate Shnidman for stimulating discussions over the past years. Dan has also helped me a great deal with the Tango PCB software. I did not have much interaction with the newest members of our group Amit Sinha and Alice Wang due to my grueling schedule during the past nine months. Yet, I am well aware of their outstanding abilities and I wish them best of luck in their studies. I wish to thank Elizabeth Chung and Margaret Flaherty who have been extremely helpful with administrative matters. Tom Lohman, Myron Freeman and Tom Wingard have been very helpful with the computing infrastructure. I thank National Semiconductor Corporation for providing me with a fellowship dur- ing my last two years. I thank Michael Sampogna, Chuck Traylor and Hemant Bheda for all their help during my summer at National. I thank SHARP Corporation for their partial financial support for this research. I thank Yoshifumi Yaoi of SHARP for his help with the DAC paper and with the MPEG compres- sion standard in general. I wish to thank Rich Lethin for entrusting me with interesting projects and providing me with great opportunities. I thank Larry Dennison for his valuable advice in time of need and for his excellent SKILL tools that I used extensively throughout the development of the DCT chip. I would like to thank my Master thesis advisor Bill Dally for his help and support during my years at the MIT AI Laboratory. I also thank him for pointing me to the FMIDCT paper. I wish to thank my uncle John Xanthopoulos and aunt Virginia Xanthopoulos for all their help and support during my years in the States. Their house has always been open for me and their love has been unconditional. I thank my grandmothers Georgia Xanthopoulos and Themelina Kapellas for their love and support from the day I was born. I thank my parents Nikos and Maria and my sister Georgia for their love and support. I know how much you have missed me during the past 10 years because I have missed you exactly the same if not more. You have made me what I am and for this I am eter- nally grateful. My trips to Greece and our telephone conversations have been an incessant source of strength and courage for me to go on. I know that you want me to come back soon. I promise to you that I will come back sooner than you think. Finally, I wish to thank my companion Sylvi Dogramatzian for her unconditional love and support from the day we met. Your love has made me a better person. I thank you for your patience and all your sacrifices. I promise that I will make up for all the lost ground. This work is as much yours as it is mine. I wish to dedicate this work to the memory of both my grandfathers, Thucydides Xan- thopoulos and Manolis Kapellas. 6 Contents 1 Introduction 17 1.1 A Case for Low Power Hardware Compression/Decompression Macrocells 18 1.2 Low Power Design Methodoloy . ....................... 19 1.2.1 Algorithm Optimizations . ....................... 19 1.2.2 Architectural Optimizations . ....................... 21 1.2.3 Circuit Optimizations ........................... 21 1.3 Thesis Contributions and Overview ....................... 21 2 Background 23 2.1 The Discrete Cosine Transform and its Inverse ................. 23 2.1.1 Fast DCT/IDCT Algorithms ....................... 24 2.2 Distributed Arithmetic . ............................. 25 2.2.1 On the Successive Approximation Property of Distributed Arithmetic 27 2.3 Quantization . .................................. 29 2.3.1 The Lloyd-Max Optimum Quantizer . .................. 29 2.4 Image Quality Measure . ............................. 31 2.5 The MPEG Video Compression Standard . .................. 31 3 IDCT Core Processor 33 3.1 Background and Algorithmic Design ....................... 33 3.2 Chip Architecture .................................. 36 3.2.1 Clock Gating . ............................. 40 3.3 Circuit Design . .................................. 42 3.3.1 Arithmetic Circuits ............................. 42 3.3.2 Flip-Flops .................................. 45 7 8 CONTENTS 3.3.3 I/O Pads . .................................. 46 3.4 Power Estimation Methodology . ....................... 46 3.4.1 Glitch Power Investigation . ....................... 49 3.4.2 Pipeline Power Savings Investigation .................. 49 3.5 Chip I/O Specification and Usage . ....................... 52 3.6 Chip Physical Design and Packaging ......................
Recommended publications
  • Instruction-Level Distributed Processing
    COVER FEATURE Instruction-Level Distributed Processing Shifts in hardware and software technology will soon force designers to look at microarchitectures that process instruction streams in a highly distributed fashion. James E. or nearly 20 years, microarchitecture research In short, the current focus on instruction-level Smith has emphasized instruction-level parallelism, parallelism will shift to instruction-level distributed University of which improves performance by increasing the processing (ILDP), emphasizing interinstruction com- Wisconsin- number of instructions per cycle. In striving munication with dynamic optimization and a tight Madison F for such parallelism, researchers have taken interaction between hardware and low-level software. microarchitectures from pipelining to superscalar pro- cessing, pushing toward increasingly parallel proces- TECHNOLOGY SHIFTS sors. They have concentrated on wider instruction fetch, During the next two or three technology generations, higher instruction issue rates, larger instruction win- processor architects will face several major challenges. dows, and increasing use of prediction and speculation. On-chip wire delays are becoming critical, and power In short, researchers have exploited advances in chip considerations will temper the availability of billions of technology to develop complex, hardware-intensive transistors. Many important applications will be object- processors. oriented and multithreaded and will consist of numer- Benefiting from ever-increasing transistor budgets ous separately
    [Show full text]
  • CCIA Comments in ITU CWG-Internet OTT Open Consultation.Pdf
    CCIA Response to the Open Consultation of the ITU Council Working Group on International Internet-related Public Policy Issues (CWG-Internet) on the “Public Policy considerations for OTTs” Summary. The Computer & Communications Industry Association welcomes this opportunity to present the views of the tech sector to the ITU’s Open Consultation of the CWG-Internet on the “Public Policy considerations for OTTs”.1 CCIA acknowledges the ITU’s expertise in the areas of international, technical standards development and spectrum coordination and its ambition to help improve access to ICTs to underserved communities worldwide. We remain supporters of the ITU’s important work within its current mandate and remit; however, we strongly oppose expanding the ITU’s work program to include Internet and content-related issues and Internet-enabled applications that are well beyond its mandate and core competencies. Furthermore, such an expansion would regrettably divert the ITU’s resources away from its globally-recognized core competencies. The Internet is an unparalleled engine of economic growth enabling commerce, social development and freedom of expression. Recent research notes the vast economic and societal benefits from Rich Interaction Applications (RIAs), a term that refers to applications that facilitate “rich interaction” such as photo/video sharing, money transferring, in-app gaming, location sharing, translation, and chat among individuals, groups and enterprises.2 Global GDP has increased US$5.6 trillion for every ten percent increase in the usage of RIAs across 164 countries over 16 years (2000 to 2015).3 However, these economic and societal benefits are at risk if RIAs are subjected to sweeping regulations.
    [Show full text]
  • Adv Forensic
    Oklahoma State University School of Forensic Sciences Non-Thesis Creative Component Spring 2019 FRNS 5980 12-Week Course I. Course Description: This course is a 3 unit graduate level course focusing on the Forensic Sciences in relation to Fire Investigation and Explosives/Explosion Investigation. Each student will submit a topic that will further their understanding of one of the above areas of study. This class builds off of the Ethical Writing and Research Course as you use the same topic from that course. Method of Teaching: This course will utilize a variety of instructional methods, including assigned readings. In addition to assigned reading, students will research topics in current literature and provide their opinion on the matter, supported by references. Course Goals and Objectives: The goal of this course is to further understand the particular discipline each student is responsible in their professional occupation. However, an additional goal of this graduate level course is to prepare you for forensic investigations where you may be confronted by an original problem and be tasked with developing a solution. Therefore, your submitted assignments will be based on researching topics in current literature and applying your discoveries. Competencies: Students are required to demonstrate an appropriate level of accomplishment to include: Critical Thinking: The ability to analyze and support information. Writing: The ability to organize and communicate ideas efficiently and effectively through writing skills. Information Literacy: Demonstrate the ability to search, locate, access, and assess appropriate research materials/sources pertinent to course requirements. Students need to be able to use the best and most current information in writing their research papers for this course.
    [Show full text]
  • Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice
    Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice Jairam S, Madhusudan Rao, Jithendra Srinivas, Parimala Vishwanath, Udayakumar H, Jagdish Rao SoC Center of Excellence, Texas Instruments, India (sjairam, bgm-rao, jithendra, pari, uday, j-rao) @ti.com 1 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 2 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 3 Clock Gating Overview JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer • Suspend clocks selectively • No change to functionality • Specific to circuit structure • Possible to automate gating at RTL or gate-level JS/BGM – ISLPED08 4 Clock Network Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer
    [Show full text]
  • Traffic Safety Resource Prosecutors Miriam Norman- TSRP
    Traffic Safety Resource Prosecutors Miriam Norman- TSRP Re: Potential Impeachment Disclosure (PID) of WSP Toxicology Laboratory Date: August 7, 2020 We have received information from the WSP Toxicology Laboratory that is potential impeachment information (“PII”). This PID (potential impeachment disclosure) relates environmental contamination of portions of the WSP Toxicology Laboratory of methamphetamine. By environmental contamination, we mean that the methamphetamine level found in various areas exceeded the levels specified in WAC 246-205-541. The environmental contamination possibly contaminated some blood samples during the extraction process. In March of 2018, the toxicology laboratory took over laboratory and office space from the crime lab. The toxicology laboratory moved one ethanol instrument into that space. In addition, extractions of evidence were to be conducted in that space. Prior to becoming toxicology laboratory space, the crime lab used one of the lab areas to house a methamphetamine production laboratory for training purposes; this information was not known to the Toxicology laboratory at the time. It is hypothesized that this use resulted in environmental contamination of the area. During the extraction process performed in the contaminated area, analysts observed that some preliminary tests were positive for methamphetamine, while confirmation tests were negative for methamphetamine. There is a possibility that these “false presumptive positives” were due to the environmental contamination. The environmental methamphetamine contamination did not affect any results reported by the toxicology laboratory for two reasons: First, the one ethanol instrument in the room cannot read/detect methamphetamine. The ethanol instruments can only detect volatiles not drugs. Second, as to the drug cases, the toxicology laboratory discovered this environmental contamination due to the drug testing process.
    [Show full text]
  • Input and Output Index Mappings for a Prime-Factor- Decomposed Computation of Discrete Cosine Transform
    IEEE TRANSACTIONS ON ACOUSI'ICS. SPEECH, AND SIGNAL PROCESSING. VOL. 37. NO. 2, FEBRUARY 1989 231 Input and Output Index Mappings for a Prime-Factor- Decomposed Computation of Discrete Cosine Transform Abstract-This paper provides a direct derivation of the prime-fac- work and time, still providing a simple and nice structure. tor-decomposed computation algorithm of an N-point discrete cosine However, this technique has not yet been widely utilized transform for the number N decomposable into two relative prime numbers. It also presents input and output index mappings in the form mainly because its input and output index mappings are of tables-namely, ri-, it-, nc-, nK-, and k-tables. The index mapping seemingly too involved. In fact, the mappings are the only tables are useful for practical use of the prime-factor-decomposed barrier to overcome in applying the prime-factor algo- computation of arbitrarily sized discrete cosine transforms. rithm. This paper is therefore intended to provide a simple and organized method to perform the index mappings. In this paper, a formal direct derivation of the prime- I. INTRODUCTION factor-decomposed computation algorithm will be pre- INCE its first introduction in 1974 [l], the discrete sented first. The derivation is a direct one in the sense that Scosine transform (DCT) has found applications in it is based on the real cosine function without resorting to speech and image signal processing [1]-[8] as well as in the DFT expressions or the complex functions. Then, telecommunication signal processing [9], [lo]. The DCT based on the equations obtained during the derivation, in- has been applied for speech and image compression be- put and output index mappings will be introduced in the cause its performance was nearly optimal, yet not being form of tables.
    [Show full text]
  • Saber Eletrônica, Designers Pois Precisamos Comprovar Ao Meio Anunciante Estes Números E, Assim, Carlos C
    editorial Editora Saber Ltda. Digital Freemium Edition Diretor Hélio Fittipaldi Nesta edição comemoramos o fantástico número de 258.395 downloads da edição 460 digital em PDF que tivemos nos primeiros 50 dias de circu- www.sabereletronica.com.br lação. Assim, esperamos atingir meio milhão em twitter.com/editora_saber seis meses. Na fase de teste, no ano passado, com Editor e Diretor Responsável a Edição Digital Gratuita que chamamos de “Digital Hélio Fittipaldi Conselho Editorial Freemium (Free + Premium) Edition”, já atingimos João Antonio Zuffo este marco e até ultrapassamos. Redação Fica aqui nosso agradecimento a todos os que Hélio Fittipaldi Augusto Heiss seguiram nosso apelo, para somente fazerem Revisão Técnica Eutíquio Lopez download das nossas edições através do link do Portal Saber Eletrônica, Designers pois precisamos comprovar ao meio anunciante estes números e, assim, Carlos C. Tartaglioni, Diego M. Gomes obtermos patrocínio para manter a edição digital gratuita. Publicidade Aproveitamos também para avisar aos nossos leitores de Portugal, Caroline Ferreira, cerca de 6.000 pessoas, que infelizmente os custos para enviarmos as Nikole Barros revistas impressas em papel têm sido altos e, por solicitação do nosso Colaboradores Alexandre Capelli, distribuidor, não enviaremos mais os exemplares impressos em papel Bruno Venâncio, para distribuição no mercado português e ex-colônias na África. César Cassiolato, Dante J. S. Conti, Em junho teremos a edição especial deste semestre e o assunto prin- Edriano C. de Araújo, cipal é a eletrônica embutida (embedded electronic), ou como dizem os Eutíquio Lopez, Tsunehiro Yamabe espanhóis e portugueses: electrónica embebida. Como marco teremos também no Centro de Exposições Transamérica em São Paulo, a 2ª edição da ESC Brazil 2012 e a 1ª MD&M, o maior evento de tecnologia para o mercado de design eletrônico que, neste ano, estará sendo promovido PARA ANUNCIAR: (11) 2095-5339 pela UBM junto com o primeiro evento para o setor médico/odontoló- [email protected] gico (a MD&M Brazil).
    [Show full text]
  • Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗
    IEICE TRANS. INF. & SYST., VOL.E101–D, NO.4 APRIL 2018 1116 PAPER Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: A Practical Approach∗ Carlos Cesar CORTES TORRES†a), Nonmember, Hayate OKUHARA†, Student Member, Nobuyuki YAMASAKI†, Member, and Hideharu AMANO†, Fellow SUMMARY In the past decade, real-time systems (RTSs), which must in RTSs. These techniques can improve energy efficiency; maintain time constraints to avoid catastrophic consequences, have been however, they often require a large amount of power since widely introduced into various embedded systems and Internet of Things they must control the supply voltages of the systems. (IoTs). The RTSs are required to be energy efficient as they are used in embedded devices in which battery life is important. In this study, we in- Body bias (BB) control is another solution that can im- vestigated the RTS energy efficiency by analyzing the ability of body bias prove RTS energy efficiency as it can manage the tradeoff (BB) in providing a satisfying tradeoff between performance and energy. between power leakage and performance without affecting We propose a practical and realistic model that includes the BB energy and the power supply [4], [5].Itseffect is further endorsed when timing overhead in addition to idle region analysis. This study was con- ducted using accurate parameters extracted from a real chip using silicon systems are enabled with silicon on thin box (SOTB) tech- on thin box (SOTB) technology. By using the BB control based on the nology [6], which is a novel and advanced fully depleted sili- proposed model, about 34% energy reduction was achieved.
    [Show full text]
  • Best Texting Apps with Excellent Security Omigo.Ir/R/3Ch1 : ﻟﯿﻨﮏ (Omigo.Ir/Qrmvcdifapricx) Hildanoori ﺧﻮاﻧﺪن 13 دﻗﯿﻘﻪ
    Best Texting Apps with excellent security omigo.ir/r/3ch1 : ﻟﯿﻨﮏ (omigo.ir/qrmvcdifapricx) HildaNoori ﺧﻮاﻧﺪن 13 دﻗﯿﻘﻪ We’ve given priority to services with excellent security practices. All services on this list implement some form of end-to-end encryption, which means not even the service provider can read message contents. Some services go a step further by putting security at the center of the product, incorporating additional safeguards like decentralized networks and self-destructing messages. The 7 Best Texting Apps WhatsApp for a balance among features, security, and convenience Viber for public group chats Telegram for speed Signal for simplicity Einricht for security => Download: Facebook Messenger for doing more than just chatting Tox for a decentralized chat service ********************************************************************** WhatsApp (iOS, Android, Mac, Windows, Web) Best texting app for a balance among features, security, and convenience WhatsApp screenshots WhatsApp is the undisputed ruler of free mobile messaging in the West. Launched in 2009 as a way to send messages over a data connection rather than SMS, WhatsApp was eventually acquired by Facebook in 2014. Since then, the service has grown both its feature set and user base, exceeding a billion daily users in 2017. The app is a simple messaging client that supports basic text chat, as well as photos, videos, and voice messaging. You can also send files to other users on WhatsApp as long as they’re under 100MB. Chats take the form of one-on-one interactions with other WhatsApp users or group chats of up to 256 participants. In April 2016, the company rectified one of the longstanding criticisms of the service by adding end-to-end encryption.
    [Show full text]
  • Internet: a Major Resource for Toxicologists* San Joy Kumar Pal T, Aamir Nazir, Lndranil Mukhopadhyay, D K
    Indian Journal of Experimental Biology Vol. J9, December2001, pp. 1207-1213 Mini Review Internet: A major resource for toxicologists* San joy Kumar Pal t, Aamir Nazir, lndranil Mukhopadhyay, D K. Saxena & D Kar Chowdhuri * Embryotoxicology Section, Industrial Toxicology Research Centre, Mahatma Gandhi Marg, P.B. No. 80, Lucknow 226 00 I, India Use of the Internet in developing countries is now growing fa ster. Internet has created a new conduit not only for communication but also in the access, sharing and exchange of information among scientists. The Internet is now viewed as the world's biggest library where retrieval of scientific literature and other information resources are possible within seconds. Large volumes of toxicological information resources are available on the Internet. This review outlines some sites that may be of great importance and useful to the toxicologist. The Internet is a huge network of computers that span information transfer over a telephone line) are 1 the globe • More popularly known as the "information indispensable. There are several ways to gain access superhighway" the Internet has become a very to the Internet backbone. For private use, one has to pervasive influence in everyday life. Its power is more look for an Internet Access Provider (lAP). This is a strongly seen among scientists, as more and more commercial company that provides access to its information is made available through "the net" Internet backbone network; connection with the whether they are gene sequences, experimental data customer is usually established via a modem and 2 or whole journal articles • The Internet is also an communication line.
    [Show full text]
  • Register Allocation and VDD-Gating Algorithms for Out-Of-Order
    Register Allocation and VDD-Gating Algorithms for Out-of-Order Architectures Steven J. Battle and Mark Hempstead Drexel University Philadelphia, PA USA Email: [email protected], [email protected] Abstract—Register Files (RF) in modern out-of-order micro- 100 avg Int avg FP processors can account for up to 30% of total power consumed INT → → by the core. The complexity and size of the RF has increased due 80 FP to the transition from ROB-based to MIPSR10K-style physical register renaming. Because physical registers are dynamically 60 allocated, the RF is not fully occupied during every phase of the application. In this paper, we propose a comprehensive power 40 management strategy of the RF through algorithms for register allocation and register-bank power-gating that are informed by % of runtime 20 both microarchitecture details and circuit costs. We investigate algorithms to control where to place registers in the RF, when to 0 disable banks in the RF, and when to re-enable these banks. We 60 80 100 120 140 160 include detailed circuit models to estimate the cost for banking Num. Registers Occupied and power-gating the RF. We are able to save up to 50% of the leakage energy vs. a baseline monolithic RF, and save 11% more Fig. 1. Average Reg File occupancy CDF for SPEC2006 workloads. leakage energy than fine-grained VDD-gating schemes. 1 1 Index Terms—Computer architecture, Gate leakage, Registers, SRAM cells 0.8 0.8 I. INTRODUCTION 0.6 0.6 F.cactus I.astar 0.4 0.4 Out-of-order superscalar processors, historically found only F.gems I.libq in high-performance computing environments, are now used in F.milc I.go 0.2 F.pov 0.2 Imcf a diverse range of energy-constrained applications from smart- F.zeus Iomn phones to data-centers.
    [Show full text]
  • A 65 Nm 2-Billion Transistor Quad-Core Itanium Processor
    18 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 1, JANUARY 2009 A 65 nm 2-Billion Transistor Quad-Core Itanium Processor Blaine Stackhouse, Sal Bhimji, Chris Bostak, Dave Bradley, Brian Cherkauer, Jayen Desai, Erin Francom, Mike Gowan, Paul Gronowski, Dan Krueger, Charles Morganti, and Steve Troyer Abstract—This paper describes an Itanium processor imple- mented in 65 nm process with 8 layers of Cu interconnect. The 21.5 mm by 32.5 mm die has 2.05B transistors. The processor has four dual-threaded cores, 30 MB of cache, and a system interface that operates at 2.4 GHz at 105 C. High speed serial interconnects allow for peak processor-to-processor bandwidth of 96 GB/s and peak memory bandwidth of 34 GB/s. Index Terms—65-nm process technology, circuit design, clock distribution, computer architecture, microprocessor, on-die cache, voltage domains. I. OVERVIEW Fig. 1. Die photo. HE next generation in the Intel Itanium processor family T code named Tukwila is described. The 21.5 mm by 32.5 mm die contains 2.05 billion transistors, making it the first two billion transistor microprocessor ever reported. Tukwila combines four ported Itanium cores with a new system interface and high speed serial interconnects to deliver greater than 2X performance relative to the Montecito and Montvale family of processors [1], [2]. Tukwila is manufactured in a 65 nm process with 8 layers of copper interconnect as shown in the die photo in Fig. 1. The Tukwila die is enclosed in a 66 mm 66 mm FR4 laminate package with 1248 total landed pins as shown in Fig.
    [Show full text]