Microarchitectural Low-Power Design Techniques for Embedded Microprocessors

Microarchitectural Low-Power Design Techniques for Embedded Microprocessors THÈSE NO 7168 (2016) PRÉSENTÉE LE 11 NOVEMBRE 2016 À LA FACULTÉ DES SCIENCES ET TECHNIQUES DE L'INGÉNIEUR LABORATOIRE DE CIRCUITS POUR TÉLÉCOMMUNICATIONS PROGRAMME DOCTORAL EN GÉNIE ÉLECTRIQUE ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Jeremy Hugues-Felix CONSTANTIN acceptée sur proposition du jury: Prof. Y. Leblebici, président du jury Prof. A. P. Burg, Prof. D. Atienza Alonso, directeurs de thèse Prof. A. Chattopadhyay, rapporteur Prof. G. Karakonstantis, rapporteur Prof. L. Benini, rapporteur Suisse 2016 Acknowledgements I would like to express my gratitude to my thesis advisor Prof. Andreas Burg for giving me the opportunity to work together on such interesting topics throughout my PhD time, and for creating a supportive work environment in his laboratory. I am very grateful for our many technical discussions from which I learned a lot over the years regarding all aspects of engineering, first at ETH during my master studies and then later at EPFL. Moreover, I thank him for always having an open ear and that he was always available, whatever the problem or question, especially during important times. I acknowledge my co-advisor Prof. David Atienza for enabling the good cooperation between his laboratory (ESL) and the Telecommunications Circuits Laboratory (TCL), when I first arrived at EPFL and during the early time of my PhD. Prof. Luca Benini, Prof. Anupam Chat- topadhyay, and Prof. Georgios Karakonstantis I would like to thank for being the examiners on my thesis defense jury, and Prof. Yusuf Leblebici for taking the role as the jury president. I want to thank Ahmed Dogan from ESL for the fruitful cooperation during the first years of my PhD, building and researching architectures for biomedical signal processing. My thanks also go to Oskar Andersson for his support with the sub-threshold characterization of TamaRISC- CS, and Pascal Meinerzhagen who generously provided the sub-threshold capable memories for the design. I am grateful to Frank Gürkaynak for our discussions and his help on the cores with SHA-3 ISEs. For their contributions to the LISA model of the OpenRISC and its extensions for fault injection, I would like to extend my thanks to Lai Wang and Zheng Wang, respectively. I am particularly grateful to team DynOR (Andrea Bonetti, Adam Teman, Christoph Müller, and Lorenz Schmid) for all their efforts and extraordinary contributions to the successful design and measurement of our first test chip in 28 nm FD-SOI. This project would not have been possible without them. During my time at EPFL I had the pleasure to work in a wonderful environment. I was lucky to meet many terrific people over the years and had a number of great colleagues, who especially were always up for fun activities inside as well as outside the lab, which made my stay at TCL that much more enjoyable. I will take many fond memories of this time with me, from the many culinary adventures of various kinds to great nature excursions, or just the funniest discussions that we had in the office. I am very happy to be able to call them my friends: Adi Teman, Alexios Balatsoukas Stimming, Andrea Bonetti, Andrew Austin, Christian Senning, Christoph Müller, Filippo Borlenghi, Georgios Karakonstantis, Kynthia Chamilothori, Lorenz Schmid, Maitane Barrenetxea, Nicholas Preyss, Orion Afisiadis, Pablo Garcia, Pascal Giard, Pascal Meinerzhagen, Pavle Belanovic, Reza Ghanaatian, Rubén Braojos, Shrikanth Ganapathy. i Acknowledgements Finally, I want to thank my family for all their support over the years and for always believing in my goals, which helped me immensely during all my education. I also want to express my deepest gratitude to my girlfriend Sandra for all her love and her continuous support, which always kept me going through the last years of this journey. Lausanne, 2 November 2016 Jeremy Constantin ii Abstract Over the last two decades, embedded processing has become omnipresent in all forms of elec- tronic devices in order to provide increasingly complex features and richer user experiences. There is moreover a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. While technology scaling continues to provide improvements in both performance and power, architectural design has become equally if not even more important to meet desired performance requirements with severely limited power budgets. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. A popular technique for the improvement of processor performance in application-specific systems/scenarios are instruction set extensions (ISEs). In the context of cryptographic applications, we explore the effectiveness of ISEs for a range of different cryptographic hash functions on a microcontroller architecture. Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. The proposed ISE concepts are evaluated on the PIC24 architecture (a 16-bit microcontroller) with the final round SHA-3 candidate hash algorithms. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. For a case study of electrocardiogram (ECG) signal compression we show that significant power savings of more than 11x over the state of the art for programmable architectures are possible. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi- core architecture targeted at biomedical signal processing applications with moderate to high processing requirements ( 1 MOPS). A range of different microarchitectural techniques for È efficient memory organization are presented, which can provide significant energy savings of more than 50%. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. iii Abstract We then study an inherent suboptimality due to the worst-case design principle in syn- chronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for through- put improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 32-bit microprocessor with cycle- by-cycle dynamic clock adjustment. The microarchitecture of the employed OpenRISC core comprises a 6-stage pipeline, and its implementation is specifically tuned for DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of our DCA-enabled microarchitecture fabricated in 28 nm FD-SOI CMOS. The fully operational test chip includes a suitable clock generation unit which allows for cycle-by-cycle clock adjustment over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions through voltage scaling are provided. Keywords: low-power design, microarchitecture, instruction set architecture, instruction set extensions, ultra-low-power embedded processor, microcontroller, ASIC, VLSI design, sub-threshold operation, multi-core architecture, sensor nodes, biomedical signal processing, cryptographic hash functions, SHA-3, compressed sensing, dynamic timing margins, dynamic timing analysis, dynamic clock adjustment, timing errors, statistical fault injection, approximate computing, OpenRISC iv Zusammenfassung In den letzten zwei Jahrzehnten ist eingebettet Datenverarbeitung in jeglichen Formen von Elektronik allgegenwärtig geworden, um immer komplexere Funktionen und reichere User Ex- periences zur Verfügung zu stellen. Es gibt zudem einen starken Trend zu drahtlosen, batterie- betriebenen, tragbaren eingebetteten Systemen, die unter strengen Energieeinschränkungen arbeiten müssen. Folglich haben sich niedriger Energieverbrauch und hohe Energieeffizi- enz als die beiden Schlüsselkriterien für eingebetteten Prozessorentwurf entwickelt. Obwohl

Load more