UNIVERSITY of CALIFORNIA, SAN DIEGO Holistic Design for Multi-Core Architectures a Dissertation Submitted in Partial Satisfactio

UNIVERSITY OF CALIFORNIA, SAN DIEGO Holistic Design for Multi-core Architectures A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science (Computer Engineering) by Rakesh Kumar Committee in charge: Professor Dean Tullsen, Chair Professor Brad Calder Professor Fred Chong Professor Rajesh Gupta Dr. Norman P. Jouppi Professor Andrew Kahng 2006 Copyright Rakesh Kumar, 2006 All rights reserved. The dissertation of Rakesh Kumar is approved, and it is acceptable in quality and form for publication on micro- film: Chair University of California, San Diego 2006 iii DEDICATIONS This dissertation is dedicated to friends, family, labmates, and mentors { the ones who taught me, indulged me, loved me, challenged me, and laughed with me, while I was also busy working on my thesis. To Professor Dean Tullsen for teaching me the values of humility, kind- ness,and caring while trying to teach me football and computer architecture. For always encouraging me to do the right thing. For always letting me be myself. For always believing in me. For always challenging me to dream big. For all his wisdom. And for being an adviser in the truest sense, and more. To Professor Brad Calder. For always caring about me. For being an inspiration. For his trust. For making me believe in myself. For his lies about me getting better at system administration and foosball even though I never did. To Dr Partha Ranganathan. For always being there for me when I would get down on myself. And that happened often. For the long discussions on life, work, and happiness. For always being willing to listen. For all his infectious optimism, enthusiasm, and energy. To Dr. Norman Jouppi. For his wisdom. For his humility. For his boundless energy and desire to teach. For always wanting me to get better. Most importantly, for getting my jokes and laughing even at the bad ones. To Drs. Andrew Kahng and Rajesh Gupta for the always constructive feedback and discussions, and for holding high standards for excellence. To Dr. Fred Chong for his invaluable comments. To Drs. Ravi Nair, Victor Zyuban, and Keith Farkas for being excellent mentors. To my labmates for the memories and the friendships. To Jeremy for always listening to my rants, and for the countless hours spent discussing with me the ` `what are the heck are we doing" question. To Jeff for teaching me much of what I know about computer architecture and America. To Satish for his sense iv of balance and his ever willingness to be my sounding board. To Erez for always helping me with my perspective on life. To John, Eric, and Jamison for being my mentors in so many ways. To Wei for his enthusiasm. To Mike and Weifeng for their sincerity. To Jack for his sense of humor. To Matt for tennis and the beautiful Easter party. To Shubhro for being a great roommate, labmate, and friend. To Ganesh for his infectious laughter. And to so many other past and present members of the lab for shaping it into what it is. To my friends who helped me keep my sanity outside work. To Puneet, Satya, Sagnik, Sharma, Swamy, Sameer, and so many others. For the many hours of fun. For sharing sorrow and laughter. For their invaluable friendships. For letting me talk them into going out for a lunch, dinner, or a movie with me always. Even when they wanted to cook at home or had work to do. Finally, to my family. For their love, support, encouragement, guidance, and checks. For their character. For my character. For laughing with me. For crying with me. For crying for me. For bliss. For happiness. They are the ones responsible for everything that I am and everything that I want to be. And to them I dedicate this dissertation. v \Karmanye va dhikarasthe, Ma phaleshu kadachana Karma Phala heturbhur ma te sangastva karmani" vi TABLE OF CONTENTS Signature Page . iii Dedication Page . iv Epigraph . vi Table of Contents . vii List of Figures . xi List of Tables . xiv Acknowledgments . xv Vita and Publications . xvii Abstract . xx I Introduction . 1 A. Design Methodology for Multi-cores . 2 1. Holistic Design . 3 B. Overview of Dissertation . 7 II Background . 9 A. Why Multi-cores . 9 B. Chronicling Multi-core Efforts . 11 III Holistic Design for Adaptability: Single-ISA Heterogeneous Multi-core Architectures . 17 A. Inefficiency due to Workload Diversity . 17 B. Single-ISA Heterogeneous Multi-core Architectures . 20 C. Evaluation Methodology . 24 D. Scheduling for Throughput: Analysis and Results . 30 E. Acknowledgment . 44 IV Holistic Design for Adaptability: Power Advantages of Heterogeneity . 45 A. Discussion of Core Switching . 46 B. Choice of cores . 47 C. Switching applications between cores . 49 D. Evaluation Methodology . 50 1. Modeling of CPU Cores . 51 vii 2. Modeling Power . 53 3. Estimating Chip Area . 57 4. Modeling Performance . 57 E. Scheduling for Power: Analysis and Results . 59 1. Variation in Core Performance and Power . 59 2. Oracle Heuristics for Dynamic Core Selection . 62 3. Static Core Selection . 64 4. Realistic Dynamic Switching Heuristics . 66 5. Practical heterogeneous architectures . 69 F. Summary . 70 G. Acknowledgment . 70 V Holistic Design for Adaptability: Designing Heterogeneous Multi-cores From the Ground Up . 71 A. Overview of Related Proposals . 71 B. Benefits of Ground-up Design . 75 C. From Workloads to Multi-core Design . 77 D. Customizing Cores to Workloads . 79 E. Methodology . 81 1. Modeling of CPU Cores . 81 2. Modeling Power and Area . 82 3. Modeling Performance . 84 F. Analysis and Results . 88 1. Fixed Area Budget . 88 2. Fixed Power Budget . 95 3. Impact of Non-monotonic Design . 97 4. Varying Thread-Level Parallelism . 97 5. Dynamic Switching . 98 6. Efficient Search Techniques . 99 G. Validating Results . 101 H. Acknowledgment . 102 VI Obviating Overprovisioning: Conjoined-core Multiprocessing Architec- tures . 107 A. Related Work . 109 B. Baseline Architecture . 110 1. Baseline processor model . 111 2. Die floorplan and area model . 112 C. Conjoined-core Architectures . 114 1. ICache sharing . 116 2. DCache sharing . 117 viii 3. Crossbar sharing . 117 4. FPU sharing . 119 5. Summary of sharing . 120 D. Experimental Methodology . 120 E. Simple Sharing . 122 1. Sharing the ICache . 123 2. DCache sharing . 125 3. FPU sharing . 126 4. Crossbar sharing . 126 5. Simple sharing summary . 127 F. Intelligent Sharing of Resources . 128 1. ICache sharing . 128 2. DCache sharing . 130 3. Symbiotic assignment of threads . 132 G. A Unified Conjoined-Core Architecture . 132 H. Acknowledgment . 134 VII The Interconnect Problem and the Need for Co-design . 140 A. Related Work . 141 B. Interconnection Mechanisms . 142 1. Shared Bus Fabric . 142 2. P2P Links . 147 3. Crossbar Interconnection System . 147 C. Modeling Area, Power, and Latency . 149 1. Wiring Area Overhead . 149 2. Logic Area Overhead . 150 3. Power . 151 4. Latency . 152 D. Modeling Multi-core Architectures . 152 1. Workload . 154 E. Shared Bus Fabric: Overheads and Design Issues . 155 1. Area . 156 2. Power . 158 3. Performance . 158 F. Shared Caches and the Crossbar . 161 1. Area and power overhead . 161 2. Performance . 163 G. Scaling with Technological Parameters . 165 H. An Example Holistic Approach to Interconnection . 166 I. Acknowledgment . ..

UNIVERSITY of CALIFORNIA, SAN DIEGO Holistic Design for Multi-Core Architectures a Dissertation Submitted in Partial Satisfactio

Implicitly-Multithreaded Processors

Kaisen Lin and Michael Conley

A Speculative Control Scheme for an Energy-Efficient Banked Register File

The Microarchitecture of a Low Power Register File

REPORT Compaq Chooses SMT for Alpha Simultaneous Multithreading

PERL – a Register-Less Processor

Efficient Processing of Deep Neural Networks

Prozessorarchitektur Am Beispiel Des Amdathlon

A Bibliography of Publications in IEEE Micro

Optimizing SIMD Execution in HW/SW Co-Designed Processors

Sony's Emotionally Charged Chip

Commemorative Booklet for the Thirty-Fifth Asilomar Microcomputer Workshop April 15-17, 2009