C 2012 Neal Clayton Crago ENERGY-EFFICIENT LATENCY TOLERANCE for 1000-CORE DATA PARALLEL PROCESSORS with DECOUPLED STRANDS
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Chip Multi-Cluster Architecture with Locality Aware Task Distribution
A CHIP MULTI-CLUSTER ARCHITECTURE WITH LOCALITY AWARE TASK DISTRIBUTION A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences 2007 Matthew James Horsnell School of Computer Science Contents List of Tables 7 List of Figures 8 List of Algorithms 12 Abstract 13 Declaration 14 Copyright 15 Acknowledgements 16 1 Introduction 17 1.1 Motivation . 17 1.2 Microprocessor Design Challenges . 19 1.2.1 Wire Delay . 20 1.2.2 Memory Gap . 21 1.2.3 Limits of Instruction Level Parallelism . 22 1.2.4 Power Limits . 23 1.2.5 Design Complexity . 24 1.3 Design Solutions . 26 1.3.1 Exploiting Parallelism . 26 2 1.3.2 Partitioned Designs . 29 1.3.3 Bridging the Memory Gap . 30 1.3.4 Design Abstraction and Replication . 31 1.4 Summary . 33 1.5 Research Aims . 33 1.6 Contributions . 33 1.7 Thesis Structure . 34 1.8 Publications . 35 2 Parallelism 36 2.1 Application Parallelism . 36 2.1.1 Amdahl's Law . 37 2.1.2 Implicit and Explicit Parallelism . 37 2.1.3 Granularity of Parallelism . 39 2.2 Architectural Parallelism . 42 2.2.1 Bit Level Parallelism . 42 2.2.2 Data Level Parallelism . 43 2.2.3 Instruction Level Parallelism . 44 2.2.4 Multithreading . 47 2.2.5 Simultaneous Multithreading . 50 2.2.6 Chip Multiprocessors . 50 2.3 Summary . 51 3 Jamaica CMP and Software Environment 52 3.1 The Jamaica Chip Multiprocessor . 52 3.1.1 Multithreading . 53 3.1.2 Register Windows . -
Chip Multithreading: Opportunities and Challenges
Chip Multithreading: Opportunities and Challenges Lawrence Spracklen & Santosh G. Abraham Scalable Systems Group Sun Microsystems Inc., Sunnyvale, CA lawrence.spracklen,santosh.abraham @sun.com f g Abstract be improved significantly due to low ILP and off-chip CPI is large and growing, due to relative increases in memory latency. How- Chip Multi-Threaded (CMT) processors provide support for ever, typical server applications concurrently serve a large number many simultaneous hardware threads of execution in various of users or clients; for instance, a contemporary database server ways, including Simultaneous Multithreading (SMT) and Chip may have hundreds of active processes, each associated with a dif- Multiprocessing (CMP). CMT processors are especially suited to ferent client. Furthermore, these processes are currently multi- server workloads, which generally have high levels of Thread- threaded to hide disk access latencies. This structure leads to high Level Parallelism (TLP). In this paper, we describe the evolution levels of TLP. Thus, it is extremely attractive to couple the high of CMT chips in industry and highlight the pervasiveness of CMT TLP in the application domain with support for multiple threads designs in upcoming general-purpose processors. The CMT de- of execution on a processor chip. sign space accommodates a range of designs between the extremes Chip Multi-Threaded (CMT) processors support many simulta- represented by the SMT and CMP designs and a variety of attrac- neous hardware strands (or threads) of execution via a combination tive design options are currently unexplored. Though there has of support for multiple cores (Chip Multiprocessors (CMP)) [1] been extensive research on utilizing multiple hardware threads to and Simultaneous Multi-Threading (SMT) [2]. -
Reducing Exception Management Overhead with Software Restart
Reducing Exception Management Overhead with Software Restart Markers by Mark Jerome Hampton Bachelor of Science in Computer and Systems Engineering Bachelor of Science in Computer Science Rensselaer Polytechnic Institute, May 1999 Master of Science in Electrical Engineering and Computer Science Massachusetts Institute of Technology, June 2001 Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2008 c Massachusetts Institute of Technology 2008. All rights reserved. Author............................................................................ Department of Electrical Engineering and Computer Science February 1, 2008 Certified by........................................................................ Krste Asanovi´c Associate Professor Thesis Supervisor Accepted by....................................................................... Terry P. Orlando Chair, Department Committee on Graduate Students 2 Reducing Exception Management Overhead with Software Restart Markers by Mark Jerome Hampton Submitted to the Department of Electrical Engineering and Computer Science on February 1, 2008, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Modern processors rely on exception handling mechanisms to detect errors and to implement various features such as virtual memory. However, these mechanisms are typically hardware- intensive because of the need -
Runahead Threads
ADVERTIMENT . La consulta d’aquesta tesi queda condicionada a l’acceptació de les següents condicions d'ús: La difusió d’aquesta tesi per mitjà del servei TDX ( www.tesisenxarxa.net ) ha estat autoritzada pels titulars dels drets de propietat intel·lectual únicament per a usos privats emmarcats en activitats d’investigació i docència. No s’autoritza la seva reproducció amb finalitats de lucre ni la seva difusió i posada a disposició des d’un lloc aliè al servei TDX. No s’autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant al resum de presentació de la tesi com als seus continguts. En la utilització o cita de parts de la tesi és obligat indicar el nom de la persona autora. ADVERTENCIA . La consulta de esta tesis queda condicionada a la aceptación de las siguientes condiciones de uso: La difusión de esta tesis por medio del servicio TDR ( www.tesisenred.net ) ha sido autorizada por los titulares de los derechos de propiedad intelectual únicamente para usos privados enmarcados en actividades de investigación y docencia. No se autoriza su reproducción con finalidades de lucro ni su difusión y puesta a disposición desde un sitio ajeno al servicio TDR. No se autoriza la presentación de su contenido en una ventana o marco ajeno a TDR (framing). Esta reserva de derechos afecta tanto al resumen de presentación de la tesis como a sus contenidos. En la utilización o cita de partes de la tesis es obligado indicar el nombre de la persona autora. -
Architectural Support for Parallel Computers with Fair Reader/Writer Synchronization
Universidad de Cantabria Departamento de Electrónica y Computadores Tesis Doctoral SOPORTE ARQUITECTÓNICO A LA SINCRONIZACIÓN IMPARCIAL DE LECTORES Y ESCRITORES EN COMPUTADORES PARALELOS Enrique Vallejo Gutiérrez Santander, Marzo de 2010 Universidad de Cantabria Departamento de Electrónica y Computadores Dr. Ramón Beivide Palacio Catedrático de Universidad Y Dr. Fernando Vallejo Alonso Profesor Titular de Universidad Como directores de la Tesis Doctoral: ARCHITECTURAL SUPPORT FOR PARALLEL COMPUTERS WITH FAIR READER/WRITER SYNCHRONIZATION realizada por el doctorando Don Enrique Vallejo Gutiérrez, Ingeniero de Telecomunicación, en el Departamento de Electrónica y Computadores de la Universidad de Cantabria, autorizan la presentación de la citada Tesis Doctoral dado que reúne las condiciones necesarias para su defensa. Santander, Marzo de 2010 Fdo: Ramón Beivide Palacio Fdo. Fernando Vallejo Alonso Agradecimientos Son tantas las personas a las que tengo que agradecer su ayuda y apoyo que esta sección debiera ser la más larga de la tesis. Por ello, no pretendo convertirla en una lista exhaustiva, que además, siempre se olvidaría de alguien importante. Ahí van, por tanto, mis agradecimientos condensados. A Mónica, por su apoyo incondicional, culpable última de que tengas esta tesis en tus manos. A mi familia, especialmente a mis padres, por el soporte y apoyo recibido. A mis directores de tesis, Mon y Fernando. Aunque esto sea una carrera en solitario, es necesaria una buena guía para llegar al final. A los revisores de la tesis, tanto los revisores externos oficiales, Rubén González y Tim Harris, como todos aquellos que se molestaron en leer el trabajo y ayudaron con sugerencias y posibles mejoras. A los compañeros del Grupo de Arquitectura y Tecnología de Computadores de la Universidad de Cantabria, por la ayuda y colaboración en el desarrollo del trabajo, en especial a Carmen Martínez. -
Chip Multithreading: Opportunities and Challenges
Chip Multithreading: Opportunities and Challenges Lawrence Spracklen & Santosh G. Abraham Scalable Systems Group Sun Microsystems Inc., Sunnyvale, CA lawrence.spracklen,santosh.abraham @sun.com f g Abstract be improved significantly due to low ILP and off-chip CPI is large and growing, due to relative increases in memory latency. How- Chip Multi-Threaded (CMT) processors provide support for ever, typical server applications concurrently serve a large number many simultaneous hardware threads of execution in various of users or clients; for instance, a contemporary database server ways, including Simultaneous Multithreading (SMT) and Chip may have hundreds of active processes, each associated with a dif- Multiprocessing (CMP). CMT processors are especially suited to ferent client. Furthermore, these processes are currently multi- server workloads, which generally have high levels of Thread- threaded to hide disk access latencies. This structure leads to high Level Parallelism (TLP). In this paper, we describe the evolution levels of TLP. Thus, it is extremely attractive to couple the high of CMT chips in industry and highlight the pervasiveness of CMT TLP in the application domain with support for multiple threads designs in upcoming general-purpose processors. The CMT de- of execution on a processor chip. sign space accommodates a range of designs between the extremes Chip Multi-Threaded (CMT) processors support many simulta- represented by the SMT and CMP designs and a variety of attrac- neous hardware strands (or threads) of execution via a combination tive design options are currently unexplored. Though there has of support for multiple cores (Chip Multiprocessors (CMP)) [1] been extensive research on utilizing multiple hardware threads to and Simultaneous Multi-Threading (SMT) [2].