Reducing Exception Management Overhead with Software Restart

Total Page:16

File Type:pdf, Size:1020Kb

Reducing Exception Management Overhead with Software Restart Reducing Exception Management Overhead with Software Restart Markers by Mark Jerome Hampton Bachelor of Science in Computer and Systems Engineering Bachelor of Science in Computer Science Rensselaer Polytechnic Institute, May 1999 Master of Science in Electrical Engineering and Computer Science Massachusetts Institute of Technology, June 2001 Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2008 c Massachusetts Institute of Technology 2008. All rights reserved. Author............................................................................ Department of Electrical Engineering and Computer Science February 1, 2008 Certified by........................................................................ Krste Asanovi´c Associate Professor Thesis Supervisor Accepted by....................................................................... Terry P. Orlando Chair, Department Committee on Graduate Students 2 Reducing Exception Management Overhead with Software Restart Markers by Mark Jerome Hampton Submitted to the Department of Electrical Engineering and Computer Science on February 1, 2008, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Modern processors rely on exception handling mechanisms to detect errors and to implement various features such as virtual memory. However, these mechanisms are typically hardware- intensive because of the need to buffer partially-completed instructions to implement precise exceptions and enforce in-order instruction commit, often leading to issues with performance and energy efficiency. The situation is exacerbated in highly parallel machines with large quantities of programmer-visible state, such as VLIW or vector processors. As architects increasingly rely on parallel architectures to achieve higher performance, the problem of exception handling is becoming critical. In this thesis, I present software restart markers as the foundation of an exception handling mechanism for explicitly parallel architectures. With this model, the compiler is responsible for delimiting regions of idempotent code. If an exception occurs, the operat- ing system will resume execution from the beginning of the region. One advantage of this approach is that instruction results can be committed to architectural state in any order within a region, eliminating the need to buffer those values. Enabling out-of-order commit can substantially reduce the exception management overhead found in precise exception im- plementations, and enable the use of new architectural features that might be prohibitively costly with conventional precise exception implementations. Additionally, software restart markers can be used to reduce context switch overhead in a multiprogrammed environment. This thesis demonstrates the applicability of software restart markers to vector, VLIW, and multithreaded architectures. It also contains an implementation of this exception han- dling approach that uses the Trimaran compiler infrastructure to target the Scale vector- thread architecture. I show that using software restart markers incurs very little perfor- mance overhead for vector-style execution on Scale. Finally, I describe the Scale compiler flow developed as part of this work and discuss how it targets certain features facilitated by the use of software restart markers. Thesis Supervisor: Krste Asanovi´c Title: Associate Professor 3 4 Acknowledgments As I come to the end of this phase of my life, it seems insufficient to try to distill several years of interactions and relationships into a handful of paragraphs. Thankfully this isn’t like an awards show where the producers will start playing music to cut off winners who take too long with their speeches. I’m free to ramble as long as I want. But in the spirit of “less is more,” I will try to succinctly encapsulate my gratitude to those who have helped me get to this point. Of course, given my penchant for being long-winded, my “less” will almost certainly be somebody else’s “more.” First and foremost, I have to give thanks to God the Father, the Giver of every good and perfect gift, and to Jesus Christ, who strengthens me daily. They richly deserve glory for anything I have achieved during my tenure in graduate school, and for anything that I may achieve in the future. In this earthly realm, I of course have to start with my advisor, Professor Krste Asanovi´c. When I first came to MIT, I had no idea who he was, and was initially intent on trying to secure another professor as my advisor. However, after I saw Krste give a presentation on the work he was doing, I knew which group I wanted to join. And I’ve never regretted my decision. Many of my former group members and colleagues in other groups have used all kinds of superlatives to describe Krste in their own thesis acknowledgments, and all of those descriptions hold true. Krste’s breadth of knowledge, his ability to digest and analyze new material at a rapid rate, and his seemingly inexhaustible supply of energy that allows him to juggle a countless number of projects, are all, in a word, “scary.” And yet he was willing to tolerate the inadequacies of a grad student like myself, who came to MIT with relatively little understanding of computer architecture. Even though I may know much more now (in large part thanks to him), I still can’t hold a candle to his knowledge about such a vast array of subjects. I want to thank him for all of his support through the years (not just financial, although that is greatly appreciated as well), for fostering a great research group environment, and for indulging my fascination with exception handling. I thank Professor Saman Amarasinghe and Professor Arvind for serving on my thesis committee and for providing feedback on my work. I have interacted with many fellow students over the years who have enriched my ex- perience in grad school. I want to begin by acknowledging my two original officemates, with whom I have developed valuable relationships. Since I listed them in one order in the Acknowledgments section of my Master’s thesis, I list them in reverse order here. Jessica Tseng was the first person in Krste’s group whom I contacted when I came to MIT. She showed me how to use the simulator that I needed for my work, and then she subsequently became one of my officemates—and a very tolerant one at that, as she put up with my singing in the office. Even after we moved to different offices, I enjoyed being able to walk downstairs and strike up a conversation when I didn’t feel like doing any work (which occurred frequently). I want to thank her for her friendship and support. 5 Mike Zhang, my other original officemate, is truly passionate about his convictions, including his love of the Yankees. Our frequent arguments about creation vs. evolution were definitely interesting, and while it’s probably not the effect he intended, they also strengthened my faith. But at the end of the day, even though he may have viewed my religious convictions as silly, and I may have viewed his atheist beliefs as silly, we still respected one another, which was great. Ronny Krashinsky entered MIT at the same time I did, earned his Master’s degree at the same time I did, but then decided to break the mold and become a doctor before me. Given the vast amount of work that he did for his thesis, the fact that he was able to complete it when he did is truly awe-inspiring. I’m sure that no matter what lies in store for him, he will have a very successful future, and who knows, maybe someday I’ll be working for “Ronny, Inc.” after all. Heidi Pan, even though you decided you didn’t want to be my officemate anymore (just kidding), I appreciate the time that we did share an office—even if you didn’t want to get roped into the crazy discussions that the rest of us had. Steve Gerding, I’m very glad that Jessica, Mike, and I had to kick out our imaginary officemate and actually let a real person occupy the fourth spot in our office. Your devotion to all things related to Los Angeles—especially UCLA—is pretty scary, but also admirable in its steadfastness. Sam Larsen was responsible for the bulk of the compiler front end work that made it possible for me to use SUIF and Trimaran together. His thesis work also provided a template for my parallelization phase. Beyond those contributions, I thank him for the sense of humor he brought to the various discussions which took place in our office, even though he frequently looked so serious. I thank Mark Stephenson—a.k.a. “The Fastest Man in Computer Architecture”—not only for allowing me to witness his speed on the track (unofficially), but also for facilitating the only opportunity I had to practice my thesis defense. While I had hoped for a smoother presentation than actually occurred, the feedback I received was invaluable. Ian Bratt created the initial port of Trimaran for the RAW architecture that I was subsequently able to adapt for Scale. I thank all of the past and present members of Krste’s group, including Seongmoo Heo, Chris Batten, Ken Barr, Emmett Witchel, Albert Ma, Rose Liu, Jae Lee, and Asif Khan. I also thank other lab members whom I have had the pleasure to meet, including Rodric Rabbah, Mike Gordon, and Michal Karczmarek. Outside of the lab, my primary social and support network on campus consisted of members of the Black Graduate Student Association (BGSA) during the first half of my graduate career, and members of the Academy of Courageous Minority Engineers (ACME) during the second half of my graduate career. I thank BGSA for giving me a much-needed outlet for relieving stress, and I especially thank the members who helped convince me to 6 stay at MIT beyond my Master’s degree and to pursue a Ph.D.
Recommended publications
  • A Chip Multi-Cluster Architecture with Locality Aware Task Distribution
    A CHIP MULTI-CLUSTER ARCHITECTURE WITH LOCALITY AWARE TASK DISTRIBUTION A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences 2007 Matthew James Horsnell School of Computer Science Contents List of Tables 7 List of Figures 8 List of Algorithms 12 Abstract 13 Declaration 14 Copyright 15 Acknowledgements 16 1 Introduction 17 1.1 Motivation . 17 1.2 Microprocessor Design Challenges . 19 1.2.1 Wire Delay . 20 1.2.2 Memory Gap . 21 1.2.3 Limits of Instruction Level Parallelism . 22 1.2.4 Power Limits . 23 1.2.5 Design Complexity . 24 1.3 Design Solutions . 26 1.3.1 Exploiting Parallelism . 26 2 1.3.2 Partitioned Designs . 29 1.3.3 Bridging the Memory Gap . 30 1.3.4 Design Abstraction and Replication . 31 1.4 Summary . 33 1.5 Research Aims . 33 1.6 Contributions . 33 1.7 Thesis Structure . 34 1.8 Publications . 35 2 Parallelism 36 2.1 Application Parallelism . 36 2.1.1 Amdahl's Law . 37 2.1.2 Implicit and Explicit Parallelism . 37 2.1.3 Granularity of Parallelism . 39 2.2 Architectural Parallelism . 42 2.2.1 Bit Level Parallelism . 42 2.2.2 Data Level Parallelism . 43 2.2.3 Instruction Level Parallelism . 44 2.2.4 Multithreading . 47 2.2.5 Simultaneous Multithreading . 50 2.2.6 Chip Multiprocessors . 50 2.3 Summary . 51 3 Jamaica CMP and Software Environment 52 3.1 The Jamaica Chip Multiprocessor . 52 3.1.1 Multithreading . 53 3.1.2 Register Windows .
    [Show full text]
  • C 2012 Neal Clayton Crago ENERGY-EFFICIENT LATENCY TOLERANCE for 1000-CORE DATA PARALLEL PROCESSORS with DECOUPLED STRANDS
    c 2012 Neal Clayton Crago ENERGY-EFFICIENT LATENCY TOLERANCE FOR 1000-CORE DATA PARALLEL PROCESSORS WITH DECOUPLED STRANDS BY NEAL CLAYTON CRAGO DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering in the Graduate College of the University of Illinois at Urbana-Champaign, 2012 Urbana, Illinois Doctoral Committee: Professor Sanjay J. Patel, Chair Professor Wen-mei W. Hwu Associate Professor Steven S. Lumetta Associate Professor Deming Chen ABSTRACT This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel processors. The approach focuses on developing in- struction latency tolerance to improve performance for a single thread. The main idea behind the approach is to leverage the compiler to split the original thread into separate memory-accessing and memory-consuming instruction streams. The goal is to provide latency tolerance similar to high-performance techniques such as out-of-order execution while leveraging low hardware com- plexity similar to an in-order execution core. The research in this dissertation supports the following thesis: Pipeline stalls due to long exposed instruction latency are the main performance lim- iter for cached 1000-core data parallel processors. Leveraging natural decou- pling of memory-access and memory-consumption, a serial thread of execu- tion can be partitioned into strands providing energy-efficient latency toler- ance. This dissertation motivates the need for latency tolerance in 1000-core data parallel processors and presents decoupled core architectures as an alterna- tive to currently used techniques. This dissertation discusses the limitations of prior decoupled architectures, and proposes techniques to improve both latency tolerance and energy-efficiency.
    [Show full text]
  • Chip Multithreading: Opportunities and Challenges
    Chip Multithreading: Opportunities and Challenges Lawrence Spracklen & Santosh G. Abraham Scalable Systems Group Sun Microsystems Inc., Sunnyvale, CA lawrence.spracklen,santosh.abraham @sun.com f g Abstract be improved significantly due to low ILP and off-chip CPI is large and growing, due to relative increases in memory latency. How- Chip Multi-Threaded (CMT) processors provide support for ever, typical server applications concurrently serve a large number many simultaneous hardware threads of execution in various of users or clients; for instance, a contemporary database server ways, including Simultaneous Multithreading (SMT) and Chip may have hundreds of active processes, each associated with a dif- Multiprocessing (CMP). CMT processors are especially suited to ferent client. Furthermore, these processes are currently multi- server workloads, which generally have high levels of Thread- threaded to hide disk access latencies. This structure leads to high Level Parallelism (TLP). In this paper, we describe the evolution levels of TLP. Thus, it is extremely attractive to couple the high of CMT chips in industry and highlight the pervasiveness of CMT TLP in the application domain with support for multiple threads designs in upcoming general-purpose processors. The CMT de- of execution on a processor chip. sign space accommodates a range of designs between the extremes Chip Multi-Threaded (CMT) processors support many simulta- represented by the SMT and CMP designs and a variety of attrac- neous hardware strands (or threads) of execution via a combination tive design options are currently unexplored. Though there has of support for multiple cores (Chip Multiprocessors (CMP)) [1] been extensive research on utilizing multiple hardware threads to and Simultaneous Multi-Threading (SMT) [2].
    [Show full text]
  • Runahead Threads
    ADVERTIMENT . La consulta d’aquesta tesi queda condicionada a l’acceptació de les següents condicions d'ús: La difusió d’aquesta tesi per mitjà del servei TDX ( www.tesisenxarxa.net ) ha estat autoritzada pels titulars dels drets de propietat intel·lectual únicament per a usos privats emmarcats en activitats d’investigació i docència. No s’autoritza la seva reproducció amb finalitats de lucre ni la seva difusió i posada a disposició des d’un lloc aliè al servei TDX. No s’autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant al resum de presentació de la tesi com als seus continguts. En la utilització o cita de parts de la tesi és obligat indicar el nom de la persona autora. ADVERTENCIA . La consulta de esta tesis queda condicionada a la aceptación de las siguientes condiciones de uso: La difusión de esta tesis por medio del servicio TDR ( www.tesisenred.net ) ha sido autorizada por los titulares de los derechos de propiedad intelectual únicamente para usos privados enmarcados en actividades de investigación y docencia. No se autoriza su reproducción con finalidades de lucro ni su difusión y puesta a disposición desde un sitio ajeno al servicio TDR. No se autoriza la presentación de su contenido en una ventana o marco ajeno a TDR (framing). Esta reserva de derechos afecta tanto al resumen de presentación de la tesis como a sus contenidos. En la utilización o cita de partes de la tesis es obligado indicar el nombre de la persona autora.
    [Show full text]
  • Architectural Support for Parallel Computers with Fair Reader/Writer Synchronization
    Universidad de Cantabria Departamento de Electrónica y Computadores Tesis Doctoral SOPORTE ARQUITECTÓNICO A LA SINCRONIZACIÓN IMPARCIAL DE LECTORES Y ESCRITORES EN COMPUTADORES PARALELOS Enrique Vallejo Gutiérrez Santander, Marzo de 2010 Universidad de Cantabria Departamento de Electrónica y Computadores Dr. Ramón Beivide Palacio Catedrático de Universidad Y Dr. Fernando Vallejo Alonso Profesor Titular de Universidad Como directores de la Tesis Doctoral: ARCHITECTURAL SUPPORT FOR PARALLEL COMPUTERS WITH FAIR READER/WRITER SYNCHRONIZATION realizada por el doctorando Don Enrique Vallejo Gutiérrez, Ingeniero de Telecomunicación, en el Departamento de Electrónica y Computadores de la Universidad de Cantabria, autorizan la presentación de la citada Tesis Doctoral dado que reúne las condiciones necesarias para su defensa. Santander, Marzo de 2010 Fdo: Ramón Beivide Palacio Fdo. Fernando Vallejo Alonso Agradecimientos Son tantas las personas a las que tengo que agradecer su ayuda y apoyo que esta sección debiera ser la más larga de la tesis. Por ello, no pretendo convertirla en una lista exhaustiva, que además, siempre se olvidaría de alguien importante. Ahí van, por tanto, mis agradecimientos condensados. A Mónica, por su apoyo incondicional, culpable última de que tengas esta tesis en tus manos. A mi familia, especialmente a mis padres, por el soporte y apoyo recibido. A mis directores de tesis, Mon y Fernando. Aunque esto sea una carrera en solitario, es necesaria una buena guía para llegar al final. A los revisores de la tesis, tanto los revisores externos oficiales, Rubén González y Tim Harris, como todos aquellos que se molestaron en leer el trabajo y ayudaron con sugerencias y posibles mejoras. A los compañeros del Grupo de Arquitectura y Tecnología de Computadores de la Universidad de Cantabria, por la ayuda y colaboración en el desarrollo del trabajo, en especial a Carmen Martínez.
    [Show full text]
  • Chip Multithreading: Opportunities and Challenges
    Chip Multithreading: Opportunities and Challenges Lawrence Spracklen & Santosh G. Abraham Scalable Systems Group Sun Microsystems Inc., Sunnyvale, CA lawrence.spracklen,santosh.abraham @sun.com f g Abstract be improved significantly due to low ILP and off-chip CPI is large and growing, due to relative increases in memory latency. How- Chip Multi-Threaded (CMT) processors provide support for ever, typical server applications concurrently serve a large number many simultaneous hardware threads of execution in various of users or clients; for instance, a contemporary database server ways, including Simultaneous Multithreading (SMT) and Chip may have hundreds of active processes, each associated with a dif- Multiprocessing (CMP). CMT processors are especially suited to ferent client. Furthermore, these processes are currently multi- server workloads, which generally have high levels of Thread- threaded to hide disk access latencies. This structure leads to high Level Parallelism (TLP). In this paper, we describe the evolution levels of TLP. Thus, it is extremely attractive to couple the high of CMT chips in industry and highlight the pervasiveness of CMT TLP in the application domain with support for multiple threads designs in upcoming general-purpose processors. The CMT de- of execution on a processor chip. sign space accommodates a range of designs between the extremes Chip Multi-Threaded (CMT) processors support many simulta- represented by the SMT and CMP designs and a variety of attrac- neous hardware strands (or threads) of execution via a combination tive design options are currently unexplored. Though there has of support for multiple cores (Chip Multiprocessors (CMP)) [1] been extensive research on utilizing multiple hardware threads to and Simultaneous Multi-Threading (SMT) [2].
    [Show full text]