The Named-State Register File

MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY Technical Report 1459 August, 1993 The Named-State Register File Peter Robert Nuth The Named-State Register File Peter Robert Nuth Bachelor of Science in Electrical Engineering and Computer Science, MIT 1983 Master of Science in Electrical Engineering and Computer Science, MIT 1987 Electrical Engineer, MIT 1987 Submitted to Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical Engineering and Computer Science Massachusetts Institute of Technology, August 1993 Copyright © 1993 Massachusetts Institute of Technology All rights reserved The research described in this report was supported in part by the Defense Advanced Research Projects Agency under contracts N00014-88K-0783 and F19628-92C-0045, and by a National Science Founda- tion Presidential Young Investigator Award with matching funds from General Electric Corporation, IBM Corporation, and AT&T. The Named-State Register File Peter Robert Nuth Submitted to the Department of Electrical Engineering and Computer Science on August 31, 1993 in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract A register file is a critical resource of modern processors. Most hardware and software mechanisms to manage registers across procedure calls do not efficiently support multithreaded programs. To switch between parallel threads, a conventional processor must spill and reload thread contexts from registers to memory. If context switches are frequent and unpredictable, a large fraction of execution time is spent saving and restoring registers. This thesis introduces the Named-State Register File, a fine-grain, fully-associative register organization. The NSF uses hardware and software mechanisms to manage registers among many concurrent activations. The NSF enables both fast context switching and efficient sequential program performance. The NSF holds more live data than conventional register files, and requires much less spill and reload traffic to switch between concurrent active contexts. The NSF speeds execution of some sequential and parallel programs by 9% to 17% over alternative register file organizations. The access time of the NSF is only 6% greater than a conventional register file. The NSF adds less than 5% to the area of a typical processor chip. This thesis describes the structure of the Named-State Register File, evaluates the cost of its implementation and its benefits for efficient context switching. The thesis demonstrates a prototype implementation of the NSF and estimates the access time and chip area required for different NSF organizations. Detailed architectural simula- tions running large sequential and parallel application programs were used to evaluate the effect of the NSF on register usage, register traffic, and execution time. Keywords: multithreaded, register, thread, context switch, fully-associative, parallel processor. Author: Peter R. Nuth [email protected] Supervisor: William J. Dally Associate Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments Looking back at almost half a lifetime spent at MIT, it is impossible to thank all of the individuals who have helped me over the years. I am grateful to have been able to work with and learn from so many truly exceptional people. My thanks first and foremost to Professor Bill Dally, who has taught me more about computer architecture than I would have thought possible. Bill’s drive, enthusiasm, and technical ability encouraged me during my time in the CVA group. Bill created an exciting environment, gave us the opportunity to work on challenging projects, and stayed committed to them to the end. I would like to thank my readers, Professors Tom Knight and Greg Papadopoulos, for their insight and perspective which greatly improved the quality of this thesis. I thank Dr. Bert Halstead for his gentle teachings and crucial support. He introduced me to parallel processors, and first started me on this line of research. Mike Noakes has been a source of wit and wisdom, a thoughtful critic of all poorly formed ideas, and an ally against the perversity of the modern world. Stuart Fiske gave me good advice, encouragement, and constant reminders not to take myself too seri- ously. Steve Keckler endured several drafts of this thesis with good humor and an uncompromising pen. David Harris build the prototype chip in one summer, in an impressive display of skill and single minded devotion. Theodore Tonchev wrote the first version of the TL0 translator. I would like to thank my family for their constant encouragement and faith in my ability. Although we have pursued different paths, my sisters supported my decisions, and applauded every achievement. I especially thank my parents, whose love and discipline instilled me with a determined nature and a sense of curiosity. They helped nurture my dreams, and gave me the rare opportunity to pursue them. Most of all, I would like to thank my wife, Julia. She has been a caring friend and a tireless supporter. She has taught me more than any teacher, by teaching me about myself. Through her I have found commitment, love, and great joy. 5 This thesis is dedicated to my parents, Robert and Lois Nuth 6 Table of Contents Table of Contents ..........................................................................................7 List of Figures..............................................................................................11 List of Tables ...............................................................................................13 Chapter 1 Introduction ............................................................................15 1.1 Overview....................................................................................................15 1.1.1 Problem Statement.............................................................................................. 15 1.1.2 Proposal .............................................................................................................. 15 1.1.3 Methodology....................................................................................................... 16 1.1.4 Research Results................................................................................................. 16 1.2 Justification ................................................................................................16 1.2.1 Parallel and Sequential Programs ....................................................................... 17 1.2.2 Parallel Program Behavior.................................................................................. 18 1.2.3 Context Switching .............................................................................................. 18 1.2.4 Thread Scheduling.............................................................................................. 19 1.3 Multithreaded Processors...........................................................................20 1.3.1 Segmented Register Files ................................................................................... 20 1.4 The Named-State Register File..................................................................22 1.5 Related Work .............................................................................................23 1.5.1 Memory to Memory Architectures ..................................................................... 23 1.5.2 Preloading........................................................................................................... 24 1.5.3 Software Techniques........................................................................................... 24 1.5.4 Alternative Register Organizations..................................................................... 25 1.6 Thesis Outline ............................................................................................25 Chapter 2 The Named-State Register File..............................................27 2.1 NSF Operation ...........................................................................................27 2.1.1 Structure.............................................................................................................. 27 2.1.2 Register addressing............................................................................................. 28 2.1.3 Read, write, allocate and deallocate ................................................................... 29 2.1.4 Reloading and spilling ........................................................................................ 29 2.1.5 Context switching ............................................................................................... 30 2.2 Justification ................................................................................................31 2.2.1 Registers and memory ........................................................................................ 31 7 2.2.2 NSF and memory hierarchy................................................................................ 32 2.2.3 Properties of memory structures......................................................................... 34 2.2.4 Advantages of the NSF....................................................................................... 36 2.3 NSF System Issues.....................................................................................36 2.3.1 Context Identifiers .............................................................................................

Load more