Virtual Global Communication

Virtual Global Communication by Daniel J. Ernst A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2005 Doctoral Committee: Todd M. Austin, Chair Trevor Mudge David Blaauw Dennis Sylvester © Daniel J. Ernst All Rights Reserved 2005 ACKNOWLEDGEMENTS First and foremost, I would like to thank my advisor, Todd Austin. His unparalleled creativity helped spawn many of these ideas, and his boundless enthusiasm was a driving force in getting them investigated. Since taking me as his student, his good real-world advice and his endless patience have helped me develop into a better researcher and a better person. In addition, I thank the other members of my dissertation committee: Trevor, David, and Dennis. Your comments, suggestions, and ideas have solidified much of this work in very concrete ways. During my time as a graduate student, you’ve all greatly aided my understanding of a great many topics that I came asking about. Over the course of my 5 years at Michigan, I’ve had the pleasure of sharing projects, meetings, and offices with many other very talented people. In particular, I would like to thank Andrew Hamel for his hard work on Cyclone and the many outstanding students and faculty who put their efforts into the Razor project and prototype. Also, I thank Chris ii Weaver, Matt Guthaus, and Shidhartha Das for graciously putting up with my endless requests for help with the pleathora of CAD tools used in my studies. Of course, the cul- ture of an office goes far beyond work, and I’m extremely grateful to have had people like Eric Larson, Chris Weaver, Rajeev Krishna, Pat Cassleman, Steve Raasch, Dave Greene, Fadi Gebara, Steve Martin, Leyla Nazhandali, and countless others around to discuss important topics like football, video games, and the latest conference gossip. I have had the fortune to receive funding from several generous organizations, includ- ing Intel, the NSF, GSRC, and DARPA. Their support played an important role in the completion of this work. I thank my parents, Charles and Patricia Ernst. Their dedication to education and to their family has instilled in me an understanding of the tremendous value of both, and I have yet to find a better set of mentors in either. Finally, I thank my wife, Beth. Her love and support has been unwavering and invalu- able throughout this process. Yes, you can finally stop asking, “are you done yet?” iii Table of Contents Acknowledgements............................................................................................................. ii List of Figures.................................................................................................................... vi List of Tables.................................................................................................................... viii Abstract.............................................................................................................................. ix CHAPTER 1 : INTRODUCTION .......................................................................................1 1.1 Motivation and Goals...................................................................................1 1.2 Dynamic Scheduling....................................................................................4 1.3 Control Signals ............................................................................................5 1.4 Contributions ...............................................................................................6 1.5 Overview of the Thesis ................................................................................8 CHAPTER 2 : BACKGROUND AND RELATED WORK................................................9 2.1 High-Performance Dynamic Scheduling ...................................................10 2.2 Long Wires.................................................................................................18 CHAPTER 3 : EVALUATION METHODOLOGY ..........................................................24 3.1 Architectural Evaluation ............................................................................24 3.2 Circuit Analysis Methodology...................................................................27 3.3 Area Estimate Methodology ......................................................................31 3.4 Evaluating Trade-Offs................................................................................32 CHAPTER 4 : REDUCING COMPLEXITY IN DYNAMIC INSTRUCTION SCHEDULING......................................................................................................35 4.1 Tag Elimination..........................................................................................35 4.2 Cyclone ......................................................................................................64 4.3 Summary....................................................................................................86 CHAPTER 5 : VIRTUAL GLOBAL CONTROL .............................................................88 5.1 Sending Control Signals Over Multiple Cycles.........................................88 5.2 Applying Virtual Global Control to the Razor Prototype Chip ...............105 5.3 Summary..................................................................................................117 CHAPTER 6 : CONCLUSION........................................................................................118 6.1 Summary of the Thesis ............................................................................118 iv 6.2 Future Directions .....................................................................................120 BIBLIOGRAPHY............................................................................................................123 v List of Figures Figure 1. Delay cost for increasing instruction window size for a 4-issue processor.2 Figure 2. Reachable Chip Area by Process Generation (taken from Matzke [45]) ....3 Figure 3. Conventional Dynamic Scheduler Pipeline Stages....................................10 Figure 4. Reservation Station Datapaths and Control ...............................................11 Figure 5. Diagram of relevant parameters in calculating wire delay ........................19 Figure 6. IPns Calculation with Clock Cap...............................................................33 Figure 7. Runtime Distribution of Ready Input Operands ........................................36 Figure 8. Conventional and Reduced-Tag Reservation Stations ...............................38 Figure 9. GSHARE-Style Last Tag Predictor ...........................................................40 Figure 10. Prediction Accuracy of Last Tag Predictors of Various Sizes ...................41 Figure 11. Scheduler Pipeline with Last Tag Speculation...........................................42 Figure 12. Reduced-Tag Reservation Station with Last Tag Speculation...................43 Figure 13. Half-Price Selective Replay Mechanism. (Figure from [38])....................45 Figure 14. Intel Selective Replay Mechanism. ...........................................................48 Figure 15. Instructions Per Cycle for Varying Configurations....................................51 Figure 16. Scheduling delays for various tag elimination configurations...................53 Figure 17. Instructions Per ns for Varying Configurations .........................................55 Figure 18. Impact of Tag-Reduction for Varying Window Sizes................................56 Figure 19. Energy-Delay Product for Varying Configurations ...................................57 Figure 20. Effect of Selective Replay on Reduced-Tag Schedulers............................59 Figure 21. Effect of Reduced Scheduler Pressure.......................................................61 Figure 22. Compile-time vs. Run-time Instruction Scheduling ..................................65 Figure 23. Cyclone Scheduler Architecture ................................................................68 Figure 24. Pre-scheduler Design and Example ...........................................................70 Figure 25. Switchback Logic ......................................................................................75 Figure 26. IPC effects of Cyclone optimizations ........................................................81 Figure 27. Performance and area for the 4-wide scheduler design space ...................84 Figure 28. Performance and area for the 8-wide scheduler design space ...................85 Figure 29. Performance and area overview.................................................................86 Figure 30. Pipeline stalling using global clock gating ................................................91 Figure 31. Timing diagram for pipelined stalls...........................................................91 Figure 32. “Off-Ramps” for delayed stall signals .......................................................91 Figure 33. Mechanism for VGC micro-rollback.........................................................93 vi Figure 34. Rollback register design 1..........................................................................95 Figure 35. Rollback register design 2..........................................................................95 Figure 36. Pipeline recovery using counterflow pipelining ......................................100 Figure 37. Pipeline augmented with Razor latches and control

Virtual Global Communication

IEEE Paper Template in A4

Computer Science 246 Computer Architecture Spring 2010 Harvard University

Dynamic Scheduling

United States Patent (19) 11 Patent Number: 5,680,565 Glew Et Al

Identifying Bottlenecks in a Multithreaded Superscalar

Advanced Processor Designs

Computer Architecture Out-Of-Order Execution

CSE 586 Computer Architecture Lecture 4

Superscalar Instruction Dispatch with Exception Handling

MIPS Architecture with Tomasulo Algorithm [12]

In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (Tlbs)

Improving the Precise Interrupt Mechanism of Software- Managed TLB Miss Handlers