SUPPORT for SPECULATIVE EXECUTION in HIGH- PERFORMANCE PROCESSORS Michael David Smith Technical Report: CSL-TR-93456 November 1992

SUPPORT FOR SPECULATIVE EXECUTION IN HIGH- PERFORMANCE PROCESSORS Michael David Smith Technical Report: CSL-TR-93456 November 1992 Computer Systems Laboratory Departments of Electrical Engineering and Computer Science Stanford University Stanford, California 94305-4055 Abstract Superscalar and superpipelining techniques increase the overlap between the instructions in a pipelined processor, and thus these techniques have the potential to improve processor performance by decreasing the average number of cycles between the execution of adjacent instructions. Yet, to obtain this potential performance benefit, an instruction scheduler for this high-performance processor must find the independent instructions within the instruction stream of an application to execute in parallel. For non-numerical applications, there is an insufficient number of independent instructions within a basic block, and consequently the instruction scheduler must search across the basic block boundaries for the extra instruction-level parallelism required by the superscalar and superpipelining techniques. To exploit instruction-level parallelism across a conditional branch, the instruction scheduler must support the movement of instructions above a conditional branch, and the processor must support the speculative execution of these instructions. We define boosting, an architectural mechanism for speculative execution, that allows us to uncover the instruction-level parallelism across conditional branches without adversely affecting the instruction count of the application or the cycle time of the processor. Under boosting, the compiler is responsible for analyzing and scheduling instructions, while the hardware is responsible for ensuring that the effects of a speculatively-executed instruction do not corrupt the program state when the compiler is incorrect in its speculation. To experiment with boosting, we built a global instruction scheduler, which is specifically tailored for the non-numerical environment, and a simulator, which determines the cycle-count performance of our globally-scheduled programs. We also analyzed the hardware requirements for boosting in a typical load/store architecture. Through the cycle-count simulations and an understanding of the cycle-time impact of the hardware support for boosting, we found that only a small amount of hardware support for speculative execution is necessary to achieve good performance in a small-issue, superscalar processor. d Phrases.. computer architecture, instruction scheduling, superscalar processor, trace-driven simulation Copyright 8 1992 bY Michael David Smith Acknowledgments Six and a half years at Stanford. Nine thousand hours in front of a workstation. Eleven million keystrokes and mouse clicks. Is this toughening of the fingertips the essence of a graduate career? Fortunately not. I can honestly say that I enjoyed my graduate career because of the people I met between those keystrokes and mouse clicks. Certainly, the one person who has the biggest influence on any graduate career is the principle thesis advisor. I consider myself lucky to have had Mark Horowitz as my advisor for he is a truly unique individual. As a principle thesis advisor, I guess that Mark is obligated to listen to the crazy ideas of his students, but he always listened to the craziest of my ideas with genuine interest and unfaltering patience. Of course, he never listened for long because he has this uncanny ability to understand your entire idea, the ramifications of your idea, and the problems with your idea from the first two sentences out of your mouth. I thank him for all that he has taught me and for the time that he has spent with me. Actually, I am one of those fortunate individuals with more than one interested advisor. Monica Lam graciously acted as my alternate advisor, answering whatever compiler ques- tions I had. I am not sure that Monica realized just how little I knew of compiler technol- ogy when she first agreed to support my research, but in a short period of time, she helped me learn more about compilers than I would have ever imagined possible. I would also like to particularly acknowledge the support and guidance of three other Stanford professors. The first of these professors is John Hennessy. John helped get me started at Stanford, he sat on my orals committee, and he basically kept me sharp throughout my graduate career. John continually referred his external visitors to my cubicle, and he often stopped by to suggest that I volunteer for yet another talk. Though I first viewed these activities as an unwelcome distraction, I later realized that they were opportunities which had an immeasurable effect on my research and on my development. I also wish to thank Professor Anoop Gupta for treating me as a colleague from my very first hour at Stanford. I hope that our discussions were as helpful to him as they were to me. Finally, I want to thank Professor Teresa Meng who chaired my orals committee and acted as a reader for this dissertation. ill Besides my professors, I wish to acknowledge the support of the staff of the Center for Integrated Systems and the help and friendship of my fellow students in the DASH, SUIF, and TORCH research groups. Without their aid, none of the research in this thesis would have been possible. I should especially thank the original members of the TORCH group (Tom Chanak, Phil Lacroute, John Maneatis, Don Ramsey, and Drew Wingard) for believ- ing in my work long enough to make it a reality. Also, I need to particularly thank Wolf Weber and Kourosh Gharachorloo for so honestly reviewing my papers and talks. Like many projects at Stanford, my research was also supported by many generous individuals outside the university. Of all of these individuals, four desire special recognition. I want to thank Peter Davies, Mike Johnson, and Earl Killian who each in some way con- tributed to the simulation environment used in this research. I also want to thank Neil Wil- helm for his,understanding and guidance during those difficult years when I was searching for a research topic. My final thanks must go to my family. My family has grown enormously since my first days at Stanford, and I cherish the understanding and compassion that they all showed me throughout the years. Of course, my deepest thanks must go to my wife Chris, who more than anyone else has supported me both financially and emotionally. Chris never once questioned me as to when I would be done, and she did a wonderful job of filling in those few hours that I was away from my workstation. This work was supported by the Defense Advanced Projects Research Agency @ARPA) under contract NOOO39-91-C-0138. The author’s support by Digital Equipment Corpora- tion through the CIS Fellow-Mentor-Advisor program is also gratefully acknowledged. iv This dissertation is dedicated to the loving memory of my brother Andrew Fairman Smith. Table of Contents Chapter 1 Introduction ................................................................................................... 1 1.1 Constraints on ILP .................................................................................................. 2 1.2 Background ............................................................................................................. 5 1.2.1 Current approaches to instruction scheduling................................................ 5 1.2.2 Instruction scheduling with speculative execution ....................................... .7 1.3 An integrated approach.. ......................................................................................... 9 Chapter 2 Opportunistic Instruction Scheduling ,.............,........,............................... 13 2.1 Branch speculation................................................................................................ 14 2.1.1 Achieving branch speculation.. .................................................................... 15 2.1.2 Boosting.. .................................................................................................... .17 2.1.3 Handling exceptions ................................................................................... .20 2.1.3.1 Restart from a speculative exception.................................................. 20 2.1.3.2 Restart from a non-speculative exception.. ........................................ .25 2.1.4 Existing mechanisms ................................................................................... 25 2.2 Building mechanisms for speculation .................................................................. .27 2.3 Speculative memory disambiguation .................................................................... 29 2.4 Sumrnary.. ............................................................................................................. 32 Chapter 3 Global Instruction Scheduling . 35 3.1 Background ........................................................................................................... 35 3.1.1 Issues in basic block scheduling .................................................................. 36 3.1.2 Issues in global scheduling ......................................................................... .39 3.1.3 Existing global schedulers ........................................................................... 43 3.2 Issues in our global scheduling algorithm ............................................................ 46 3.3 A trace-scheduling framework............................................................................

SUPPORT for SPECULATIVE EXECUTION in HIGH- PERFORMANCE PROCESSORS Michael David Smith Technical Report: CSL-TR-93456 November 1992

Computer Science 246 Computer Architecture Spring 2010 Harvard University

OS and Compiler Considerations in the Design of the IA-64 Architecture

Selective Eager Execution on the Polypath Architecture

Igpu: Exception Support and Speculative Execution on Gpus

A Survey of Published Attacks on Intel

High Performance Architecture Using Speculative Threads and Dynamic Memory Management Hardware

Whitepaper Cache Speculation Side-Channels Author: Richard Grisenthwaite Date: January 2018 Version 1.1

Invisispec: Making Speculative Execution Invisible in the Cache Hierarchy

A State of the Art Investigation

Speculative Execution and Instruction-Level Parallelism

PA-RISC 8X00 Family of Microprocessors with Focus on PA-8700

Itanium Processor Microarchitecture