Rethinking Dynamic Instruction Scheduling and Retirement for Efficient Microarchitectures

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1902 Rethinking Dynamic Instruction Scheduling and Retirement for Efficient Microarchitectures MEHDI ALIPOUR ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-513-0868-5 UPPSALA urn:nbn:se:uu:diva-403675 2020 Dissertation presented at Uppsala University to be publicly examined in VIII, Universitethuset, Biskopsgatan 3, 753 10 Uppsala, Friday, 20 March 2020 at 09:00 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Professor Mikko H. Lipasti (University of Wisconsin-Madison). Abstract Alipour, M. 2020. Rethinking Dynamic Instruction Scheduling and Retirement for Efficient Microarchitectures. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1902. 76 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-513-0868-5. Out-of-order execution is one of the main micro-architectural techniques used to improve the performance of both single- and multi-threaded processors. The application of such a processor varies from mobile devices to server computers. This technique achieves higher performance by finding independent instructions and hiding execution latency and uses the cycles which otherwise would be wasted or caused a CPU stall. To accomplish this, it uses scheduling resources including the ROB, IQ, LSQ and physical registers, to store and prioritize instructions. The pipeline of an out-of-order processor has three macro-stages: the front-end, the scheduler, and the back-end. The front-end fetches instructions, places them in the out-of-order resources, and analyzes them to prepare for their execution. The scheduler identifies which instructions are ready for execution and prioritizes them for scheduling. The back-end updates the processor state with the results of the oldest completed instructions, deallocates the resources and commits the instructions in the program order to maintain correct execution. Since out-of-order execution needs to be able to choose any available instructions for execution, its scheduling resources must have complex circuits for identifying and prioritizing instructions, which makes them very expansive, therefore, limited. Due to their cost, the scheduling resources are constrained in size. This limited size leads to two stall points respectively at the front-end and the back-end of the pipeline. The front-end can stall due to fully allocated resources and therefore no more new instructions can be placed in the scheduler. The back-end can stall due to the unfinished execution of an instruction at the head of the ROB which prevents other resources from being deallocated, preventing new instructions from being inserted into the pipeline. To address these two stalls, this thesis focuses on reducing the time instructions occupy the scheduling resources. Our front-end technique tackles IQ pressure while our back-end approach considers the rest of the resources. To reduce front-end stalls we reduce the pressure on the IQ for both storing (depth) and issuing (width) instructions by bypassing them to cheaper storage structures. To reduce back-end stalls, we explore how we can retire instructions earlier, and out- of-order, to reduce the pressure on the out-of-order resource. Keywords: Out-of-Order Processors, Energy-Efficient, High-Performance, Instruction Scheduling Mehdi Alipour, Department of Information Technology, Computer Architecture and Computer Communication, Box 337, Uppsala University, SE-75105 Uppsala, Sweden. © Mehdi Alipour 2020 ISSN 1651-6214 ISBN 978-91-513-0868-5 urn:nbn:se:uu:diva-403675 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-403675) To my parents, who always valued their children, prioritized my sport and education, and supported me throughout List of papers This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I Mehdi Alipour, Trevor E. Carlson, Stefanos Kaxiras, "A Taxonomy of Out-of-Order Instruction Commit". In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Santa Rosa, California, USA, April 2017. II Mehdi Alipour, Trevor E. Carlson, Stefanos Kaxiras, "Exploring the Performance Limits of Out-of-order Commit". In Proceedings of the 2017 ACM International Conference on Computing Frontiers (CF) Siena, Italy, May 2017. III Mehdi Alipour, Trevor E. Carlson, David Black-Schaffer, Stefanos Kaxiras, "Maximizing Limited Resources: A Limit-based Study and Taxonomy of Out-of-order Commit". Journal of Signal Processing Systems 91(3-4): 379-397, 2019 (an extension of Paper II). IV Mehdi Alipour, Rakesh Kumar, Stefanos Kaxiras, David Black-Schaffer, "FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors". In Proceedings of the Design, Automation and Test in Europe (DATE) Florence, Italy, March 2019. V Mehdi Alipour, Rakesh Kumar, Stefanos Kaxiras, David Black-Schaffer, "Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors". In proceeding of IEEE International Symposium on High-Performance Computer Architecture (HPCA) San Diego, CA, USA, February 2020. Reprints were made with permission from the publishers. Other publications not included in this thesis: • Alberto Ros, Trevor E. Carlson, Mehdi Alipour, Stefanos Kaxiras, "Non- Speculative Load-Load Reordering in TSO". In proceeding of IEEE International Symposium on Computer Architecture (ISCA) Toronto, Canada, June 2017. • Sizhuo Zhang, Muralidaran Vijayaraghavan, Andrew Wright, Mehdi Alipour, Arvind, "Constructing a Weak Memory Model". In proceeding of IEEE International Symposium on Computer Architecture (ISCA) LA, USA, June 2018. • Stefanos Kaxiras, Trevor E. Carlson, Mehdi Alipour, Alberto Ros, "Non- Speculative Load Reordering in Total Store Ordering". IEEE Micro Top Picks, June 2018. • Rakesh Kumar, Mehdi Alipour, David Black-Schaffer, "Freeway: Max- imizing MLP for Slice-Out-of-Order Execution". In proceeding of IEEE International Symposium on High-Performance Computer Architecture (HPCA) Washington D.C., USA, February 2019. • Christos Sakalis, Mehdi Alipour, Alberto Ros, Alexandra Jimborean, Stefanos Kaxiras, Magnus Själander, "Ghost loads: what is the cost of invisible speculation?". In Proceedings of the 2017 ACM International Conference on Computing Frontiers (CF) Alghero, Italy, May 2019. Contents 1 Preface ......................................................................................................... 9 1.1 Contributions: Front-End .............................................................. 11 1.2 Contribution: Back-End ................................................................. 12 1.3 Thesis Organization ....................................................................... 13 2 Overview of Out-of-Order Processors. ................................................ 14 2.1 The In-order Front-end .................................................................. 15 2.2 Out-of-order Scheduling ................................................................ 18 2.2.1 Scheduling Resources ..................................................... 19 2.2.2 Scheduling Steps ............................................................. 20 2.2.3 Cost Evaluation: the Width and the Depth of the IQ .... 22 2.2.4 Research Problem I: Inefficient Scheduling .................. 22 2.3 The In-order Back-end ................................................................... 23 2.3.1 Architectural vs. Speculative State of a Processor ........ 23 2.3.2 Overview of the Back-end .............................................. 24 2.3.3 Research Problem II: In-order Commit is Overly Conservative ..................................................................... 26 3 Efficient Resource Allocation: Scheduling Considering Readiness and Critically of Instructions ................................................................. 27 3.1 Research Problem: Inefficient Scheduling .................................. 27 3.2 Insight I: Some Instructions do not Need OoO Scheduling ........ 29 3.3 Solution I: FIFOrder, Ready-Aware Instruction Scheduling. ...... 32 3.3.1 Instruction Criticality and Limits of Ready-Aware Instruction Scheduling .................................................... 37 3.4 Insight II: Overlap Between Readiness and Criticality ............... 37 3.5 Potential of Combining Readiness and Criticality ...................... 41 3.6 Solution II: DNB, Ready- and Criticality-Aware Instructions Scheduling ...................................................................................... 45 3.7 Conclusion ...................................................................................... 49 4 High performance resource deallocation: early release and out-of-order commit ................................................................................ 50 4.1 Out-of-order commit conditions. .................................................. 52 4.2 Contribution I: Relaxing Out-of-Order Commit Conditions ....... 52 4.3 Contribution II: Category/taxonomy of out-of-order commit ..... 54 4.3.1 Safe_OOC ........................................................................ 54 4.3.2 Unsafe_OOC .................................................................... 54 4.3.3 Reluctant .......................................................................... 54 4.3.4 Aggressive ........................................................................ 55

Load more