The Multiscalar Architecture

THE MULTISCALAR ARCHITECTURE by MANOJ FRANKLIN A thesis submitted in partial ful®llment of the requirements for the degree of DOCTOR OF PHILOSOPHY (Computer Sciences) at the UNIVERSITY OF WISCONSIN Ð MADISON 1993 THE MULTISCALAR ARCHITECTURE Manoj Franklin Under the supervision of Associate Professor Gurindar S. Sohi at the University of Wisconsin-Madison ABSTRACT The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits ®ne-grain parallelism by executing multiple, possibly (control and/or data) depen- dent tasks in parallel using multiple processing elements. Splitting the instruction stream at statically determined boundaries allows the compiler to pass substantial information about the tasks to the hardware. The processing paradigm can be viewed as extensions of the superscalar and multiprocess- ing paradigms, and shares a number of properties of the sequential processing model and the data¯ow processing model. The multiscalar paradigm is easily realizable, and we describe an implementation of the multiscalar paradigm, called the multiscalar processor. The central idea here is to connect multiple sequential processors, in a decoupled and decentralized manner, to achieve overall multiple issue. The multiscalar processor supports speculative execution, allows arbitrary dynamic code motion (facilitated by an ef®cient hardware memory disambiguation mechanism), exploits communication localities, and does all of these with hardware that is fairly straightforward to build. Other desirable aspects of the implementation include decentralization of the critical resources, absence of wide associative searches, and absence of wide interconnection/data paths. i The multiscalar processor design has novel features such as the Address Resolution Buffer (ARB), which is a powerful hardware mechanism for enforcing memory dependencies. The ARB allows arbitrary dynamic motion of memory reference instructions by providing an ef®cient way of detecting memory hazards, and instructing the multiscalar hardware to restart portions of the dynamic instruction stream when hazards are violated. We also present the results of a detailed simulation study showing the performance improve- ments offered by the multiscalar processor. These preliminary results are very encouraging. The multiscalar paradigm seems to be an ideal candidate to reach our goal of 10+ instructions per cycle, and has tremendous potential to be the paradigm of choice for a circa 2000 (and beyond) ILP processor. ii Acknowledgements First of all, I offer praise and thanks with a grateful heart to my Lord JESUS CHRIST, to whom this thesis is dedicated, for HIS in®nite love and divine guidance all through my life. Everything that I am and will ever be will be because of HIM. It was HE who bestowed me with the ability to do this piece of research. HE led and inspired me during the past six years over which I learned many spiritual matters, without which all these achievements are meaningless, and a mere chasing after the wind. So, to God be the praise, glory, and honor, for ever and ever. I am deeply indebted to my PhD advisor, Professor Gurindar S. Sohi, for the constant guidance he provided in the course of this research. Guri shared with me many of his ideas, and gave extremely insightful advice aimed at making me aim higher and higher in doing research. His ability to identify signi®cant ideas, and suppress insigni®cant ones was invaluable. I feel honored and privileged to have worked with him for this thesis. I am very grateful to my MS advisor, Professor Kewal K. Saluja, for his encouragement and advice throughout my graduate studies, and for the opportunity to do research with him in many areas. I am deeply indebted to Professor James R. Goodman for his help and encouragement in steer- ing me into Computer Sciences. I would like to thank Professor Mark Hill for his insightful com- ments on this thesis. I am very grateful to Professor Wilson Pearson for providing all facilities at Clemson University, where I can continue my career. Members of my family, each in his/her own way, have taught me and loved me far beyond what I could ever repay. I would like to acknowledge my father Dr. G. Aruldhas and mother Mrs. Myrtle Grace Aruldhas, for their unfailing concern, care and support throughout my life. They have always encouraged me to strive for insight and excellence, while staying in the spiritual path. A heartfelt thanks to my sister, Dr. A. Ann Adeline, for her continuous love and support over the years. I am grateful to my uncle, Dr. G. Sundharadas, for his help and encouragement, especially in the beginning iii stages of my graduate studies, a time when it was most needed. I would like to acknowledge my friend Dr. B. Narendran, who kindled my interest in recreative mathematics and puzzles. Over the years, he has made many contributions to my papers on testing. I thank him for many helpful mathematical discussions and interesting puzzle solving sessions. On a different note, I thank my friend, Neena Dakua, for many stimulating philosophical discussions years back, which put me on a thinking path. I owe a special debt to her, D. Chidambara Krishnan, and my colleagues in Bangalore, for helping me to come to the USA for higher studies. I would like to acknowledge my friend Dr. Rajesh Mansharamani for taking care of all the logistics in connection with my thesis submission. Dionisios N. Pnevmatikatos took care of all my UNIX questions. I would like to acknowledge my close friends in India, R. Rajendra Kumar and D. Sanil Kumar, who have demonstrated the reality of ``friend in need'', by the warmth and affection of their sel¯ess love, which has accompanied me through the ups and downs of life. Rajendran's letters from far away have always been a constant source of encouragement. I would very much like to thank my brothers and sisters in Christ, for their prayers and care over the years in Madison. In particular, I would like to thank Chellapandi Subramanian, Adharyan T. Dayan, Dr. Suseela Rajkumar, Mathews Abraham, Shelin Mathews, Paul Lanser, Karin Lanser, Professor Wayne M. Becker, and the members of the Faith Community Bible Church and Indonesian Christian Fellowship. They have made my time in Madison meaningful and enjoyable. The daily prayer meetings with Chella will always stay in my mind. Finally, the arrival of my better half Dr. Bini Lizbeth George into my life inspired me to come out of the dissertator cocoon. If not for her, I would still be analyzing every nook and corner of the multiscalar architecture. She has shared every nuance of the last two years with me, and I express my deep gratitude to her for her love and affection. My research as a graduate student was mostly supported by IBM through an IBM Graduate Fel- lowship, and also by Professor Gurindar Sohi through his NSF grant (contract CCR-8706722). iv Table of Contents Abstract ........................................................................................................................................ i Acknowledgements ..................................................................................................................... iii Table of Contents ........................................................................................................................ v Chapter 1. Introduction ................................................................................................................ 1 1.1. Why ILP Processors? ............................................................................................ 2 1.2. Challenges .............................................................................................................. 3 1.3. Contributions of this Thesis ................................................................................... 4 1.4. Organization of the Thesis ..................................................................................... 6 Chapter 2. ILP Processing ........................................................................................................... 8 2.1. Program Execution Basics ..................................................................................... 8 2.1.1. Constraints on ILP .................................................................................. 9 2.1.2. Program Representations ........................................................................ 9 2.2. Program Speci®cation and Execution Models ....................................................... 10 2.2.1. Program Speci®cation Order .................................................................. 12 2.2.1.1. Control-Driven Speci®cation ................................................. 12 2.2.1.2. Data-Driven Speci®cation ...................................................... 13 2.2.2. Program Execution Order ....................................................................... 14 2.3. Fundamentals of ILP Processing ............................................................................ 16 2.3.1. Expressing ILP ....................................................................................... 18 2.3.2. Extracting ILP by Software .................................................................... 19 2.3.2.1. Establishing a Window of Operations .................................... 19 2.3.2.2. Determining and Minimizing Dependencies ......................... 21 2.3.2.3.

The Multiscalar Architecture

GPU Implementation Over IPTV Software Defined Networking

Computer Hardware Architecture Lecture 4

Superscalar Fall 2020

Trends in Processor Architecture

Intro Evolution of Superscalar Processor

Parallel Architectures MICHAEL J

Modern Processor Design: Fundamentals of Superscalar

Multithreading

A VHDL Model of a Superscalar Implementation of the DLX Instruction Set Architcture

Superscalar Processors and Parallel Computer Systems Objective

Memory Data Flow, VLIW and EPIC Processors

Pipelined Instruction Dispatch Unit in a Superscalar Processor