Speculative Execution in High Performance Computer Architectures CHAPMAN & HALL/CRC COMPUTER and INFORMATION SCIENCE SERIES Series Editor: Sartaj Sahni
Total Page:16
File Type:pdf, Size:1020Kb
CHAPMAN & HALL/CRC COMPUTER and INFORMATION SCIENCE SERIES Speculative Execution in High Performance Computer Architectures CHAPMAN & HALL/CRC COMPUTER and INFORMATION SCIENCE SERIES Series Editor: Sartaj Sahni PUBLISHED TITLES HANDBOOK OF SCHEDULING: ALGORITHMS, MODELS, AND PERFORMANCE ANALYSIS Joseph Y-T. Leung THE PRACTICAL HANDBOOK OF INTERNET COMPUTING Munindar P. Singh HANDBOOK OF DATA STRUCTURES AND APPLICATIONS Dinesh P. Mehta and Sartaj Sahni DISTRIBUTED SENSOR NETWORKS S. Sitharama Iyengar and Richard R. Brooks SPECULATIVE EXECUTION IN HIGH PERFORMANCE COMPUTER ARCHITECTURES David Kaeli and Pen-Chung Yew CHAPMAN & HALL/CRC COMPUTER and INFORMATION SCIENCE SERIES Speculative Execution in High Performance Computer Architectures Edited by David Kaeli Northeastern University Boston, MA and Pen-Chung Yew University of Minnesota Minneapolis, MN Boca Raton London New York Singapore C4479_Discl.fm Page 1 Wednesday, April 20, 2005 8:02 AM Published in 2005 by CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2005 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10987654321 International Standard Book Number-10: 1-58488-447-9 (Hardcover) International Standard Book Number-13: 978-1-58488-447-7 (Hardcover) Library of Congress Card Number 2005041310 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Kaeli, David R. Speculative execution in high performance computer architectures / David Kaeli and Pen Yew. p. cm. -- (Chapman & Hall/CRC computer and information science series) Includes bibliographical references and index. ISBN 1-58488-447-9 (alk. paper) 1. Computer architecture. I. Yew, Pen-Chung, 1950- II. Title. III. Series. QA76.9.A73K32 2005 004'.35--dc22 2005041310 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com Taylor & Francis Group and the CRC Press Web site at is the Academic Division of T&F Informa plc. http://www.crcpress.com About the Editors David Kaeli received his B.S. in Electrical Engineering from Rutgers Uni- versity, his M.S. in Computer Engineering from Syracuse University, and his Ph.D. in Electrical Engineering from Rutgers University. He is currently an Associate Professor on the faculty of the Department of Electrical and Com- puter Engineering at Northeastern University. Prior to 1993, he spent 12 years at IBM, the last 7 at IBM T.J. Watson Research in Yorktown Heights, N.Y. In 1996 he received the NSF CAREER Award. He currently direct- s the Northeastern University Computer Architecture Research Laboratory (NUCAR). Dr. Kaeli’s research interests include computer architecture and organization, compiler optimization, VLSI design, trace-driven simulation and workload characterization. He is an editor for the Journal of Instruction Level Parallelism,theIEEE Computer Architecture Letters, and a past editor for IEEE Transactions on Computers. HeisamemberoftheIEEEandACM. URL: www.ece.neu.edu/info/architecture/nucar.html Pen-Chung Yew is a professor of the Department of Computer Science and Engineering, University of Minnesota. Previously, he was an Associate Di- rector of the Center for Supercomputing Research and Development (CSRD) at the University of Illinois at Urbana-Champaign. From 1991 to 1992, he served as the Program Director of the Microelectronic Systems Architecture Program in the Division of Microelectronic Information Processing Systems at the National Science Foundation, Washington, D.C. Pen-Chung Yew is an IEEE Fellow. He served as the Editor-in-Chief of the IEEE Trans. on Parallel and Distributed Systems between 2002-2005. He has served on the program committee of major conferences. He also served as a co-chair of the 1990 In- ternational Conference on Parallel Processing, a general co-chair of the 1994 International Symposium on Computer Architecture, the program chair of the 1996 International Conference on Supercomputing, and a program co-chair of the 2002 International Conference on High Performance Computer Architec- ture. He has also served on the editorial boards of the IEEE Transactions on Parallel and Distributed Systems from 1992 to 1996, and Journal of Par- allel and Distributed Computing from 1989 to 1995. He received his Ph.D. from the University of Illinois at Urbana-Champaign, M.S. from University of Massachusetts at Amherst, and B.S. from National Taiwan University. v Acknowledgments Professors Kaeli and Yew would like to thank their students for their help and patience in the preparation of the chapters of this text. They would also like to thank their families for their support on this project. vii Contents 1 Introduction 1 David R. Kaeli1 and Pen-Chung Yew2 Northeastern University,1 Uni- versity of Minnesota2 2 Instruction Cache Prefetching 9 Glenn Reinman UCLA Computer Science Department 3 Branch Prediction 29 Philip G. Emma IBM T.J. Watson Research Laboratory 4TraceCaches 87 Eric Rotenberg North Carolina State University 5 Branch Predication 109 David August Princeton University 6 Multipath Execution 135 Augustus K. Uht University of Rhode Island 7 Data Cache Prefetching 161 Yan Solihin1 and Donald Yeung2 North Carolina State University;1 University of Maryland at College Park2 8 Address Prediction 187 Avi Mendelson Intel Mobil Micro-Processor Architect 9 Data Speculation 215 Yiannakis Sazeides,1 Pedro Marcuello,2 James E. Smith,3 and An- tonio Gonz´alez2,4 1University of Cyprus; 2Intel-UPC Barcelona Research Center; 3University of Wisconsin-Madison; 4Universitat Polit`ecnica de Catalunya 10 Instruction Precomputation 245 Joshua J. Yi,1 Resit Sendag,2 and David J. Lilja3 1Freescale Semi- conductor Inc.; 2University of Rhode Island; 3University of Minnesota at Twin Cities ix xContents 11 Profile-Based Speculation 269 Youfeng Wu and Jesse Fang Intel Microprocessor Technology Labs 12 Compilation and Speculation 301 Jin Lin, Wei-Chung Hsu, and Pen-Chung Yew University of Minnesota 13 Multithreading and Speculation 333 Pedro Marcuello, Jesus Sanchez, and Antonio Gonz´alez Intel-UPC Barcelona Research Center; Intel Labs; Universitat Politecnica de Catalun- ya; Barcelona (Spain) 14 Exploiting Load/Store Parallelism via Memory Dependence Prediction 355 Andreas Moshovos University of Toronto 15 Resource Flow Microarchitectures 393 David A. Morano,1 David R. Kaeli,1 and Augustus K. Uht2 1Northeastern University; 2University of Rhode Island Index 421 List of Tables 4.1Growthininstruction-levelparallelism(ILP).......... 87 5.1IMPACTEPICpredicatedeposittypes............. 113 5.2 Five comparison type combinations for IA-64 . 124 6.1 Multipath taxonomy and machine/model characteristics . 139 10.1 Characteristics of the 2048 most frequent unique computations 249 10.2 Number of unique computations . 250 10.3 Key performance parameters . 254 10.4 Selected SPEC CPU 2000 benchmarks and input sets . 254 10.5 Processor performance bottlenecks . 261 12.1 Using control and data speculation to hide memory latency. 302 12.2Example12.2........................... 312 12.3 Enhanced Φ insertion allows data speculation. 317 12.4 Enhanced renaming allows data speculation . 318 12.5 An example of speculative load and check generation . 321 12.6 Different representations used to model check instructions and recoverycode........................... 322 12.7 Two examples of multi-level speculation . 323 12.8 Examples of check instructions and their recovery blocks in multi-levelspeculation...................... 324 12.9 Examples of check instructions and their recovery blocks in multi-levelspeculation...................... 325 12.10Example of recovery code generation in speculative PRE . 328 12.11The recovery code introduced in the speculative PRE interacts with the instruction scheduling . 329 14.1 Program input data sets . 376 14.2 Default configuration for superscalar timing simulations . 377 14.3 Misspeculations with speculation/synchronization . 387 xi List of Figures 2.1 Instruction delivery mechanism . 10 2.2Differentinstructioncachedesigns............... 12 2.3Sampleinstructionaddressstream............... 15 2.4Thestreambufferinaction................... 17 2.5 Out-of-order