Addressing the Challenges of Tera-Scale Computing

Intel® Technology Journal | Volume 13, Issue 4, 2009 Intel Technology Journal Publisher Managing Editor Content Architect Richard Bowles David King Jim Held Program Manager Technical Editor Technical Illustrators Stuart Douglas Marian Lacey InfoPros Technical and Strategic Reviewers Terry A. Smith Jim Hurley Ali-Reza Adl-Tabatabai Jesse Fang Sridhar Iyengar Joe Schutz Shekhar Borkar Greg Taylor Intel® Technology Journal | 1 Intel® Technology Journal | Volume 13, Issue 4, 2009 Intel Technology Journal Copyright © 2009 Intel Corporation. All rights reserved. ISBN 978-1-934053-23-2, ISSN 1535-864X Intel Technology Journal Volume 13, Issue 4 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Publisher, Intel Press, Intel Corporation, 2111 NE 25th Avenue, JF3-330, Hillsboro, OR 97124-5961. E mail: [email protected]. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. Third-party vendors, devices, and/or software are listed by Intel as a convenience to Intel’s general customer base, but Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of these devices. This list and/or these devices may be subject to change without notice. Fictitious names of companies, products, people, characters, and/or data mentioned herein are not intended to represent any real individual, company, product, or event. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel, the Intel logo, Celeron, Intel Centrino, Intel Core Duo, Intel NetBurst, Intel Xeon, Itanium, Pentium, Pentium D, MMX, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. †Other names and brands may be claimed as the property of others. This book is printed on acid-free paper. Publisher: Richard Bowles Managing Editor: David King Library of Congress Cataloging in Publication Data: Printed in United States of America 10 9 8 7 6 5 4 3 2 1 First printing: December, 2009 2 | Foreword Intel® Technology Journal | Volume 13, Issue 4, 2009 INTEL® TECHNOLOGY JOURNAL ADDRESSING THE CHALLENGES OF TERA-SCALE COMPUTING Articles Foreword ................................................................................................................................................................ 4 A Design Pattern Language for Engineering (Parallel) Software ........................................................................... 6 Hardware and Software Approaches for Deterministic Multi-processor Replay of Concurrent Programs............ 20 A Programming Model for Heterogeneous Intel® x86 Platforms �� 42 Flexible and Adaptive On-chip Interconnect for Tera-scale Architectures �� 62 Tera-scale Memory Challenges and Solutions ..................................................................................................... 80 Ultra-low Voltage Technologies for Energy-efficient Special-purpose Hardware Accelerators ........................... 102 Lessons Learned from the 80-core Tera-scale Research Processor ..................................................................118 Table of Contents | 3 Intel® Technology Journal | Volume 13, Issue 4, 2009 FOREWORD Jim Held PhD. Intel Fellow Director, Tera-scale Computing Research Intel Labs Intel Corporation The Intel® Tera-scale Computing Research Program is Intel’s overarching effort to shape the future of Intel processors and platforms, in order to accelerate the shift from frequency to parallelism for performance improvement. Intel researchers worldwide are already working on R&D projects to address the hardware and software challenges of building and programming systems with teraFLOPS of parallel performance that can process tera-bytes of data. This level of performance will enable exciting new and emerging applications, but will also require addressing challenges in everything from program architecture to circuit technologies. This issue of the Intel Technology Journal includes results from a range of research that walks down the ‘stack’ from application design to circuits. “Systems with teraFLOPS of parallel Emerging visual-computing applications require tera-scale performance in order to simulate worlds based on complex physical models. They use rich performance that can process tera-bytes user interfaces with video recognition and 3D graphics synthesis, and they of data.” are highly parallel. How can we build them? Architecting designs for such applications that fully exploit their inherent parallelism is a major software engineering challenge. As with most kinds of architecture, new programs will be based on a combination of preexisting patterns and an exploitation of application frameworks that support them. Tim Mattson and Kurt Koetzer describe their work to find the parallel patterns that are needed for concurrent software in their article entitled “A Design Pattern Language for Engineering (Parallel) Software.” The non-deterministic nature of concurrent execution has made debugging one of the toughest parts of delivering a parallel program. Gilles Pokam and his colleagues, in their article “Hardware/Software Approaches for Deterministic Multi-processor Replay of Concurrent Programs” describe their work on hardware and software to support debugging by recording and replaying execution in order to allow analysis and discovery of the subtle timing errors that come with the many possible executions of parallelism. “Visual-computing applications Future tera-scale platforms may be heterogeneous with a mixture of types of compute elements. Our August 2007 issue of the Intel Technology Journal require tera-scale performance in order included articles that described support for mixed-ISA co-processing. In to simulate worlds based on complex “Programming Model for Heterogeneous Intel® x86 Platforms” in this issue, Bratin Saha’s and his colleagues describe work in IA-ISA to provide support for physical models.” shared memory with a mixture of cache coherence models. 4 | Foreword Intel® Technology Journal | Volume 13, Issue 4, 2009 Mani Azimi and his colleagues’ article “Flexible and Adaptive On-Chip “Effective use of a mesh network Interconnect for Tera-scale Architectures,” describes research into on-die network fabric, and they show our evolution from an analysis of the challenges will require sophisticated support to and alternatives to the development of the protocols to exploit the potential of provide the routing and configuration a network on chip. Effective use of a mesh network will require sophisticated support to provide the routing and configuration management for fairness, management for fairness, load load balancing, and congestion management. balancing, and congestion Perhaps the largest platform hardware challenge for tera-scale computing is management.” the longstanding one of access to memory to match the tremendous compute density of many cores on a die. Moreover, an effective solution must also meet the declining cost and power consumption targets of the mainstream market segments. Dave Dunning and his colleagues, in the article “Tera-scale Memory Challenges and Solutions” outline the problems and our research agenda in this critical area. The continuing challenge for the core of tera-scale platforms is how to continue to increase energy efficiency. As process technology advances continue to give us more transistors, we can add more cores, but unless we improve their efficiency, we won’t be able to use them. Ram Krishnamurthy’s team continues to make progress in improving the energy efficiency of computations with designs for ALUs that exploit near-threshold voltage circuits and extremely “For some research questions fine-grained power management. Their work is described in an article “Ultra- there is no substitute for a silicon low Voltage Technologies for Energy-efficient Special-purpose Hardware Accelerators.” implementation.” Finally, for some research questions there is no substitute for a silicon implementation: therefore, we built

Addressing the Challenges of Tera-Scale Computing

Beyond BIOS Developing with the Unified Extensible Firmware Interface

Class-Action Lawsuit

Christophe Lécuyer Revival Of

Intel® Technology Journal the Original 45Nm Intel Core™ Microarchitecture

A Complete Bibliography of Publications in the Intel Technology Journal

Intel Technology Journal

Vision-Based Vehicle Tracking Using Image Alignment With

Jon Stokes Jon

One Big PDF Volume

Intel® Technology Journal the Spectrum of Risk Management in a Technology Company

Recollections of Early Chip Development at Intel

Automatic SIMD Vectorization