Automatic SIMD Vectorization

Total Page:16

File Type:pdf, Size:1020Kb

Automatic SIMD Vectorization Die approbierte Originalversion dieser Dissertation ist an der Hauptbibliothek der Technischen Universität Wien aufgestellt (http://www.ub.tuwien.ac.at). The approved original version of this thesis is available at the main library of the Vienna University of Technology (http://www.ub.tuwien.ac.at/englweb/). TECHNISCHE UNIVERSITÄT WIEN ru VIENNA UNIVERSITY OF VIENNA TECHNOLOGY Dissertation Automatic SIMD Vectorization ausgeführt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Wissenschaften unter der Leitung von Ao. Univ.-Prof. Dipl.-Ing. Dr. techn. Christoph W. Überhuber E101 - Institut für Analysis und Scientific Computing eingereicht an der Technischen Universität Wien Fakultät für Informatik von Dipl.-Ing. Jürgen Lorenz Matrikelnummer 9525135 Zenogasse 3/4 1120 Wien Wien, am 16. März 2004 Kurzfassung Automatisch generierter numerischer Quellcode kann von handelsüblichen Compilern nicht optimal verarbeitet werden. Algorithmische Strukturen werden nur unzureichend erkannt und dementsprechend schlecht ausgenutzt. Es werden daher spezialisierte Com- pilertechniken benötigt um zu einem zufriedenstellend effizienten Zielcode zu gelangen. Die immer weiter fortschreitende Zunahme der Bedeutung von Multimedia- Anwendungen hat zur Entwicklung von SIMD-Befehlssatzerweiterungen auf allen gängigen Mikroprozessoren geführt. Die konventionellen Vektorisierungsmethoden der handelsüblichen Compiler sind jedoch nicht in der Lage, Parallelismus in einem längeren numerischen Straight-Line-Code zu erkennen. In dieser Dissertation werden Techniken, die speziell die Übersetzung numerischer Rou- tinen zum Ziel haben, vorgestellt. Die automatische 2-fach-SIMD-Vektorisierung ent- deckt den in einer Routine vorhandenen SIMD-Parallelismus und garantiert einen zu- friedenstellenden Grad der Nutzung von SIMD-Befehlen. Die Gucklock-Optimierung ersetzt bestimmte Kombinationen von Instruktionen durch effizientere. Für den letz- ten Teil der Übersetzung werden synthèse-spefizische Techniken eingesetzt, die deutlich zur Steigerung der Qualität des übersetzten Programmcodes - über das Maß han- delsüblicher Übersetzer hinaus - beitragen. Die in dieser Dissertation präsentierten Konzepte stellen die Grundlage des Special Purpose Compilers MAP [70, 71, 72, 73, 75] dar, der speziell für die effiziente Vekto- risierung großer numerischer Straight-Line-Codes entwickelt wurde. MAP unterstützt die führenden automatischen Performance-Tuning-Softwareprodukte: FFTW1, SPIRAL2 und ATLAS3. Damit werden die wichtigsten Algorithmen der numerischen linearen Al- gebra und der digitalen Signalverarbeitung abgedeckt. Zu den praktischen Ergebnissen dieser Dissertation zählen: (i) die schnellsten derzeit verfügbaren FFT-Codes für AMD Athlon basierte Systeme; (ii) automatisch erzeug- te FFT-Codes für den Intel Pentium 4, die vergleichbare Leistung wie die schnellsten handoptimierten Codes liefern und {in) die schnellsten derzeit verfügbaren FFT-Codes für den IBM PowerPC 440 FP2 Prozessor, der im derzeit von IBM entwickelten Super- computer BlueGene/L Verwendung findet. Die experimentellen Resultate wurden durch Anwendung des MAP Special Purpose Compilers auf von FFTW (Frigo and Johnson [35]), SPIRAL (Moura et al. [88]) und ATLAS (Whaley et al. [115]) erzeugten Code erzielt. 1FFTW ist die Abkürzung von "Fastest Fourier Transform in the West" (schnellste Fourier- Transformation des Westens). 2SPIRAL ist die Abkürzung von "Signal Processing Algorithms Implementation Research for Adaptive Libraries" (Implementierungsforschung an Signalverarbeitungsalgorithmen für adap- tive Software). 3ATLAS ist die Abkürzung von "Automatically Tuned Linear Algebra Software" (automa- tisch angepasste Software für Probleme der Linearen Algebra). Summary General purpose compilers produce suboptimal object code when applied to automat- ically generated numerical source code. Moreover, general purpose compilers have natural limits in deducing and utilizing information about the structure of the imple- mented algorithms. Specialized compilation techniques, as introduced in this thesis, are needed to realize such structural transformations. Increasing focus on multimedia applications has resulted in the addition of short vector SIMD extensions to most existing general-purpose microprocessors. This added functionality comes primarily with the addition of short vector SIMD instructions. Un- fortunately, access to these instructions is limited to proprietary language extensions, in-line assembly, and library calls. Generally, it has been assumed that vector com- pilers provide the most promising means of exploiting multimedia instructions. But although vectorization technology is well understood, it is inherently complex and frag- ile. Conventional vectorizers are incapable of locating SIMD style parallelism within a basic block without introducing unacceptably large overhead. Without the adoption of SIMD extensions, 50% to 75% of a machine's possible performance is wasted, even in conjunction with state-of-the-art performance tuning software. Automatic exploitation of SIMD instructions therefore requires new compiler technology. This thesis presents techniques tailor-made for the compilation of numerical straight line code: (i) Automatic 2-way SIMD vectorization extracts SIMD parallelism out of numerical straight line code in a way that guarantees a satisfactory level of SIMD utilization, (ii) Peephole optimization rewrites the SIMD vectorized code in a do- main specific manner to further improve efficiency. (iii) Backend specific techniques are used for compiling the vectorized and optimized code, yielding object code whose performance is significantly better than the performance of code produced by general purpose compilers. The new compiler techniques presented in this thesis form the basis of the "MAP Vectorizer and Backend" [70, 71, 72, 73, 75], a special purpose compiler, which was designed especially for the efficient vectorization and backend optimization of large basic blocks of numerical straight line code. MAP is currently applicable to the state- of-the-art performance tuning systems FFTW4, SPIRAL5 and ATLAS6. Thus, MAP is covering a broad range of highly important algorithms in the fields of digital signal processing and numerical linear algebra. The experimental results were obtained using the MAP vectorizer and backend in connection with FFTW [35], SPIRAL [88] and ATLAS [115]. 4FFTW is the abbreviation of "Fastest Fourier Transform in the West". 5SPIRAL is the abbreviation of "Signal Processing Algorithms Implementation Research for Adaptive Libraries". 6ATLAS is the abbreviation of "Automatically Tuned Linear Algebra Software ". Contents Kurzfassung / Summary 2 Introduction 8 1 Hardware vs. Algorithms 14 1.1 Current Hardware Trends 14 1.2 Performance Implications 16 1.3 Automatic Performance Tuning 21 2 Standard Hardware 27 2.1 Processors • • • 27 2.2 Advanced Architectural Features 31 2.3 The Memory Hierarchy 33 3 Short Vector Hardware 36 3.1 Short Vector Extensions 37 3.2 Intel's Streaming SIMD Extensions 42 3.3 AMD's 3DNow! 47 3.4 The IBM BlueGene/L Supercomputer 48 3.5 Vector Computers vs. Short Vector SIMD 48 4 Fast Algorithms for Linear Transforms 52 4.1 Discrete Linear Transforms . 52 5 Matrix Multiplication 58 5.1 The Problem Setting 58 6 A Portable SIMD API 61 6.1 Definition of the Portable SIMD API 62 7 Automatic Vectorization of Straight-Line Code 67 7.1 Vectorization of Straight Line Code 68 7.2 The Vectorization Approach 68 7.3 Benefits of The MAP Vectorizer 69 7.4 Virtual Machine Models 70 7.5 The Vectorization Engine 75 7.6 Pairing Rules 83 CONTENTS 5 8 Rewriting and Optimization 93 8.1 Optimization Goals 94 8.2 Peephole Optimization 94 8.3 Transformation Rules 96 8.4 The Scheduler 101 9 Backend Techniques for Straight-Line Code 107 9.1 Optimization Techniques Used in MAP 107 9.2 Backend Optimization Goals 108 9.3 One Time Optimizations 109 9.4 Feedback Driven Optimizations 114 10 Experimental Results 118 10.1 Experimental Layout 118 10.2 BlueGene/L Experiments 119 10.3 Experiments on IA-32 Architectures 121 Conclusion and Outlook 129 A The Kronecker Product Formalism 130 A.I Notation 131 A.2 Extended Subvector Operations 134 A.3 Kronecker Products 134 A.4 Stride Permutations 138 A.5 Twiddle Factors and Diagonal Matrices 142 B Compiler Techniques 144 B.I Register Allocation and Memory Accesses 144 B.2 Instruction Scheduling 145 C Performance Assessment 147 C.I Short Vector Performance Measures 149 C.2 Empirical Performance Assessment 149 D Short Vector Instruction Set 157 D.I The Intel Streaming SIMD Extensions 2 157 E The Portable SIMD API 161 E.I Intel Streaming SIMD Extensions 2 161 E.2 BG/L Double FPU SIMD Extensions 163 F SPIRAL Example Code 167 F.I Scalar C Code 167 F.2 Short Vector Code 170 6 CONTENTS G FFTW Example Code 175 G.I Scalar C Code 175 G.2 Short Vector Code 176 H ATLAS Example Code 179 H.I Scalar C Code 179 H.2 Short Vector Code 181 Table of Abbreviations 186 Bibliography 187 Curriculum Vitae 197 Acknowledgements First of all, I would like to express my sincere gratitude to my Ph. D. advisor Christoph Ueberhuber, who gave me the opportunity of working in the ASCOT team operating in the interesting research field of scientific computing. I espe- cially want to express my gratitude for leading my first steps in the area of high performance computing and for giving me the possibility to work at the forefront of scientific research. He supported my efforts with great dedication, not only during the work on my thesis, but also
Recommended publications
  • Beyond BIOS Developing with the Unified Extensible Firmware Interface
    Digital Edition Digital Editions of selected Intel Press books are in addition to and complement the printed books. Click the icon to access information on other essential books for Developers and IT Professionals Visit our website at www.intel.com/intelpress Beyond BIOS Developing with the Unified Extensible Firmware Interface Second Edition Vincent Zimmer Michael Rothman Suresh Marisetty Copyright © 2010 Intel Corporation. All rights reserved. ISBN 13 978-1-934053-29-4 This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. Fictitious names of companies, products, people, characters, and/or data mentioned herein are not intended to represent any real individual, company, product, or event. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel, the Intel logo, Celeron, Intel Centrino, Intel NetBurst, Intel Xeon, Itanium, Pentium, MMX, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
    [Show full text]
  • Class-Action Lawsuit
    Case 3:20-cv-00863-SI Document 1 Filed 05/29/20 Page 1 of 279 Steve D. Larson, OSB No. 863540 Email: [email protected] Jennifer S. Wagner, OSB No. 024470 Email: [email protected] STOLL STOLL BERNE LOKTING & SHLACHTER P.C. 209 SW Oak Street, Suite 500 Portland, Oregon 97204 Telephone: (503) 227-1600 Attorneys for Plaintiffs [Additional Counsel Listed on Signature Page.] UNITED STATES DISTRICT COURT DISTRICT OF OREGON PORTLAND DIVISION BLUE PEAK HOSTING, LLC, PAMELA Case No. GREEN, TITI RICAFORT, MARGARITE SIMPSON, and MICHAEL NELSON, on behalf of CLASS ACTION ALLEGATION themselves and all others similarly situated, COMPLAINT Plaintiffs, DEMAND FOR JURY TRIAL v. INTEL CORPORATION, a Delaware corporation, Defendant. CLASS ACTION ALLEGATION COMPLAINT Case 3:20-cv-00863-SI Document 1 Filed 05/29/20 Page 2 of 279 Plaintiffs Blue Peak Hosting, LLC, Pamela Green, Titi Ricafort, Margarite Sampson, and Michael Nelson, individually and on behalf of the members of the Class defined below, allege the following against Defendant Intel Corporation (“Intel” or “the Company”), based upon personal knowledge with respect to themselves and on information and belief derived from, among other things, the investigation of counsel and review of public documents as to all other matters. INTRODUCTION 1. Despite Intel’s intentional concealment of specific design choices that it long knew rendered its central processing units (“CPUs” or “processors”) unsecure, it was only in January 2018 that it was first revealed to the public that Intel’s CPUs have significant security vulnerabilities that gave unauthorized program instructions access to protected data. 2. A CPU is the “brain” in every computer and mobile device and processes all of the essential applications, including the handling of confidential information such as passwords and encryption keys.
    [Show full text]
  • Christophe Lécuyer Revival Of
    CONFRONTING THE JAPANESE CHALLENGE: THE REVIVAL OF MANUFACTURING AT INTEL Christophe Lécuyer Abstract: 1 Like many other American corporations, Intel was outcompeted in manufacturing by Japanese firms in the late 1970s and the first half of the 1980s. By 1985, it became clear that the corporation’s weakness in production endangered its long-term survival. Responding to the Japanese challenge, Intel’s upper management instigated a fundamental reform of manufacturing. At their behest, production engineers and managers adopted Japanese manufacturing technologies and operating procedures. They put microchip fabrication on a scientific footing. They developed new ways of transfering processes from development to production and they standardized the firm’s factories. This major transformation enabled Intel to reach manufacturing parity with Japanese chip-makers by the early 1990s. About the author: Christophe Lécuyer teaches business history and the history of technology at Sorbonne Université. Acknowledgments: The author would like to thank David Hodges, Patrick Fridenson, Walter Friedman, and three anonymous referees for their insightful comments on previous versions of this article. His thanks also go to the Science History Institute for its support for this project. Introduction 2 The United States experienced a manufacturing crisis in the 1970s and during much of the 1980s. The competitiveness of American plants declined significantly vis à vis factories located in Japan. The crisis was so acute that, in many sectors, US corporations were forced to close down a significant fraction of their production facilities. Losing marketshare to Japanese competitors, integrated steel manufacturers shut down more than half of their mills and laid off 250,000 workers in the 1970s and early 1980s.
    [Show full text]
  • Intel® Technology Journal the Original 45Nm Intel Core™ Microarchitecture
    Volume 12 Issue 03 Published October 2008 ISSN 1535-864X DOI: 10.1535/itj.1203 Intel® Technology Journal The Original 45nm Intel Core™ Microarchitecture Intel Technology Journal Q3’08 (Volume 12, Issue 3) focuses on Intel® Processors Based on the Original 45nm Intel Core™ Microarchitecture: The First Tick in Intel’s new Architecture and Silicon “Tick-Tock” Cadence The Technical Challenges of Transitioning Original 45nm Intel® Core™ 2 Processor Performance Intel® PRO/Wireless Solutions to a Half-Mini Card Power Management Enhancements Greater Mobility Through Lower Power in the 45nm Intel® Core™ Microarchitecture Improvements in the Intel® Core™ 2 Penryn Processor Family Architecture Power Improvements on 2008 Desktop Platforms and Microarchitecture Mobility Thin and Small Form-Factor Packaging for Intel® Processors Based on Original 45nm The First Six-Core Intel® Xeon™ Microprocessor Intel Core™ Microarchitecture More information, including current and past issues of Intel Technology Journal, can be found at: http://developer.intel.com/technology/itj/index.htm Volume 12 Issue 03 Published October 2008 ISSN 1535-864X DOI: 10.1535/itj.1203 Intel® Technology Journal The Original 45nm Intel Core™ Microarchitecture Articles Preface iii Foreword v Technical Reviewers vii Original 45nm Intel® Core™ 2 Processor Performance 157 Power Management Enhancements in the 45nm Intel® Core™ Microarchitecture 169 Improvements in the Intel® Core™ 2 Penryn Processor Family Architecture 179 and Microarchitecture Mobility Thin and Small Form-Factor Packaging
    [Show full text]
  • A Complete Bibliography of Publications in the Intel Technology Journal
    A Complete Bibliography of Publications in the Intel Technology Journal Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 14 October 2017 Version 1.12 Title word cross-reference .NET [135]. /Fortran [99]. 10 [179, 178]. 130nm [104]. 2.0Volts [10]. 21st [9, 24]. 300mm [105, 21]. 30nm [108]. 3945ABG [268]. 3D [41]. 4 [246, 72, 73, 168, 171, 75, 76, 170, 169, 77]. 60nm [104]. 64 [26, 46, 45, 27, 28, 38, 44]. 64-bit [48]. 1 2 802.11b [49]. 802.16 [190, 189]. 90nm [168, 171, 170]. 915GMS [210]. 945GMS [267]. AAL2 [119]. Abstractions [137]. Access [190]. Achieving [115, 19]. Across [84, 204, 6]. Adaptive [199]. Adding [116]. Addressing [66, 160]. Advanced [162, 16, 241, 247, 254]. AdvancedTCA [159, 158]. AdvancedTCA* [254]. AdvancedTCA/CGL [159]. Advancements [202]. Advantage [159]. Air [153]. Algorithms [14]. All-IP [118]. Always [55]. Analysis [223, 220, 12, 16, 173, 114, 247, 170]. Antenna [146, 191]. AOAC [55]. Applicability [197]. Application [204, 179, 60, 158]. Applications [42, 217, 100, 25, 202, 198, 180, 117, 277, 135]. Architectural [62]. Architecture [48, 64, 204, 26, 172, 80, 254, 161, 65, 27, 23, 263, 28, 269, 56, 231, 96, 3, 34, 206, 83, 268, 38, 44, 271, 125, 29, 41]. Architectures [158]. Area [154]. Arithmetic [46]. Array [246, 61].
    [Show full text]
  • Intel Technology Journal
    Intel Technology Journal Volume 06 Issue 01 Published February 14, 2002 ISSN 1535766X Hyper-Threading Technology Table of Contents Articles Preface 2 Foreword 3 Hyper-Threading Technology Architecture and Microarchitecture 4 Pre-Silicon Validation of Hyper-Threading Technology 16 Speculative Precomputation: Exploring the Use of Multithreading for Latency Tools 22 Intel® OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance 36 Media Applications on Hyper-Threading Technology 47 Hyper-Threading Technology: Impact on Compute-Intensive Workloads 58 Preface q1. 2002 By Lin Chao, Publisher Intel Technology Journal This February 2002 issue of the Intel Technology Journal (ITJ) is full of new things. First, there is a new look and design. This is the first big redesign since the inception of the ITJ on the Web in 1997. The new design, together with inclusion of the ISSN (International Standard Serial Number), makes it easier to index articles into technical indexes and search engines. There are new “subscribe,” search ITJ, and “e-mail to a colleague” features in the left navigation tool bar. Readers are encouraged to subscribe to the ITJ. The benefit is subscribers are notified by e-mail when a new issue is published. The focus of this issue is Hyper-Threading Technology, a new microprocessor architecture technology. It makes a single processor look like two processors to the operating system. Intel's Hyper-Threading Technology delivers two logical processors that can execute different tasks simultaneously using shared hardware resources. Hyper-Threading Technology effectively looks like two processors on a chip. A chip with this technology will not equal the computing power of two processors; however, it will seem like two, as the performance boost is substantial.
    [Show full text]
  • Vision-Based Vehicle Tracking Using Image Alignment With
    Motion Tracking on Embedded Systems - Vision-based Vehicle Tracking using Image Alignment with Symmetrical Function CHEUNG, Lap Chi A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Philosophy in Computer Science and Engineering © The Chinese University of Hong Kong August 2007 The Chinese University of Hong Kong holds the copyright of this thesis. Any person(s) intending to use a pari or whole of the materials in the thesis in a proposed publication must seek copyright release from the Dean of the Graduate School. Iti^Hj u 謹 ITY y^/j ^XLIBRARY SYSTEM^纷 Thesis/Assessment Committee Professor WONG Kin Hong (Chair) Professor MOON Yiu Sang (Thesis Supervisor) Professor JIA Jiaya Leo (Committee Member) Professor FAIRWEATHER Graeme (External Examiner) i Abstract The large number of rear end collisions due to driver inattention has been identified as a major automotive safety issue in Intelligent Transportation Systems (ITS), part of whose goals is to apply automotive technology to improve driver safely with efficiency. In this thesis, we describe a 3-phase vehicle tracking methodology with a single moving camera mounted on the driver's automobile as input for detecting rear vehicles on highways and city streets for the purpose of preventing potential rear-end collisions. In the first phase, a rear vehicle is detected by symmetrical measurement and Haar transform. A template is also created for tracking the vehicle using image alignment techniques in the second phase. The template is continuously monitored and updated to handle abrupt changes in surrounding conditions in the third phase. Different from most previous methods which simply delect vehicles in each frame without correlating the successive frames, our methodology gives precise tracking, an essential requirement for estimating the distances between the rear vehicles and the driver's car using the newly proposed formula.
    [Show full text]
  • Jon Stokes Jon
    Inside the Machine the Inside A Look Inside the Silicon Heart of Modern Computing Architecture Computer and Microprocessors to Introduction Illustrated An Computers perform countless tasks ranging from the business critical to the recreational, but regardless of how differently they may look and behave, they’re all amazingly similar in basic function. Once you understand how the microprocessor—or central processing unit (CPU)— Includes discussion of: works, you’ll have a firm grasp of the fundamental concepts at the heart of all modern computing. • Parts of the computer and microprocessor • Programming fundamentals (arithmetic Inside the Machine, from the co-founder of the highly instructions, memory accesses, control respected Ars Technica website, explains how flow instructions, and data types) microprocessors operate—what they do and how • Intermediate and advanced microprocessor they do it. The book uses analogies, full-color concepts (branch prediction and speculative diagrams, and clear language to convey the ideas execution) that form the basis of modern computing. After • Intermediate and advanced computing discussing computers in the abstract, the book concepts (instruction set architectures, examines specific microprocessors from Intel, RISC and CISC, the memory hierarchy, and IBM, and Motorola, from the original models up encoding and decoding machine language through today’s leading processors. It contains the instructions) most comprehensive and up-to-date information • 64-bit computing vs. 32-bit computing available (online or in print) on Intel’s latest • Caching and performance processors: the Pentium M, Core, and Core 2 Duo. Inside the Machine also explains technology terms Inside the Machine is perfect for students of and concepts that readers often hear but may not science and engineering, IT and business fully understand, such as “pipelining,” “L1 cache,” professionals, and the growing community “main memory,” “superscalar processing,” and of hardware tinkerers who like to dig into the “out-of-order execution.” guts of their machines.
    [Show full text]
  • One Big PDF Volume
    Proceedings of the Linux Symposium July 14–16th, 2014 Ottawa, Ontario Canada Contents Btr-Diff: An Innovative Approach to Differentiate BtrFs Snapshots 7 N. Mandliwala, S. Pimpale, N. P. Singh, G. Phatangare Leveraging MPST in Linux with Application Guidance to Achieve Power and Performance Goals 13 M.R. Jantz, K.A. Doshi, P.A. Kulkarni, H. Yun CPU Time Jitter Based Non-Physical True Random Number Generator 23 S. Müller Policy-extendable LMK filter framework for embedded system 49 K. Baik, J. Kim, D. Kim Scalable Tools for Non-Intrusive Performance Debugging of Parallel Linux Workloads 63 R. Schöne, J. Schuchart, T. Ilsche, D. Hackenberg Veloces: An Efficient I/O Scheduler for Solid State Devices 77 V.R. Damle, A.N. Palnitkar, S.D. Rangnekar, O.D. Pawar, S.A. Pimpale, N.O. Mandliwala Dmdedup: Device Mapper Target for Data Deduplication 83 V. Tarasov, D. Jain, G. Kuenning, S. Mandal, K. Palanisami, P. Shilane, S. Trehan, E. Zadok SkyPat: C++ Performance Analysis and Testing Framework 97 P.H. Chang, K.H. Kuo, D.Y. Tsai, K. Chen, Luba W.L. Tang Computationally Efficient Multiplexing of Events on Hardware Counters 101 R.V. Lim, D. Carrillo-Cisneros, W. Alkowaileet, I.D. Scherson The maxwell(8) random number generator 111 S. Harris Popcorn: a replicated-kernel OS based on Linux 123 A. Barbalace, B. Ravindran, D. Katz Conference Organizers Andrew J. Hutton, Linux Symposium Emilie Moreau, Linux Symposium Proceedings Committee Ralph Siemsen With thanks to John W. Lockhart, Red Hat Robyn Bergeron Authors retain copyright to all submitted papers, but have granted unlimited redistribution rights to all as a condition of submission.
    [Show full text]
  • Intel® Technology Journal the Spectrum of Risk Management in a Technology Company
    Volume 11 Issue 02 Published, May 16, 2007 ISSN 1535-864X DOI:10.1535/itj.1102.04 Intel® Technology Journal The Spectrum of Risk Management in a Technology Company Using Forecasting Markets to Manage Demand Risk More information, including current and past issues of Intel Technology Journal, can be found at: http://developer.intel.com/technology/itj/index.htm Using Forecasting Markets to Manage Demand Risk Jay W. Hopman, Information Technology Innovation & Research, Intel Corporation Index words: prediction markets, forecasting, planning, performance incentives generations of product transitions discovered that ABSTRACT producing high-quality demand forecasts is difficult to Intel completed a study of several generations of products achieve consistently and that mistakes can be quite costly to learn how product forecasts and plans are managed, [1]. how demand risks manifest themselves, and how business Managing demand risk is critical to Intel’s success, but it processes contend with, and sometimes contribute to, is only one of many business challenges the company demand risk. The study identified one critical area prone faces. Across the organization, teams grapple with to breakdown: the aggregation of market insight from questions such as how many units of products x, y, and z customers. Information collected from customers and then customers will demand at certain prices, how much rolled up through sales, marketing, and business planning factory capacity should be funded, which products should teams is often biased, and it can lead to inaccurate be brought to market, which features and technologies forecasts, as evidenced by historical results. should be included in new products, and when new A research effort launched in 2005 sought to introduce products will be ready for production and distribution.
    [Show full text]
  • Recollections of Early Chip Development at Intel
    Recollections of Early Chip Development at Intel Andrew M. Volk, Desktop Platforms Group, Intel Corp. Peter A. Stoll, Technology & Manufacturing Group, Intel Corp. Paul Metrovich, Desktop Platforms Group, Intel Corp. Index words: history, products, naming, definition, validation, debug marketing decided to jazz it up with 4’s and 8’s” [1]. Dr. ABSTRACT Gordon Moore also was “one of the cooks” that In the early days of Intel, between the late 1960s and the developed the naming system [2]. So that’s how it late 1970s, there was a regular product naming scheme by started. which a process, product type, or product family could be Intel started with two processes: a PMOS polysilicon gate easily known. Few remain at Intel who remember this and a Schottky barrier diode bipolar process. One goal of scheme, and its source is all but forgotten. The naming the early products was to replace magnetic core memory scheme and many stories of early products were in computers with silicon memories. To that end, the first uncovered through interviews and reminiscences by the products were a 64-bit bipolar memory and a 256-bit authors, who among them have over 80 years of PMOS memory. The PMOS products were given experience at Intel. This is their story. numbers starting with 1xxx, and the bipolar products were given numbers starting with 3xxx. The second digit INTRODUCTION was a “1” for Random Access Memory (RAM), and the The genesis for this paper came from a seemingly simple last two digits were the product sequence number. The inquiry to the Intel® Technology Journal.
    [Show full text]
  • Addressing the Challenges of Tera-Scale Computing
    Intel® Technology Journal | Volume 13, Issue 4, 2009 Intel Technology Journal Publisher Managing Editor Content Architect Richard Bowles David King Jim Held Program Manager Technical Editor Technical Illustrators Stuart Douglas Marian Lacey InfoPros Technical and Strategic Reviewers Terry A. Smith Jim Hurley Ali-Reza Adl-Tabatabai Jesse Fang Sridhar Iyengar Joe Schutz Shekhar Borkar Greg Taylor Intel® Technology Journal | 1 Intel® Technology Journal | Volume 13, Issue 4, 2009 Intel Technology Journal Copyright © 2009 Intel Corporation. All rights reserved. ISBN 978-1-934053-23-2, ISSN 1535-864X Intel Technology Journal Volume 13, Issue 4 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Publisher, Intel Press, Intel Corporation, 2111 NE 25th Avenue, JF3-330, Hillsboro, OR 97124-5961. E mail: [email protected]. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter.
    [Show full text]