Approche Logicielle Pour Améliorer La Fiabilité D'applications Parallèles

Total Page:16

File Type:pdf, Size:1020Kb

Approche Logicielle Pour Améliorer La Fiabilité D'applications Parallèles THÈSE Pour obtenir le grade de DOCTEUR DE LA COMMUNAUTÉ UNIVERSITÉ GRENOBLE ALPES Spécialité : NANO ÉLECTRONIQUE ET NANO TECHNOLOGIES Arrêté ministérial : 25 mai 2016 Présentée par Vanessa Carolina VARGAS VALLEJO Thèse dirigée par Raoul VELAZCO, Directeur de Recherche, CNRS et codirigée par Jean-François MÉHAUT, Professeur, Grenoble UJF préparée au sein Laboratoire Techniques de l’informatique et de la Mi- croélectronique pour l’Architecture des Systèmes Intégrées et de l’École Doctorale Électronique, Électrotechnique, Automatique, Traitement du Signal (EEATS) Approche logicielle pour améliorer la fiabilité d’applications parallèles implémentées sur des processeurs multi-cœur et many-cœur Thèse soutenue publiquement le 28 avril 2017, devant le jury composé de : Monsieur Fréderic PÉTROT Professeur, Président, Grenoble INP Monsieur Luc GIRAUD Directeur de recherche, Rapporteur, INRIA Délegation aquitaine Monsieur Olivier ROMAIN Professeur, Rapporteur, Université de Cergy-Pontoise Monsieur Nacer-Eddine ZERGAINOH Maître de conférences, Examinateur, Université Grenoble Alpes Acknowledgement First and all above, I would like to thank God, the almighty, for all the blessings He shed in my life, without Him and his support this work would not be possible. Special thanks to the Ecuadorian government through the Secretaría de Educación Superior, Ciencia, Tecnología e Innovación del Ecuador and Universidad de las Fuerzas Armadas ESPE for the confidence placed in me and my abilities to fulfill this challenge. I would like to express my deep thankfulness and appreciation to my academic advisor, Dr. Raoul Velazco for his helpful guidance, valuable availability and precious optimism throughout the development of this research. I want to express my deep gratitude to my academic co-advisor, Prof. Jean -François Méhaut for his constant involvement, constructive advices, and priceless readiness during this thesis. Special thanks to my thesis reviewers Prof. Luc Giraud et Prof. Olivier Romain, for their invaluable comments and relevant questions that help me to improve this document. I acknowledge all the staff of TIMA and LIG laboratories for their hospitality and cooperation, special thanks to Anne-Laure Fourneret, Fréderic Chevrot and Christian Seguy for their availability and service. I would like to greatly appreciate Maud Baylac, Francesca Villa and Solene Rey from the LPSC laboratory for their long-lasting cooperation in the execution of the radiation tests experiments. I want to express my gratitude to Stéphane Gailhard and Renaud Stevens from Kalray Company for their support in the applications’ development and the experiments on the MPPA-256 many-core processor. I would like to thank all my friends that I met during this stage of my life, specially Robert, Jessica, Alejandro, Audrey, Martin, Anaïs, Adrien, Cristina et Edison. Special deep thanks to my mother, my mother-in-law and my aunt Talita for all the efforts and sacrifices that they have made on behalf of my family and for taking care with love of my children during different periods along these research years. I want to thank all my family for their prays, invaluable support and love, specially to my parents, who have always trusted in me and encouraged me to fight for my dreams, and my siblings for their unconditional availability, confidence and help during these years. I want to thank my little children, for their great comprehension and tenderness, for their smiles, kisses and hugs that fortify my soul. Lastly, I want to thank my wonderful husband for his motivation, encouragement, and patience during the development of this thesis, for support me every time especially in the tough moments, love me and make me happy. Vanessa Vargas Dedication To the love of my life, Pablo, and my tresors, Matías and Emmanuel... To my beloved parents Guido and Lorgia... "The Lord is my strength and my shield; my heart trusts in him, and he helps me" Ps 28:7 Contents List of Tables vi List of Figures viii List of abbreviations ix 1 Introduction 1 1.1 Research issues . .3 1.2 Context of the thesis . .4 1.3 Thesis outline . .4 2 Background 7 2.1 Radiation Effects on Electronic circuits . .7 2.1.1 Radiation Environment . .7 2.1.2 Single Event Effects . .9 2.1.3 Consequences caused by transient effects . 10 2.1.4 Evaluation of the radiation effects on electronic circuits . 11 2.1.5 Mitigation of Radiation effects . 13 2.2 Multi-core and many-core processors . 14 2.2.1 Architectural concept . 14 2.2.2 Memory hierarchy models . 16 2.2.3 Software issues . 17 2.2.4 Sensitivity to neutron radiation . 20 2.3 Reliability of multi-core and many-core . 23 2.3.1 Conceptual basis . 23 2.3.2 Redundancy techniques . 26 i 2.3.3 Partitioning . 28 2.4 Discussion . 29 3 Experimental Methodology to evaluate the impact of SEEs Sensitivity 31 3.1 Generalities . 31 3.2 Evaluation strategies . 33 3.2.1 Fault-injection strategy . 34 3.2.2 Neutron-radiation facility . 35 3.3 Experimental Platforms . 36 3.3.1 P2041RDB Development Board . 37 3.3.2 MPPA Developer . 37 3.4 Benchmark applications . 38 3.4.1 Traveling Salesman Problem . 38 3.4.2 Matrix Multiplication . 40 3.4.3 Intrinsic characteristic issues . 41 3.5 Concluding Remarks . 41 4 Case-studies based on the multi-core processor 43 4.1 Description of the P2041 multi-core processor . 43 4.2 Case-study A: Multi-core processor running each core independently . 45 4.2.1 System configuration . 45 4.2.2 Experimental evaluation by neutron radiation . 46 4.3 Case-study B: Multi-core implementing TMR as a fault-tolerant technique 47 4.3.1 System configuration . 48 4.3.2 Fault-injector details . 48 4.3.3 Experimental evaluation by fault injection . 49 4.4 Case-study C: Multi-core sharing resources and maximizing inter-core com- munication . 52 4.4.1 System configuration . 53 4.4.2 Fault-injector details . 54 4.4.3 Experimental evaluation by fault injection . 55 4.4.4 Experimental evaluation by neutron radiation . 59 4.5 Discussion of the obtained results . 63 ii 4.6 Concluding Remarks . 67 5 Case-studies based on the many-core processor 69 5.1 Description of the MPPA-256 many-core processor . 69 5.1.1 Sensitive zones . 71 5.1.2 Programming MPPA issues . 72 5.2 Case-study D: Many-core execution with minimal use of Network-on-Chip (NoC) services . 72 5.2.1 System configuration . 73 5.2.2 Fault-injector details . 74 5.2.3 Experimental evaluation by fault injection . 75 5.2.4 Experimental evaluation by neutron radiation . 76 5.3 Case-study E: Many-core execution with intensive use of NoC services. 78 5.3.1 System configuration . 78 5.3.2 Benchmark details . 80 5.3.3 Fault-injector details . 83 5.3.4 Experimental evaluation by fault injection . 83 5.4 Discussion of the overall results . 87 5.5 Concluding remarks . 90 6 NMR-MPar: a fault-tolerant approach 93 6.1 Description . 93 6.2 Case-Study F: NMR-MPar implemented on the MPPA-256 processor . 95 6.2.1 System configuration . 95 6.2.2 Fault-injector details . 100 6.2.3 Experimental evaluation by fault injection . 100 6.3 Overall result comparison . 103 6.4 Concluding remarks . 106 7 Related work 107 7.1 SEE sensitivity of multi-core and many-core processors . 107 7.2 Reliability in multi-core and many-core processors . 108 7.2.1 Fault-tolerance by redundancy . 109 iii 7.2.2 Fault-tolerance by partitioning . 110 7.3 Contributions to the state-of-the-art . 111 8 Conclusions and perspectives 113 8.1 Contributions . 114 8.2 Future Works . 115 Annexe 117 A Implementation details of case-study F . 117 B Publications during this thesis . 121 Extended Abstract in French 123 Projects references 137 Bibliography 139 iv List of Tables 4.1 Sensitive areas of the P2041 targeted in this work . 44 4.2 Results of radiation experiments for the P2041 working in AMP scenario 46 4.3 Results of fault injection at register level for TMR MM AMP . 50 4.4 Results of fault injection in variables for TMR MM AMP . 51 4.5 Injection Error-Rates in Variables for TMR MM AMP . 52 4.6 Applications summary for TMR scenatio . 55 4.7 Fault-injection campaigns’ details for TMR scenario . 56 4.8 Results of fault injection in registers for SMP scenarios . 56 4.9 Results of fault injection in memory for SMP scenarios . 57 4.10 Error-rates for SMP scenarios in P2041 . 58 4.11 Test campaigns characteristics for TSP-SMP-P2041-NR scenario . 59 4.12 Results of radiation experiments for TSP-SMP-P2041-NR scenario . 60 4.13 Test campaigns characteristics for MM-SMP-P2041-NR scenario . 61 4.14 Results of neutron radiation tests for MM-SMP-P2041-NR . 62 5.1 Memory cells of the MPPA-256 many-core processor . 71 5.2 Results of the fault-injection campaigns for MM-AMP-MPPA . 75 5.3 Results of the dynamic radiation tests for MPPA scenarios . 77 5.4 Standard execution time for different configuration of TSP on the MPPA 80 5.5 Fault-injection campaigns details for the Multi Purpose Processing Array (MPPA) . 84 5.6 Results of the fault-injection campaigns on applications running on the MPPA..................................... 84 5.7 Error-rate by fault injection for POSIX scenarios in MPPA . 85 5.8 Summary of the different fault-injection scenarios evaluated on the MPPA 88 v 6.1 Execution-time comparison for different Traveling Salesman Problem (TSP) configurations on the MPPA . 99 6.2 NoC connections on applications running NMR-MPar on the MPPA . 99 6.3 Fault-injection campaigns details of the 4MR-8Par applications running on the MPPA .
Recommended publications
  • Development of a Dynamically Extensible Spinnaker Chip Computing Module
    FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Development of a Dynamically Extensible SpiNNaker Chip Computing Module Rui Emanuel Gonçalves Calado Araújo Master in Electrical and Computers Engineering Supervisor: Jörg Conradt Co-Supervisor: Diamantino Freitas January 27, 2014 Resumo O projeto SpiNNaker desenvolveu uma arquitetura que é capaz de criar um sistema com mais de um milhão de núcleos, com o objetivo de simular mais de um bilhão de neurónios em tempo real biológico. O núcleo deste sistema é o "chip" SpiNNaker, um multiprocessador System-on-Chip com um elevado nível de interligação entre as suas unidades de processamento. Apesar de ser uma plataforma de computação com muito potencial, até para aplicações genéricas, atualmente é ape- nas disponibilizada em configurações fixas e requer uma estação de trabalho, como uma máquina tipo "desktop" ou "laptop" conectada através de uma conexão Ethernet, para a sua inicialização e receber o programa e os dados a processar. No sentido de tirar proveito das capacidades do "chip" SpiNNaker noutras áreas, como por exemplo, na área da robótica, nomeadamente no caso de robots voadores ou de tamanho pequeno, uma nova solução de hardware com software configurável tem de ser projetada de forma a poder selecionar granularmente a quantidade do poder de processamento. Estas novas capacidades per- mitem que a arquitetura SpiNNaker possa ser utilizada em mais aplicações para além daquelas para que foi originalmente projetada. Esta dissertação apresenta um módulo de computação dinamicamente extensível baseado em "chips" SpiNNaker com a finalidade de ultrapassar as limitações supracitadas das máquinas SpiN- Naker atualmente disponíveis. Esta solução consiste numa única placa com um microcontrolador, que emula um "chip" SpiNNaker com uma ligação Ethernet, acessível através de uma porta série e com um "chip" SpiNNaker.
    [Show full text]
  • NVIDIA CUDA on IBM POWER8: Technical Overview, Software Installation, and Application Development
    Redpaper Dino Quintero Wei Li Wainer dos Santos Moschetta Mauricio Faria de Oliveira Alexander Pozdneev NVIDIA CUDA on IBM POWER8: Technical overview, software installation, and application development Overview The exploitation of general-purpose computing on graphics processing units (GPUs) and modern multi-core processors in a single heterogeneous parallel system has proven highly efficient for running several technical computing workloads. This applied to a wide range of areas such as chemistry, bioinformatics, molecular biology, engineering, and big data analytics. Recently launched, the IBM® Power System S824L comes into play to explore the use of the NVIDIA Tesla K40 GPU, combined with the latest IBM POWER8™ processor, providing a unique technology platform for high performance computing. This IBM Redpaper™ publication discusses the installation of the system, and the development of C/C++ and Java applications using the NVIDIA CUDA platform for IBM POWER8. Note: CUDA stands for Compute Unified Device Architecture. It is a parallel computing platform and programming model created by NVIDIA and implemented by the GPUs that they produce. The following topics are covered: Advantages of NVIDIA on POWER8 The IBM Power Systems S824L server Software stack System monitoring Application development Tuning and debugging Application examples © Copyright IBM Corp. 2015. All rights reserved. ibm.com/redbooks 1 Advantages of NVIDIA on POWER8 The IBM and NVIDIA partnership was announced in November 2013, for the purpose of integrating IBM POWER®-based systems with NVIDIA GPUs, and enablement of GPU-accelerated applications and workloads. The goal is to deliver higher performance and better energy efficiency to companies and data centers. This collaboration produced its initial results in 2014 with: The announcement of the first IBM POWER8 system featuring NVIDIA Tesla GPUs (IBM Power Systems™ S824L).
    [Show full text]
  • Openpower AI CERN V1.Pdf
    Moore’s Law Processor Technology Firmware / OS Linux Accelerator sSoftware OpenStack Storage Network ... Price/Performance POWER8 2000 2020 DRAM Memory Chips Buffer Power8: Up to 12 Cores, up to 96 Threads L1, L2, L3 + L4 Caches Up to 1 TB per socket https://www.ibm.com/blogs/syst Up to 230 GB/s sustained memory ems/power-systems- openpower-enable- bandwidth acceleration/ System System Memory Memory 115 GB/s 115 GB/s POWER8 POWER8 CPU CPU NVLink NVLink 80 GB/s 80 GB/s P100 P100 P100 P100 GPU GPU GPU GPU GPU GPU GPU GPU Memory Memory Memory Memory GPU PCIe CPU 16 GB/s System bottleneck Graphics System Memory Memory IBM aDVantage: data communication and GPU performance POWER8 + 78 ms Tesla P100+NVLink x86 baseD 170 ms GPU system ImageNet / Alexnet: Minibatch size = 128 ADD: Coherent Accelerator Processor Interface (CAPI) FPGA CAPP PCIe POWER8 Processor ...FPGAs, networking, memory... Typical I/O MoDel Flow Copy or Pin MMIO Notify Poll / Int Copy or Unpin Ret. From DD DD Call Acceleration Source Data Accelerator Completion Result Data Completion Flow with a Coherent MoDel ShareD Mem. ShareD Memory Acceleration Notify Accelerator Completion Focus on Enterprise Scale-Up Focus on Scale-Out and Enterprise Future Technology and Performance DriVen Cost and Acceleration DriVen Partner Chip POWER6 Architecture POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10 POWER8/9 2007 2008 2010 2012 2014 2016 2017 TBD 2018 - 20 2020+ POWER6 POWER6+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 SO 2 cores 2 cores 8 cores 8 cores 12 cores w/ NVLink
    [Show full text]
  • IBM Power Systems Performance Report Apr 13, 2021
    IBM Power Performance Report Power7 to Power10 September 8, 2021 Table of Contents 3 Introduction to Performance of IBM UNIX, IBM i, and Linux Operating System Servers 4 Section 1 – SPEC® CPU Benchmark Performance 4 Section 1a – Linux Multi-user SPEC® CPU2017 Performance (Power10) 4 Section 1b – Linux Multi-user SPEC® CPU2017 Performance (Power9) 4 Section 1c – AIX Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 5 Section 1d – Linux Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 6 Section 2 – AIX Multi-user Performance (rPerf) 6 Section 2a – AIX Multi-user Performance (Power8, Power9 and Power10) 9 Section 2b – AIX Multi-user Performance (Power9) in Non-default Processor Power Mode Setting 9 Section 2c – AIX Multi-user Performance (Power7 and Power7+) 13 Section 2d – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power8) 15 Section 2e – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power7 and Power7+) 20 Section 3 – CPW Benchmark Performance 19 Section 3a – CPW Benchmark Performance (Power8, Power9 and Power10) 22 Section 3b – CPW Benchmark Performance (Power7 and Power7+) 25 Section 4 – SPECjbb®2015 Benchmark Performance 25 Section 4a – SPECjbb®2015 Benchmark Performance (Power9) 25 Section 4b – SPECjbb®2015 Benchmark Performance (Power8) 25 Section 5 – AIX SAP® Standard Application Benchmark Performance 25 Section 5a – SAP® Sales and Distribution (SD) 2-Tier – AIX (Power7 to Power8) 26 Section 5b – SAP® Sales and Distribution (SD) 2-Tier – Linux on Power (Power7 to Power7+)
    [Show full text]
  • Probabilistic Study of End-To-End Constraints in Real-Time Systems Cristian Maxim
    Probabilistic study of end-to-end constraints in real-time systems Cristian Maxim To cite this version: Cristian Maxim. Probabilistic study of end-to-end constraints in real-time systems. Systems and Control [cs.SY]. Université Pierre et Marie Curie - Paris VI, 2017. English. NNT : 2017PA066479. tel-02003251 HAL Id: tel-02003251 https://tel.archives-ouvertes.fr/tel-02003251 Submitted on 1 Feb 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` DE DOCTORAT DE l'UNIVERSITE´ PIERRE ET MARIE CURIE Sp´ecialit´e Informatique Ecole´ doctorale Informatique, T´el´ecommunications et Electronique´ (Paris) Pr´esent´eepar Cristian MAXIM Pour obtenir le grade de DOCTEUR de l'UNIVERSITE´ PIERRE ET MARIE CURIE Sujet de la th`ese: Etude´ probabiliste des contraintes de bout en bout dans les syst`emestemps r´eel soutenue le 11 dec´embre 2017 devant le jury compos´ede : Mme. Liliana CUCU-GROSJEAN Directeur de th`ese M. Benoit TRIQUET Coordinateur Industriel Mme. Christine ROCHANGE Rapporteur M. Thomas NOLTE Rapporteur Mme. Alix MUNIER Examinateur M. Victor JEGU Examinateur M. George LIMA Examinateur M. Sascha UHRIG Examinateur Contents Table of contents iii 1 Introduction1 1.1 General Introduction...........................1 1.1.1 Embedded systems........................2 1.1.2 Real-time domain........................4 1.1.3 Avionics industry.........................9 1.2 Context.................................
    [Show full text]
  • 佐野正博(2010)「Cpu モジュール単体の性能指標としての Ips 値」2010 年度技術戦略論用資料
    佐野正博(2010)「CPU モジュール単体の性能指標としての IPS 値」2010 年度技術戦略論用資料 CPU モジュール単体の性能指標としての IPS 値 ............. .. CPU モジュール単体の性能を測る指標の一つは、「1秒間に何個の命令(Instructions)を実行可能であるのか」ということ である。そのための基本的単位は IPS(Instructions Per Second)である。代表的な CPU の IPS 値は下表の通りである。 ゲーム専用機用 CPU と PC 用 CPU、ゲーム専用機用 CPU でも据え置き型ゲーム専用機用 CPU と携帯型ゲーム専用 機用 CPU との区別に注意を払って下記の一覧表を見ると、CPU の MIPS 値は当然のことながら年ごとに性能を着実に向 上させていることがわかる。 IPS(Instructions Per Second)値の比較表 —- 年順 (命令実行性能の単位は MIPS、動作周波数の単位は MHz) 命令実行 動作 プロセッサーの名称 年 性能 周波数 Intel 4004 0.09 0.74 1971 Intel 8080 0.64 2 1974 セガ:メガドライブ/Motorola 68000 1 8 1979 Intel 80286 2.66 12 1982 Motorola 68030 11 33 1987 Intel 80386DX 8.5 25 1988 NEC:PC-FX/NEC V810 16 25 1994 Motorola 68040 44 40 1990 Intel 80486DX 54 66 1992 セガ:セガサターン/日立 SH-2 25 28.6 1994 SONY:PS/MIPS R3000A 30 34 1994 任天堂:N64/MIPS R4300 125 93.75 1996 Intel Pentium Pro 541 200 1996 PowerPC G3 525 233 1997 セガ:Dreamcast/日立 SH-4 360 200 1998 Intel Pentium III 1,354 500 1999 SONY:PS2 [Emotion Engine]/MIPS IV(R5900) 435 300 2000 任天堂:GAMECUBE [Gekko]/IBM PowerPC G3 (1) 1,125 485 2001 MICROSOFT:XBOX/Intel Mobile Celeron(Pentium III) 1,985 733 2001 (推定値) 任天堂:ゲームボーイアドバンス/ARM ARM7TDMI 15 16.8 2001 Pentium 4 Extreme Edition 9,726 3,200 2003 任天堂:ニンテンドーDS/ARM ARM946E-S 74.5 67 2004 MICROSOFT:XBOX360 [Xenon]/IBM PowerPC G5 6,400 3,200 2005 任天堂:Wii [BroadWay]/IBM PowerPC G3 2,250 729 2006 SONY:PS3[Cell 内蔵 PPE]/IBM PowerPC G5 10,240 3,200 2006 Intel Core 2 Extreme QX6700 49,161 2,660 2006 [出典]文書末の注で明示したものを除き、"Instructions per second", Wikipedia, http://en.wikipedia.org/wiki/Instructions_per_second, (2008
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Modern Processor Design: Fundamentals of Superscalar
    Fundamentals of Superscalar Processors John Paul Shen Intel Corporation Mikko H. Lipasti University of Wisconsin WAVELAND PRESS, INC. Long Grove, Illinois To Our parents: Paul and Sue Shen Tarja and Simo Lipasti Our spouses: Amy C. Shen Erica Ann Lipasti Our children: Priscilla S. Shen, Rachael S. Shen, and Valentia C. Shen Emma Kristiina Lipasti and Elias Joel Lipasti For information about this book, contact: Waveland Press, Inc. 4180 IL Route 83, Suite 101 Long Grove, IL 60047-9580 (847) 634-0081 info @ waveland.com www.waveland.com Copyright © 2005 by John Paul Shen and Mikko H. Lipasti 2013 reissued by Waveland Press, Inc. 10-digit ISBN 1-4786-0783-1 13-digit ISBN 978-1-4786-0783-0 All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without permission in writing from the publisher. Printed in the United States of America 7 6 5 4 3 2 1 Table of Contents PrefaceAbout the Authors x ix 1 Processor Design 1 1.1 The Evolution of Microprocessors 2 1.21.2.1 Instruction Digital Set Systems Processor Design Design 44 1.2.2 Architecture,Realization Implementation, and 5 1.2.3 Instruction Set Architecture 6 1.2.4 Dynamic-Static Interface 8 1.3 Principles of Processor Performance 10 1.3.1 Processor Performance Equation 10 1.3.2 Processor Performance Optimizations 11 1.3.3 Performance Evaluation Method 13 1.4 Instruction-Level Parallel Processing 16 1.4.1 From Scalar to Superscalar 16 1.4.2 Limits of Instruction-Level Parallelism 24 1.51.4.3 Machines Summary for Instruction-Level
    [Show full text]
  • Copy-Back Cache Organisation for An
    COPY-BACK CACHE ORGANISATION FOR AN ASYNCHRONOUS MICROPROCESSOR A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science & Engineering 2002 DARANEE HORMDEE Department of Computer Science Contents Contents ...................................................................................................................2 List of Figures .........................................................................................................6 List of Tables ...........................................................................................................8 Abstract ...................................................................................................................9 Declaration ............................................................................................................10 Copyright ...............................................................................................................10 The Author ............................................................................................................11 Acknowledgements ...............................................................................................12 Dedication .............................................................................................................13 Chapter 1: Introduction ....................................................................................14 1.1 Thesis organisation .................................................................................16
    [Show full text]
  • POWER8® Processor-Based Systems RAS Introduction to Power Systems™ Reliability, Availability, and Serviceability
    POWER8® Processor-Based Systems RAS Introduction to Power Systems™ Reliability, Availability, and Serviceability March 17th, 2016 IBM Server and Technology Group Daniel Henderson Senior Technical Staff Member, RAS Information in this document is intended to be generally descriptive of a family of processors and servers. It is not intended to describe particular offerings of any individual servers. Not all features are available in all countries. Information contained herein is subject to change at any time without notice. Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Hypervisor™ Power Systems™ Power Systems Software™ Memory™ POWER6® POWER7® POWER7+™ POWER8™ PowerHA® PowerLinux™ PowerVM® System x® System z® POWER Hypervisor™ Additional Trademarks may be identified in the body of this document. The following terms are trademarks of other companies: Intel, Intel Xeon, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
    [Show full text]
  • Lawrence Berkeley National Laboratory Recent Work
    Lawrence Berkeley National Laboratory Recent Work Title Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms Permalink https://escholarship.org/uc/item/48w9z3w5 Journal International Journal of High Performance Computing Applications, 27(4) ISSN 1094-3420 Authors Ibrahim, KZ Madduri, K Williams, S et al. Publication Date 2013-11-01 DOI 10.1177/1094342013492446 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Analysis and Optimization of Gyrokinetic Toroidal Simulations on Homogenous and Heterogenous Platforms Khaled Z. Ibrahim1, Kamesh Madduri2, Samuel Williams1, Bei Wang3 , Stephane Ethier4, Leonid Oliker1 1CRD, Lawrence Berkeley National Laboratory, Berkeley, USA 2 The Pennsylvania State University, University Park, PA, USA 3 Princeton Institute of Computational Science and Engineering, Princeton University, Princeton, NJ, USA 4 Princeton Plasma Physics Laboratory, Princeton, NJ, USA f KZIbrahim, SWWilliams, LOliker [email protected], [email protected], [email protected], [email protected] Abstract—The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This work presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and nVidia CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores. 1 Introduction Continuing the scaling and increasing the efficiency of high performance machines are exacerbating the complexity and diversity of computing node architectures.
    [Show full text]
  • Organization of the Motorola 88110 Superscalar RISC Microprocessor
    Organization of the Motorola 88110 Superscalar RISC Microprocessor Motorola’s second-generation RISC microprocessor employs advanced techniques for exploit- ing instruction-level parallelism, including superscak instruction issue, out-of-order instruc- tion completion, speculative execution, dynamic hx&ruction rescheduling, and two parallel, high-bandwidth, on-chip caches. Designed to serve as the central processor in low-cost per- sonal computers and workstations, the 88110 supports demanding graphics and digital signal processing applications. Keith Diefendorff otorola designers conceived of the puters and workstation systems. Thus, our de- 88000 RISC (reduced instruction- sign objective was good performance at a given Michael Allen set computer) architecture to cost, rather than ultimate performance at any cost. Dl simplify construction of micropro- We recognized that the personal computer envi- Motorola cessors capable of exploiting high degrees of in- ronment is moving toward highly interactive soft- struction-level parallelism without sacrificing clock ware, user-oriented interfaces, voice and image speed. The designers held architectural complexity processing, and advanced graphics and video, to a minimum to eliminate pipeline bottlenecks all of which would place extremely high demands and remove limitations on concurrent instruction on integer, floating-point, and graphics process- execution. ing capabilities. At the same time, we realized The 88100/200 is the first implementation of the 88110 would have to meet these performance the 88000 architecture. It is a three-chip set, re- demands while operating with the inexpensive quiring one CPU (88100) chip and two (or more> DRAM systems typically found in low-cost per- cache (88200) chips. The CPU’s simple scalar sonal computers.
    [Show full text]