Robustness Evaluation of Operating Systems

Total Page:16

File Type:pdf, Size:1020Kb

Robustness Evaluation of Operating Systems Robustness Evaluation of Operating Systems Vom Fachbereich Informatik der Technischen Universit¨at Darmstadt genehmigte Dissertation zur Erlangung des akademischen Grades eines Doktor-Ingenieur (Dr.-Ing.) vorgelegt von Andr´eas Johansson aus Falkenberg, Schweden Referenten: Prof. Neeraj Suri, Ph.D. Prof. Christof Fetzer, Ph.D. Datum der Einreichung: 19.11.2007 Datum der m¨undlichen Pr¨ufung: 14.01.2008 Darmstadt 2008 D17 ii Summary The premise behind this thesis is the observation that Operating Systems (OS), being the foundation behind operations of computing systems, are complex entities and also subject to failures. Consequently, when they do fail, the impact is the loss of system service and the applications running thereon. While a multitude of sources for OS failures exist, device drivers are often identified as a prominent cause behind failures. In order to characterize the impact of driver failures, at both the OS and application levels, this thesis develops a framework for error propagation-based robustness profiling for an OS. The framework is first developed conceptually and then experimentally validated on a real OS, namely Windows CE .Net. The choice of Windows CE is driven by its representativeness for a multitude of OS’s, as well as the ability to customize the OS components for particular needs. For experimental validation, fault injection is a prominent technique that can be used to simulate faults (or errors) in the system by inserting synthetic ones and study their effect. Three key questions with such a technique are where, what and when to inject faults. This thesis shows how injecting errors at the interface between drivers and the OS can be very effective in evaluating the effects driver faults can have. To quantify the OS’s robustness, this thesis defines a series of error propaga- tion measures, specifically tailored for device drivers. These measures allow for quantifying and comparing both individual services and device drivers on their susceptibility and diffusing abilities. This thesis compares three contemporary error models on their suitability for robustness evaluation. The classical bit-flip model is found to identify a higher number of severe failures in the system. It also identifies failures for more services than both other models, data type and fuzzing. However, its main drawback is that it requires substantially more injections than the other two. Fuzzing, even though not giving rise to as many failures is able to find new additional services with severe failures. A careful study of the injections performed with the bit-flip model shows that only a few bits are generally useful for identifying new services with robustness weaknesses. Consequently, a new composite model is proposed, combining the most effective bits of the bit-flip model with the fuzzing model’s ability to identify new services, giving rise to new model without loss of important information and at the same time incurring a moderate number of injections. To answer the question of when to inject an error this thesis proposes a novel model of a driver’s usage profile, focusing on high-level operations being carried out. It guides the injection of errors to instances when the driver is carrying out specific operations. Results from extensive fault injection experiments show that more service vulnerabilities can be discovered. Furthermore, a priori profiling of the drivers can show how effective the proposed approach will be. iii iv Kurzfassung Der Hintergrund dieser Dissertation beruht auf der Beobachtung, dass das Betriebssystem, welches die Grundlage f¨ur den Betrieb von Rechnersystemen darstellt, eine sehr komplexe Struktur aufweist, was h¨aufig zu Fehlern im Be- triebssystem f¨uhren kann. Wenn diese betriebssysteminternen Fehler Ausf¨alle von Diensten zur Folge haben, sind auch die im Rahmen des Betriebssystems laufenden Applikationen gef¨ahrdet. Auch wenn es im allgemeinen viele Fehlerquellen gibt, werden oft fehlerhafte Treiber als die h¨aufigste Ursache angegeben. Um die Auswirkungen von Treiberdefekten auf der Betriebssystem- und App- likationsebene zu charakterisieren, wird in dieser Dissertation ein auf der Ausbre- itung von Fehlern basierendes Framework f¨ur Robustheitsauswertung entwickelt. Das Framework wird sowohl konzeptionell entwickelt als auch auf einem echten Be- triebssystem experimentell validiert. Das gew¨ahlte Betriebssystem, Windows CE .Net, ist repr¨asentativ f¨ur viele andere Betriebssysteme. Es ist modular aufgebaut, was die Anpassung der Betriebssystemkomponenten an verschiedene Bed¨urfnisse erheblich vereinfacht. Fehlerinjektion ist eine bedeutende Technik f¨ur die experimentelle Validierung, wobei Fehler simuliert werden indem man sie in das System injiziert und ihre Fol- gen beobachtet. Drei wichtige Aspekte, die hierbei ber¨ucksichtigt werden m¨ussen, sind: Welche Fehler sollen wo und wann injiziert werden. In dieser Dissertation wird gezeigt, dass Fehlerinjektion in die Schnittstelle zwischen dem Betriebssystem und den Treibern eine effektive Vorgehensweise darstellt, die Folgen von Treiber- fehlern abzusch¨atzen. Um die Robustheit eines Betriebssystems zu quantifizieren, werden eine Reihe von Fehlerausbreitungsmetriken definiert, die speziell auf Treiberfehler zugeschnit- ten sind. Anhand dieser Metriken k¨onnen Dienste und Treiber hinsichtlich Empfindlichkeit und Ausbreitungsverm¨ogen verglichen werden. Diese Dissertation vergleicht drei zeitgem¨ae Fehlermodelle in Bezug auf ihre Tauglichkeit zur Robustheitsbewertung. Das klassische Bit-Flip-Modell ermittelt am h¨aufigsten schwere Ausf¨alle im System. Mehr als die beiden anderen Modelle, Data Type und Fuzzing, ermittelt dieses Modell auch die meisten Dienste, die zu Ausf¨allen f¨uhren k¨onnten. Der gr¨ote Nachteil dieses Modells ist allerdings, dass es sehr viele Injektionen erfordert. Fuzzing ermittelt weniger Dienste, daf¨ur aber neue fehlerhafte, von Bit-Flip nicht erkannte Dienste. Eine sorgf¨altige Untersuchung der Ergebnisse des Bit-Flip-Modells zeigt, dass schon eine Teilmenge der Bits ausreichend ist, um neue Dienste, die zu Robus- theitsausf¨allen f¨uhren, zu ermitteln. Daraufhin wird ein neues, zusammengesetztes Modell vorgeschlagen, das die guten Eigenschaften des Bit-Flip-Modells und das Verm¨ogen des Fuzzing-Modells neue Dienste zu identifizieren miteinander kom- biniert. Das neue Modell verliert keine wichtige Information, und erfordert insge- samt deutlich weniger Injektionen. Um die Frage zu beantworten wann es sinnvoll ist Fehler zu injizieren, wird ein neues, an das Benutzerprofil des Treibers angelehntes Timingmodell vorgeschla- v gen. Das neue Modell basiert auf der Ausf¨uhrung von Befehlen in einer h¨oheren Schicht. Bestimmte Fehlerinjektionen werden zum Zeitpunkt der Ausf¨uhrung bes- timmter Befehle get¨atigt. Die Ergebnisse der Fehlerinjektionen zeigen, dass ein Vielfaches an st¨orungsanf¨alligen Diensten gefunden werden kann. Auerdem gibt das Benutzerprofil des Treibers im Voraus Aufschlussuber ¨ die Effektivit¨at der neuen Methode. vi Acknowledgements The path to a Ph.D. is a long, winding one. At the beginning it is wide and spacious, only to become narrower and narrower. Sometimes it goes steeply upwards, sometimes downwards. Sometimes you think you see an opening and light after the next turn, only to find a dead end. There are crossings, where one has to choose which path to pursue. Some paths look more promising than others but you quickly learn that the easy path is not always the shortest. I am now at the end of the path, only to realize that it is the beginning of a new. I would not have gotten here without the assistance and encouragement of several people. First of all I would like to thank my guide and mentor, Prof. Neeraj Suri. Thanks for your guidance and support. I would also like to thank all present and former members of the DEEDS group: Martin, Vilgot, Orjan,¨ Arshad, Robert, Adina, Dinu, Marco, Dan, Ripon, Brahim, Faisal, Majid and Matthias. Many thanks also to Birgit, Ute, Sabine and Boyan. A great many thanks also to Prof. Chrstof Fetzer for accepting to be my opponent. I am also grateful for funding for conducting my research coming from the projects EC IP DECOS, EC NoE ReSIST and by research grants from Microsoft Research. A part of the research validation was accomplished over an internship at Microsoft Research, Cambridge. A special thanks to Brendan Murphy at MSR for hosting my internship and for all discussions and help writing papers. Finally to my beautiful and supporting wife, Mia. Thank you for every- thing! vii viii Contents 1 Introduction 1 1.1 Dependability: The Basic Concepts ............... 3 1.1.1 Dependability Attributes ................. 4 1.1.2 Dependability Threats .................. 5 1.1.3 Dependability Means ................... 6 1.1.4 Alternate Terminology: Software Engineering ..... 7 1.1.5 Bohrbugs and Heisenbugs ................ 7 1.2 Robustness Evaluation ...................... 8 1.3 Thesis Research Questions & Contributions .......... 9 1.4 Thesis Structure .......................... 14 2 Background and Context 15 2.1 A Short Operating System History ............... 16 2.1.1 OS Design ......................... 16 2.1.2 Device Drivers ....................... 17 2.1.3 What is the Problem? .................. 18 2.2 Sources of Failures of Operating Systems ............ 18 2.2.1 Hardware Related ..................... 19 2.2.2 Software Related ..................... 20 2.2.3 User Related ....................... 20 2.2.4 Device Drivers ....................... 21 2.3 Fault Injection .........................
Recommended publications
  • EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President
    EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President ISPASS 2005 Certified Performance Analysis for Embedded Systems Designers EEMBC: A Historical Perspective • Began as an EDN Magazine project in April 1997 • Replace Dhrystone • Have meaningful measure for explaining processor behavior • Developed business model • Invited worldwide processor vendors • A consortium was born 1 EEMBC Membership • Board Member • Membership Dues: $30,000 (1st year); $16,000 (subsequent years) • Access and Participation on ALL Subcommittees • Full Voting Rights • Subcommittee Member • Membership Dues Are Subcommittee Specific • Access to Specific Benchmarks • Technical Involvement Within Subcommittee • Help Determine Next Generation Benchmarks • Special Academic Membership EEMBC Philosophy: Standardized Benchmarks and Certified Scores • Member derived benchmarks • Determine the standard, the process, and the benchmarks • Open to industry feedback • Ensures all processor/compiler vendors are running the same tests • Certification process ensures credibility • All benchmark scores officially validated before publication • The entire benchmark environment must be disclosed 2 Embedded Industry: Represented by Very Diverse Applications • Networking • Storage, low- and high-end routers, switches • Consumer • Games, set top boxes, car navigation, smartcards • Wireless • Cellular, routers • Office Automation • Printers, copiers, imaging • Automotive • Engine control, Telematics Traditional Division of Embedded Applications Low High Power
    [Show full text]
  • Cloud Workbench a Web-Based Framework for Benchmarking Cloud Services
    Bachelor August 12, 2014 Cloud WorkBench A Web-Based Framework for Benchmarking Cloud Services Joel Scheuner of Muensterlingen, Switzerland (10-741-494) supervised by Prof. Dr. Harald C. Gall Philipp Leitner, Jürgen Cito software evolution & architecture lab Bachelor Cloud WorkBench A Web-Based Framework for Benchmarking Cloud Services Joel Scheuner software evolution & architecture lab Bachelor Author: Joel Scheuner, [email protected] Project period: 04.03.2014 - 14.08.2014 Software Evolution & Architecture Lab Department of Informatics, University of Zurich Acknowledgements This bachelor thesis constitutes the last milestone on the way to my first academic graduation. I would like to thank all the people who accompanied and supported me in the last four years of study and work in industry paving the way to this thesis. My personal thanks go to my parents supporting me in many ways. The decision to choose a complex topic in an area where I had no personal experience in advance, neither theoretically nor technologically, made the past four and a half months challenging, demanding, work-intensive, but also very educational which is what remains in good memory afterwards. Regarding this thesis, I would like to offer special thanks to Philipp Leitner who assisted me during the whole process and gave valuable advices. Moreover, I want to thank Jürgen Cito for his mainly technologically-related recommendations, Rita Erne for her time spent with me on language review sessions, and Prof. Harald Gall for giving me the opportunity to write this thesis at the Software Evolution & Architecture Lab at the University of Zurich and providing fundings and infrastructure to realize this thesis.
    [Show full text]
  • Automatic Benchmark Profiling Through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin
    Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin To cite this version: Alexis Martin, Vania Marangozova-Martin. Automatic Benchmark Profiling through Advanced Trace Analysis. [Research Report] RR-8889, Inria - Research Centre Grenoble – Rhône-Alpes; Université Grenoble Alpes; CNRS. 2016. hal-01292618 HAL Id: hal-01292618 https://hal.inria.fr/hal-01292618 Submitted on 24 Mar 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin , Vania Marangozova-Martin RESEARCH REPORT N° 8889 March 23, 2016 Project-Team Polaris ISSN 0249-6399 ISRN INRIA/RR--8889--FR+ENG Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin ∗ † ‡, Vania Marangozova-Martin ∗ † ‡ Project-Team Polaris Research Report n° 8889 — March 23, 2016 — 15 pages Abstract: Benchmarking has proven to be crucial for the investigation of the behavior and performances of a system. However, the choice of relevant benchmarks still remains a challenge. To help the process of comparing and choosing among benchmarks, we propose a solution for automatic benchmark profiling. It computes unified benchmark profiles reflecting benchmarks’ duration, function repartition, stability, CPU efficiency, parallelization and memory usage.
    [Show full text]
  • Automatic Benchmark Profiling Through Advanced Trace Analysis
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Hal - Université Grenoble Alpes Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin To cite this version: Alexis Martin, Vania Marangozova-Martin. Automatic Benchmark Profiling through Advanced Trace Analysis. [Research Report] RR-8889, Inria - Research Centre Grenoble { Rh^one-Alpes; Universit´eGrenoble Alpes; CNRS. 2016. <hal-01292618> HAL Id: hal-01292618 https://hal.inria.fr/hal-01292618 Submitted on 24 Mar 2016 HAL is a multi-disciplinary open access L'archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destin´eeau d´ep^otet `ala diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publi´esou non, lished or not. The documents may come from ´emanant des ´etablissements d'enseignement et de teaching and research institutions in France or recherche fran¸caisou ´etrangers,des laboratoires abroad, or from public or private research centers. publics ou priv´es. Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin , Vania Marangozova-Martin RESEARCH REPORT N° 8889 March 23, 2016 Project-Team Polaris ISSN 0249-6399 ISRN INRIA/RR--8889--FR+ENG Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin ∗ † ‡, Vania Marangozova-Martin ∗ † ‡ Project-Team Polaris Research Report n° 8889 — March 23, 2016 — 15 pages Abstract: Benchmarking has proven to be crucial for the investigation of the behavior and performances of a system. However, the choice of relevant benchmarks still remains a challenge. To help the process of comparing and choosing among benchmarks, we propose a solution for automatic benchmark profiling.
    [Show full text]
  • Introduction to Digital Signal Processors
    INTRODUCTION TO Accumulator architecture DIGITAL SIGNAL PROCESSORS Memory-register architecture Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Magesh Valliappan Load-store architecture Embedded Signal Processing Laboratory The University of Texas at Austin Austin, TX 78712-1084 http://anchovy.ece.utexas.edu/ Outline n Signal processing applications n Conventional DSP architecture n Pipelining in DSP processors n RISC vs. DSP processor architectures n TI TMS320C6x VLIW DSP architecture n Signal and image processing applications n Signal processing on general-purpose processors n Conclusion 2 Signal Processing Applications n Low-cost embedded systems 4 Modems, cellular telephones, disk drives, printers n High-throughput applications 4 Halftoning, radar, high-resolution sonar, tomography n PC based multimedia 4 Compression/decompression of audio, graphics, video n Embedded processor requirements 4 Inexpensive with small area and volume 4 Deterministic interrupt service routine latency 4 Low power: ~50 mW (TMS320C5402/20: 0.32 mA/MIP) 3 Conventional DSP Architecture n High data throughput 4 Harvard architecture n Separate data memory/bus and program memory/bus n Three reads and one or two writes per instruction cycle 4 Short deterministic interrupt service routine latency 4 Multiply-accumulate (MAC) in a single instruction cycle 4 Special addressing modes supported in hardware n Modulo addressing for circular buffers (e.g. FIR filters) n Bit-reversed addressing (e.g. fast Fourier transforms) 4Instructions to keep the
    [Show full text]
  • BOOM): an Industry- Competitive, Synthesizable, Parameterized RISC-V Processor
    The Berkeley Out-of-Order Machine (BOOM): An Industry- Competitive, Synthesizable, Parameterized RISC-V Processor Christopher Celio David A. Patterson Krste Asanović Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-167 http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-167.html June 13, 2015 Copyright © 2015, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor Christopher Celio, David Patterson, and Krste Asanovic´ University of California, Berkeley, California 94720–1770 [email protected] BOOM is a work-in-progress. Results shown are prelimi- nary and subject to change as of 2015 June. I$ L1 D$ (32k) L2 data 1. The Berkeley Out-of-Order Machine BOOM is a synthesizable, parameterized, superscalar out- exe of-order RISC-V core designed to serve as the prototypical baseline processor for future micro-architectural studies of uncore regfile out-of-order processors. Our goal is to provide a readable, issue open-source implementation for use in education, research, exe and industry. uncore BOOM is written in roughly 9,000 lines of the hardware L2 data (256k) construction language Chisel.
    [Show full text]
  • A Free, Commercially Representative Embedded Benchmark Suite
    MiBench: A free, commercially representative embedded benchmark suite Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, Richard B. Brown {mguthaus,jringenb,ernstd,taustin,tnm,brown}@eecs.umich.edu The University of Michigan Electrical Engineering and Computer Science 1301 Beal Ave., Ann Arbor, MI 48109-2122 Abstract Among the common features of these This paper examines a set of commercially microarchitectures are deep pipelines, significant representative embedded programs and compares them instruction level parallelism, sophisticated branch to an existing benchmark suite, SPEC2000. A new prediction schemes, and large caches. version of SimpleScalar that has been adapted to the Although this class of machines has been the chief ARM instruction set is used to characterize the focus of the computer architecture community, performance of the benchmarks using configurations relatively few microprocessors are employed in this similar to current and next generation embedded market segment. The vast majority of microprocessors processors. Several characteristics distinguish the are employed in embedded applications [8]. Although representative embedded programs from the existing many are just inexpensive microcontrollers, their SPEC benchmarks including instruction distribution, combined sales account for nearly half of all memory behavior, and available parallelism. The microprocessor revenue. Furthermore, the embedded embedded benchmarks, called MiBench, are freely application domain is the fastest growing market available to all researchers. segment in the microprocessor industry. The wide range of applications makes it difficult to 1. Introduction characterize the embedded domain. In fact, an Performance based design has made benchmarking embedded benchmark suite should reflect this by a critical part of the design process [1].
    [Show full text]
  • Low-Power High Performance Computing
    Low-Power High Performance Computing Panagiotis Kritikakos August 16, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract The emerging development of computer systems to be used for HPC require a change in the architecture for processors. New design approaches and technologies need to be embraced by the HPC community for making a case for new approaches in system design for making it possible to be used for Exascale Supercomputers within the next two decades, as well as to reduce the CO2 emissions of supercomputers and scientific clusters, leading to greener computing. Power is listed as one of the most important issues and constraint for future Exascale systems. In this project we build a hybrid cluster, investigating, measuring and evaluating the performance of low-power CPUs, such as Intel Atom and ARM (Marvell 88F6281) against commodity Intel Xeon CPU that can be found within standard HPC and data-center clusters. Three main factors are considered: computational performance and efficiency, power efficiency and porting effort. Contents 1 Introduction 1 1.1 Report organisation . 2 2 Background 3 2.1 RISC versus CISC . 3 2.2 HPC Architectures . 4 2.2.1 System architectures . 4 2.2.2 Memory architectures . 5 2.3 Power issues in modern HPC systems . 9 2.4 Energy and application efficiency . 10 3 Literature review 12 3.1 Green500 . 12 3.2 Supercomputing in Small Spaces (SSS) . 12 3.3 The AppleTV Cluster . 13 3.4 Sony Playstation 3 Cluster . 13 3.5 Microsoft XBox Cluster . 14 3.6 IBM BlueGene/Q .
    [Show full text]
  • Exploring Coremark™ – a Benchmark Maximizing Simplicity and Efficacy by Shay Gal-On and Markus Levy
    Exploring CoreMark™ – A Benchmark Maximizing Simplicity and Efficacy By Shay Gal-On and Markus Levy There have been many attempts to provide a single number that can totally quantify the ability of a CPU. Be it MHz, MOPS, MFLOPS - all are simple to derive but misleading when looking at actual performance potential. Dhrystone was the first attempt to tie a performance indicator, namely DMIPS, to execution of real code - a good attempt, which has long served the industry, but is no longer meaningful. BogoMIPS attempts to measure how fast a CPU can “do nothing”, for what that is worth. The need still exists for a simple and standardized benchmark that provides meaningful information about the CPU core - introducing CoreMark, available for free download from www.coremark.org. CoreMark ties a performance indicator to execution of simple code, but rather than being entirely arbitrary and synthetic, the code for the benchmark uses basic data structures and algorithms that are common in practically any application. Furthermore, EEMBC carefully chose the CoreMark implementation such that all computations are driven by run-time provided values to prevent code elimination during compile time optimization. CoreMark also sets specific rules about how to run the code and report results, thereby eliminating inconsistencies. CoreMark Composition To appreciate the value of CoreMark, it’s worthwhile to dissect its composition, which in general is comprised of lists, strings, and arrays (matrixes to be exact). Lists commonly exercise pointers and are also characterized by non-serial memory access patterns. In terms of testing the core of a CPU, list processing predominantly tests how fast data can be used to scan through the list.
    [Show full text]
  • CPU and FFT Benchmarks of ARM Processors
    Proceedings of SAIP2014 Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors Mitchell A Cox, Robert Reed and Bruce Mellado School of Physics, University of the Witwatersrand. 1 Jan Smuts Avenue, Braamfontein, Johannesburg, South Africa, 2000 E-mail: [email protected] Abstract. Projects such as the Large Hadron Collider at CERN generate enormous amounts of raw data which presents a serious computing challenge. After planned upgrades in 2022, the data output from the ATLAS Tile Calorimeter will increase by 200 times to over 40 Tb/s. ARM System on Chips are common in mobile devices due to their low cost, low energy consumption and high performance and may be an affordable alternative to standard x86 based servers where massive parallelism is required. High Performance Linpack and CoreMark benchmark applications are used to test ARM Cortex-A7, A9 and A15 System on Chips CPU performance while their power consumption is measured. In addition to synthetic benchmarking, the FFTW library is used to test the single precision Fast Fourier Transform (FFT) performance of the ARM processors and the results obtained are converted to theoretical data throughputs for a range of FFT lengths. These results can be used to assist with specifying ARM rather than x86-based compute farms for upgrades and upcoming scientific projects. 1. Introduction Projects such as the Large Hadron Collider (LHC) generate enormous amounts of raw data which presents a serious computing challenge. After planned upgrades in 2022, the data output from the ATLAS Tile Calorimeter will increase by 200 times to over 41 Tb/s (Terabits/s) [1].
    [Show full text]
  • DSP Benchmark Results of the GR740 Rad-Hard Quad-Core LEON4FT
    The most important thing we build is trust ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS DSP Benchmark Results of the GR740 Rad-Hard Quad-Core LEON4FT Cobham Gaisler June 16, 2016 Presenter: Javier Jalle ESA DSP DAY 2016 Overview GR740 high-level description • GR740 is a new general purpose processor component for space – Developed by Cobham Gaisler with partners on STMicroelectronics C65SPACE 65nm technology platform – Development of GR740 has been supported by ESA • Newest addition to the existing Cobham LEON product portfolio (GR712, UT699, UT700) – The GR740 will work with Cobham Gaisler ecosystem: • GRMON2 • OS/Compilers • etc ... 1 14 June 2016 Cobham plc Overview GR740 high-level description • Higher computing performance and performance/watt ratio than earlier generation products – Process improvements as well as architectural improvements. • Current work is under ESA contract “NGMP Phase 2: Development of Engineering Models and radiation testing” • Development boards and prototype parts are available for purchase Already available! contact: [email protected] 2 14 June 2016 Cobham plc Overview Block diagram • Architecture block diagram 3 14 June 2016 Cobham plc Overview Block diagram • Architecture block diagram (simplified) 4 14 June 2016 Cobham plc Features summary Core components • 4 x LEON4 fault tolerant CPU:s – 16 KiB L1 instruction cache – 16 KiB L1 data cache – Memory Management Unit (MMU) – IEEE-754 Floating Point Unit (FPU) – Integer multiply and divide unit. • 2 MiB Level-2 cache – Shared between the 4 LEON4 cores 5 14 June 2016 Cobham plc Features summary Floating point unit • Each Leon4FT core comprises a a high-performance FPU – As defined in the IEEE-754 and the SPARC V8 standard (IEEE- 1754).
    [Show full text]
  • Scalable Vector Media-Processors for Embedded Systems
    Scalable Vector Media-processors for Embedded Systems Christoforos Kozyrakis Report No. UCB/CSD-02-1183 May 2002 Computer Science Division (EECS) University of California Berkeley, California 94720 Scalable Vector Media-processors for Embedded Systems by Christoforos Kozyrakis Grad. (University of Crete, Greece) 1996 M.S. (University of California at Berkeley) 1999 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA at BERKELEY Committee in charge: Professor David A. Patterson, Chair Professor Katherine Yelick Professor John Chuang Spring 2002 Scalable Vector Media-processors for Embedded Systems Copyright Spring 2002 by Christoforos Kozyrakis Abstract Scalable Vector Media-processors for Embedded Systems by Christoforos Kozyrakis Doctor of Philosophy in Computer Science University of California at Berkeley Professor David A. Patterson, Chair Over the past twenty years, processor designers have concentrated on superscalar and VLIW architectures that exploit the instruction-level parallelism (ILP) available in engineering applications for workstation systems. Recently, however, the focus in computing has shifted from engineering to multimedia applications and from workstations to embedded systems. In this new computing environment, the performance, energy consumption, and development cost of ILP processors renders them ineffective despite their theoretical generality. This thesis focuses on the development of efficient architectures for embedded multimedia systems. We argue that it is possible to design processors that deliver high performance, have low energy consumption, and are simple to implement. The basis for the argument is the ability of vector architectures to exploit efficiently the data-level parallelism in multimedia applications.
    [Show full text]