UNICORE OPTIMIZATION William Jalby

Total Page:16

File Type:pdf, Size:1020Kb

UNICORE OPTIMIZATION William Jalby UNICORE OPTIMIZATION William Jalby LRC ITA@CA (CEA DAM/ University of Versailles St-Quentin-en-Yvelines) FRANCE 1 Outline The stage Key unicore performance limitations (excluding caches) Multimedia Extensions Compiler Optimizations 2 Abstraction Layers in Modern Systems Application Algorithm/Libraries CS Programming Language Original Compilers/Interpreters domain of Operating System/Virtual Machines Domain of the computer recent architect Instruction Set Architecture (ISA) computer architecture (‘50s-’80s) Microarchitecture (‘90s) Gates/Register-Transfer Level (RTL) Circuits EE Devices Physics Key issue Application Algorithm/Libraries Understand the We have to take into relationship/interaction account the between Architecture intermediate layers Microarchitecture and Applications/Algorithms Microarchitecture KEY TECHNOLOGY: Don’t forget also the lowest layers Performance Measurement and Analysis Performance Measurement and Analysis AN OVERLOOKED ISSUE HARDWARE VIEW: mechanism description and a few portions of codes where it works well (positive view) COMPILER VIEW: aggregate performance number (SPEC), little correlation with hardware Lack of guidelines for writing efficient programs Uniprocessor Performance From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006 - VAX: 25%/year 1978 to 1986 - RISC + x86: 52%/year 1986 to 2002 RISC + x86: ??%/year 2002 to present Trends Unicore Performance REF: Mikko Lipasti-University of [source: Intel] Wisconsin Modern Unicore Stage KEY PERFORMANCE INGREDIENTS ILP: Instruction Level Parallelism: Pipeline: interleave execution of several instructions: degree of (partial) parallelism: 10 - 20 Superscalar: parallel execution of several instructions: degree of parallelism: 3 to 6 Out of Order execution Specialized units: multimedia units: degree of parallelism 2 to 4 Compiler Optimizations Large set of Registers (really true only on Itanium) 8 ILP limitations In the previous slide, wherever I used the term degree of parallelism = k : it means that the compiler has to k independent instructions Key unicore performance limitations (excluding caches) The compiler and the hardware is in charge of finding such ILP An easy case: basic block: a sequence of instructions which will be executed in a continous flow (no branch, no labels). Basic Block sizes constitute a good indicator of easy ILP. A more complex case: in a loop, the hardware will try to overlap execution of several iterations In general, this is not a big issue BUT …. 9 Hard ILP limitation (1): data dependency DO I = 1, 100 S = S + A(I) ENDDO Iteration I +1 is strictly dependant upon Iteration I Very limited ILP In fact, the code computes a sum. Using associativity it is easy to rewrite it by computing for example 4 partial sums DO I = 1, 100, 4 S1 = S1 + A(I) S2 = S2 + A(I+1) S3 = S3 + A(I+2) S4 = S4 + A(I+3) ENDDO S = S1 + S2 + S3 + S4 The code above has much more ILP but unfortunately since FP additions are not associative, well accuracy may suffer The associativity trick works on simple cases: might to be complex to use in general Remark: MAX is an associative operator even in FP 10 Hard ILP limitation (2): branches CASE 1 : subroutine calls Easy to overcome: SOLUTION = inlining. At the call site, insert the suroutine Iteration I +1 is strictly dependant upon Iteration I ADVANTAGES OF INLINING: increase basic block size, allows specialization (see later) DRAWBACKS OF INLINING: increase severly code size CASE2: Conditional branches. DO I = 1, 100 IF (A(I) > 0) THEN B(I) = C(I) (statement S1(I)) ELSE F(I) = G(I) (statement S2(I)) ENDIF ENDDO Basically, you need to know the outcome of the test before proceeding with execution of S1(I) or S2(I) SOLUTION: the hardware will bet on the result on the test and decide to execute ahead one of the two outcomes of the branch (branch prediction). The hardware will use the past to predict the future. DRAWBACK: if the hardware has mispredicted, recovery might be costly On the code above, we might think that the hardware had 50/50% of mispredicting (which is true). In practice, branch predictors are very performant : they can reach up to 97% correct predictions (and even more) success because …. There are many loops REAL SOLUTION: avoid branches in your code 11 A subtle ILP limitation DO I = 1, 100 A(I) = B(I) ENDDO In FORTRAN, the use of different identifiers implies that the memory regions are distinct: A(1:100) and B(1:100) do not have any overlap in memory. Therefore we are sure that “logically” the iterations are independent In C, the situation is radically different: there is no guarantee that the regions A(1:100) and B(1:100) do not overlap. Therefore, the iterations might dependant upon each other. Furthermore the loads and the stores have to be executed in the strict order B(1), A(1), B(2), A(2) etc ….This implies strong limitations on the use of out of order mechanisms. 12 Trick 1 for overcoming ILP limitations Increase basic block size Use unrolling: DO I 1= 100 DO I = 1,100, 2 A(I) = B(I) is replaced by A(I) = B(I) ENDDO A(I+1) = B(I+1) ENDDO ADVANTAGES: increase basic block sizes and always applicable. Can be combined with jamming, reordering the statements with the increased loop body. DRAWBACKS Choose the right degree of unrolling not always obvious Might increase register usage Increase code size Does not always improve performance in particular on modern Out of Order machines. In fact on such machines, hardware is somewhat performing dynamically loop unrolling. Don’t forget tail code when iteration number is not a multiple of unrolling degree. The tail code is in charge of dealing with the “remainder” 13 Trick 2 : specialization The code generation/optimization strategy is completely dependent upon the number of iterations: if the number of iterations is systematically equal to 5, choose systematically full unrolling. If the number of iterations is large, then don’t use full unrolling Go against genericity (against sound principles of software engineering) STANDARD DAXPY CODE DO I = 1, N Y(INCY*I) = Y(INCY*I) + a * X(INCX*I) ENDDO Never use such a code Instead notice that very often, INCX = INCY = 1 !! And generate a specific version for this very common case. SIMPLIFY COMPILER’S LIFE: try to give him as much info as possible ADVANTAGE: almost always usable DRAWBACK: increase code size, too many specialized versions. 14 Multimedia instructions IDEA: provide special instructions to speedup multimedia code (don’t forget multimedia = mass market) Very often multimedia code contains loops operating on arrays (each element of the array being accessed in turn) and perform the same operation DO I = 1, 100 A(I) = B(I) + C(I) Regular (contiguous) array access ENDDO Mediaprocesing: Vectorizable? Vector Lengths? Kernel Vector length Matrix transpose/multiply # vertices at once DCT (video, communication) image width FFT (audio) 256-1024 Motion estimation (video) image width, iw/16 Gamma correction (video) image width Haar transform (media mining) image width Median filter (image processing) image width Separable convolution (img. proc.) image width (from Pradeep Dubey - IBM, http://www.research.ibm.com/people/p/pradeep/tutor.html) Principle of multimedia Instructions Instead of operating on a single operand at a time, operate on a group of operands and for each element of the group perform the same (or similar) operation: SIMD Single Instruction Multiple Data KEY INGREDIENT: extend the notion of scalar registers to multimedia registers: MMR4 MMR4(0) MMR4(1) MMR4(2) MMR4(3) addps MMR6, MMR4, MMR5 MMR4 MMR4(0) MMR4(1) MMR4(2) MMR4(3) + + + + MMR5 MMR5(0) MMR5(1) MMR5(2) MMR5(3) = = = = MMR6 MMR4(0) + MMR5(0) MMR4(1) + MMR5(1) MMR4(2) + MMR5(2) MMR4(3) + MMR5(3) Multimedia Extensions INTEL SSE 128b 64b 64b 32b 32b 32b 32b 16b 16b 16b 16b 16b 16b 16b 16b A unique size for Multimedia Register (%XMM): 128 bits which can be viewed as 2 x 64b, 4 x 32b or 8 x 16b NO VECTOR LENGTH: either a single element (S: Scalar) or all of the elements (P: Packed) Operation mode specified in the code op: addpd : perform addition on all (4) of the elements in Single Precision NO STRIDE: vectors must be stride 1 RESTRICTION ON ALIGNMENTS: when loading/writing, operating on block of operands which are lined up on a 128b boundary (i.e. the starting address of the block is a multiple of 128) is much faster BUT IT REQUIRES THE USE OF EXPLICIT INSTRUCTIONS (aligned versions). A nightmare to deal with all of the possible combinations if there is no information on address alignment INTEL SSE Intel started with MMX then SSE, SSE2, SS3 and SSE4 (and it keeps on going): the size of the instruction set manual has doubled from 700 pages up to 1400 pages Extremely Rich Instruction Set (back to CISC) BUT NO REAL SCATTER/GATHER BIG BENEFIT IN USING MULTIMEDIA INSTRUCTIONS: for SP (resp DP), 4x (resp 2x) speedup over scalar equivalent Needs to rely on the compiler (vectorizer) to get such instructions or use “assembly-like” instructions called “intrinsics” MULTIMEDIA INSTRUCTIONS IS A “DEGENERATE” (much more restricted) FORM OF THE OLD VECTOR INSTRUCTIONS (Cf CRAY) but it’s OK for multimedia applications (less for HPC applications ) A Vicious One addsubps MMR6, MMR4, MMR5 MMR4 MMR4(0) MMR4(1) MMR4(2) MMR4(3) + - + - MMR5 MMR5(0) MMR5(1) MMR5(2) MMR5(3) = = = = MMR6 MMR4(0) + MMR5(0) MMR4(1) - MMR5(1) MMR4(2) + MMR5(2) MMR4(3) - MMR5(3) It is a miracle if the compiler can succeed in using this one!! Multimedia Instructions Everybody has it now: besides Intel, IBM has Altivec, SUN has VIS Characteristics of VIS and Altivec very similar to SSE Simple architecture mechanism; easy to implement and not very power hungry The compiler A key actor in unicore performance: can in the way (i.e.
Recommended publications
  • Computer Architecture Techniques for Power-Efficiency
    MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 COMPUTER ARCHITECTURE TECHNIQUES FOR POWER-EFFICIENCY i MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 ii MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 iii Synthesis Lectures on Computer Architecture Editor Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Mutiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, James Laudon 2007 Transactional Memory James R. Larus, Ravi Rajwar 2007 Quantum Computing for Computer Architects Tzvetan S. Metodi, Frederic T. Chong 2006 MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 Copyright © 2008 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi www.morganclaypool.com ISBN: 9781598292084 paper ISBN: 9781598292091 ebook DOI: 10.2200/S00119ED1V01Y200805CAC004 A Publication in the Morgan & Claypool Publishers
    [Show full text]
  • Installation Guide for UNICORE Server Components
    Installation Guide for UNICORE Server Components Installation Guide for UNICORE Server Components UNICORE Team July 2015, UNICORE version 7.3.0 Installation Guide for UNICORE Server Components Contents 1 Introduction1 1.1 Purpose and Target Audience of this Document.................1 1.2 Overview of the UNICORE Servers and some Terminology...........1 1.3 Overview of this Document............................2 2 Installation of Core Services for a Single Site3 2.1 Basic Scenarios..................................3 2.2 Preparation....................................5 2.3 Installation....................................6 2.4 Security Settings................................. 15 2.5 Installation of the Perl TSI and TSI-related Configuration of the UNICORE/X server....................................... 18 2.6 The Connections Between the UNICORE Components............. 20 3 Operation of a UNICORE Installation 22 3.1 Starting...................................... 22 3.2 Stopping...................................... 22 3.3 Monitoring.................................... 22 3.4 User Management................................. 22 3.5 Testing your Installation............................. 23 4 Integration of Another Target System 24 4.1 Configuration of the UNICORE/X Service.................... 24 4.2 Configuration of Target System Interface..................... 25 4.3 Addition of Users to the XUUDB........................ 26 4.4 Additions to the Gateway............................. 26 5 Multi-Site Installation Options 26 5.1 Multiple Registries...............................
    [Show full text]
  • Intel's High-Performance Computing Technologies
    Intel’s High-Performance Computing Technologies 11th ECMWF Workshop Use of HIgh Performance Computing in Meteorology Reading, UK 26-Oct-2004 Dr. Herbert Cornelius Advanced Computing Center Intel EMEA Advanced Computing on Intel® Architecture Intel HPC Technologies October 2004 HPC continues to change … *Other brands and names are the property of their respective owners •2• Advanced Computing on Intel® Architecture Intel HPC Technologies October 2004 Some HPC History 1960s 1970s 1980s 1990s 2000s HPC Systems 1970s 1980s 1990s 2000s Processor proprietary proprietary COTS COTS Memory proprietary proprietary COTS COTS Motherboard proprietary proprietary proprietary COTS Interconnect proprietary proprietary proprietary COTS OS, SW Tools proprietary proprietary proprietary mixed COTS: Commercial off the Shelf (industry standard) *Other brands and names are the property of their respective owners •3• Advanced Computing on Intel® Architecture Intel HPC Technologies October 2004 High-Performance Computing with IA Source: http://www.top500.org/lists/2004/06/2/ Source: http://www.top500.org/lists/2004/06/5/ 4096 (1024x4) Intel® Itanium® 2 processor based system 2500 (1250x2) Intel® Xeon™ processor based system 22.9 TFLOPS peak performance 15.3 TFLOPS peak performance PNNL RIKEN 9 1936 Intel® Itanium® 2 processor cluster 7 2048 Intel® Xeon™ processor cluster 11.6 / 8.6 TFLOPS Rpeak/Rmax 12.5 / 8.7 TFLOPS Rpeak/Rmax *Other brands and names are the property of their respective owners •4• Advanced Computing on Intel® Architecture Intel HPC Technologies
    [Show full text]
  • Implementing Production Grids William E
    Implementing Production Grids William E. Johnston a, The NASA IPG Engineering Team b, and The DOE Science Grid Team c Contents 1 Introduction: Lessons Learned for Building Large-Scale Grids ...................................... 3 5 2 The Grid Context .................................................................................................................. 3 The Anticipated Grid Usage Model Will Determine What Gets Deployed, and When. 7 3.1 Grid Computing Models ............................................................................................................ 7 3 1.1 Export Existing Services .......................................................................................................... 7 3 1.2 Loosely Coupled Processes ..................................................................................................... 7 3.1 3 WorlqTow Managed Processes .............................................................................. 8 3.1 4 Distributed-Pipelined / Coupled processes ............................................................................. 9 3.1 5 Tightly Coupled Processes ........................................................................ 9 3.2 Grid Data Models ..................................................................................................................... 10 3.2.1 Occasional Access to Multiple Tertiary Storage Systems ..................................................... 11 3.2.2 Distributed Analysis of Massive Datasets Followed by Cataloguing and Archiving ...........
    [Show full text]
  • UNICORE D2.3 Platform Requirements
    H2020-ICT-2018-2-825377 UNICORE UNICORE: A Common Code Base and Toolkit for Deployment of Applications to Secure and Reliable Virtual Execution Environments Horizon 2020 - Research and Innovation Framework Programme D2.3 Platform Requirements - Final Due date of deliverable: 30 September 2019 Actual submission date: 30 September 2019 Start date of project 1 January 2019 Duration 36 months Lead contractor for this deliverable Accelleran NV Version 1.0 Confidentiality status “Public” c UNICORE Consortium 2020 Page 1 of (62) Abstract This is the final version of the UNICORE “Platform Requirements - Final” (D2.3) document. The original version (D2.1 Requirements) was published in April 2019. The differences between the two versions of this document are detailed in the Executive Summary. The goal of the EU-funded UNICORE project is to develop a common code-base and toolchain that will enable software developers to rapidly create secure, portable, scalable, high-performance solutions starting from existing applications. The key to this is to compile an application into very light-weight virtual machines - known as unikernels - where there is no traditional operating system, only the specific bits of operating system functionality that the application needs. The resulting unikernels can then be deployed and run on standard high-volume servers or cloud computing infrastructure. The technology developed by the project will be evaluated in a number of trials, spanning several applica- tion domains. This document describes the current state of the art in those application domains from the perspective of the project partners whose businesses encompass those domains. It then goes on to describe the specific target scenarios that will be used to evaluate the technology within each application domain, and how the success of each trial will be judged.
    [Show full text]
  • DC DMV Communication Related to Reinstating Suspended Driver Licenses and Driving Privileges (As of December 10, 2018)
    DC DMV Communication Related to Reinstating Suspended Driver Licenses and Driving Privileges (As of December 10, 2018) In accordance with District Law L22-0175, Traffic and Parking Ticket Penalty Amendment Act of 2017, the DC Department of Motor Vehicles (DC DMV) has reinstated driver licenses and driving privileges for residents and non-residents whose credential was suspended for one of the following reasons: • Failure to pay a moving violation; • Failure to pay a moving violation after being found liable at a hearing; or • Failure to appear for a hearing on a moving violation. DC DMV is mailing notification letters to residents and non-residents affected by the law. District residents who have their driver license or learner permit, including commercial driver license (CDL), reinstated and have outstanding tickets are boot eligible if they have two or more outstanding tickets. If a District resident has an unpaid moving violation in a different jurisdiction, then his or her driving privileges may still be suspended in that jurisdiction until the moving violation is paid. If the resident’s driver license or CDL is not REAL ID compliant (i.e., there is a black star in the upper right-hand corner) and expired, then to renew the credential, the resident will need to provide DC DMV with: • One proof of identity; • One proof of Social Security Number; and • Two proofs of DC residency. If the resident has a name change, then additional documentation, such as a marriage license, divorce order, or name change court order is required. DC DMV only accepts the documents listed on its website at www.dmv.dc.gov.
    [Show full text]
  • Power and Energy Characterization of an Open Source 25-Core Manycore Processor
    Power and Energy Characterization of an Open Source 25-core Manycore Processor Michael McKeown, Alexey Lavrov, Mohammad Shahrad, Paul J. Jackson, Yaosheng Fu∗, Jonathan Balkind, Tri M. Nguyen, Katie Lim, Yanqi Zhouy, David Wentzlaff Princeton University fmmckeown,alavrov,mshahrad,pjj,yfu,jbalkind,trin,kml4,yanqiz,[email protected] ∗ Now at NVIDIA y Now at Baidu Abstract—The end of Dennard’s scaling and the looming power wall have made power and energy primary design CB Chip Bridge (CB) PLL goals for modern processors. Further, new applications such Tile 0 Tile 1 Tile 2 Tile 3 Tile 4 as cloud computing and Internet of Things (IoT) continue to necessitate increased performance and energy efficiency. Manycore processors show potential in addressing some of Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 these issues. However, there is little detailed power and energy data on manycore processors. In this work, we carefully study Tile 10 Tile 11 Tile 12 Tile 13 Tile 14 detailed power and energy characteristics of Piton, a 25-core modern open source academic processor, including voltage Tile 15 Tile 16 Tile 17 Tile 18 Tile 19 versus frequency scaling, energy per instruction (EPI), memory system energy, network-on-chip (NoC) energy, thermal charac- Tile 20 Tile 21 Tile 22 Tile 23 Tile 24 teristics, and application performance and power consumption. This is the first detailed power and energy characterization of (a) (b) an open source manycore design implemented in silicon. The Figure 1. Piton die, wirebonds, and package without epoxy encapsulation open source nature of the processor provides increased value, (a) and annotated CAD tool layout screenshot (b).
    [Show full text]
  • MISP Objects
    MISP Objects MISP Objects Introduction. 7 Funding and Support . 9 MISP objects. 10 ail-leak . 10 ais-info . 11 android-app. 12 android-permission. 13 annotation . 15 anonymisation . 16 asn . 20 attack-pattern . 22 authentication-failure-report . 22 authenticode-signerinfo . 23 av-signature. 24 bank-account. 25 bgp-hijack. 29 bgp-ranking . 30 blog . 30 boleto . 32 btc-transaction . 33 btc-wallet . 34 cap-alert . 35 cap-info. 39 cap-resource . 43 coin-address . 44 command . 46 command-line. 46 cookie . 47 cortex . 48 cortex-taxonomy . 49 course-of-action . 49 covid19-csse-daily-report . 51 covid19-dxy-live-city . 53 covid19-dxy-live-province . 54 cowrie . 55 cpe-asset . 57 1 credential . 67 credit-card . 69 crypto-material. 70 cytomic-orion-file. 73 cytomic-orion-machine . 74 dark-pattern-item. 74 ddos . 75 device . 76 diameter-attack . 77 dkim . 79 dns-record . ..
    [Show full text]
  • Horus: Large-Scale Symmetric Multiprocessing for Opteron Systems
    HORUS: LARGE-SCALE SYMMETRIC MULTIPROCESSING FOR OPTERON SYSTEMS HORUS LETS SERVER VENDORS DESIGN UP TO 32-WAY OPTERON SYSTEMS. BY IMPLEMENTING A LOCAL DIRECTORY STRUCTURE TO FILTER UNNECESSARY PROBES AND BY OFFERING 64 MBYTES OF REMOTE DATA CACHE, THE CHIP SIGNIFICANTLY REDUCES OVERALL SYSTEM TRAFFIC AS WELL AS THE LATENCY FOR A COHERENT HYPERTRANSPORT TRANSACTION. Apart from offering x86 servers a to remote quads. Key to Horus’s performance migration path to 64-bit technology, the is the chip’s ability to cache remote data in its Opteron processor from AMD enables glue- remote data cache (RDC) and the addition of less eight-way symmetric multiprocessing Directory, a cache-coherent directory that (SMP). The performance scaling of impor- eliminates the unnecessary snooping of tant commercial applications is challenging remote Opteron caches. above four-way SMP, however, because of the For enterprise systems, Horus incorporates less-than-full interconnection. Interconnect features such as partitioning; reliability, avail- wiring and packaging is severely taxed with an ability, and serviceability; and communica- eight-way SMP system. tion with the Newisys service processor as part Scaling above an eight-way SMP system of monitoring the system’s health. Rajesh Kota requires fixing both these problems. The In performance simulation tests of Horus Horus application-specific IC, to be released for online transaction processing (OLTP), Rich Oehler in third quarter 2005, offers a solution by transaction latency improved considerably. expanding Opteron’s SMP capability from The average memory access latency of a trans- Newisys Inc. eight-way to 32-way, or from 8 to 32 sockets, action in a four-quad system (16 nodes) with or nodes.1 As the “Work on Symmetric Mul- Horus running an OLTP application was less tiprocessing Systems” sidebar shows, many than three times the average memory access SMP implementations exist, but Horus is the latency in an Opteron-only system with four only chip that targets the Opteron in an SMP Opterons.
    [Show full text]
  • Embedded Multicore: an Introduction
    Embedded Multicore: An Introduction EMBMCRM Rev. 0 07/2009 How to Reach Us: Home Page: www.freescale.com Web Support: http://www.freescale.com/support Information in this document is provided solely to enable system and software USA/Europe or Locations Not Listed: implementers to use Freescale Semiconductor products. There are no express or Freescale Semiconductor, Inc. implied copyright licenses granted hereunder to design or fabricate any integrated Technical Information Center, EL516 circuits or integrated circuits based on the information in this document. 2100 East Elliot Road Tempe, Arizona 85284 Freescale Semiconductor reserves the right to make changes without further notice to +1-800-521-6274 or any products herein. Freescale Semiconductor makes no warranty, representation or +1-480-768-2130 www.freescale.com/support guarantee regarding the suitability of its products for any particular purpose, nor does Freescale Semiconductor assume any liability arising out of the application or use of Europe, Middle East, and Africa: Freescale Halbleiter Deutschland GmbH any product or circuit, and specifically disclaims any and all liability, including without Technical Information Center limitation consequential or incidental damages. “Typical” parameters which may be Schatzbogen 7 provided in Freescale Semiconductor data sheets and/or specifications can and do 81829 Muenchen, Germany vary in different applications and actual performance may vary over time. All operating +44 1296 380 456 (English) +46 8 52200080 (English) parameters, including “Typicals” must be validated for each customer application by +49 89 92103 559 (German) customer’s technical experts. Freescale Semiconductor does not convey any license +33 1 69 35 48 48 (French) under its patent rights nor the rights of others.
    [Show full text]
  • Certification of Avionics Applications on Multi-Core Processors: Opportunities and Challenges
    ™ AN INTELL COMPANY Certification of Avionics Applications on Multi-core Processors: Opportunities and Challenges WHEN IT MATTERS, IT RUNS ON WIND RIVER CERTIFICATION OF AVIONICS APPLICATIONS ON MULTI-CORE PROCESSORS: OPPORTUNITIES AND CHALLENGES EXECUTIVE SUMMARY Developers of avionics systems are increasingly interested in employing multi-core pro- cessors (MCPs). MCPs are especially suited to the lower size, weight, and power (SWaP) consumption requirements of avionics systems. However, MCPs pose many more system implementation and certification challenges than do typical single-core or multiple dis- crete processor solutions. This paper is intended to provide guidance on the certification challenges of multi-core solutions, as well as an update on the work at Wind River® to develop commercial off-the-shelf (COTS) RTCA DO-178C DAL A certification evidence packages for VxWorks® 653 Multi-core Edition platform. TABLE OF CONTENTS Executive Summary . 2 The Challenge of Multi-core Certification . 3 Business Challenges . 3 Technical Challenges . 3 Certification of an ARINC 653 RTOS on Multi-core Processor Architecture . 5 Wind River VxWorks 653 RTOS Multi-core Requirements . 5 DO-178C DAL A Certification Strategy for VxWorks 653 on QorIQ . 6 Future Challenges . 6 Conclusion . 6 ™ 2 | White Paper AN INTEL COMPANY CERTIFICATION OF AVIONICS APPLICATIONS ON MULTI-CORE PROCESSORS: OPPORTUNITIES AND CHALLENGES THE CHALLENGE OF MULTI-CORE CERTIFICATION hardware costs and the impact of hardware obsolescence, thus Multi-core processors have delivered significant performance providing long-term benefits for a program . gains for general purpose enterprise applications over the last In addition, the use of a COTS DO-178C certification approach decade . However, their use in safety-critical avionics systems and COTS certification packages for an ARINC 653–compliant poses some unique challenges that have slowed adoption and RTOS can also drastically reduce a program’s DO-178C certifi- deployment in this market .
    [Show full text]
  • Hitachi Technology 2005-2006
    History Highlights 1910 • Founded by Namihei Odaira as 1975 • Hitachi High Crown Control Mill developed an electrical repair shop • Succeeded in first domestic 1978 • Completed world’s first field emission electron manufacture of three 5-HP microscope with record-high resolution (3.6775-kW) electric motors • Experimental color camera with solid-state miniature as the company’s first products image device developed 1910 1982 • Hitachi Europe Ltd. established 1915 • Completed 10,000-HP (7,355-kW) water turbine • Succeeded in world’s first micro-level observation of magnetic field by the use of electron beam holography 1924 • Completed the first large-scale • Listed on New York Stock Exchange DC electric locomotives to be manufactured in Japan 1984 • Started mass production of 256-kbit DRAMs 1924 1931 • Completed 10,000-A hydraulic electrolytic cell 1985 • Completed the “JT-60” large-scale Tokamak device 1932 • Completed Hitachi’s first electric refrigerator for break-even plasma experiments • The Hitachi Foundation was established to promote 1943 • Completed 85,000-kW Francis water turbine and cultural, educational and scientific exchanges 70,000-kVA alternating current generator between Japan and the U.S. 1952 • Completed 21,000-kW two-stage pump-turbine 1988 • Hitachi Asia Pte. Ltd. established 1954 • Completed the first large-scale cold strip mill to be produced in Japan 1989 • Developed world’s fastest superconductive 1955 • Completed 100,000-kW Francis water turbine computer and 93,000-kVA alternating current generator • Developed superconductive MR imaging equipment • Established two R&D centers in the U.S. and two 1958 • Electron microscope awarded 1958 laboratories in Europe the grand prix at the World Exposition in Brussels 1990 • Released very large-scale computer with the world’s 1959 • Completed electronic computers fastest processing speed at that time based on transistors • Hitachi America, Ltd.
    [Show full text]