Computer Organization and Components

Total Page:16

File Type:pdf, Size:1020Kb

Computer Organization and Components 2 Computer Organization and Course Structure Components Module 1: C and Assembly Module 4: Processor Design Programming IS1500, fall 2015 LE1 LE2 LE3 EX1 LAB1 LE9 LE10 S2 LAB5 Lecture 12: Parallelism, Concurrency, Speedup, and ILP LE4 S1 LAB2 David Broman Module 2: I/O Systems Module 5: Memory Hierarchy Associate Professor, KTH Royal Institute of Technology Assistant Research Engineer, University of California, Berkeley LE5 LE6 EX2 LAB3 LE11 EX4 LAB6 Module 3: Logic Design Module 6: Parallel Processors and Programs LE7 LE8 EX3 LAB4 LE12 LE13 EX5 S3 Proj. Expo LE14 Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level Slides version 1.0 [email protected] Concurrency, and Speedup Parallelism 3 4 Abstractions in Computer Systems Agenda Computer System Networked Systems and Systems of Systems Application Software Software Operating System Part I Part II Multiprocessors, Parallelism, Instruction Level Parallelism Instruction Set Architecture Hardware/Software Interface Concurrency, and Speedup Microarchitecture Logic and Building Blocks Digital Hardware Design Digital Circuits Analog Circuits Analog Design and Physics Devices and Physics Part I Part II Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level David Broman Multiprocessors, Parallelism, Instruction Level [email protected] Concurrency, and Speedup Parallelism [email protected] Concurrency, and Speedup Parallelism 5 6 How is this computer revolution possible? (Revisited) Part I Moore’s law: • Integrated circuit resources (transistors) Multiprocessors, Parallelism, double every 18-24 months. Concurrency, and Speedup • By Gordon E. Moore, Intel’s co-founder, 1960s. • Possible because refined manufacturing process. E.g., 4th generation Intel Core i7 processors uses 22nm manufacturing. • Sometimes considered a self-fulfilling prophecy. Served as a goal for the semiconductor industry. Acknowledgement: The structure and several of the good examples are derived from the book “Computer Organization and Design” (2014) by David A. Patterson and John L. Hennessy Part I Part II Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level David Broman Multiprocessors, Parallelism, Instruction Level [email protected] Concurrency, and Speedup Parallelism [email protected] Concurrency, and Speedup Parallelism 7 8 Have we reached the limit? (Revisited) What is a multiprocessor? A multiprocessor is a computer By contrast, a computer with one During the last decade, the clock rate has processor is called a uniprocessor. Why? increased dramatically. system with two or more processors. • 1989: 80486, 25MHz The Power Wall • 1993: Pentium, 66Mhz • 1997: Pentium Pro, 200MHz Multicore microprocessors are • 2001: Pentium 4, 2.0 GHz multiprocessors where all processors (cores) • 2004: Pentium 4, 3.6 GHz are located on a single integrated circuit. by Eric Gaba, CC BY-SA 3.0. No modifications made. 2013: Core i7, 3.1 GHz - 4 GHz http://www.publicdomainpictures.net/view-image.php? image=1281&picture=tegelvagg A cluster is a set of computers that are Increased clock rate “New” trend since 2006: Multicore connected over a local area network (LAN). implies increased power • Moore’s law still holds May be viewed as one large multiprocessor. • More processors on a chip: multicore We cannot cool the system enough to • “New” challenge: parallel programming Photo by Robert Harker increase the clock rate anymore… Part I Part II Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level David Broman Multiprocessors, Parallelism, Instruction Level [email protected] Concurrency, and Speedup Parallelism [email protected] Concurrency, and Speedup Parallelism 9 10 Different Kinds of Computer Systems (Revisited) Why multiprocessors? Possible to execute many Performance computation tasks in parallel. Replace energy inefficient Multiprocessor Energy processors in data centers with many efficient smaller Photo by Kyro Photo by Robert Harker processors. Embedded Personal Computers and Warehouse Real-Time Systems Personal Mobile Devices Scale Computers If one out of N processors fails, still Dependability N-1 processors are functioning. Dependability Energy Performance Part I Part II Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level David Broman Multiprocessors, Parallelism, Instruction Level [email protected] Concurrency, and Speedup Parallelism [email protected] Concurrency, and Speedup Parallelism 11 12 Parallelism and Concurrency – what is the difference? Speedup Execution time of Concurrency is about handling many things at the same time. one program before How much can we improve the Concurrency may be viewed from the software viewpoint. Tbefore improvement performance using parallelization? Speedup = Tafter Execution time after Software improvement Parallelism is about Sequential Superlinear speedup. Either wrong, doing (executing) Concurrent Speedup or due to e.g. cache effects. many things at the Example: matrix Example: A Linux same time. Parallelism 4 multiplication on a OS running on a Linear speedup may be viewed from Serial unicore processor. unicore processor. 3 (or ideal speedup) the hardware Still increased speedup, but viewpoint. 2 less efficient Note: As always, everybody does Hardware Example: matrix Example: A Linux OS 1 not agree on the definitions of multiplication on a running on a multicore Danger: Relative speedup concurrency and parallelism. The multicore processor. processor . Parallel measures only the same program matrix is from H&P 2014 and the informal definitions above are 1 2 3 4 Number of True speedup compares also with similar to what was said in a talk processors the best known sequential program, by Rob Pike. Part I Part II Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level David Broman Multiprocessors, Parallelism, Instruction Level [email protected] Concurrency, and Speedup Parallelism [email protected] Concurrency, and Speedup Parallelism 13 14 E Amdahl’s Law (1/4) Amdahl’s Law (2/4) Can we achieve linear speedup? Time affected by the Time unaffected of Tbefore Tbefore Divide execution time before improvement of improvement Speedup = = parallelization (sequential part) T T improvement into two parts. after affected + Tunaffected N T = Taffected + Tunaffected Exercise: Assume a program consists Solution: Execution time after of an image analysis task, sequentially 4 = 80 / (60 / N + 80 – 60) improvement T followed by a statistical computation affected T task. Only the image analysis task can Tafter = + unaffected Amount of improvement 60/N + 20 = 20 N (N times improvement) be parallelized. How much do we need to improve the image analysis task to 60/N = 0 T T be able to achieve 4 times speedup? before before Speedup = = This is sometimes referred It is impossible to achieve this T Assume that the program takes 80ms Tafter affected to as Amdahl’s law speedup! + Tunaffected in total and that the image analysis task N takes 60ms out of this time. Part I Part II Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level David Broman Multiprocessors, Parallelism, Instruction Level [email protected] Concurrency, and Speedup Parallelism [email protected] Concurrency, and Speedup Parallelism 15 16 E E Amdahl’s Law (3/4) Amdahl’s Law (4/4) Example continued. What if we change the size of the Tbefore Tbefore Speedup = = problem (make the matrices larger)? T T after affected + Tunaffected N Number of processors Assume that we perform 10 scalar 10 40 But was not the maximal Solution A: integer additions, followed by one matrix speedup 11 when N à infinity? addition, where matrices are 10x10. (10+10*10) / (10*10/10 + 10) = 5.5 Speedup Speedup 5.5 8.8 Assume additions take the same amount 10x10 Strong scaling = measuring of time and that we can only parallelize Solution B: speedup while keeping the the matrix addition. (10+10*10) / (10*10/40 + 10) = 8.8 problem size fixed. Exercise A: What is the speedup with 10 processors? Solution C: Weak scaling = measuring Speedup Speedup (10+10*10) / ((10*10)/N + 10) = 11 when Size of matrices Exercise B: What is the speedup with 8.2 20.5 speedup when the problem 40 processors? N à infinity 20x20 size grows proportionally to Exercise C: What is the maximal the increased number of speedup (the limit when N à infinity) processors. Part I Part II Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level David Broman Multiprocessors, Parallelism, Instruction Level [email protected] Concurrency, and Speedup Parallelism [email protected] Concurrency, and Speedup Parallelism 17 18 E Main Classes of Parallelisms SISD, SIMD, and MIMD An old (from the 1960s) but still very useful classification of Example – Sheep shearing processors uses the notion of instruction and data streams. Assume that sheep are data Data-Level Parallelism (DLP) items and the task for the farmer Data-level parallelism. Examples DLP Many data items can be is to do sheep shearing (remove Data Stream are multimedia extensions (e.g., processed at the same time. the wool). Data-level parallelism SSE, streaming SIMD would be the same as using Single Multiple extension), vector processors. several farm hands to do the SISD SIMD Graphical Unit Processors shearing. E.g. Intel (GPUs) are both SIMD and Single E.g. SSE Pentium 4 MIMD Example – Many tasks at the farm Instruction in x86 Task-Level Parallelism (TLP) Assume that there are many different Task-level parallelism. Examples are multicore and TLP Different tasks of work that can things that can be done on the farm MISD MIMD cluster computers work in independently and in (fix the barn, sheep shearing, feed the parallel pigs etc.) Task-level parallelism would No examples today E.g. Intel Physical Q/A be to let the farm hands do the Instruction Stream Core i7 Multiple What is a modern Intel CPU, different tasks in parallel. such as Core i7? Stand for MIMD, on the table for SIMD Part I Part II Part I Part II David Broman Multiprocessors, Parallelism, Instruction Level David Broman Multiprocessors, Parallelism, Instruction Level [email protected] Concurrency, and Speedup Parallelism [email protected] Concurrency, and Speedup Parallelism 19 20 What is Instruction Level Parallelism? Instruction Level Parallelism (ILP) may increase Part II performance without involvement of the programmer.
Recommended publications
  • UNICORE OPTIMIZATION William Jalby
    UNICORE OPTIMIZATION William Jalby LRC ITA@CA (CEA DAM/ University of Versailles St-Quentin-en-Yvelines) FRANCE 1 Outline The stage Key unicore performance limitations (excluding caches) Multimedia Extensions Compiler Optimizations 2 Abstraction Layers in Modern Systems Application Algorithm/Libraries CS Programming Language Original Compilers/Interpreters domain of Operating System/Virtual Machines Domain of the computer recent architect Instruction Set Architecture (ISA) computer architecture (‘50s-’80s) Microarchitecture (‘90s) Gates/Register-Transfer Level (RTL) Circuits EE Devices Physics Key issue Application Algorithm/Libraries Understand the We have to take into relationship/interaction account the between Architecture intermediate layers Microarchitecture and Applications/Algorithms Microarchitecture KEY TECHNOLOGY: Don’t forget also the lowest layers Performance Measurement and Analysis Performance Measurement and Analysis AN OVERLOOKED ISSUE HARDWARE VIEW: mechanism description and a few portions of codes where it works well (positive view) COMPILER VIEW: aggregate performance number (SPEC), little correlation with hardware Lack of guidelines for writing efficient programs Uniprocessor Performance From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006 - VAX: 25%/year 1978 to 1986 - RISC + x86: 52%/year 1986 to 2002 RISC + x86: ??%/year 2002 to present Trends Unicore Performance REF: Mikko Lipasti-University of [source: Intel] Wisconsin Modern Unicore Stage KEY PERFORMANCE
    [Show full text]
  • Computer Architecture Techniques for Power-Efficiency
    MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 COMPUTER ARCHITECTURE TECHNIQUES FOR POWER-EFFICIENCY i MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 ii MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 iii Synthesis Lectures on Computer Architecture Editor Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Mutiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, James Laudon 2007 Transactional Memory James R. Larus, Ravi Rajwar 2007 Quantum Computing for Computer Architects Tzvetan S. Metodi, Frederic T. Chong 2006 MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 Copyright © 2008 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi www.morganclaypool.com ISBN: 9781598292084 paper ISBN: 9781598292091 ebook DOI: 10.2200/S00119ED1V01Y200805CAC004 A Publication in the Morgan & Claypool Publishers
    [Show full text]
  • Installation Guide for UNICORE Server Components
    Installation Guide for UNICORE Server Components Installation Guide for UNICORE Server Components UNICORE Team July 2015, UNICORE version 7.3.0 Installation Guide for UNICORE Server Components Contents 1 Introduction1 1.1 Purpose and Target Audience of this Document.................1 1.2 Overview of the UNICORE Servers and some Terminology...........1 1.3 Overview of this Document............................2 2 Installation of Core Services for a Single Site3 2.1 Basic Scenarios..................................3 2.2 Preparation....................................5 2.3 Installation....................................6 2.4 Security Settings................................. 15 2.5 Installation of the Perl TSI and TSI-related Configuration of the UNICORE/X server....................................... 18 2.6 The Connections Between the UNICORE Components............. 20 3 Operation of a UNICORE Installation 22 3.1 Starting...................................... 22 3.2 Stopping...................................... 22 3.3 Monitoring.................................... 22 3.4 User Management................................. 22 3.5 Testing your Installation............................. 23 4 Integration of Another Target System 24 4.1 Configuration of the UNICORE/X Service.................... 24 4.2 Configuration of Target System Interface..................... 25 4.3 Addition of Users to the XUUDB........................ 26 4.4 Additions to the Gateway............................. 26 5 Multi-Site Installation Options 26 5.1 Multiple Registries...............................
    [Show full text]
  • Intel's High-Performance Computing Technologies
    Intel’s High-Performance Computing Technologies 11th ECMWF Workshop Use of HIgh Performance Computing in Meteorology Reading, UK 26-Oct-2004 Dr. Herbert Cornelius Advanced Computing Center Intel EMEA Advanced Computing on Intel® Architecture Intel HPC Technologies October 2004 HPC continues to change … *Other brands and names are the property of their respective owners •2• Advanced Computing on Intel® Architecture Intel HPC Technologies October 2004 Some HPC History 1960s 1970s 1980s 1990s 2000s HPC Systems 1970s 1980s 1990s 2000s Processor proprietary proprietary COTS COTS Memory proprietary proprietary COTS COTS Motherboard proprietary proprietary proprietary COTS Interconnect proprietary proprietary proprietary COTS OS, SW Tools proprietary proprietary proprietary mixed COTS: Commercial off the Shelf (industry standard) *Other brands and names are the property of their respective owners •3• Advanced Computing on Intel® Architecture Intel HPC Technologies October 2004 High-Performance Computing with IA Source: http://www.top500.org/lists/2004/06/2/ Source: http://www.top500.org/lists/2004/06/5/ 4096 (1024x4) Intel® Itanium® 2 processor based system 2500 (1250x2) Intel® Xeon™ processor based system 22.9 TFLOPS peak performance 15.3 TFLOPS peak performance PNNL RIKEN 9 1936 Intel® Itanium® 2 processor cluster 7 2048 Intel® Xeon™ processor cluster 11.6 / 8.6 TFLOPS Rpeak/Rmax 12.5 / 8.7 TFLOPS Rpeak/Rmax *Other brands and names are the property of their respective owners •4• Advanced Computing on Intel® Architecture Intel HPC Technologies
    [Show full text]
  • Implementing Production Grids William E
    Implementing Production Grids William E. Johnston a, The NASA IPG Engineering Team b, and The DOE Science Grid Team c Contents 1 Introduction: Lessons Learned for Building Large-Scale Grids ...................................... 3 5 2 The Grid Context .................................................................................................................. 3 The Anticipated Grid Usage Model Will Determine What Gets Deployed, and When. 7 3.1 Grid Computing Models ............................................................................................................ 7 3 1.1 Export Existing Services .......................................................................................................... 7 3 1.2 Loosely Coupled Processes ..................................................................................................... 7 3.1 3 WorlqTow Managed Processes .............................................................................. 8 3.1 4 Distributed-Pipelined / Coupled processes ............................................................................. 9 3.1 5 Tightly Coupled Processes ........................................................................ 9 3.2 Grid Data Models ..................................................................................................................... 10 3.2.1 Occasional Access to Multiple Tertiary Storage Systems ..................................................... 11 3.2.2 Distributed Analysis of Massive Datasets Followed by Cataloguing and Archiving ...........
    [Show full text]
  • UNICORE D2.3 Platform Requirements
    H2020-ICT-2018-2-825377 UNICORE UNICORE: A Common Code Base and Toolkit for Deployment of Applications to Secure and Reliable Virtual Execution Environments Horizon 2020 - Research and Innovation Framework Programme D2.3 Platform Requirements - Final Due date of deliverable: 30 September 2019 Actual submission date: 30 September 2019 Start date of project 1 January 2019 Duration 36 months Lead contractor for this deliverable Accelleran NV Version 1.0 Confidentiality status “Public” c UNICORE Consortium 2020 Page 1 of (62) Abstract This is the final version of the UNICORE “Platform Requirements - Final” (D2.3) document. The original version (D2.1 Requirements) was published in April 2019. The differences between the two versions of this document are detailed in the Executive Summary. The goal of the EU-funded UNICORE project is to develop a common code-base and toolchain that will enable software developers to rapidly create secure, portable, scalable, high-performance solutions starting from existing applications. The key to this is to compile an application into very light-weight virtual machines - known as unikernels - where there is no traditional operating system, only the specific bits of operating system functionality that the application needs. The resulting unikernels can then be deployed and run on standard high-volume servers or cloud computing infrastructure. The technology developed by the project will be evaluated in a number of trials, spanning several applica- tion domains. This document describes the current state of the art in those application domains from the perspective of the project partners whose businesses encompass those domains. It then goes on to describe the specific target scenarios that will be used to evaluate the technology within each application domain, and how the success of each trial will be judged.
    [Show full text]
  • DC DMV Communication Related to Reinstating Suspended Driver Licenses and Driving Privileges (As of December 10, 2018)
    DC DMV Communication Related to Reinstating Suspended Driver Licenses and Driving Privileges (As of December 10, 2018) In accordance with District Law L22-0175, Traffic and Parking Ticket Penalty Amendment Act of 2017, the DC Department of Motor Vehicles (DC DMV) has reinstated driver licenses and driving privileges for residents and non-residents whose credential was suspended for one of the following reasons: • Failure to pay a moving violation; • Failure to pay a moving violation after being found liable at a hearing; or • Failure to appear for a hearing on a moving violation. DC DMV is mailing notification letters to residents and non-residents affected by the law. District residents who have their driver license or learner permit, including commercial driver license (CDL), reinstated and have outstanding tickets are boot eligible if they have two or more outstanding tickets. If a District resident has an unpaid moving violation in a different jurisdiction, then his or her driving privileges may still be suspended in that jurisdiction until the moving violation is paid. If the resident’s driver license or CDL is not REAL ID compliant (i.e., there is a black star in the upper right-hand corner) and expired, then to renew the credential, the resident will need to provide DC DMV with: • One proof of identity; • One proof of Social Security Number; and • Two proofs of DC residency. If the resident has a name change, then additional documentation, such as a marriage license, divorce order, or name change court order is required. DC DMV only accepts the documents listed on its website at www.dmv.dc.gov.
    [Show full text]
  • Power and Energy Characterization of an Open Source 25-Core Manycore Processor
    Power and Energy Characterization of an Open Source 25-core Manycore Processor Michael McKeown, Alexey Lavrov, Mohammad Shahrad, Paul J. Jackson, Yaosheng Fu∗, Jonathan Balkind, Tri M. Nguyen, Katie Lim, Yanqi Zhouy, David Wentzlaff Princeton University fmmckeown,alavrov,mshahrad,pjj,yfu,jbalkind,trin,kml4,yanqiz,[email protected] ∗ Now at NVIDIA y Now at Baidu Abstract—The end of Dennard’s scaling and the looming power wall have made power and energy primary design CB Chip Bridge (CB) PLL goals for modern processors. Further, new applications such Tile 0 Tile 1 Tile 2 Tile 3 Tile 4 as cloud computing and Internet of Things (IoT) continue to necessitate increased performance and energy efficiency. Manycore processors show potential in addressing some of Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 these issues. However, there is little detailed power and energy data on manycore processors. In this work, we carefully study Tile 10 Tile 11 Tile 12 Tile 13 Tile 14 detailed power and energy characteristics of Piton, a 25-core modern open source academic processor, including voltage Tile 15 Tile 16 Tile 17 Tile 18 Tile 19 versus frequency scaling, energy per instruction (EPI), memory system energy, network-on-chip (NoC) energy, thermal charac- Tile 20 Tile 21 Tile 22 Tile 23 Tile 24 teristics, and application performance and power consumption. This is the first detailed power and energy characterization of (a) (b) an open source manycore design implemented in silicon. The Figure 1. Piton die, wirebonds, and package without epoxy encapsulation open source nature of the processor provides increased value, (a) and annotated CAD tool layout screenshot (b).
    [Show full text]
  • MISP Objects
    MISP Objects MISP Objects Introduction. 7 Funding and Support . 9 MISP objects. 10 ail-leak . 10 ais-info . 11 android-app. 12 android-permission. 13 annotation . 15 anonymisation . 16 asn . 20 attack-pattern . 22 authentication-failure-report . 22 authenticode-signerinfo . 23 av-signature. 24 bank-account. 25 bgp-hijack. 29 bgp-ranking . 30 blog . 30 boleto . 32 btc-transaction . 33 btc-wallet . 34 cap-alert . 35 cap-info. 39 cap-resource . 43 coin-address . 44 command . 46 command-line. 46 cookie . 47 cortex . 48 cortex-taxonomy . 49 course-of-action . 49 covid19-csse-daily-report . 51 covid19-dxy-live-city . 53 covid19-dxy-live-province . 54 cowrie . 55 cpe-asset . 57 1 credential . 67 credit-card . 69 crypto-material. 70 cytomic-orion-file. 73 cytomic-orion-machine . 74 dark-pattern-item. 74 ddos . 75 device . 76 diameter-attack . 77 dkim . 79 dns-record . ..
    [Show full text]
  • Horus: Large-Scale Symmetric Multiprocessing for Opteron Systems
    HORUS: LARGE-SCALE SYMMETRIC MULTIPROCESSING FOR OPTERON SYSTEMS HORUS LETS SERVER VENDORS DESIGN UP TO 32-WAY OPTERON SYSTEMS. BY IMPLEMENTING A LOCAL DIRECTORY STRUCTURE TO FILTER UNNECESSARY PROBES AND BY OFFERING 64 MBYTES OF REMOTE DATA CACHE, THE CHIP SIGNIFICANTLY REDUCES OVERALL SYSTEM TRAFFIC AS WELL AS THE LATENCY FOR A COHERENT HYPERTRANSPORT TRANSACTION. Apart from offering x86 servers a to remote quads. Key to Horus’s performance migration path to 64-bit technology, the is the chip’s ability to cache remote data in its Opteron processor from AMD enables glue- remote data cache (RDC) and the addition of less eight-way symmetric multiprocessing Directory, a cache-coherent directory that (SMP). The performance scaling of impor- eliminates the unnecessary snooping of tant commercial applications is challenging remote Opteron caches. above four-way SMP, however, because of the For enterprise systems, Horus incorporates less-than-full interconnection. Interconnect features such as partitioning; reliability, avail- wiring and packaging is severely taxed with an ability, and serviceability; and communica- eight-way SMP system. tion with the Newisys service processor as part Scaling above an eight-way SMP system of monitoring the system’s health. Rajesh Kota requires fixing both these problems. The In performance simulation tests of Horus Horus application-specific IC, to be released for online transaction processing (OLTP), Rich Oehler in third quarter 2005, offers a solution by transaction latency improved considerably. expanding Opteron’s SMP capability from The average memory access latency of a trans- Newisys Inc. eight-way to 32-way, or from 8 to 32 sockets, action in a four-quad system (16 nodes) with or nodes.1 As the “Work on Symmetric Mul- Horus running an OLTP application was less tiprocessing Systems” sidebar shows, many than three times the average memory access SMP implementations exist, but Horus is the latency in an Opteron-only system with four only chip that targets the Opteron in an SMP Opterons.
    [Show full text]
  • Embedded Multicore: an Introduction
    Embedded Multicore: An Introduction EMBMCRM Rev. 0 07/2009 How to Reach Us: Home Page: www.freescale.com Web Support: http://www.freescale.com/support Information in this document is provided solely to enable system and software USA/Europe or Locations Not Listed: implementers to use Freescale Semiconductor products. There are no express or Freescale Semiconductor, Inc. implied copyright licenses granted hereunder to design or fabricate any integrated Technical Information Center, EL516 circuits or integrated circuits based on the information in this document. 2100 East Elliot Road Tempe, Arizona 85284 Freescale Semiconductor reserves the right to make changes without further notice to +1-800-521-6274 or any products herein. Freescale Semiconductor makes no warranty, representation or +1-480-768-2130 www.freescale.com/support guarantee regarding the suitability of its products for any particular purpose, nor does Freescale Semiconductor assume any liability arising out of the application or use of Europe, Middle East, and Africa: Freescale Halbleiter Deutschland GmbH any product or circuit, and specifically disclaims any and all liability, including without Technical Information Center limitation consequential or incidental damages. “Typical” parameters which may be Schatzbogen 7 provided in Freescale Semiconductor data sheets and/or specifications can and do 81829 Muenchen, Germany vary in different applications and actual performance may vary over time. All operating +44 1296 380 456 (English) +46 8 52200080 (English) parameters, including “Typicals” must be validated for each customer application by +49 89 92103 559 (German) customer’s technical experts. Freescale Semiconductor does not convey any license +33 1 69 35 48 48 (French) under its patent rights nor the rights of others.
    [Show full text]
  • Certification of Avionics Applications on Multi-Core Processors: Opportunities and Challenges
    ™ AN INTELL COMPANY Certification of Avionics Applications on Multi-core Processors: Opportunities and Challenges WHEN IT MATTERS, IT RUNS ON WIND RIVER CERTIFICATION OF AVIONICS APPLICATIONS ON MULTI-CORE PROCESSORS: OPPORTUNITIES AND CHALLENGES EXECUTIVE SUMMARY Developers of avionics systems are increasingly interested in employing multi-core pro- cessors (MCPs). MCPs are especially suited to the lower size, weight, and power (SWaP) consumption requirements of avionics systems. However, MCPs pose many more system implementation and certification challenges than do typical single-core or multiple dis- crete processor solutions. This paper is intended to provide guidance on the certification challenges of multi-core solutions, as well as an update on the work at Wind River® to develop commercial off-the-shelf (COTS) RTCA DO-178C DAL A certification evidence packages for VxWorks® 653 Multi-core Edition platform. TABLE OF CONTENTS Executive Summary . 2 The Challenge of Multi-core Certification . 3 Business Challenges . 3 Technical Challenges . 3 Certification of an ARINC 653 RTOS on Multi-core Processor Architecture . 5 Wind River VxWorks 653 RTOS Multi-core Requirements . 5 DO-178C DAL A Certification Strategy for VxWorks 653 on QorIQ . 6 Future Challenges . 6 Conclusion . 6 ™ 2 | White Paper AN INTEL COMPANY CERTIFICATION OF AVIONICS APPLICATIONS ON MULTI-CORE PROCESSORS: OPPORTUNITIES AND CHALLENGES THE CHALLENGE OF MULTI-CORE CERTIFICATION hardware costs and the impact of hardware obsolescence, thus Multi-core processors have delivered significant performance providing long-term benefits for a program . gains for general purpose enterprise applications over the last In addition, the use of a COTS DO-178C certification approach decade . However, their use in safety-critical avionics systems and COTS certification packages for an ARINC 653–compliant poses some unique challenges that have slowed adoption and RTOS can also drastically reduce a program’s DO-178C certifi- deployment in this market .
    [Show full text]