NXP Semiconductors

Total Page:16

File Type:pdf, Size:1020Kb

NXP Semiconductors Real-Time Open Industrial Linux Jeff Steinheider Marketing Director Roy Zang Software Architect May 2019 | AMF-IND-T3538 Company Public – NXP, the NXP logo, and NXP secure connections for a smarter world are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2019 NXP B.V. Agenda • Industrial Application Requirements • Deterministic Computing • Protecting Industrial Devices • Time Synchronization • Deterministic Networking COMPANY PUBLIC 1 Manufacturing Automation/Smart Grid Requirements Processor Requires Real-Time Performance Control Traditionally supported via RTOS Processor (LS, I.MX) PCIe or 16 bit parallel bus Depends on data sizes and system architecture Communications IC Control loops run every 25-150 usecs Requires low, deterministic latency All elements must be synchronized Control loop period determines how fast and how Motors, smoothly a mechanical system can run Drives (LPC15xxx, Kinetis) Communications IC Will be replaced by TSN COMPANY PUBLIC 2 OpenIL.org Built-in TSN and Hard real-time IEEE1588 Support applications support G A Open Open Software F Built-in industrial-grade Repository and Industrial security Community B Support for various Built-in graphic display E industrial networking support for HMI protocols Applications C D Manageable via industrial control and networking protocols COMPANY PUBLIC 3 OpenIL for Industrial Automation Scalable Hardware Customer Applications Open Industrial Linux SDK 4G TSN,IEEE1588 Networking, Security drivers ARM CPUs up to 100K Coremark PCIe Available 802.11ac/n Bare Metal Xenomai SE Linux Framework Packet Engine Security BLE/Zigbee/ 2-20Gbps Engine Thread Trust Architecture Ethernet Controllers 2x 1GE -> 2x 10GE NFC Determinism Xenomai Linux, Bare Metal Framework LS1046 IEEE 1588, TSN LS1043 LS1028 Security LS1021 SE Linux LS1012 OP-TEE I.MX 6 COMPANY PUBLIC 4 OpenIL Running on Scalable Portfolio of Devices Currently Supported Devices New Device Support Single to Quad Core Adding 3D GPU 32 and 64 bit Arm Adding Integrated TSN LS1043A LS1046A • Cortex-A53 • Cortex-A72 • 2-4 cores • 2-4 cores i.MX 6Dual/6Quad LS1028A • 1.6GHz • 1.8GHz • Cortex-A9 • Cortex-A72 • 1/10G Ethernet, • 1/10 G Ethernet, • 2-4 cores • 2 cores USB, PCI USB, PCI • 800 MHz • 1.3GHz • 5-10W • 10-12W (Industrial) • 4-9W LS1012A LS1021A • 2D/3D GPU • Integrated TSN switch • Cortex-A53 • Cortex-A7 • 2D/3D GPU • 1 core • 2 cores • 1GHz • 1.2GHz • 1-2W • 2W • Ethernet, USB, PCI • Ethernet, USB, PCI COMPANY PUBLIC 5 One Package – Four SoC Options 4x A53 1.6 GHz LS1043 LS1046 4x A72 1.8 GHz 4.2 W Typical 8.5 W Typical 26,650 Coremark 45,330 Coremark Per core SpecINT Per core SpecINT Per core SpecFP Per core SpecFP 2x A53 1.0 GHz LS1023 LS1026 2x A72 1.2 GHz 2.5 W Typical 5.6 W Typical 23mm x 23mm 8,360 Coremark 15,000 Coremark 780 pin Per core SpecINT Per core SpecINT FC-PBGA Package Per core SpecFP Per core SpecFP COMPANY PUBLIC 6 Deterministic Computing COMPANY PUBLIC 7 Deterministic Computing for Industrial Workloads 3 Levels of Real-Time Performance: Heterogeneous Software Model • Xenomai Mercury (PREEMPT-RT Patches) SoC Core 1 Core 2 Core 3 Core 4 • Xenomai Cobalt (Real-Time Co- Bare- Linux RTOS RTOS Kernel) Metal • Bare-Metal Framework Acceleration Acceleration • Run management, communication I/O - Ethernet, PCIe software in Linux on 1 core • Real-time applications running with RTOS (Xenomai) or Bare-Metal on other cores COMPANY PUBLIC 8 Xenomai Latency Distribution on LS1043A Max Latency Samples Distribution 400 353 • Xenomai Cobalt 64-bit 350 mode on LS1043A @ 300 250 1.6 GHz 200 • Measured using 141 150 Xenomai latency tool 100 84 • Jitter < 450 ns 50 • Max latency of 680 ns 1 10 7 3 1 0 0.4 0.44 0.48 0.52 0.56 0.6 0.64 0.68 Max Latency (us) latency min (us) latency avg (us) latency max (us) Duration 0.24 0.279 0.68 00:10:00 COMPANY PUBLIC 9 Protecting Industrial Devices COMPANY PUBLIC 10 Trust Architecture Provides a Trusted Platform Hardware based security features to ease the All QorIQ SoCs support Trust development of trustworthy systems Architecture General- General- DDR Purpose Purpose Controller Manufacturing Secure SoC CPU Core CPU Core Protection Boot HV MMU HV MMU 8 1 Battery Back-up Strong Secure Partitioning Storage Security Fuses 7 2 PreBoot Loader Coherent Interconnect Security Monitor IOMMU IOMMU Internal BootROM 6 3 Secure Debug Tamper Key Power Mgmt SEC Engine Mgmt Control Controller Detection Protection Tamper SD/MMC Crypto, RNG Real Time Debug 5 4 Detect(s) SPI DUART I2C QMan FMAN WRIOP Watchpoint Secure Keys IFC USB SATA BMan Perf CoreNet Secure Key Monitor Trace Clocks/Reset UID, Runtime Eth, Debug Revocation Integrity Check AIOP PCI Aurora CCSR GPIO Security Data-path sub-system sub-system COMPANY PUBLIC 11 Runtime Access Control With SELinux • Improved access control • Policies control file access, network resources, and IPC − Finer grain access control • Use cases: − Prevent remote login for certain types of users − Restrict access to files from the web COMPANY PUBLIC 12 Time Synchronization COMPANY PUBLIC 13 IEEE 1588 for Timing Synchronization linuxptp support: LS1021A LS1043A LS1046A LS1028A Master/Slave Boundary Clock Mode 802.1AS End Station Synchronization within +/- 23 nsec for back to back boards Example configurations and test results COMPANY PUBLIC 14 1588 Performance Offset from Master, Startup Offset from Master, Stable State Timing settles within 5 seconds Accuracy within ±23 nsec COMPANY PUBLIC 15 Deterministic Networking COMPANY PUBLIC 16 Single Board TSN Demonstration • 3 host Linux machines connected through a switch • 2 TCP flows competing for bandwidth • Flows bottlenecked because they are sharing the same link towards Host 2 • Combined throughput cannot exceed 1000Mbps • Utilize TSN features to isolate flows − Ingress Policing: rate-limit traffic coming from Host 3 − Time Gating: schedule the 2 flows on different time slots COMPANY PUBLIC 17 Demonstration Setup LS1012A-FRDM LS1021ATSN ubuntu COMPANY PUBLIC 18 Standard Ethernet Switch Settings Standard Switch Settings Both streams compete for bandwidth High variation Roughly equal distribution standard.xml COMPANY PUBLIC 19 Start TSN on LS1021A-TSN – Enhance with LS1028A LS1021A-TSN LS1028A TSN Features New TSN Features • Time Aware Shaper • Frame Pre-emption (802.1Qbv) (802.1Qbu) • Per-Stream Filtering & • Frame Replication and Policing (802.1Qci) Elimination (802.1CB) • Credit Based Shaper • Cut-through Switching (802.1Qav) • Cyclic Queuing and • Time Synchronization Forwarding (802.1Qch) (802.1AS) • 802.1AS-Rev Supported by one SDK – Open Industrial Linux COMPANY PUBLIC 20 OPC UA over TSN for Industry 4.0 Communications • OpenIL integrates with Open62541 − Open source C implementation of OPC UA − Mozilla Public License v2.0 − server side capabilities • LS1021A Running OPC UA Server − Providing switch statistics − Access via FreeOpcUa Client GUI COMPANY PUBLIC 21 LS1028A: Dual ARM Cortex A72 Processor Core Complex Security Core Complex • 2x 64-bit Cortex-A72 with Neon SIMD engine Security Engine • Speed up to 1300 MHz ARM Trust Zone ARM ARM • Parity and ECC protected 48 KB L1 instruction and 32 KB L1 data cache Cortex –A72 Cortex –A72 Secure Boot • 1 MB L2 cache with ECC protection 16/32-bit DDR3L/4 Basic Peripheral and Interconnect Power Management 48 KB L1-I 48 KB L1-I Memory Controller (ECC) 32 KB L1-D 32 KB L1-D • 2x USB 3.0 OTG controllers with integrated PHY Standard interfaces • 2x eSDHC controllers supporting SD/SDIO 4.0 • 2x CAN-FD controllers 2x SD/SDIO/eMMC 1MB L2 - Cache • 8x UART serial ports 3x SPI Coherent Interconnect (CCI-400) Networking Elements 2x UART, 6x LPUART Accelerators and Memory High Speed Networking • Four Port TSN Ethernet Switch up to 2.5 Gbps on each port Control interfaces Elements 8x I2C, GPIO • Up to four SGMII supporting 1 Gbps 2.5 GbE 6x SAI, 2x CAN-FD 256KB SRAM SATA • Up to one USXGMII supporting 2.5 Gbps PCIe 3 FlexSPI • Up to one QSGMII 3.0 2.5 GbE USB • Up to one RGMII 8x Flex Timer 2.5 GbE 3.0 TSN Switch TSN • 2x PCI Express Gen 3 controllers w/PHY Multimedia interfaces 3D GPU 2.5 GbE • 1x SATA Gen 3.0 controller 4K LCD Controller PCIe USB 2.5 TSN GbE Accelerators and Memory Control 3.0 3.0 eDP/DP Phy w/PHY 1 GbE • 1x 16/32-bit DDR3L/4 Controller with ECC support up to 1.6 GT/s • Time Sensitive Networking (TSN) Ethernet Switch • Security Engine (SEC) Target Applications: • QorIQ Trust architecture: Secure boot, ARM Trust zone and security monitor • Industrial Control, PLCs, Gateways • IoT Gateways Qualification • Automotive • Human Machine Interface • Commercial and extended temperature (support for 125C Tj) • Professional Audio/Video Power Package • 5W TDP • 17x17mm, 0.75mm pitch FC-PBGA COMPANY PUBLIC 22 LS1028A Reference Design Front Panel Back Panel Full Size SD Card Slot Up to 4K 4 Switched Display via 1G/100M/10M DisplayPort TSN Ethernet 2x CAN FD Interfaces 2x UART Ports USB 3.0 1G/100M/10M • Internal M.2 PCIe, SATA slots Type C and Type TSN Ethernet • 2x mikroBUS™ sockets for Click A Controller Boards Compelling Combination of IO, Computing and TSN COMPANY PUBLIC 23 Industrial Solution COMPANY PUBLIC 24 EtherCAT Over TSN IOT device secure management • EtherCAT IGH Master OpenIL and Industrial Reference Solution • Industrial control system OpenIL Distributed IOT Apps Background Flows Build and deployment based on EtherCAT over LinuxCNC LinuxCNC Host Generator1 IGH EtherCAT TSN Master Real-time Linux • Real-time Linux system Platform TSN Switch1 EtherCAT over TSN Switch2 support – Xenomai (LS1028ARDB)
Recommended publications
  • Memory Centric Characterization and Analysis of SPEC CPU2017 Suite
    Session 11: Performance Analysis and Simulation ICPE ’19, April 7–11, 2019, Mumbai, India Memory Centric Characterization and Analysis of SPEC CPU2017 Suite Sarabjeet Singh Manu Awasthi [email protected] [email protected] Ashoka University Ashoka University ABSTRACT These benchmarks have become the standard for any researcher or In this paper, we provide a comprehensive, memory-centric charac- commercial entity wishing to benchmark their architecture or for terization of the SPEC CPU2017 benchmark suite, using a number of exploring new designs. mechanisms including dynamic binary instrumentation, measure- The latest offering of SPEC CPU suite, SPEC CPU2017, was re- ments on native hardware using hardware performance counters leased in June 2017 [8]. SPEC CPU2017 retains a number of bench- and operating system based tools. marks from previous iterations but has also added many new ones We present a number of results including working set sizes, mem- to reflect the changing nature of applications. Some recent stud- ory capacity consumption and memory bandwidth utilization of ies [21, 24] have already started characterizing the behavior of various workloads. Our experiments reveal that, on the x86_64 ISA, SPEC CPU2017 applications, looking for potential optimizations to SPEC CPU2017 workloads execute a significant number of mem- system architectures. ory related instructions, with approximately 50% of all dynamic In recent years the memory hierarchy, from the caches, all the instructions requiring memory accesses. We also show that there is way to main memory, has become a first class citizen of computer a large variation in the memory footprint and bandwidth utilization system design.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • 3 — Arithmetic for Computers 2 MIPS Arithmetic Logic Unit (ALU) Zero Ovf
    Chapter 3 Arithmetic for Computers 1 § 3.1Introduction Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing with overflow Floating-point real numbers Representation and operations Rechnerstrukturen 182.092 3 — Arithmetic for Computers 2 MIPS Arithmetic Logic Unit (ALU) zero ovf 1 Must support the Arithmetic/Logic 1 operations of the ISA A 32 add, addi, addiu, addu ALU result sub, subu 32 mult, multu, div, divu B 32 sqrt 4 and, andi, nor, or, ori, xor, xori m (operation) beq, bne, slt, slti, sltiu, sltu With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub Rechnerstrukturen 182.092 3 — Arithmetic for Computers 3 (Simplyfied) 1-bit MIPS ALU AND, OR, ADD, SLT Rechnerstrukturen 182.092 3 — Arithmetic for Computers 4 Final 32-bit ALU Rechnerstrukturen 182.092 3 — Arithmetic for Computers 5 Performance issues Critical path of n-bit ripple-carry adder is n*CP CarryIn0 A0 1-bit result0 ALU B0 CarryOut0 CarryIn1 A1 1-bit result1 B ALU 1 CarryOut 1 CarryIn2 A2 1-bit result2 ALU B2 CarryOut CarryIn 2 3 A3 1-bit result3 ALU B3 CarryOut3 Design trick – throw hardware at it (Carry Lookahead) Rechnerstrukturen 182.092 3 — Arithmetic for Computers 6 Carry Lookahead Logic (4 bit adder) LS 283 Rechnerstrukturen 182.092 3 — Arithmetic for Computers 7 § 3.2 Addition and Subtraction 3.2 Integer Addition Example: 7 + 6 Overflow if result out of range Adding +ve and –ve operands, no overflow Adding two +ve operands
    [Show full text]
  • I.MX 8Quadxplus Power and Performance
    NXP Semiconductors Document Number: AN12338 Application Note Rev. 4 , 04/2020 i.MX 8QuadXPlus Power and Performance 1. Introduction Contents This application note helps you to design power 1. Introduction ........................................................................ 1 management systems. It illustrates the current drain 2. Overview of i.MX 8QuadXPlus voltage supplies .............. 1 3. Power measurement of the i.MX 8QuadXPlus processor ... 2 measurements of the i.MX 8QuadXPlus Applications 3.1. VCC_SCU_1V8 power ........................................... 4 Processors taken on NXP Multisensory Evaluation Kit 3.2. VCC_DDRIO power ............................................... 4 (MEK) Platform through several use cases. 3.3. VCC_CPU/VCC_GPU/VCC_MAIN power ........... 5 3.4. Temperature measurements .................................... 5 This document provides details on the performance and 3.5. Hardware and software used ................................... 6 3.6. Measuring points on the MEK platform .................. 6 power consumption of the i.MX 8QuadXPlus 4. Use cases and measurement results .................................... 6 processors under a variety of low- and high-power 4.1. Low-power mode power consumption (Key States modes. or ‘KS’)…… ......................................................................... 7 4.2. Complex use case power consumption (Arm Core, The data presented in this application note is based on GPU active) ......................................................................... 11 5. SOC
    [Show full text]
  • Part 1 of 4 : Introduction to RISC-V ISA
    PULP PLATFORM Open Source Hardware, the way it should be! Working with RISC-V Part 1 of 4 : Introduction to RISC-V ISA Frank K. Gürkaynak <[email protected]> Luca Benini <[email protected]> http://pulp-platform.org @pulp_platform https://www.youtube.com/pulp_platform Working with RISC-V Summary ▪ Part 1 – Introduction to RISC-V ISA ▪ What is RISC-V about ▪ Description of ISA, and basic principles ▪ Simple 32b implementation (Ibex by LowRISC) ▪ How to extend the ISA (CV32E40P by OpenHW group) ▪ Part 2 – Advanced RISC-V Architectures ▪ Part 3 – PULP concepts ▪ Part 4 – PULP based chips | ACACES 2020 - July 2020 Working with RISC-V Few words about myself Frank K. Gürkaynak (just call me Frank) Senior scientist at ETH Zurich (means I am old) working with Luca Studied / Worked at Universities: in Turkey, United States and Switzerland (ETHZ and EPFL) Involved in Digital Design, Low Power Circuits, Open Source Hardware Part of PULP project from the beginning in 2013 | ACACES 2020 - July 2020 Working with RISC-V RISC-V Instruction Set Architecture ▪ Started by UC-Berkeley in 2010 SW ▪ Contract between SW and HW Applications ▪ Partitioned into user and privileged spec ▪ External Debug OS ▪ Standard governed by RISC-V foundation ▪ ETHZ is a founding member of the foundation ISA ▪ Necessary for the continuity User Privileged ▪ Defines 32, 64 and 128 bit ISA Debug ▪ No implementation, just the ISA ▪ Different implementations (both open and close source) HW ▪ At ETH Zurich we specialize in efficient implementations of RISC-V cores | ACACES 2020 - July 2020 Working with RISC-V RISC-V maintains basically a PDF document | ACACES 2020 - July 2020 Working with RISC-V ISA defines the instructions that processor uses C++ program translated to RISC-V instructions defined by ISA.
    [Show full text]
  • Asustek Computer Inc.: Asus P6T
    SPEC CFP2006 Result spec Copyright 2006-2014 Standard Performance Evaluation Corporation ASUSTeK Computer Inc. (Test Sponsor: Intel Corporation) SPECfp 2006 = 32.4 Asus P6T Deluxe (Intel Core i7-950) SPECfp_base2006 = 30.6 CPU2006 license: 13 Test date: Oct-2008 Test sponsor: Intel Corporation Hardware Availability: Jun-2009 Tested by: Intel Corporation Software Availability: Nov-2008 0 3.00 6.00 9.00 12.0 15.0 18.0 21.0 24.0 27.0 30.0 33.0 36.0 39.0 42.0 45.0 48.0 51.0 54.0 57.0 60.0 63.0 66.0 71.0 70.7 410.bwaves 70.8 21.5 416.gamess 16.8 34.6 433.milc 34.8 434.zeusmp 33.8 20.9 435.gromacs 20.6 60.8 436.cactusADM 61.0 437.leslie3d 31.3 17.6 444.namd 17.4 26.4 447.dealII 23.3 28.5 450.soplex 27.9 34.0 453.povray 26.6 24.4 454.calculix 20.5 38.7 459.GemsFDTD 37.8 24.9 465.tonto 22.3 470.lbm 50.2 30.9 481.wrf 30.9 42.1 482.sphinx3 41.3 SPECfp_base2006 = 30.6 SPECfp2006 = 32.4 Hardware Software CPU Name: Intel Core i7-950 Operating System: Windows Vista Ultimate w/ SP1 (64-bit) CPU Characteristics: Intel Turbo Boost Technology up to 3.33 GHz Compiler: Intel C++ Compiler Professional 11.0 for IA32 CPU MHz: 3066 Build 20080930 Package ID: w_cproc_p_11.0.054 Intel Visual Fortran Compiler Professional 11.0 FPU: Integrated for IA32 CPU(s) enabled: 4 cores, 1 chip, 4 cores/chip, 2 threads/core Build 20080930 Package ID: w_cprof_p_11.0.054 CPU(s) orderable: 1 chip Microsoft Visual Studio 2008 (for libraries) Primary Cache: 32 KB I + 32 KB D on chip per core Auto Parallel: Yes Secondary Cache: 256 KB I+D on chip per core File System: NTFS Continued on next page Continued on next page Standard Performance Evaluation Corporation [email protected] Page 1 http://www.spec.org/ SPEC CFP2006 Result spec Copyright 2006-2014 Standard Performance Evaluation Corporation ASUSTeK Computer Inc.
    [Show full text]
  • Modeling and Analyzing CPU Power and Performance: Metrics, Methods, and Abstractions
    Modeling and Analyzing CPU Power and Performance: Metrics, Methods, and Abstractions Margaret Martonosi David Brooks Pradip Bose VET NOV TES TAM EN TVM DE I VI GE T SV B NV M I NE Moore’s Law & Power Dissipation... Moore’s Law: ❚ The Good News: 2X Transistor counts every 18 months ❚ The Bad News: To get the performance improvements we’re accustomed to, CPU Power consumption will increase exponentially too... (Graphs courtesy of Fred Pollack, Intel) Why worry about power dissipation? Battery life Thermal issues: affect cooling, packaging, reliability, timing Environment Hitting the wall… ❚ Battery technology ❚ ❙ Linear improvements, nowhere Past: near the exponential power ❙ Power important for increases we’ve seen laptops, cell phones ❚ Cooling techniques ❚ Present: ❙ Air-cooled is reaching limits ❙ Power a Critical, Universal ❙ Fans often undesirable (noise, design constraint even for weight, expense) very high-end chips ❙ $1 per chip per Watt when ❚ operating in the >40W realm Circuits and process scaling can no longer solve all power ❙ Water-cooled ?!? problems. ❚ Environment ❙ SYSTEMS must also be ❙ US EPA: 10% of current electricity usage in US is directly due to power-aware desktop computers ❙ Architecture, OS, compilers ❙ Increasing fast. And doesn’t count embedded systems, Printers, UPS backup? Power: The Basics ❚ Dynamic power vs. Static power vs. short-circuit power ❙ “switching” power ❙ “leakage” power ❙ Dynamic power dominates, but static power increasing in importance ❙ Trends in each ❚ Static power: steady, per-cycle energy cost ❚ Dynamic power: power dissipation due to capacitance charging at transitions from 0->1 and 1->0 ❚ Short-circuit power: power due to brief short-circuit current during transitions.
    [Show full text]
  • BOOM): an Industry- Competitive, Synthesizable, Parameterized RISC-V Processor
    The Berkeley Out-of-Order Machine (BOOM): An Industry- Competitive, Synthesizable, Parameterized RISC-V Processor Christopher Celio David A. Patterson Krste Asanović Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-167 http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-167.html June 13, 2015 Copyright © 2015, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor Christopher Celio, David Patterson, and Krste Asanovic´ University of California, Berkeley, California 94720–1770 [email protected] BOOM is a work-in-progress. Results shown are prelimi- nary and subject to change as of 2015 June. I$ L1 D$ (32k) L2 data 1. The Berkeley Out-of-Order Machine BOOM is a synthesizable, parameterized, superscalar out- exe of-order RISC-V core designed to serve as the prototypical baseline processor for future micro-architectural studies of uncore regfile out-of-order processors. Our goal is to provide a readable, issue open-source implementation for use in education, research, exe and industry. uncore BOOM is written in roughly 9,000 lines of the hardware L2 data (256k) construction language Chisel.
    [Show full text]
  • Continuous Profiling: Where Have All the Cycles Gone?
    Continuous Profiling: Where Have All the Cycles Gone? JENNIFER M. ANDERSON, LANCE M. BERC, JEFFREY DEAN, SANJAY GHEMAWAT, MONIKA R. HENZINGER, SHUN-TAK A. LEUNG, RICHARD L. SITES, MARK T. VANDEVOORDE, CARL A. WALDSPURGER, and WILLIAM E. WEIHL Digital Equipment Corporation This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is being spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them. Categories and Subject Descriptors: C.4 [Computer Systems Organization]: Performance of Systems; D.2.2 [Software Engineering]: Tools and Techniques—profiling tools; D.2.6 [Pro- gramming Languages]: Programming Environments—performance monitoring; D.4 [Oper- ating Systems]: General; D.4.7 [Operating Systems]: Organization and Design; D.4.8 [Operating Systems]: Performance General Terms: Performance Additional Key Words and Phrases: Profiling, performance understanding, program analysis, performance-monitoring hardware An earlier version of this article appeared at the 16th ACM Symposium on Operating System Principles (SOSP), St.
    [Show full text]
  • Efficient Online Memory Error Assessment and Circumvention For
    Int. J. Critical Computer-Based Systems, Vol. 4, No. 3, 227{247 1 Efficient Online Memory Error Assessment and Circumvention for Linux with RAMpage Horst Schirmeier*, Ingo Korb, Olaf Spinczyk and Michael Engel Department of Computer Science 12, Technische Universit¨atDortmund, Otto-Hahn-Str. 16, 44221 Dortmund, Germany E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] *Corresponding author Abstract: Memory errors are a major source of reliability problems in computer systems. Undetected errors may result in program termination or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. To reduce the impact of memory errors, we designed RAMpage, a purely software-based infrastructure to assess and circumvent permanent memory errors in a running commodity x86-64 Linux-based system. We briefly describe the design and implementation of RAMpage and present new results from an extensive qualitative and quantitative evaluation. These results show the efficiency of our approach { RAMpage is able to provide a smooth graceful degradation in the presence of permanent memory errors while requiring only a small overhead in terms of CPU time, energy, and memory space. Keywords: Memory errors; Software-based fault tolerance; DRAM chips; Silent data corruption; Operating systems; Reliable operation Biographical notes: Horst Schirmeier received his Diploma in Computer Science from Friedrich-Alexander-Universit¨atErlangen, Germany. He worked as a researcher in the System Software group at FAU Erlangen, and is currently working in the field of resilient infrastructure software and fault resilience assessment for the DanceOS project in the Embedded System Software group at Technische Universit¨atDortmund, Germany.
    [Show full text]
  • Specfp Benchmark Disclosure
    SPEC CFP2006 Result Copyright 2006-2016 Standard Performance Evaluation Corporation Cisco Systems SPECfp2006 = Not Run Cisco UCS C220 M4 (Intel Xeon E5-2667 v4, 3.20 GHz) SPECfp_base2006 = 125 CPU2006 license: 9019 Test date: Mar-2016 Test sponsor: Cisco Systems Hardware Availability: Mar-2016 Tested by: Cisco Systems Software Availability: Dec-2015 0 30.0 60.0 90.0 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 630 660 690 720 750 780 810 840 900 410.bwaves 572 416.gamess 43.1 433.milc 72.5 434.zeusmp 225 435.gromacs 63.4 436.cactusADM 894 437.leslie3d 427 444.namd 31.6 447.dealII 67.6 450.soplex 48.3 453.povray 62.2 454.calculix 61.6 459.GemsFDTD 242 465.tonto 52.1 470.lbm 799 481.wrf 120 482.sphinx3 91.6 SPECfp_base2006 = 125 Hardware Software CPU Name: Intel Xeon E5-2667 v4 Operating System: SUSE Linux Enterprise Server 12 SP1 (x86_64) CPU Characteristics: Intel Turbo Boost Technology up to 3.60 GHz 3.12.49-11-default CPU MHz: 3200 Compiler: C/C++: Version 16.0.0.101 of Intel C++ Studio XE for Linux; FPU: Integrated Fortran: Version 16.0.0.101 of Intel Fortran CPU(s) enabled: 16 cores, 2 chips, 8 cores/chip Studio XE for Linux CPU(s) orderable: 1,2 chips Auto Parallel: Yes Primary Cache: 32 KB I + 32 KB D on chip per core File System: xfs Secondary Cache: 256 MB I+D on chip per core System State: Run level 3 (multi-user) Continued on next page Continued on next page Standard Performance Evaluation Corporation [email protected] Page 1 http://www.spec.org/ SPEC CFP2006 Result Copyright 2006-2016 Standard Performance
    [Show full text]
  • An Evaluation of Soft Processors As a Reliable Computing Platform
    Brigham Young University BYU ScholarsArchive Theses and Dissertations 2015-07-01 An Evaluation of Soft Processors as a Reliable Computing Platform Michael Robert Gardiner Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Electrical and Computer Engineering Commons BYU ScholarsArchive Citation Gardiner, Michael Robert, "An Evaluation of Soft Processors as a Reliable Computing Platform" (2015). Theses and Dissertations. 5509. https://scholarsarchive.byu.edu/etd/5509 This Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. An Evaluation of Soft Processors as a Reliable Computing Platform Michael Robert Gardiner A thesis submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of Master of Science Michael J. Wirthlin, Chair Brad L. Hutchings Brent E. Nelson Department of Electrical and Computer Engineering Brigham Young University July 2015 Copyright © 2015 Michael Robert Gardiner All Rights Reserved ABSTRACT An Evaluation of Soft Processors as a Reliable Computing Platform Michael Robert Gardiner Department of Electrical and Computer Engineering, BYU Master of Science This study evaluates the benefits and limitations of soft processors operating in a radiation-hardened FPGA, focusing primarily on the performance and reliability of these systems. FPGAs designs for four popular soft processors, the MicroBlaze, LEON3, Cortex- M0 DesignStart, and OpenRISC 1200 are developed for a Virtex-5 FPGA. The performance of these soft processor designs is then compared on ten widely-used benchmark programs.
    [Show full text]