Electronic Design
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Investigations on Hardware Compression of IBM Power9 Processors
Investigations on hardware compression of IBM Power9 processors Jérome Kieffer, Pierre Paleo, Antoine Roux, Benoît Rousselle Outline ● The bandwidth issue at synchrotrons sources ● Presentation of the evaluated systems: – Intel Xeon vs IBM Power9 – Benchmarks on bandwidth ● The need for compression of scientific data – Compression as part of HDF5 – The hardware compression engine NX-gzip within Power9 – Gzip performance benchmark – Bitshuffle-LZ4 benchmark – Filter optimizations – Benchmark of parallel filtered gzip ● Conclusions – on the hardware – on the compression pipeline in HDF5 Page 2 HDF5 on Power9 18/09/2019 Bandwidth issue at synchrotrons sources Data analysis computer with the main interconnections and their associated bandwidth. Data reduction Upgrade to → Azimuthal integration 100 Gbit/s Data compression ! Figures from former generation of servers Kieffer et al. Volume 25 | Part 2 | March 2018 | Pages 612–617 | 10.1107/S1600577518000607 Page 3 HDF5 on Power9 18/09/2019 Topologies of Intel Xeon servers in 2019 Source: intel.com Page 4 HDF5 on Power9 18/09/2019 Architecture of the AC922 server from IBM featuring Power9 Credit: Thibaud Besson, IBM France Page 6 HDF5 on Power9 18/09/2019 Bandwidth measurement: Xeon vs Power9 Computer Dell R840 IBM AC922 Processor 4 Intel Xeon (12 cores) 2 IBM Power9 (16 cores) 2.6 GHz 2.7 GHz Cache (L3) 19 MB 8x 10 MB Memory channels 4x 6 DDR4 2x 8 DDR4 Memory capacity → 3TB → 2TB Memory speed theory 512 GB/s 340 GB/s Measured memory speed 160 GB/s 270 GB/s Interconnects PCIe v3 PCIe v4 NVlink2 & CAPI2 GP-GPU co-processor 2Tesla V100 PCIe v3 2Tesla V100 NVlink2 Interconnect speed 12 GB/s 48 GB/s CPU ↔ GPU Page 8 HDF5 on Power9 18/09/2019 Strength and weaknesses of the OpenPower architecture While amd64 is today’s de facto standard in HPC, it has a few competitors: arm64, ppc64le and to a less extend riscv and mips64. -
Bob Pease, Analog Guru, Died 5 Years Ago - Media Rako.Com/Media 1 of 12
Bob Pease, analog guru, died 5 years ago - Media Rako.com/Media 1 of 12 Rako Studios » Media » Tech » Electronics » Bob Pease, analog guru, died 5 years ago Bob Pease, analog guru, died 5 years ago Bob Pease was a true standout, the gentleman genius. Notorious analog engineer Bob Pease died five Saturday, Bob had come to the service from years ago, on June 18, 2011. His passing was his office at National Semiconductor, now all the more tragic since he died driving home Texas Instruments. My buddy has a from a remembrance for fellow analog saying, "Everyone wants to be somebody, no great Jim Williams. Although it was a one wants to become somebody." 1 of 12 9/2/2018 6:49 PM 1 of 12 Bob Pease, analog guru, died 5 years ago - Media Rako.com/Media 2 of 12 Bob's being the most famous analog designer After coming to National Semi, Bob learned was the result of his hard work becoming a analog IC design, back in the days ofhand-cut brilliant engineer, with a passion for helping Rubylith masks. There was no Spice simulator others. Fran Hoffart, retired Linear Tech apps back then, and Pease had deep ridicule for engineer and former colleague recalls, "Citing a engineers that relied on computer simulations, need for educating fellow engineers in the insteadof thinking through the problem and design of bandgap references, Bob anointed making some quickback-of-envelope himself 'The Czar of Bandgaps,' complete with calculations. He accepted that Spice was useful a quasi-military suit with a sword and a especially for inexperienced engineers, but was necklace made from metal TO-3 packages." He concerned thatengineers were substituting would help any engineer with a problem, even computer smarts for real smarts. -
POWER® Processor-Based Systems
IBM® Power® Systems RAS Introduction to IBM® Power® Reliability, Availability, and Serviceability for POWER9® processor-based systems using IBM PowerVM™ With Updates covering the latest 4+ Socket Power10 processor-based systems IBM Systems Group Daniel Henderson, Irving Baysah Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Power Power Systems Memory™ Hypervisor™ Systems™ Software™ Power® POWER POWER7 POWER8™ POWER® PowerLinux™ 7® +™ POWER® PowerHA® POWER6 ® PowerVM System System PowerVC™ POWER Power Architecture™ ® x® z® Hypervisor™ Additional Trademarks may be identified in the body of this document. Other company, product, or service names may be trademarks or service marks of others. Notices The last page of this document contains copyright information, important notices, and other information. Acknowledgements While this whitepaper has two principal authors/editors it is the culmination of the work of a number of different subject matter experts within IBM who contributed ideas, detailed technical information, and the occasional photograph and section of description. -
IBM Power Systems Performance Report Apr 13, 2021
IBM Power Performance Report Power7 to Power10 September 8, 2021 Table of Contents 3 Introduction to Performance of IBM UNIX, IBM i, and Linux Operating System Servers 4 Section 1 – SPEC® CPU Benchmark Performance 4 Section 1a – Linux Multi-user SPEC® CPU2017 Performance (Power10) 4 Section 1b – Linux Multi-user SPEC® CPU2017 Performance (Power9) 4 Section 1c – AIX Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 5 Section 1d – Linux Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 6 Section 2 – AIX Multi-user Performance (rPerf) 6 Section 2a – AIX Multi-user Performance (Power8, Power9 and Power10) 9 Section 2b – AIX Multi-user Performance (Power9) in Non-default Processor Power Mode Setting 9 Section 2c – AIX Multi-user Performance (Power7 and Power7+) 13 Section 2d – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power8) 15 Section 2e – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power7 and Power7+) 20 Section 3 – CPW Benchmark Performance 19 Section 3a – CPW Benchmark Performance (Power8, Power9 and Power10) 22 Section 3b – CPW Benchmark Performance (Power7 and Power7+) 25 Section 4 – SPECjbb®2015 Benchmark Performance 25 Section 4a – SPECjbb®2015 Benchmark Performance (Power9) 25 Section 4b – SPECjbb®2015 Benchmark Performance (Power8) 25 Section 5 – AIX SAP® Standard Application Benchmark Performance 25 Section 5a – SAP® Sales and Distribution (SD) 2-Tier – AIX (Power7 to Power8) 26 Section 5b – SAP® Sales and Distribution (SD) 2-Tier – Linux on Power (Power7 to Power7+) -
Towards a Portable Hierarchical View of Distributed Shared Memory Systems: Challenges and Solutions
Towards A Portable Hierarchical View of Distributed Shared Memory Systems: Challenges and Solutions Millad Ghane Sunita Chandrasekaran Margaret S. Cheung Department of Computer Science Department of Computer and Information Physics Department University of Houston Sciences University of Houston TX, USA University of Delaware Center for Theoretical Biological Physics, [email protected],[email protected] DE, USA Rice University [email protected] TX, USA [email protected] Abstract 1 Introduction An ever-growing diversity in the architecture of modern super- Heterogeneity has become increasingly prevalent in the recent computers has led to challenges in developing scientifc software. years given its promising role in tackling energy and power con- Utilizing heterogeneous and disruptive architectures (e.g., of-chip sumption crisis of high-performance computing (HPC) systems [15, and, in the near future, on-chip accelerators) has increased the soft- 20]. Dennard scaling [14] has instigated the adaptation of heteroge- ware complexity and worsened its maintainability. To that end, we neous architectures in the design of supercomputers and clusters need a productive software ecosystem that improves the usability by the HPC community. The July 2019 TOP500 [36] report shows and portability of applications for such systems while allowing how 126 systems in the list are heterogeneous systems confgured every parallelism opportunity to be exploited. with one or many GPUs. This is the prevailing trend in current gen- In this paper, we outline several challenges that we encountered eration of supercomputers. As an example, Summit [31], the fastest in the implementation of Gecko, a hierarchical model for distributed supercomputer according to the Top500 list (June 2019) [36], has shared memory architectures, using a directive-based program- two IBM POWER9 processors and six NVIDIA Volta V100 GPUs. -
Upgrade to POWER9 Planning Checklist
Achieve your full business potential with IBM POWER9 Future-forward infrastructure designed to crush data-intensive workloads Upgrade to POWER9 Planning Checklist Using this checklist will help to ensure that your infrastructure strategy is aligned to your needs, avoiding potential cost overruns or capability shortfalls. 1 Determine current and future capacity 5 Identify all dependencies for major database requirements. Bring your team together, platforms, including Oracle, DB2, SAP HANA, assess your current application workload and open-source databases like EnterpriseDB, requirements and three- to five-year MongoDB, neo4j, and Redis. You’re likely outlook. You’ll then have a good picture running major databases on the Power of when and where application growth Systems platform; co-locating your current will take place, enabling you to secure servers may be a way to reduce expenditure capacity at the appropriate time on an and increase flexibility. as-needed basis. 6 Understand current and future data center 2 Assess operational efficiencies and identify environmental requirements. You may be opportunities to improve service levels unnecessarily overspending on power, while decreasing exposure to security and cooling and space. Savings here will help your compliancy issues/problems. With new organization avoid costs associated with data technologies that allow you to easily adjust center expansion. capacity, you will be in a much better position to lower costs, improve service Identify the requirements of your strategy for levels, and increase efficiency. 7 on- and off-premises cloud infrastructure. As you move to the cloud, ensure you 3 Create a detailed inventory of servers have a strong strategy to determine which across your entire IT infrastructure. -
ESE 150 – Tasks? DIGITAL AUDIO BASICS ESE150 Spring 2021 Based on Slides © 2009--2021 Dehon 1 2
3/31/21 ESE150 Spring 2021 OBSERVATION Ò We want our phones (and computers) to do many things at once. Ò If we dedicate a processor to MP3 decoding É It will sit idle most of the time Ò MP3 decoding (and many other things) do not consume a modern processor Lecture #18 – Operating Systems (OS) Ò Idea: Maybe we can share the processor among ESE 150 – tasks? DIGITAL AUDIO BASICS ESE150 Spring 2021 Based on slides © 2009--2021 DeHon 1 2 ESE150 Spring 2021 ESE150 Spring 2021 BIG IDEA OUTLINE Ò Provide the illusion of (virtually) unlimited Ò Review processors by sharing single processor in time Ò Worksheet: Virtualization In Action Ò Strategy É Time-share processor Ð Store all process (virtual processors) state in memory Ð Iterate through processes × Restore process state × Run for a number of cycles × Save process state 3 4 ESE150 Spring 2021 ESE150 Spring 2021 COURSE MAP – WEEK 10 ROLE OF OPERATING SYSTEM MIC Ò Higher-level, shared support for all programs A/D É Could put it in program, but most programs need it! É Needs to be abstracted from program Music 1 domain conversion 5,6 compress Ò ResourCe sharing Numbers É Processor, memory, “devices” (net, printer, audio) correspond to pyscho- course weeks sample freq Ò acoustics 3 Polite sharing 2 4 É Isolation and protection EULA É Fences make Good Neighbors – R. Frost -------- D/A 10101001101 -------- Ò Idea: Expensive/limited resources can be shared in click time – OS manages tHis sHaring OK speaker MP3 Player / iPhone / Droid 5 6 1 3/31/21 ESE150 Spring 2021 ESE150 Spring 2021 IDEA Ò Virtualize -
AC922 Data Movement for CORAL
AC922 Data Movement for CORAL Roberts, Steve Ramanna, Pradeep Walthour, John IBM IBM IBM Cognitive Systems Cognitive Systems Cognitive Systems Austin, TX Austin, TX Austin, TX [email protected] [email protected] [email protected] Abstract—Recent publications have considered the challenge of and GPU processing elements associated with their respective movement in and out of the high bandwidth memory in attempt DRAM or high bandwidth memory (HBM2). Each processor to maximize GPU utilization and minimize overall application element creates a NUMA domain which in total encompasses wall time. This paper builds on previous contributions, [5] [17], which simulate software models, advocated optimization, and over > 2PB worth of total memory (see Table I for total suggest design considerations. This contribution characterizes the capacity). data movement innovations of the AC922 nodes IBM delivered to Oak Ridge National Labs and Lawrence Livermore National TABLE I Labs as part of the 2014 Collaboration of Oak Ridge, Argonne, CORAL SYSTEMS MEMORY SUMMARY and Livermore (CORAL) joint procurement activity. With a single HPC system able to perform up to 200PF of processing Lab Nodes Sockets DRAM (TB) GPUs HBM2 (TB) with access to 2.5PB of memory, this architecture motivates a ORNL 4607 9216 2,304 27648 432 careful look at data movement. The AC922 POWER9 system LLNL 4320 8640 2,160 17280 270 with NVIDIA V100 GPUs have cache line granularity, more than double the bandwidth of PCIe Gen3, low latency interfaces Efficient programming models call for accessing system and are interconnected by dual-rail Mellanox CAPI/EDR HCAs. memory with as little data replication as possible and As such, the bandwidth and latency assumptions from previous with low instruction overhead. -
POWER10 Processor Chip
POWER10 Processor Chip Technology and Packaging: PowerAXON PowerAXON - 602mm2 7nm Samsung (18B devices) x x 3 SMT8 SMT8 SMT8 SMT8 3 - 18 layer metal stack, enhanced device 2 Core Core Core Core 2 - Single-chip or Dual-chip sockets + 2MB L2 2MB L2 2MB L2 2MB L2 + 4 4 SMP, Memory, Accel, Cluster, PCI Interconnect Cluster,Accel, Memory, SMP, Computational Capabilities: PCI Interconnect Cluster,Accel, Memory, SMP, Local 8MB - Up to 15 SMT8 Cores (2 MB L2 Cache / core) L3 region (Up to 120 simultaneous hardware threads) 64 MB L3 Hemisphere Memory Signaling (8x8 OMI) (8x8 Signaling Memory - Up to 120 MB L3 cache (low latency NUCA mgmt) OMI) (8x8 Signaling Memory - 3x energy efficiency relative to POWER9 SMT8 SMT8 SMT8 SMT8 - Enterprise thread strength optimizations Core Core Core Core - AI and security focused ISA additions 2MB L2 2MB L2 2MB L2 2MB L2 - 2x general, 4x matrix SIMD relative to POWER9 - EA-tagged L1 cache, 4x MMU relative to POWER9 SMT8 SMT8 SMT8 SMT8 Core Core Core Core Open Memory Interface: 2MB L2 2MB L2 2MB L2 2MB L2 - 16 x8 at up to 32 GT/s (1 TB/s) - Technology agnostic support: near/main/storage tiers - Minimal (< 10ns latency) add vs DDR direct attach 64 MB L3 Hemisphere PowerAXON Interface: - 16 x8 at up to 32 GT/s (1 TB/s) SMT8 SMT8 SMT8 SMT8 x Core Core Core Core x - SMP interconnect for up to 16 sockets 3 2MB L2 2MB L2 2MB L2 2MB L2 3 - OpenCAPI attach for memory, accelerators, I/O 2 2 + + - Integrated clustering (memory semantics) 4 4 PCIe Gen 5 PCIe Gen 5 PowerAXON PowerAXON PCIe Gen 5 Interface: Signaling (x16) Signaling -
Product Datasheet [email protected]
0800 064 64 64 Product Datasheet [email protected] Apple iPhone 11 Pro Specification BATTERY IP RATING Battery: 3,046 mAh IP Rating: IP68 CAMERA MEMORY Main Camera: 12MP / 12MP / 12MP Memory: Internal Selfie Camera: 12MP MEMORY TYPE CHARGING 64 GB 4 GB RAM / 256 GB 4 Memory Type: GB RAM / 512 GB 4GB RAM Wireless Yes Charging: Then there was Pro. Pro Cameras. Pro Charging 2.0, proprietary reversible OS display. Pro performance. Type: connector iOS 13, upgradable to iOS OS: Fast Charging: Yes 13.3 A transformative triple-camera system that adds tons of capability without CHIPSET SIM TYPE complexity. An unprecedented leap in Chipset: Apple A13 Bionic (7 nm+) Sim Type: Nano-SIM and/or eSIM battery life. And a mind-blowing chip that elevates machine learning and pushes COLOURS SOUND the boundaries. Welcome to the first Contact your Account Colours: Sound: Yes iPhone powerful enough to be called Pro. Manager The iPhone 11 Pro 5.8” (diagonal) all- TECHNOLOGY CONNECTIVITY screen OLED Super Retina XDR Display GSM / CDMA / HSPA / Technology: makes it easy to view documents, read Wi-Fi: Yes EVDO / LTE emails and brings videos to life. GPS: Yes NFC: Yes VIDEO iPhone 11 Pro's Ultra-Wide Bluetooth: 5.0, A2DP, LE 2160p@24/30/60fps, camera shows users what’s happening Video: 1080p@30/60/120fps, gyro- outside the frame — and lets them CPU EIS capture it. Unleash creativity or capture Hexa-core (2x2.65 GHz high-quality video for that any occasion. CPU: Lightning + 4x1.8 GHz DIMENSIONS AND WEIGHT Thunder) Shoots sharp 4K video with cinematic Weight: 188g features. -
Analog Circuits: World Class Designs Robert A
Analog Circuits World Class Designs Newnes World Class Designs Series Analog Circuits: World Class Designs Robert A. Pease ISBN: 978-0-7506-8627-3 Embedded Systems: World Class Designs Jack Ganssle ISBN: 978-0-7506-8625-9 Power Sources and Supplies: World Class Designs Marty Brown ISBN:978-0-7506-8626-6 For more information on these and other Newnes titles, visit: www.newnespress.com Analog Circuits World Class Designs Robert A. Pease, Editor with Bonnie Baker Richard S. Burwen Sergio Franco Phil Perkins Marc Thompson Jim Williams Steve Winder AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Newnes is an imprint of Elsevier Newnes is an imprint of Elsevier 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA Linacre House, Jordan Hill, Oxford OX2 8DP, UK Copyright © 2008, by Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (ϩ44) 1865 843830, fax: (ϩ44) 1865 853333, E-mail: [email protected]. You may also complete your request online via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” Recognizing the importance of preserving what has been written, Elsevier prints its books on acid-free paper whenever possible. Library of Congress Cataloging-in-Publication Data Application Submitted British Library Cataloguing-in-Publication Data A Catalogue record for this book is available from the British Library. -
Bob Pease Lab Notes Part 9.Pdf
934 PROCEEDINGS OF THE IEEE, VOL. 68, NO. 7, JULY 1980 A test circuit of Fig. 1 was constructed using standard 741 type OA’s, monolithic transistorarrays and 0.1 percent resistors. Initially,with i = 0, the mirror gains were equalized by removing R1 and R4 (thus assuring I+ = I-) ad adjusting R 5 to obtain io = 0. For VCCS opera- tion, G, was measured leaving port B-E’ open (i = 0), while for CCCS operation, port A-A’ was shorted to ground making u1 = u2 = 0. To minimize scaling errordue to frnite gain Aof the OA’s, theratio (Rz + R3)/R1 was restricted to a maximum value of 100. ForA - lo5, this assures less than 0.1 percent error at dc due to finite OA gain. Using various resistance combinations, values of G, up to 0.5 mho and Ai as large as 400 were obtained with a scaling error of the order of 1 percent, which was largely due to nonunity gain of the mirrors. Indivi- dual trimmingof the mirrors to precisely unity gain (tO.1 percent) Fig. 1. Oscillator network in the above letter tuned for an oscillation resulted in scaling errorsof the orderof resistance tolerances. The frequency of 10 Hz. (x-axis; 50 ms/cm; y-axis: 2 V/cm) small signal performance of the convertor is shown in Fig. 2(a) for the case where R1 = - and Rz = R3 = 0. Here the circuit is a VCCS with G, = (l/R4). For large R4 the bandwidth is equal to the GBW of the the input of the next “G” block, will cause phase errors, causing the OA’s (-1 MHz).