Power Architecture® ISA 2.06 Stride N Prefetch Engines to Boost Application's Performance
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Wind Rose Data Comes in the Form >200,000 Wind Rose Images
Making Wind Speed and Direction Maps Rich Stromberg Alaska Energy Authority [email protected]/907-771-3053 6/30/2011 Wind Direction Maps 1 Wind rose data comes in the form of >200,000 wind rose images across Alaska 6/30/2011 Wind Direction Maps 2 Wind rose data is quantified in very large Excel™ spreadsheets for each region of the state • Fields: X Y X_1 Y_1 FILE FREQ1 FREQ2 FREQ3 FREQ4 FREQ5 FREQ6 FREQ7 FREQ8 FREQ9 FREQ10 FREQ11 FREQ12 FREQ13 FREQ14 FREQ15 FREQ16 SPEED1 SPEED2 SPEED3 SPEED4 SPEED5 SPEED6 SPEED7 SPEED8 SPEED9 SPEED10 SPEED11 SPEED12 SPEED13 SPEED14 SPEED15 SPEED16 POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 WEIBC1 WEIBC2 WEIBC3 WEIBC4 WEIBC5 WEIBC6 WEIBC7 WEIBC8 WEIBC9 WEIBC10 WEIBC11 WEIBC12 WEIBC13 WEIBC14 WEIBC15 WEIBC16 WEIBK1 WEIBK2 WEIBK3 WEIBK4 WEIBK5 WEIBK6 WEIBK7 WEIBK8 WEIBK9 WEIBK10 WEIBK11 WEIBK12 WEIBK13 WEIBK14 WEIBK15 WEIBK16 6/30/2011 Wind Direction Maps 3 Data set is thinned down to wind power density • Fields: X Y • POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 • Power1 is the wind power density coming from the north (0 degrees). Power 2 is wind power from 22.5 deg.,…Power 9 is south (180 deg.), etc… 6/30/2011 Wind Direction Maps 4 Spreadsheet calculations X Y POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 Max Wind Dir Prim 2nd Wind Dir Sec -132.7365 54.4833 0.643 0.767 1.911 4.083 -
Copyrighted Material
CHAPTER 1 MULTI- AND MANY-CORES, ARCHITECTURAL OVERVIEW FOR PROGRAMMERS Lasse Natvig, Alexandru Iordan, Mujahed Eleyat, Magnus Jahre and Jorn Amundsen 1.1 INTRODUCTION 1.1.1 Fundamental Techniques Parallelism hasCOPYRIGHTED been used since the early days of computing MATERIAL to enhance performance. From the first computers to the most modern sequential processors (also called uni- processors), the main concepts introduced by von Neumann [20] are still in use. How- ever, the ever-increasing demand for computing performance has pushed computer architects toward implementing different techniques of parallelism. The von Neu- mann architecture was initially a sequential machine operating on scalar data with bit-serial operations [20]. Word-parallel operations were made possible by using more complex logic that could perform binary operations in parallel on all the bits in a computer word, and it was just the start of an adventure of innovations in parallel computer architectures. Programming Multicore and Many-core Computing Systems, 3 First Edition. Edited by Sabri Pllana and Fatos Xhafa. © 2017 John Wiley & Sons, Inc. Published 2017 by John Wiley & Sons, Inc. 4 MULTI- AND MANY-CORES, ARCHITECTURAL OVERVIEW FOR PROGRAMMERS Prefetching is a 'look-ahead technique' that was introduced quite early and is a way of parallelism that is used at several levels and in different components of a computer today. Both data and instructions are very often accessed sequentially. Therefore, when accessing an element (instruction or data) at address k, an auto- matic access to address k+1 will bring the element to where it is needed before it is accessed and thus eliminates or reduces waiting time. -
Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important
VOLUME 13, NUMBER 13 OCTOBER 6,1999 MICROPROCESSOR REPORT THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important by Keith Diefendorff company has decided to make a last-gasp effort to retain control of its high-end server silicon by throwing its consid- Not content to wrap sheet metal around erable financial and technical weight behind Power4. Intel microprocessors for its future server After investing this much effort in Power4, if IBM fails business, IBM is developing a processor it to deliver a server processor with compelling advantages hopes will fend off the IA-64 juggernaut. Speaking at this over the best IA-64 processors, it will be left with little alter- week’s Microprocessor Forum, chief architect Jim Kahle de- native but to capitulate. If Power4 fails, it will also be a clear scribed IBM’s monster 170-million-transistor Power4 chip, indication to Sun, Compaq, and others that are bucking which boasts two 64-bit 1-GHz five-issue superscalar cores, a IA-64, that the days of proprietary CPUs are numbered. But triple-level cache hierarchy, a 10-GByte/s main-memory IBM intends to resist mightily, and, based on what the com- interface, and a 45-GByte/s multiprocessor interface, as pany has disclosed about Power4 so far, it may just succeed. Figure 1 shows. Kahle said that IBM will see first silicon on Power4 in 1Q00, and systems will begin shipping in 2H01. Looking for Parallelism in All the Right Places With Power4, IBM is targeting the high-reliability servers No Holds Barred that will power future e-businesses. -
POWER® Processor-Based Systems
IBM® Power® Systems RAS Introduction to IBM® Power® Reliability, Availability, and Serviceability for POWER9® processor-based systems using IBM PowerVM™ With Updates covering the latest 4+ Socket Power10 processor-based systems IBM Systems Group Daniel Henderson, Irving Baysah Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Power Power Systems Memory™ Hypervisor™ Systems™ Software™ Power® POWER POWER7 POWER8™ POWER® PowerLinux™ 7® +™ POWER® PowerHA® POWER6 ® PowerVM System System PowerVC™ POWER Power Architecture™ ® x® z® Hypervisor™ Additional Trademarks may be identified in the body of this document. Other company, product, or service names may be trademarks or service marks of others. Notices The last page of this document contains copyright information, important notices, and other information. Acknowledgements While this whitepaper has two principal authors/editors it is the culmination of the work of a number of different subject matter experts within IBM who contributed ideas, detailed technical information, and the occasional photograph and section of description. -
Openpower AI CERN V1.Pdf
Moore’s Law Processor Technology Firmware / OS Linux Accelerator sSoftware OpenStack Storage Network ... Price/Performance POWER8 2000 2020 DRAM Memory Chips Buffer Power8: Up to 12 Cores, up to 96 Threads L1, L2, L3 + L4 Caches Up to 1 TB per socket https://www.ibm.com/blogs/syst Up to 230 GB/s sustained memory ems/power-systems- openpower-enable- bandwidth acceleration/ System System Memory Memory 115 GB/s 115 GB/s POWER8 POWER8 CPU CPU NVLink NVLink 80 GB/s 80 GB/s P100 P100 P100 P100 GPU GPU GPU GPU GPU GPU GPU GPU Memory Memory Memory Memory GPU PCIe CPU 16 GB/s System bottleneck Graphics System Memory Memory IBM aDVantage: data communication and GPU performance POWER8 + 78 ms Tesla P100+NVLink x86 baseD 170 ms GPU system ImageNet / Alexnet: Minibatch size = 128 ADD: Coherent Accelerator Processor Interface (CAPI) FPGA CAPP PCIe POWER8 Processor ...FPGAs, networking, memory... Typical I/O MoDel Flow Copy or Pin MMIO Notify Poll / Int Copy or Unpin Ret. From DD DD Call Acceleration Source Data Accelerator Completion Result Data Completion Flow with a Coherent MoDel ShareD Mem. ShareD Memory Acceleration Notify Accelerator Completion Focus on Enterprise Scale-Up Focus on Scale-Out and Enterprise Future Technology and Performance DriVen Cost and Acceleration DriVen Partner Chip POWER6 Architecture POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10 POWER8/9 2007 2008 2010 2012 2014 2016 2017 TBD 2018 - 20 2020+ POWER6 POWER6+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 SO 2 cores 2 cores 8 cores 8 cores 12 cores w/ NVLink -
From Blue Gene to Cell Power.Org Moscow, JSCC Technical Day November 30, 2005
IBM eServer pSeries™ From Blue Gene to Cell Power.org Moscow, JSCC Technical Day November 30, 2005 Dr. Luigi Brochard IBM Distinguished Engineer Deep Computing Architect [email protected] © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As frequency increase is limited due to power limitation Dual core is a way to : 2 x Peak Performance per chip (and per cycle) But at the expense of frequency (around 20% down) Another way is to increase Flop/cycle © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations POWER : FMA in 1990 with POWER: 2 Flop/cycle/chip Double FMA in 1992 with POWER2 : 4 Flop/cycle/chip Dual core in 2001 with POWER4: 8 Flop/cycle/chip Quadruple core modules in Oct 2005 with POWER5: 16 Flop/cycle/module PowerPC: VMX in 2003 with ppc970FX : 8 Flops/cycle/core, 32bit only Dual VMX+ FMA with pp970MP in 1Q06 Blue Gene: Low frequency , system on a chip, tight integration of thousands of cpus Cell : 8 SIMD units and a ppc970 core on a chip : 64 Flop/cycle/chip © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As needs diversify, systems are heterogeneous and distributed GRID technologies are an essential part to create cooperative environments based on standards © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations IBM is : a sponsor of Globus Alliances contributing to Globus Tool Kit open souce a founding member of Globus Consortium IBM is extending its products Global file systems : – Multi platform and multi cluster GPFS Meta schedulers : – Multi platform -
Progress Codes
Power Systems Progress codes Power Systems Progress codes Note Before using this information and the product it supports, read the information in “Notices,” on page 109, “Safety notices” on page v, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823. This edition applies to IBM Power Systems™ servers that contain the POWER6® processor and to all associated models. © Copyright IBM Corporation 2007, 2009. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Safety notices ............v Chapter 13. (CAxx) Partition firmware progress codes ...........79 Chapter 1. Progress codes overview . 1 Chapter 14. (CF00) Linux kernel boot Chapter 2. AIX IPL progress codes . 3 progress codes ...........91 Chapter 3. AIX diagnostic load Chapter 15. (D1xx) Service processor progress indicators .........29 firmware progress codes .......93 Chapter 4. Dump progress indicators Chapter 16. (D1xx) Service processor (dump status codes) .........33 status progress codes ........95 Chapter 5. AIX crash progress codes Chapter 17. (D1xx) Service processor (category 1) ............35 dump status progress codes .....97 Chapter 6. AIX crash progress codes Chapter 18. (D1xx) Platform dump (category 2) ............37 status progress codes .......101 Chapter 7. AIX crash progress codes Chapter 19. (D2xx) Partition status (category 3) ............39 progress codes ..........103 Chapter 8. (C1xx) Service processor Chapter 20. (D6xx) General status progress codes ...........41 progress codes ..........105 Chapter 9. (C2xx) Virtual service Chapter 21. (D9xx) General status processor progress codes ......63 progress codes ..........107 Chapter 10. (C3xx, C5xx, C6xx) IPL Appendix. Notices .........109 status progress codes ........67 Trademarks ..............110 Electronic emission notices .........111 Chapter 11. -
Power1.Ps (Mpage)
Low Energy & Power Design Issues Low Power Design Problem • Processor trends Microprocessor Power • Circuit and Technology Issues (source ISSCC) 30 • Architectural optimizations • Low power µP research project 20 10 Power (Watt) 0 75 80 85 90 95 Year When supply voltage drops to 1Volt, then 100Watts = 100 Amps Slide 2 Portable devices Two Kinds of Computation Required • General purpose processing (what you have been Portable Functions studying so far) • Multimodal radio • Bursty - mostly idle with bursts of computation • Protocols, ECC, ... • Maximum possible throughput required during active • Voice I/O compression & periods decompression • Handwriting recognition • Signal processing (for multimedia, wireless Battery • Text/Graphics processing communications, etc.) (40+ lbs) • Video decompression • Stream based computation • Speech recognition • No advantage in increasing processing rate above • Java interpreter required for real-time requirements How to get 1 month of operation? Slide 3 Slide 4 Optimizing for Energy Consumption Switching Energy • Conventional General Purpose processors (e.g. Vdd Pentiums) • Performance is everything ... somehow we’ll get the Vin Vout power in and back out • 10-100 Watts, 100-1000 Mips = .01 Mips/mW CL • Energy Optimized but General Purpose • Keep the generality, but reduce the energy as much as 2 possible - e.g. StrongArm Energy/transition = CL * Vdd • .5 Watts, 160 Mips = .3 Mips/mW 2 Power = Energy/transition * f = CL * Vdd * f • Energy Optimized and Dedicated • 100 Mops/mW Slide 5 Slide 6 Low Power -
Chapter 1-Introduction to Microprocessors File
Chapter 1 Introduction to Microprocessors Expected Outcomes Explain the role of the CPU, memory and I/O device in a computer Distinguish between the microprocessor and microcontroller Differentiate various form of programming languages Compare between CISC vs RISC and Von Neumann vs Harvard architecture NMKNYFKEEUMP Introduction A microprocessor is an integrated circuit built on a tiny piece of silicon It contains thousands or even millions of transistors which are interconnected via superfine traces of aluminum The transistors work together to store and manipulate data so that the microprocessor can perform a wide variety of useful functions The particular functions a microprocessor perform are dictated by software The first microprocessor was the Intel 4004 (16-pin) introduced in 1971 containing 2300 transistors with 46 instruction sets Power8 processor, by contrast, contains 4.2 billion transistors NMKNYFKEEUMP Introduction Computer is an electronic machine that perform arithmetic operation and logic in response to instructions written Computer requires hardware and software to function Hardware is electronic circuit boards that provide functionality of the system such as power supply, cable, etc CPU – Central Processing Unit/Microprocessor Memory – store all programming and data Input/Output device – the flow of information Software is a programming that control the system operation and facilitate the computer usage Programming is a group of instructions that inform the computer to perform certain task NMKNYFKEEUMP Introduction Computer -
Introduction to the Cell Multiprocessor
Introduction J. A. Kahle M. N. Day to the Cell H. P. Hofstee C. R. Johns multiprocessor T. R. Maeurer D. Shippy This paper provides an introductory overview of the Cell multiprocessor. Cell represents a revolutionary extension of conventional microprocessor architecture and organization. The paper discusses the history of the project, the program objectives and challenges, the design concept, the architecture and programming models, and the implementation. Introduction: History of the project processors in order to provide the required Initial discussion on the collaborative effort to develop computational density and power efficiency. After Cell began with support from CEOs from the Sony several months of architectural discussion and contract and IBM companies: Sony as a content provider and negotiations, the STI (SCEI–Toshiba–IBM) Design IBM as a leading-edge technology and server company. Center was formally opened in Austin, Texas, on Collaboration was initiated among SCEI (Sony March 9, 2001. The STI Design Center represented Computer Entertainment Incorporated), IBM, for a joint investment in design of about $400,000,000. microprocessor development, and Toshiba, as a Separate joint collaborations were also set in place development and high-volume manufacturing technology for process technology development. partner. This led to high-level architectural discussions A number of key elements were employed to drive the among the three companies during the summer of 2000. success of the Cell multiprocessor design. First, a holistic During a critical meeting in Tokyo, it was determined design approach was used, encompassing processor that traditional architectural organizations would not architecture, hardware implementation, system deliver the computational power that SCEI sought structures, and software programming models. -
Foundation Overview February 2014
OpenPOWER Overview May 2015 Keith Brown Director, IBM Systems Technical Strategy & Product Security [email protected] http://openpowerfoundation.org/ © 2015 OpenPOWER Foundation What is the OpenPOWER Ecosystem? Cloud Software Existing ISV community of 800+ Standard Operating Open Environment Source All major Linux distros (System Mgmt) Software Communities Operating Open sourced Power8 System / KVM firmware stack New OSS Firmware OpenPOWER Resources for porting and Firmware Community optimizing on Hardware OpenPOWER OpenPOWERFoundation.org Technology 2 © 2015 OpenPOWER Foundation A Fast Start for OpenPOWER! The year • Collaborative solutions, standards, and reference designs available • Independent members solutions and systems ahead • Sector growth in technical computing and cloud • Global growth with increasing depth in all layers • Broad adoption across hardware, software, and end users 3 © 2015 OpenPOWER Foundation Fueling an Open Development Community 4 © 2015 OpenPOWER Foundation Critical workloads run on Linux on Power Web, Java Apps and Infrastructure Analytics & Research HPC applications for Life Sciences • Highly threaded • Compute intensive • Throughput oriented • High memory bandwidth • Scale out capable • Floating point • High quality of service • High I/O rates Business Applications Database • High quality of service • Handle peak workloads • Scalability • Scalability • Flexible infrastructure • High quality of service • Large memory footprint • Resiliency and security 5 © 2015 OpenPOWER Foundation IBM, Mellanox, and NVIDIA -
POWER8: the First Openpower Processor
POWER8: The first OpenPOWER processor Dr. Michael Gschwind Senior Technical Staff Member & Senior Manager IBM Power Systems #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 OpenPOWER is about choice in large-scale data centers The choice to The choice to The choice to differentiate innovate grow . build workload • collaborative • delivered system optimized innovation in open performance solutions ecosystem • new capabilities . use best-of- • with open instead of breed interfaces technology scaling components from an open ecosystem Join the conversation at #OpenPOWERSummit Why Power and Why Now? . Power is optimized for server workloads . Power8 was optimized to simplify application porting . Power8 includes CAPI, the Coherent Accelerator Processor Interconnect • Building on a long history of IBM workload acceleration Join the conversation at #OpenPOWERSummit POWER8 Processor Cores • 12 cores (SMT8) 96 threads per chip • 2X internal data flows/queues • 64K data cache, 32K instruction cache Caches • 512 KB SRAM L2 / core • 96 MB eDRAM shared L3 • Up to 128 MB eDRAM L4 (off-chip) Accelerators • Crypto & memory expansion • Transactional Memory • VMM assist • Data Move / VM Mobility • Coherent Accelerator Processor Interface (CAPI) Join the conversation at #OpenPOWERSummit 4 POWER8 Core •Up to eight hardware threads per core (SMT8) •8 dispatch •10 issue •16 execution pipes: •2 FXU, 2 LSU, 2 LU, 4 FPU, 2 VMX, 1 Crypto, 1 DFU, 1 CR, 1 BR •Larger Issue queues (4 x 16-entry) •Larger global completion, Load/Store reorder queue •Improved branch prediction •Improved unaligned storage access •Improved data prefetch Join the conversation at #OpenPOWERSummit 5 POWER8 Architecture . High-performance LE support – Foundation for a new ecosystem . Organic application growth Power evolution – Instruction Fusion 1600 PowerPC .