Understanding IBM Pseries Performance and Sizing

Total Page:16

File Type:pdf, Size:1020Kb

Understanding IBM Pseries Performance and Sizing Understanding IBM pSeries Performance and Sizing Comprehend IBM RS/6000 and IBM ^ pSeries hardware architectures Get an overview of current industry benchmarks Understand how to size your system Nigel Trickett Tatsuhiko Nakagawa Ravi Mani Diana Gfroerer ibm.com/redbooks SG24-4810-01 International Technical Support Organization Understanding IBM ^ pSeries Performance and Sizing February 2001 Take Note! Before using this information and the product it supports, be sure to read the general information in Appendix A, “Special notices” on page 377. Second Edition (February 2001) This edition applies to IBM RS/6000 and IBM ^ pSeries as of December 2000, and Version 4.3.3 of the AIX operating system. This document was updated on January 24, 2003. Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 1997, 2001. All rights reserved. Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Contents Preface. 9 The team that wrote this redbook. 9 Comments welcome. 11 Chapter 1. Introduction . 1 Chapter 2. Background . 5 2.1 Performance of processors . 5 2.2 Hardware architectures . 6 2.2.1 RISC/CISC concepts . 6 2.2.2 Superscalar architecture: pipeline and parallelism . 7 2.2.3 Memory management . 11 2.2.4 PCI. 19 2.2.5 MP implementation specifics . 21 2.2.6 NUMA. 24 2.2.7 Logical partitioning (LPAR) . 26 2.3 AIX kernel . 28 2.3.1 Description . 28 2.3.2 Executable file formats . 29 2.3.3 Kernel and user mode . 30 2.3.4 I/O . 30 2.3.5 Context/Thread switches . 31 2.3.6 Virtual address space . 31 2.3.7 Demand paging . 32 2.3.8 Kernel scalability enhancements . 32 2.3.9 References. 32 2.4 64-bit architecture . 33 2.4.1 Concepts . 33 2.4.2 Addressability. 34 2.4.3 Advantages of 64-bit architecture . 35 2.4.4 Performance of 64-bit architecture . 36 2.4.5 Software considerations for 64-bit architecture . 37 2.4.6 64-bit operating system capabilities . 37 Chapter 3. IBM RS/6000 and IBM pSeries architectures . 39 3.1 POWER2 Super Chip . 39 3.2 POWER3 . 41 3.2.1 POWER3 execution core . 43 3.2.2 Memory access section . 44 3.2.3 POWER 3 II chip . 47 3.3 PowerPC . 50 © Copyright IBM Corp. 2001 3 3.3.1 PowerPC 604 and 604e . 50 3.3.2 Differences between 604 and 604e processors . 51 3.3.3 RS64 II processor. 52 3.3.4 RS64 III processor . 60 3.3.5 POWER4 . 72 Chapter 4. IBM RS/6000 and IBM pSeries products . 77 4.1 Symmetrical Multiprocessor (SMP). 77 4.1.1 Migrating to SMP . 77 4.1.2 Symmetrical Multiprocessor (SMP) concepts and architecture . 78 4.1.3 Software . 86 4.1.4 Scaling . 93 4.1.5 References. 98 4.2 Scalable POWERparallel (SP) . 98 4.2.1 Parallel architecture . 98 4.2.2 IBM SP (Scalable POWERparallel) system. 100 4.2.3 SP switch performance. 104 4.2.4 Shared disk components of Parallel System Support Programs 110 4.2.5 Sizing and configuring a control workstation . 113 4.2.6 Sizing and configuring an SP system . 115 4.2.7 Resources . 131 Chapter 5. Hardware. 133 5.1 Processors . 133 5.2 Memory . 134 5.2.1 Cache memory . 134 5.2.2 Addressing considerations . 135 5.2.3 Memory cycles . 140 5.2.4 Uniprocessor vs. symmetric multiprocessor memory cycles . 141 5.2.5 Miss rate penalty . 143 5.2.6 Effect of L2 cache. 144 5.2.7 Effect of processor speed . 146 5.3 Storage . 148 5.3.1 Performance view. 149 5.3.2 Levels of storage . 150 5.3.3 How an I/O request is processed . 151 5.3.4 How a disk works . 153 5.3.5 SCSI technology. 155 5.3.6 Serial Storage Architecture (SSA) . 160 5.3.7 RAID levels overview and performance considerations. 162 5.3.8 IBM Enterprise Storage Server (ESS). 167 5.3.9 Logical Volume Manager (LVM) concepts. 168 5.3.10 Raw logical volumes versus Journaled File Systems (JFS). 174 4 RS/6000 and IBM ^ pSeries Performance and Sizing 5.4 Asynchronous Communication adapters. 176 5.4.1 Terms used in serial communication. 176 5.4.2 Flow control . 178 5.4.3 Asynchronous adapter overview . 179 5.4.4 Evaluating asynchronous communications options . 179 5.4.5 Product selection considerations . 183 5.4.6 Topology considerations. 187 5.5 LAN/WAN Adapters . 187 5.5.1 Ethernet . 187 5.5.2 Token Ring. 190 5.5.3 Fibre Channel. 191 5.5.4 ATM . 192 5.5.5 General network tuning recommendations . 194 5.6 Graphics accelerators . 196 5.6.1 Currently available RS/6000 graphics accelerators. 199 5.6.2 IBM’s graphics workstations . ..
Recommended publications
  • IBM Systems Group
    IBM Systems Group IBM ~ p5 520 IBM ~ p5 550 Milan Patel Offering Manager, pSeries Entry Servers v 3.04 IBM Confidential © 2004 IBM Corporation This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to clients. IBM Systems Group Agenda ❧Learning Objectives ❧Offering Description – p5-520 1.65GHz (2 way) – p5-550 1.65GHz (2-4way) ❧Selling Scenarios ❧Pricing and Positioning – The Value Paks New – Portfolio Positioning – Market Positioning – Retention Positioning ❧Target Sectors and Applications ❧Speeds and Feeds © 2004 IBM Corporation page 2 Template Version 3.04 IBM Systems Group Field Skills & Education IBM Confidential IBM Systems Group Learning Objectives At the conclusion of this material, you should be able to: ❧ Articulate the key messages and value-prop of ~ p5-520, ~ p5-550 ❧ Identify the opportunities and target sectors for the ~ p5-520, ~ p5-550 ❧ Articulate the enhancements and benefits of the ~ p5-520, ~ p5-550 ❧ Explain how Advanced POWER™ virtualization can help to reduce costs and simplify customer environments © 2004 IBM Corporation page 3 Template Version 3.04 IBM Systems Group Field Skills & Education IBM Confidential IBM Systems Group Offerings Overview p5-520 and p5-550 © 2004 IBM Corporation page 4 Template Version 3.04 IBM Systems Group Field Skills & Education IBM Confidential IBM Systems Group IBM ~ p5 520 What is p5-520? ❧ p5-520 is a 2 way entry server that complements the p5-550 in the entry space ❧ p5-520 delivers - – Outstanding performance –
    [Show full text]
  • Understanding IBM RS/6000 Performance and Sizing February 1997
    SG24-4810-00 Understanding IBM RS/6000 Performance and Sizing February 1997 IBML International Technical Support Organization SG24-4810-00 Understanding IBM RS/6000 Performance and Sizing February 1997 Take Note! Before using this information and the product it supports, be sure to read the general information in Appendix A, “Special Notices” on page 297. First Edition (February 1997) This edition applies to IBM RS/6000 for use with the AIX Operating System Version 4. Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 045 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. Copyright International Business Machines Corporation 1997. All rights reserved. Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Contents Figures . ix Tables . xiii Preface . xv How This Redbook Is Organized ........................... xv The Team That Wrote This Redbook ........................ xvi Comments Welcome . xvii Chapter 1. Introduction . 1 1.1 Meaningless Indicators of Performance .................... 2 1.2 Meaningful Indicators of Performance ..................... 4 Chapter 2. Background . 5 2.1 Performance of Processors ............................ 5 2.2 Hardware Architecture ............................... 5 2.2.1 RISC/CISC Concepts . 6 2.2.2 Superscalar Architecture: Pipeline & Parallelism ............ 7 2.2.3 Memory Management . 8 2.2.4 MP Implementation Specifics ........................ 16 2.3 The Kernel . 18 2.3.1 Responsibilities .
    [Show full text]
  • Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important
    VOLUME 13, NUMBER 13 OCTOBER 6,1999 MICROPROCESSOR REPORT THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important by Keith Diefendorff company has decided to make a last-gasp effort to retain control of its high-end server silicon by throwing its consid- Not content to wrap sheet metal around erable financial and technical weight behind Power4. Intel microprocessors for its future server After investing this much effort in Power4, if IBM fails business, IBM is developing a processor it to deliver a server processor with compelling advantages hopes will fend off the IA-64 juggernaut. Speaking at this over the best IA-64 processors, it will be left with little alter- week’s Microprocessor Forum, chief architect Jim Kahle de- native but to capitulate. If Power4 fails, it will also be a clear scribed IBM’s monster 170-million-transistor Power4 chip, indication to Sun, Compaq, and others that are bucking which boasts two 64-bit 1-GHz five-issue superscalar cores, a IA-64, that the days of proprietary CPUs are numbered. But triple-level cache hierarchy, a 10-GByte/s main-memory IBM intends to resist mightily, and, based on what the com- interface, and a 45-GByte/s multiprocessor interface, as pany has disclosed about Power4 so far, it may just succeed. Figure 1 shows. Kahle said that IBM will see first silicon on Power4 in 1Q00, and systems will begin shipping in 2H01. Looking for Parallelism in All the Right Places With Power4, IBM is targeting the high-reliability servers No Holds Barred that will power future e-businesses.
    [Show full text]
  • Eserver P5 Capacity on Demand
    IBM Systems Group eServer p5 Capacity On Demand Dave Nypaver Marketing Manager, On Demand and Software © 2004 IBM Corporation This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers. IBM Systems Group Agenda Topics to be covered are: – Why Capacity on Demand is Value to Clients – Differences in Capacity on Demand between eServer p5 and pSeries – eServer p5 Capacity on Demand offering details © 2004 IBM Corporation page 2 IBM Systems Group Field Skills & Education IBM Systems Group IBM eServer pSeries and p5 Capacity on Demand After completed this topic, you should be able to: ❧ Explain why On Demand features are important to your clients ❧ Highlight eServer p5 Capacity on Demand abilities ❧ Describe differences between pSeries and eServer p5 on demand features page 3 © 2004 IBM Corporation page 3 IBM Systems Group Field Skills & Education IBM Systems Group Why is On Demand Important to you? ❧ Server capacity you need, when you need it ❧ Address your non-disruptive growth needs ❧ Build in flexibility to address spikes in demand ❧ Increased configuration flexibility ❧ Increased reliablity ❧ Deploy new services quickly ❧ Gain additional workload throughput and automated systems performance leveling Customer Capacity Growth Defer 70-80% of Temporary Capacity payment for on Demand inactive capacity… Permanent Capacity Upgrade Planned on Demand (CUoD) Actual *Capacity on Demand Features available on selected models of eServer p5 © 2004 IBM Corporation page 4 IBM
    [Show full text]
  • Examining the Viability of FPGA Supercomputing
    1 Examining the Viability of FPGA Supercomputing Stephen D. Craven and Peter Athanas Bradley Department of Electrical and Computer Engineering Virginia Polytechnic Institute and State University Blacksburg, VA 24061 USA email: {scraven,athanas}@vt.edu Abstract—For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) produces significant performance improvements over processors, leading some in academia and industry to call for the inclusion of FPGAs in supercomputing clusters. This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floating- point performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general High-Performance Computing (HPC). Index Terms— computational accelerator, digital arithmetic, Field programmable gate arrays, high- performance computing, supercomputers. I. INTRODUCTION Supercomputers have experienced a recent resurgence, fueled by government research dollars and the development of low-cost supercomputing clusters. Unlike the Massively Parallel Processor (MPP) designs found in Cray and CDC machines of the 70s and 80s, featuring proprietary processor architectures, many modern supercomputing clusters are constructed from commodity PC processors, significantly reducing procurement costs. In an effort to improve performance, several companies offer machines that place one or more FPGAs in each node of the cluster. Configurable logic devices, of which FPGAs are one example, permit the device’s hardware to be programmed multiple times after manufacture. A wide body of research over two decades has repeatedly demonstrated significant performance improvements for certain classes of applications when implemented within an FPGA’s configurable logic [1]. Applications well suited to speed-up by FPGAs typically exhibit massive parallelism and small integer or fixed-point data types.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • The Interactive Performance of SLIM: a Stateless, Thin-Client Architecture ✽ ✽ ✝ Brian K
    17th ACM Symposium on Operating Systems Principles (SOSP’99) Published as Operating Systems Review, 34(5):32–47, December 1999 The interactive performance of SLIM: a stateless, thin-client architecture ✽ ✽ ✝ Brian K. Schmidt , Monica S. Lam , J. Duane Northcutt ✽ Computer Science Department, Stanford University {bks, lam}@cs.stanford.edu ✝ Sun Microsystems Laboratories [email protected] Abstract 1 Introduction Taking the concept of thin clients to the limit, this paper Since the mid 1980’s, the computing environments of proposes that desktop machines should just be simple, many institutions have moved from large mainframe, stateless I/O devices (display, keyboard, mouse, etc.) that time-sharing systems to distributed networks of desktop access a shared pool of computational resources over a machines. This trend was motivated by the need to provide dedicated interconnection fabric — much in the same way everyone with a bit-mapped display, and it was made as a building’s telephone services are accessed by a possible by the widespread availability of collection of handset devices. The stateless desktop design high-performance workstations. However, the desktop provides a useful mobility model in which users can computing model is not without its problems, many of transparently resume their work on any desktop console. which were raised by the original UNIX designers[14]: This paper examines the fundamental premise in this “Because each workstation has private data, each system design that modern, off-the-shelf interconnection must be administered separately; maintenance is technology can support the quality-of-service required by difficult to centralize. The machines are replaced today’s graphical and multimedia applications.
    [Show full text]
  • Sun Ultratm 2 Workstation Just the Facts
    Sun UltraTM 2 Workstation Just the Facts Copyrights 1999 Sun Microsystems, Inc. All Rights Reserved. Sun, Sun Microsystems, the Sun Logo, Ultra, SunFastEthernet, Sun Enterprise, TurboGX, TurboGXplus, Solaris, VIS, SunATM, SunCD, XIL, XGL, Java, Java 3D, JDK, S24, OpenWindows, Sun StorEdge, SunISDN, SunSwift, SunTRI/S, SunHSI/S, SunFastEthernet, SunFDDI, SunPC, NFS, SunVideo, SunButtons SunDials, UltraServer, IPX, IPC, SLC, ELC, Sun-3, Sun386i, SunSpectrum, SunSpectrum Platinum, SunSpectrum Gold, SunSpectrum Silver, SunSpectrum Bronze, SunVIP, SunSolve, and SunSolve EarlyNotifier are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. OpenGL is a registered trademark of Silicon Graphics, Inc. UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company, Ltd. Display PostScript and PostScript are trademarks of Adobe Systems, Incorporated. DLT is claimed as a trademark of Quantum Corporation in the United States and other countries. Just the Facts May 1999 Sun Ultra 2 Workstation Figure 1. The Sun UltraTM 2 workstation Sun Ultra 2 Workstation Scalable Computing Power for the Desktop Sun UltraTM 2 workstations are designed for the technical users who require high performance and multiprocessing (MP) capability. The Sun UltraTM 2 desktop series combines the power of multiprocessing with high-bandwidth networking, high-performance graphics, and exceptional application performance in a compact desktop package. Users of MP-ready and multithreaded applications will benefit greatly from the performance of the Sun Ultra 2 dual-processor capability.
    [Show full text]
  • 18-741 Advanced Computer Architecture Lecture 1: Intro And
    18-742 Fall 2012 Parallel Computer Architecture Lecture 10: Multithreading II Prof. Onur Mutlu Carnegie Mellon University 9/28/2012 Reminder: Review Assignments Due: Sunday, September 30, 11:59pm. Mutlu, “Some Ideas and Principles for Achieving Higher System Energy Efficiency,” NSF Position Paper and Presentation 2012. Ebrahimi et al., “Parallel Application Memory Scheduling,” MICRO 2011. Seshadri et al., “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,” PACT 2012. Pekhimenko et al., “Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency,” CMU SAFARI Technical Report 2012. 2 Feedback on Project Proposals In your email General feedback points Concrete mechanisms, even if not fully right, is a good place to start testing your ideas 3 Last Lecture Asymmetry in Memory Scheduling Wrap up Asymmetry Multithreading Fine-grained Coarse-grained 4 Today More Multithreading 5 More Multithreading 6 Readings: Multithreading Required Spracklen and Abraham, “Chip Multithreading: Opportunities and Challenges,” HPCA Industrial Session, 2005. Kalla et al., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro 2004. Tullsen et al., “Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor,” ISCA 1996. Eyerman and Eeckhout, “A Memory-Level Parallelism Aware Fetch Policy for SMT Processors,” HPCA 2007. Recommended Hirata et al., “An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads,” ISCA 1992 Smith, “A pipelined, shared resource MIMD computer,” ICPP 1978. Gabor et al., “Fairness and Throughput in Switch on Event Multithreading,” MICRO 2006. Agarwal et al., “APRIL: A Processor Architecture for Multiprocessing,” ISCA 1990. 7 Review: Fine-grained vs.
    [Show full text]
  • Performance of a Computer (Chapter 4) Vishwani D
    ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Performance of a Computer (Chapter 4) Vishwani D. Agrawal & Victor P. Nelson epartment of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 1 What is Performance? Response time: the time between the start and completion of a task. Throughput: the total amount of work done in a given time. Some performance measures: MIPS (million instructions per second). MFLOPS (million floating point operations per second), also GFLOPS, TFLOPS (1012), etc. SPEC (System Performance Evaluation Corporation) benchmarks. LINPACK benchmarks, floating point computing, used for supercomputers. Synthetic benchmarks. ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 2 Small and Large Numbers Small Large 10-3 milli m 103 kilo k 10-6 micro μ 106 mega M 10-9 nano n 109 giga G 10-12 pico p 1012 tera T 10-15 femto f 1015 peta P 10-18 atto 1018 exa 10-21 zepto 1021 zetta 10-24 yocto 1024 yotta ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 3 Computer Memory Size Number bits bytes 210 1,024 K Kb KB 220 1,048,576 M Mb MB 230 1,073,741,824 G Gb GB 240 1,099,511,627,776 T Tb TB ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 4 Units for Measuring Performance Time in seconds (s), microseconds (μs), nanoseconds (ns), or picoseconds (ps). Clock cycle Period of the hardware clock Example: one clock cycle means 1 nanosecond for a 1GHz clock frequency (or 1GHz clock rate) CPU time = (CPU clock cycles)/(clock rate) Cycles per instruction (CPI): average number of clock cycles used to execute a computer instruction.
    [Show full text]
  • Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux
    Hindawi Publishing Corporation EURASIP Journal on Embedded Systems Volume 2008, Article ID 582648, 16 pages doi:10.1155/2008/582648 Research Article Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux Emiliano Betti,1 Daniel Pierre Bovet,1 Marco Cesati,1 and Roberto Gioiosa1, 2 1 System Programming Research Group, Department of Computer Science, Systems, and Production, University of Rome “Tor Vergata”, Via del Politecnico 1, 00133 Rome, Italy 2 Computer Architecture Group, Computer Science Division, Barcelona Supercomputing Center (BSC), c/ Jordi Girona 31, 08034 Barcelona, Spain Correspondence should be addressed to Roberto Gioiosa, [email protected] Received 30 March 2007; Accepted 15 August 2007 Recommended by Ismael Ripoll Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsive- ness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations. Copyright © 2008 Emiliano Betti et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. INTRODUCTION nificantly higher than that of single-core processors, we can expect that in a near future many embedded systems will This article describes a modified Linux kernel called ASMP- make use of multicore processors.
    [Show full text]
  • Power Architecture® ISA 2.06 Stride N Prefetch Engines to Boost Application's Performance
    Power Architecture® ISA 2.06 Stride N prefetch Engines to boost Application's performance History of IBM POWER architecture: POWER stands for Performance Optimization with Enhanced RISC. Power architecture is synonymous with performance. Introduced by IBM in 1991, POWER1 was a superscalar design that implemented register renaming andout-of-order execution. In Power2, additional FP unit and caches were added to boost performance. In 1996 IBM released successor of the POWER2 called P2SC (POWER2 Super chip), which is a single chip implementation of POWER2. P2SC is used to power the 30-node IBM Deep Blue supercomputer that beat world Chess Champion Garry Kasparov at chess in 1997. Power3, first 64 bit SMP, featured a data prefetch engine, non-blocking interleaved data cache, dual floating point execution units, and many other goodies. Power3 also unified the PowerPC and POWER Instruction set and was used in IBM's RS/6000 servers. The POWER3-II reimplemented POWER3 using copper interconnects, delivering double the performance at about the same price. Power4 was the first Gigahertz dual core processor launched in 2001 which was awarded the MicroProcessor Technology Award in recognition of its innovations and technology exploitation. Power5 came in with symmetric multi threading (SMT) feature to further increase application's performance. In 2004, IBM with 15 other companies founded Power.org. Power.org released the Power ISA v2.03 in September 2006, Power ISA v.2.04 in June 2007 and Power ISA v.2.05 with many advanced features such as VMX, virtualization, variable length encoding, hyper visor functionality, logical partitioning, virtual page handling, Decimal Floating point and so on which further boosted the architecture leadership in the market place and POWER5+, Cell, POWER6, PA6T, Titan are various compliant cores.
    [Show full text]