The Powerpc 405 Core, Contact an IBM Microelectronics Sales Representative

Total Page:16

File Type:pdf, Size:1020Kb

The Powerpc 405 Core, Contact an IBM Microelectronics Sales Representative The PowerPC 405TM Core IBM Microelectronics Division Research Triangle Park, NC 27709 11/2/98 Overview The PowerPC 405 CPU Core is a new addition to the 32-bit RISC PowerPC Embedded Processor family. The 405 Core possesses all of the qualities necessary to make system-on-a-chip designs a reality. This core occupies a small die area, consumes minimal power, and provides a high performance 100% PowerPC compatible platform capable of taming the most demanding embedded applications. Target Applications The 405 Core enables high performance designs in which low cost, low power and versatility are the critical selection criteria. Target market segments for the 405 core include: • Consumer video applications including digital cameras, video games, set-top boxes, and soft modems • Portable products such as cellular phones, PDAs, and handheld GPS receivers • Office automation products such as printer, X-terminals, and fax machines • Networking and storage products such as disk drive controllers, routers, ATM switches, high performance modems, and network cards Typical Application A typical system on a chip design with the 405 Core uses a two level bus structure for system level communication. High bandwidth peripherals and the 405 Core communicate with one another over the processor local bus (PLB). Less demanding peripherals share the on-chip peripheral bus (OPB) and communicate to the PLB through the OPB Bridge. The PLB and OPB provide common interfaces for peripherals and enable quick turnaround, custom solutions for high volume applications. Figure 2 shows an example 405 Core-based system on a chip, illustrating the two-level bus structure and modular core-based design. External Bus Interface MMU Ext. Bus PCI RAM/ROM/ Bridge SDRAM Master Ctr PCMCIA PowerPC Controller 405 CPU Controller ATM 622 SAR I-Cache Code Decompression D-Cache MAL USB Timers P L B JTAG Trace Ethernet OPB DMA PowerPC 405x3 Bridge Controller Embedded Core Arbiter OPB OCM Interrupt Arbiter Control Control APU 2 IC GPIO UART UART HDLC HDLC HDLC 4KB HDLC SRAM Figure 2. Example 405x3 core+ASIC PowerPC 405 Core Page 2 of 14 11/2/98 Specifications • Technology: 0.25 µm CMOS process (0.18 µm Leff) • Frequency: 0-200MHz (200MHz at WC process, 85C, 2.3V) – 276MHz nominal • Performance: 228 Dhrystone 2.1 MIPS@200MHz (est.) • Supply Voltage: 2.5V • Die Size: 2.0 mm2 for CPU only (est.) • Power (typ.): 400mW @ 200MHz (est.), CPU only • 3.04 mW/MHz with 16KB ICU/8KB DCU (est.) • Qualified operating range –40C to 125C, 2.3V to 2.7V Embedded Design Support The 405 Core, as a member of the PowerPC 400 Series, is supported by the IBM PowerPC Embedded ToolsTM program, in which over 80 third party vendors have combined with IBM to provide a complete tools solution. Development tools for the 405 Core include C/C++ compilers, debuggers, bus functional models and real-time operating systems. As part of the tools program, IBM maintains a complete set of development tools by offering the High C/C++ Compiler, RISCWatchTM debugger with RISCTraceTM, OS OpenTM real-time operating system, VHDL and Verilog simulation models and a 405 Core reference design kit. PowerPC 405 Core Page 3 of 14 11/2/98 405 CPU Core Organization PLB Master Instruction Interface OCM MMU 405 CPU Timers I-Cache I-Cache 3-Element (FIT, Array Controller Fetch Fetch PIT, Instruction Shadow and Queue Watchdog) Instruction TLB Decode (PFB1, Cache (4 Entry) Logic PFB0, Unit DCD) Timers & Cache Units Unified TLB (64 Entry) Debug Data Debug Logic Cache Unit Data Shadow Execute Unit (EXU) (4 IAC, TLB 2 DAC, (8 Entry) D-Cache D-Cache 32x32 2 DVC) Array ALU MAC Controller GPR PLB Master Data APU/FPU JTAG Instruction Interface OCM Trace Figure 1. PowerPC 405 CPU core block diagram 405 CPU The 405 CPU operates on instructions in a five stage pipeline consisting of a fetch, decode, execute, write- back, and load write-back stage. A five-stage pipeline increases throughput and makes operating at 200 MHz or more possible. Figure 1 contains a high-level block diagram of the 405 CPU core. Fetch and Decode Logic The fetch and decode logic maintains a steady flow of instructions to the execute unit by placing up to two instructions in the fetch queue. The fetch queue consists of three buffers: pre-fetch buffer 1 (PFB1), pre- fetch buffer 0 (PFB0) and decode (DCD). When the queue is empty, instructions flow directly into the decode stage. Static branch prediction as implemented on the 405 CPU core takes advantage of some standard statistical properties of code. Branches with negative address displacement are by default assumed taken. Branches that do not test the condition or count registers are also predicted as taken. The 405 CPU core bases branch prediction upon these default conditions when a branch is not resolved and speculatively fetches along the predicted path. The default prediction can be overridden by software at assembly or compile time. Branches are examined in the decode and pre-fetch buffer 0 fetch queue stages. Two branch instructions can be handled simultaneously. If the branch in decode is not taken, the fetch logic fetches along the predicted path of the branch instruction in pre-fetch buffer 0. If the branch in decode is taken, the fetch logic ignores the branch instruction in pre-fetch buffer 0. PowerPC 405 Core Page 4 of 14 11/2/98 Execution Unit The 405 CPU core has a single issue execute unit which contains the register file, arithmetic logic unit (ALU) and the multiply-accumulate (MAC) unit.. The execution unit performs all 32-bit PowerPC integer instructions in hardware and is compliant with the IBM PowerPC Embedded Environment specification. Table 1 lists instruction latencies and throughputs. The register file is comprised of thirty-two 32 bit general purpose registers (GPR) which are accessed with three read ports and two write ports. During the decode stage, data is read out of the GPR’s and feed to the execute unit. Likewise, during the write-back stage, results are written to the GPR. The use of the five port on the register file enables either a load or a store operation to execute in parallel with an ALU operation. Instruction Latency Throughput (Clock Cycles) (Clock Cycles) Arithmetic 1 - Traps 2 - Logical 1 - Shifts/rotates 1 - Multiply two sign extended 16 bit half words in 32 bit registers 2- (16 x 16 W 32) Multiply half word (16 x 16 W 32) 2 1a Multiply one sign extended 16 bit half word in a 32 bit register with a 32a 32 bit word (16 x 32 W least significant 32 bits of a 48 bit product) Multiply (32 x 32 W least significant 32 bits of a 64 bit product) 4 3a Multiply (32 x 32 W least significant 32 bits of a 64 bit product) with 54a overflow detection Multiply (32 x 32 W most significant 32 bits of 64 bit product) with 54a or without overflow detection enabled MAC (16 x 16 + 32 W 32) 2 1a Divide 35 - Loadsb 1b 1b Load multiples/strings (caches hit) 1/transfer - Stores 1 - Store multiples/strings (caches hit) 1/transfer - Move to/from Device Control Register 3 - Move to/from Special Purpose Register 1 - Branch known taken 1d or 2e - Branch known not taken 1 - Taken branch predicted takenc 1d or 2e - Taken branch predicted not takenc 2e or 3f - Not taken branch predicted takenc 2e or 3f - Not taken branch predicted not-takenc 1- Notes: a. Throughput applies when no dependencies exist between the result of the multiplication and the source of the subsequent multiplication. However, the target accumulator from one MAC instruction may be used as the source accumulator for the next MAC instruction. b. Load instructions have a 1 cycle use penalty. Data acquired during the execute pipeline stage is not available until after the completion of the write-back pipeline stage. Write-back takes one clock cycle. c. These cycles are for conditional branches that have a dependency. d. The branch is resolved during the fetch pipeline stage while in PFB0. e. The branch is resolved during the decode pipeline stage while in DCD. f. The branch is resolved during the execute pipeline stage. PowerPC 405 Core Page 5 of 14 11/2/98 Table 1. Instruction Latencies and Throughputs The MAC unit is an auxiliary processor unit (APU) which adds 9 operations to the 405 CPU core instruction set. MAC instructions operate on either signed or unsigned 16 bit operands and accumulate the results in a 32 bit GPR. All MAC unit instructions have a single cycle throughput. Table 2 lists the six MAC operations and three halfword multiply operations supported by the MAC unit. RA and RB in Table 2 are the 16 bit operands and RT is the 32-bit accumulate source and target. Most of the instructions have four flavors. Instructions may be unsigned, signed, modulo or saturate. Modulo and saturate determine the behavior of the instruction when the accumulated value overflows. Description Operation Instruction mnemonic multiply accumulate high Prod0:31 U (RA)0:15 x (RB)0:15 machhw – modulo signed halfword to word RT U Prod0:31 + (RT) machhwu - modulo unsigned machhws – saturate signed machhwsu - saturate unsigned multiply accumulate low Prod0:31 U (RA)16:31 x (RB)16:31 maclhw - modulo signed halfword to word RT U Prod0:31 + (RT) maclhwu - modulo unsigned maclhws - saturate signed maclhwsu - saturate unsigned multiply accumulate cross Prod0:31 U (RA)16:31 x (RB)0:15 macchw - modulo signed halfword to word RT U Prod0:31 + (RT) macchwu - modulo unsigned macchws - saturate signed macchwsu - saturate unsigned negative multiply accumulate Prod0:31 U (RA)0:15 x (RB)0:15 nmachhw – modulo signed high halfword to word RT U - Prod0:31 + (RT) nmachhws – saturate signed negative multiply accumulate Prod0:31 U (RA)16:31 x (RB)16:31 nmaclhw – modulo signed low halfword to word RT U - Prod0:31 + (RT) nmaclhws – saturate signed negative multiply accumulate Prod0:31 U (RA)0:15 x (RB)0:15 nmacchw – modulo signed cross halfword to word RT U - Prod0:31 + (RT) nmacchws – saturate signed multiply high halfword to word RTU (RA)0:15 x (RB)0:15 mulhhw – modulo signed mulhhwu – modulo unsigned multiply low halfword to word RT U (RA)16:31 x (RB)16:31 mullhw – modulo signed mullhwu - modulo unsigned multiply cross halfword to word RT U (RA)16:31 x (RB)0:15 mulchw – modulo signed mulchwu -modulo unsigned Table 2.
Recommended publications
  • PPC400 Debugger C++ and JAVAC++ Aswell Asthe Debugging of the TPU
    BDM-PPC400 Technical Information Technical PPC400 Debugger ■ Full HLL and ASM support available ■ Batch processing ■ Supports ELF/DWARF format ■ 3.3 volt support ■ Support for internal triggers ■ Break on code or data ■ Unlimited software breakpoints ■ Fast download (ETHERNET or PARALLEL) up to 600KB/sec The debugger for IBM PowerPC 400 family allows fast access to the BDM interface of the chip. Up to 450 KByte can be downloaded in 1 second. The systems supports C, C++ and JAVA as well as the debugging of the TPU. BDM-PPC400 14.06.13 TRACE32 - Technical Information 2 Features ❏ Active and Passive JTAG ❏ Variable Debug Clock Speed Debugger available ■ 10 kHz...5 MHz ■ 1/4 CPU Clock ■ 1/8 CPU Clock ❏ Software Compatible to In-Circuit ■ Variable up to 100 MHz (Pow- Emulator and Monitor erDebug only) ■ Operation System ■ PRACTICE ❏ ■ ASM Debugger Tr igger ■ HLL Debugger for C and C++ ■ Input from PODBUS ■ Peripheral Windows ■ Output to PODBUS ❏ High-Speed Download ❏ Support for EPROM/FLASH ■ Up to 450 KByte/sec Simulator ■ Breakpoints in ROM Area ■ 8, 16 and 32 Bit EPROM/ FLASH Emulation BDM-PPC400 Features TRACE32 - Technical Information 3 Connector Connector Type stanard 100 mil connector (BETRG, AMP, etc.) Connector 16 pin Signal Pin Pin Signal TDO 1 2 N/C TDI 3 4 TRST- (*) N/C 5 6 VCCS TCK 7 8 N/C TMS 9 10 N/C HALT- 11 12 N/C N/C 13 14 KEY N/C 15 16 GND BDM-PPC400 Connector TRACE32 - Technical Information 4 Operation Voltage Operation Voltage This list contains information on probes available for other voltage ranges.
    [Show full text]
  • IBM Power System POWER8 Facts and Features
    IBM Power Systems IBM Power System POWER8 Facts and Features April 29, 2014 IBM Power Systems™ servers and IBM BladeCenter® blade servers using IBM POWER7® and POWER7+® processors are described in a separate Facts and Features report dated July 2013 (POB03022-USEN-28). IBM Power Systems™ servers and IBM BladeCenter® blade servers using IBM POWER6® and POWER6+™ processors are described in a separate Facts and Features report dated April 2010 (POB03004-USEN-14). 1 IBM Power Systems Table of Contents IBM Power System S812L 4 IBM Power System S822 and IBM Power System S822L 5 IBM Power System S814 and IBM Power System S824 6 System Unit Details 7 Server I/O Drawers & Attachment 8 Physical Planning Characteristics 9 Warranty / Installation 10 Power Systems Software Support 11 Performance Notes & More Information 12 These notes apply to the description tables for the pages which follow: Y Standard / Supported Optional Optionally Available / Supported N/A or - Not Available / Supported or Not Applicable SOD Statement of General Direction announced SLES SUSE Linux Enterprise Server RHEL Red Hat Enterprise Linux a One x8 PCIe slots must contain a 4-port 1Gb Ethernet LAN available for client use b Use of expanded function storage backplane uses one PCIe slot Backplane provides dual high performance SAS controllers with 1.8 GB write cache expanded up to 7.2 GB with c compression plus Easy Tier function plus two SAS ports for running an EXP24S drawer d Full benchmark results are located at ibm.com/systems/power/hardware/reports/system_perf.html e Option is supported on IBM i only through VIOS.
    [Show full text]
  • Chapter 2-1: Cpus
    Chapter 2-1: CPUs Soo-Ik Chae © 2007 Elsevier 1 Topics CPU metrics. Categories of CPUs. CPU mechanisms. High Performance Embedded Computing © 2007 Elsevier 2 Performance as a design metric Performance = speed: Latency. Throughput. Average vs. peak performance. Worst-case and best- case performance. High Performance Embedded Computing © 2007 Elsevier 3 Other metrics Cost (area). Energy and p ower. Predictability: important for embedded systems Pipelining: branch penalty. Memory system (Cache) : cache miss penalty Security: difficult to measure because of the fact that we do not know of a successful attack. High Performance Embedded Computing © 2007 Elsevier 4 Flyyypnn’s taxonomy of processors Single-instruction single-data (SISD): RISC, etc. Single-instruction multiple-data (SIMD): all processors perform the same operations. Multiple-instruction multiple-data (MIMD): homogeneou s or heterogeneou s multiprocessor. Multiple-instruction multiple data (MISD). High Performance Embedded Computing © 2007 Elsevier 5 Other axes of comparison RISC. Emphasis on software Sing le-cyclilittile, simple instructions Register to register: LOAD" and "STORE“ are independent instructions Low cycles per second, Large code sizes Spends more transistors on memory registers CISC. Emphasis on hardware multi-cycle, complex instructions Memory-to-memory: LOAD" and "STORE“ incorporated in instructions High cycles per second Small code sizes Transistors used for storing complex instructions High Performance Embedded Computing © 2007 Elsevier 6 RISC CISC 1. 1-cycle simple instructions 1. multi-cycle complex instructions 2. only LD/ST can access memory 2. any instruction may access memory 3. designed around pipeline 3. designed around instn. set 4. instns. executed by h/w 4. instns interpreted by micro-program 5.
    [Show full text]
  • Coverstory by Robert Cravotta, Technical Editor
    coverstory By Robert Cravotta, Technical Editor u WELCOME to the 31st annual EDN Microprocessor/Microcontroller Di- rectory. The number of companies and devices the directory lists continues to grow and change. The size of this year’s table of devices has grown more than NEW PROCESSOR OFFERINGS 25% from last year’s. Also, despite the fact that a number of companies have disappeared from the list, the number of companies participating in this year’s CONTINUE TO INCLUDE directory has still grown by 10%. So what? Should this growth and change in the companies and devices the directory lists mean anything to you? TARGETED, INTEGRATED One thing to note is that this year’s directory has experienced more compa- ny and product-line changes than the previous few years. One significant type PERIPHERAL SETS THAT SPAN of change is that more companies are publicly offering software-programma- ble processors. To clarify this fact, not every company that sells processor prod- ALL ARCHITECTURE SIZES. ucts decides to participate in the directory. One reason for not participating is that the companies are selling their processors only to specific customers and are not yet publicly offering those products. Some of the new companies par- ticipating in this year’s directory have recently begun making their processors available to the engineering public. Another type of change occurs when a company acquires another company or another company’s product line. Some of the acquired product lines are no longer available in their current form, such as the MediaQ processors that Nvidia acquired or the Triscend products that Arm acquired.
    [Show full text]
  • IBM POWER8 High-Performance Computing Guide: IBM Power System S822LC (8335-GTB) Edition
    Front cover IBM POWER8 High-Performance Computing Guide IBM Power System S822LC (8335-GTB) Edition Dino Quintero Joseph Apuzzo John Dunham Mauricio Faria de Oliveira Markus Hilger Desnes Augusto Nunes Rosario Wainer dos Santos Moschetta Alexander Pozdneev Redbooks International Technical Support Organization IBM POWER8 High-Performance Computing Guide: IBM Power System S822LC (8335-GTB) Edition May 2017 SG24-8371-00 Note: Before using this information and the product it supports, read the information in “Notices” on page ix. First Edition (May 2017) This edition applies to: IBM Platform LSF Standard 10.1.0.1 IBM XL Fortran v15.1.4 and v15.1.5 compilers IBM XLC/C++ v13.1.2 and v13.1.5 compilers IBM PE Developer Edition version 2.3 Red Hat Enterprise Linux (RHEL) 7.2 and 7.3 in little-endian mode © Copyright International Business Machines Corporation 2017. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . ix Trademarks . .x Preface . xi Authors. xi Now you can become a published author, too! . xiii Comments welcome. xiv Stay connected to IBM Redbooks . xiv Chapter 1. IBM Power System S822LC for HPC server overview . 1 1.1 IBM Power System S822LC for HPC server. 2 1.1.1 IBM POWER8 processor . 3 1.1.2 NVLink . 4 1.2 HPC system hardware components . 5 1.2.1 Login nodes . 6 1.2.2 Management nodes . 6 1.2.3 Compute nodes. 7 1.2.4 Compute racks . 7 1.2.5 High-performance interconnect.
    [Show full text]
  • POWER8: the First Openpower Processor
    POWER8: The first OpenPOWER processor Dr. Michael Gschwind Senior Technical Staff Member & Senior Manager IBM Power Systems #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 OpenPOWER is about choice in large-scale data centers The choice to The choice to The choice to differentiate innovate grow . build workload • collaborative • delivered system optimized innovation in open performance solutions ecosystem • new capabilities . use best-of- • with open instead of breed interfaces technology scaling components from an open ecosystem Join the conversation at #OpenPOWERSummit Why Power and Why Now? . Power is optimized for server workloads . Power8 was optimized to simplify application porting . Power8 includes CAPI, the Coherent Accelerator Processor Interconnect • Building on a long history of IBM workload acceleration Join the conversation at #OpenPOWERSummit POWER8 Processor Cores • 12 cores (SMT8) 96 threads per chip • 2X internal data flows/queues • 64K data cache, 32K instruction cache Caches • 512 KB SRAM L2 / core • 96 MB eDRAM shared L3 • Up to 128 MB eDRAM L4 (off-chip) Accelerators • Crypto & memory expansion • Transactional Memory • VMM assist • Data Move / VM Mobility • Coherent Accelerator Processor Interface (CAPI) Join the conversation at #OpenPOWERSummit 4 POWER8 Core •Up to eight hardware threads per core (SMT8) •8 dispatch •10 issue •16 execution pipes: •2 FXU, 2 LSU, 2 LU, 4 FPU, 2 VMX, 1 Crypto, 1 DFU, 1 CR, 1 BR •Larger Issue queues (4 x 16-entry) •Larger global completion, Load/Store reorder queue •Improved branch prediction •Improved unaligned storage access •Improved data prefetch Join the conversation at #OpenPOWERSummit 5 POWER8 Architecture . High-performance LE support – Foundation for a new ecosystem . Organic application growth Power evolution – Instruction Fusion 1600 PowerPC .
    [Show full text]
  • Powerpc 400 Series Caches: Programming and Coherency Issues
    PowerPC 400 Series Caches: Microcontroller Applications Programming and Coherency IBM Microelectronics Research Triangle Park, NC Issues [email protected] Version: 1.0 January 22, 1999 Abstract – The PowerPC™ instruction set provides opcodes that allow programmers to explicitly move items in or out of the processor’s instruction and data caches. This application note examines these cache control instructions in terms of initializing the caches at power-on and for enforcing coherency in systems with devices capable of directly reading and writing cacheable system memory. High performance processors, such as the IBM family of Embedded PowerPC Processors, require access to instructions and data at the clock rate of the processor. In most circumstances, external memory cannot provide this level of data throughput and on-chip memory cannot fulfill a system’s memory requirements. An effective solution is to exploit the locality of instruction and data accesses present in most programs and retain frequently accessed items in an on-chip cache memory. Typically, caches greatly increase performance while minimally affecting system and program design. However, there are methods of accessing memory that can cause on-chip caches and external memory to become incoherent, resulting in data errors and possible system failure. Key to preventing this type of problem is an understanding of what may cause these problems and how to write software to avoid them. Cache Structure The PPC40x series of Embedded Processors and Cores employ separate instruction and data caches. Separate caches provide a performance advantage by allowing simultaneous instruction and data accesses. Equivalent performance is possible through a dual-ported unified cache with the same total number of storage bits, but the unified cache has the disadvantage of occupying a larger amount of silicon.
    [Show full text]
  • Release History
    Release History TRACE32 Online Help TRACE32 Directory TRACE32 Index TRACE32 Technical Support ........................................................................................................... Release History ............................................................................................................................. 1 General Information ................................................................................................................... 4 Code 4 Release Information ................................................................................................................... 4 Software Release from 01-Feb-2021 5 Build 130863 5 Software Release from 01-Sep-2020 8 Build 125398 8 Software Release from 01-Feb-2020 11 Build 117056 11 Software Release from 01-Sep-2019 13 Build 112182 13 Software Release from 01-Feb-2019 16 Build 105499 16 Software Release from 01-Sep-2018 19 Build 100486 19 Software Release from 01-Feb-2018 24 Build 93173 24 Software Release from 01-Sep-2017 27 Build 88288 27 Software Release from 01-Feb-2017 32 Build 81148 32 Build 80996 33 Software Release from 01-Sep-2016 36 Build 76594 36 Software Release from 01-Feb-2016 39 Build 69655 39 Software Release from 01-Sep-2015 42 Build 65657 42 Software Release from 02-Feb-2015 45 Build 60219 45 Software Release from 01-Sep-2014 48 Build 56057 48 Software Release from 16-Feb-2014 51 ©1989-2021 Lauterbach GmbH Release History 1 Build 51144 51 Software Release from 16-Aug-2013 54 Build 50104 54 Software Release from 16-Feb-2013 56
    [Show full text]
  • A História Da Família Powerpc
    A História da família PowerPC ∗ Flavio Augusto Wada de Oliveira Preto Instituto de Computação Unicamp fl[email protected] ABSTRACT principal atingir a marca de uma instru¸c~ao por ciclo e 300 Este artigo oferece um passeio hist´orico pela arquitetura liga¸c~oes por minuto. POWER, desde sua origem at´eos dias de hoje. Atrav´es deste passeio podemos analisar como as tecnologias que fo- O IBM 801 foi contra a tend^encia do mercado ao reduzir ram surgindo atrav´esdas quatro d´ecadas de exist^encia da dr´asticamente o n´umero de instru¸c~oes em busca de um con- arquitetura foram incorporadas. E desta forma ´eposs´ıvel junto pequeno e simples, chamado de RISC (reduced ins- verificar at´eos dias de hoje como as tend^encias foram segui- truction set computer). Este conjunto de instru¸c~oes elimi- das e usadas. Al´emde poder analisar como as tendencias nava instru¸c~oes redundantes que podiam ser executadas com futuras na ´area de arquitetura de computadores seguir´a. uma combina¸c~ao de outras intru¸c~oes. Com este novo con- junto reduzido, o IBM 801 possuia metade dos circuitos de Neste artigo tamb´emser´aapresentado sistemas computacio- seus contempor^aneos. nais que empregam comercialmente processadores POWER, em especial os videogames, dado que atualmente os tr^es vi- Apesar do IBM 801 nunca ter se tornado um chaveador te- deogames mais vendidos no mundo fazem uso de um chip lef^onico, ele foi o marco de toda uma linha de processadores POWER, que apesar da arquitetura comum possuem gran- RISC que podemos encontrar at´ehoje: a linha POWER.
    [Show full text]
  • Real-Time Instruction Trace in the Powerpc 400 Family of Embedded
    IBM Microelectronics Real-Time Instruction Trace in Dept PDXA/Bldg 060 3039 Cornwallis Road the PowerPC 400 Family of Research Triangle Park, NC 27709 Embedded Controllers Version: 1.0 April 10, 1998 HISTORY Software development and debug time have become the critical factors in the embedded system development cycle. But the embedded controllers of today are far different animals from those of the past. Previously, controllers ran at speeds less than 10 MHz, and contained no caches. These controllers were built by only a few companies, and enjoyed a long product life. With a long product life, manufacturers could justify the cost of producing a special “bond-out” version of a chip. This bond-out chip contained additional signals not found in the production chip, and was attached to emulators and logic analyzers to perform debug and real-time trace. But because of their high pin-out and cost of manufacturing, it was cost prohibitive to ship them in a production product, and instead they were only used for initial system development and test. THE PROBLEM Today’s environment has changed dramatically. Now, in the 32-bit embedded RISC1 market, controllers are running in excess of 100 MHz, and contain large caches -- 16 KB and higher. There are a large number of suppliers making many different controllers with short product lives. Some manufacturers offer system-on-a-chip (also known as Core+ASIC2) embedded solutions, in which customers can choose the peripherals and packaging that surround the embedded RISC processor core. Meanwhile, the size and complexity of the code executed by these controllers have increased.
    [Show full text]
  • IBM POWER8 CPU Architecture
    POWER8 Jeff Stuecheli IBM Power Systems IBM Systems & Technology Group Development © 2013 International Business Machines Corporation 1 POWER7+ POWER7 2012 POWER6 2010 POWER5 2007 2004 45nm SOI 32nm SOI Technology 130nm SOI 65nm SOI eDRAM eDRAM Compute Cores 2 2 8 8 Threads SMT2 SMT2 SMT4 SMT4 Caching On-chip 1.9MB 8MB 2 + 32MB 2 + 80MB Off-chip 36MB 32MB None None Bandwidth Sust. Mem. 15GB/s 30GB/s 100GB/s 100GB/s Peak I/O 3GB/s 10GB/s 20GB/s 20GB/s © 2013 International Business Machines Corporation 2 POWER8 POWER7+ POWER7 2012 POWER6 2010 POWER5 2007 2004 45nm SOI 32nm SOI Technology 130nm SOI 65nm SOI eDRAM eDRAM Compute Cores 2 2 8 8 Today’s Threads SMT2 SMT2 SMT4 SMT4 Topic Caching On-chip 1.9MB 8MB 2 + 32MB 2 + 80MB Off-chip 36MB 32MB None None Bandwidth Sust. Mem. 15GB/s 30GB/s 100GB/s 100GB/s Peak I/O 3GB/s 10GB/s 20GB/s 20GB/s © 2013 International Business Machines Corporation 3 Leadership System Open System Performance Innovation Innovation • Increase core throughput • Higher capacity cache hierarchy • CAPI at single thread, SMT2, and highly threaded processor • Memory interface SMT4, and SMT8 level • Enhanced memory bandwidth, • Open system software • Large step in per socket capacity, and expansion performance • Flexible SMT • Enable more robust • Dynamic code optimization multi-socket scaling • Hardware-accelerated virtual memory management © 2013 International Business Machines Corporation 4 Technology • 22nm SOI, eDRAM, 15 ML 650mm2 Caches Cores • 512 KB SRAM L2 / core • 12 cores (SMT8) • 96 MB eDRAM shared L3 • 8 dispatch, 10 issue, Local SMP Links SMP Local • Up to 128 MB eDRAM L4 Accelerators 16 exec pipe Core Core Core Core Core Core (off-chip) • 2X internal data flows/queues L2 L2 L2 L2 L2 L2 Memory 8M L3 • Enhanced prefetching Region • Up to 230 GB/s • 64K data cache, Mem .
    [Show full text]
  • Powerpc 405 Embedded Processor Core User’S Manual
    PowerPC 405 Embedded Processor Core User’s Manual SA14-2339-04 Fifth Edition (December 2001) This edition of IBM PPC405 Embedded Processor Core User’s Manual applies to the IBM PPC405 32-bit embedded processor core, until otherwise indicated in new versions or application notes. The following paragraph does not apply to the United Kingdom or any country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS MANUAL “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions; therefore, this statement may not apply to you. IBM does not warrant that the products in this publication, whether individually or as one or more groups, will meet your requirements or that the publication or the accompanying product descriptions are error-free. This publication could contain technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or program(s) described in this publication at any time. It is possible that this publication may contain references to, or information about, IBM products (machines and programs), programming, or services that are not announced in your country. Such references or information must not be construed to mean that IBM intends to announce such IBM products, programming, or services in your country. Any reference to an IBM licensed program in this publication is not intended to state or imply that you can use only IBM’s licensed program.
    [Show full text]