CO200 - Computer Organization & Architecture

Total Page:16

File Type:pdf, Size:1020Kb

CO200 - Computer Organization & Architecture CO200 - Computer Organization & Architecture Basavaraj Talawar [email protected] Course Syllabus ● Processor Basics – CPU organization, Data representation and Instruction Sets ● Datapath Design – Fixed point arithmetic – Adders, Subtracters, Multipliers, Dividers. – ALU, Floating point arithmetic ● Control Design – Hardwired control, Microprogrammed control, Pipeline control ● Memory Organization – Serial vs. Random Access Memories – Caches, Virtual Memory ● Principles of Pipelining ● Principles of Parallel Computing Course Structure ● Textbooks – J P Hayes, Computer Architecture and Organization, 3 ed., McGraw Hill. – Hwang and Briggs, Computer Architecture and Parallel Processing, McGraw Hill. – D Patterson and J Hennessy, Computer Organization and Architecture, MK, 3 ed. ● Other References – NPTEL course on “High Performance Computing” by Matthew Jacob, IISc. ● Guest Lectures ● About Course – Surprise Quizzes – 15%, Assignments – 10%, Mid Sem – 25%, Final Exam – 50% Course Objectives ● To understand how a computer works ● To know the architecture and working of components inside a computer – Processor, Control unit, ALU, Memory, I/O Course Objectives – Expanded ● How is a machine language program executed by a computer? ● How does the software instruct the hardware to perform a desired action? How does the hardware instruct a desired unit to perform its corresponding operation? ● Why study all of this? – To gain insight into the setting in which our programs execute – To improve the setting in which our programs execute – to improve the performance of the system What is a Computer? What is a Computer? ● An electronic device which is capable of receiving information (data) in a particular form and of performing a sequence of operations in accordance with a predetermined but variable set of procedural instructions (program) to produce a result in the form of information or signals. Basic Computer Organization ● Machine instructions – Description of a primitive operation that a machine hardware is able to understand – In binary – Example of a 32b machine language instruction 00110011101100000100001110101011 Basic Computer Organization ● Instruction Set – Complete specification of all the kinds of instructions that the processor hardware was built to execute – Eg.: ADD, SUB, XOR, JUMP, … ● How are programs written in high level languages such as C translated into a language that the machine understands? The Computer Program ● Description of algorithms and data structures to achieve a specific objective ● A compiler translates the high level language into assembly language. ● An assembler translates the assembly into machine code. Basic Computer Organization ● Processor – Executes programs ● Main Memory – Holds program and data ● I/O – For communication and data Processor (CPU) ALU REGISTERS MEMORY CONTROL BUS I/O I/O I/O I/O Inside the Processor ● Control Hardware: Hardware to manage instruction execution ● ALU: Arithmetic and Logical Unit (hardware to do arithmetic and logic operations) ● Registers: Small units of memory to hold data/instructions temporarily during execution ● Memory: Stores information being processed by the CPU ● Input: Allows the user to supply information to the computer ● Output: Allows the user to receive information from the computer Physics in the Real World Computer Architecture Application Algorithm Computer architecture is the design of the Programming Language abstraction/implement Operating System/Virtual Machines ation layers that allow Instruction Set Architecture us to execute Organization/Microarchitecture information Register-Transfer Level processing Gates applications efficiently using Circuits manufacturing Devices technologies Physics Architecture vs. Organization ● Architecture/Instruction Set Architecture (ISA) – Programmer visible state (Memory & Registers) – Operations (Instructions and how they work) – Input/Output – Data Representation – Types/Sizes ● Microarchitecture/Organization: – Is the way a given ISA is implemented on a processor Same Architecture, Different Organizations ● AMD Athlon II X4 ● Intel Atom – X86 ISA – X86 Instruction Set – Quad Core, 2.9GHz, 125W – Single Core, 1.6GHz, 2W – 3 Instructions/Cycle/Core – 2 Instructions/Cycle/Core – 64KB L1Cache, 512KB L2 – 32KB/24KB L1 I/D Cache, 512KB Cache L2 Cache Different Architectures, Organizations ● AMD Vishera ● IBM POWER 8 – X86 ISA – Power ISA – 8 Core, 4.7 GHz, 125W – 12 cores, 4.5GHz, 250W – 64KB L1Cache, 2MB L2 – 64KB L1Cache, 512KB L2 Cache, 8MB L3 Cache, 8MB L3. Recap ● What is a Computer? ● Computer Organization and Architecture – Registers, Control Unit, ALU, Memory, I/O, Bus ● ISA, Machine language ● Organization vs. Architecture Coming up … ● Processor Performance ● Machine Models Concept of Time and Speed ● Frequency: Number of occurrences of a repeating event per unit time. – SI unit: Hertz (Hz) ● The period is the duration of one cycle in a repeating event – Period = Cycle time 1 Cycle Time= Frequency On Processor Performance ● How is frequency related to performance? Program ExecutionTime= Execution Time per Instruction×Total Program Instructions CPU Time=Execution Time per Instruction×InstructionCount Execution Time per Instruction= Cycles spent per Instruction×Cycle Time CPU Time= IC×Cycles per Instruction×CycleTime ExampleExample WhatWhat isis thethe executionexecution timetime ofof aa programprogram containingcontaining aa millionmillion InstructionsInstructions eacheach occupyingoccupying 44 cyclescycles inin aa 22 GHzGHz processor?processor? Iron Law of Processor Performance CPU Time= IC ×Cycles per Instruction×CycleTime 1 Time per Cycle= Frequency IC×CPI CPU Time= Frequency Instructions Clock cycles Seconds CPU time= ∗ ∗ Program Instruction Clock cycle On Processor Performance Instructions Clock cycles Seconds CPU time= ∗ ∗ Program Instruction Clock cycle ARCHITECTURE AND COMPILER ORGANIZATION The GNU C Compiler ● $gcc hello.c The compiler and its working: Guest lecture by Dr. Janakiraman, IBM, August 2 Operations and Operands ● C = A + B ● Operation: Addition. Operands: A & B. Result: C. ● Instruction: ADD C, A, B Where do Operands come from and where do results go? Architectural decision Memory – Toy Example 0x0000 ... ... ● ... Byte addressable ... ● Linearly increasing addresses 0x00FF 0x0100 ● Memory is 'growing down' 0x0101 0x0102 ● Any location can be read ... ... ... from/written into. ... ... ● ... How many bytes can be stored ... ... ... in this example memory? ... ... ... ... 0xFFFE ... ... 0xFFFF Recap ● Processor performance ● Abstract view of Memory ExampleExample YourYour desktopdesktop hashas aa 4GB4GB Memory.Memory. HowHow longlong (in(in bits)bits) isis itsits address?address? Operations and Operands ... Register File i1 i2 R O S S E C O Control R ALU P ... ... ... Memory ... Machine Model – Stack ● STACK Stack is a form of memory 0xFF ● Top of the Stack (Stack Pointer) 0xFE ... ... ● ... Push and Pop ... ... ... ... ... TOS ... ... ... 0x02 0x01 0x00 Stack STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ... ... 255 0x07 POP 13 ... ... PUSH 7 ... ... 77... 0x10 ... ... ... ... 44 0x12 ... 172 0x13 0x06 ... ... 0x05 0x04 0x03 ... 0x02 71 TOS 0x01 94 TOS 0x02 0x00 10 Stack STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ... ... 255 0x07 POP 13 ... ... PUSH 7 ... ... 77... 0x10 ... ... ... ... 44 0x12 172 0x13 0x06 ... ... ... 0x05 ... 0x04 0x03 ...77 TOS 0x02 71 TOS 0x01 94 TOS 0x020x03 0x00 10 Stack STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ... ... 255 0x07 POP 13 ... ... PUSH 7 ... ... 77... 0x10 ... ... ... ... 44 0x12 172 0x13 0x06 ... 0x05 0x04 44 0x03 77 TOS 0x02 71 0x01 94 TOS 0x030x04 0x00 10 Stack STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ... ... 255 0x07 POP 13 ... ... PUSH 7 ... ... 77... 0x10 ... ... ... ... 44 0x12 17244 0x13 0x06 ... 0x05 0x04 44 TOS 0x03 77 TOS 0x02 71 0x01 94 TOS 0x040x03 0x00 10 Stack STACK MEMORY 0xFF ... PUSH 10 PUSH 12 0xFE ... ... 255 0x07 POP 13 ... ... PUSH 7 ... ... 77... 0x10 ... ... ... ... 44 0x12 44 0x13 0x06 ... 0x05 0x04 255 TOS 0x03 44 0x02 71 0x01 94 TOS 0x04 0x00 10 Machine Model – Stack STACK STACK R O S S ... ... E ... C O R ... P ... TOS ... ... ... ... ALU ... ... ... Y R O M ... E ... M ... ... TOS Where do Operands come from and where do results go? Machine Model – Stack STACK ● R The operands are always TOS, O S S TOS – 1. E C ... O TOS R ● P Result always goes into TOS – 1. ● Implicit operands ● ALU Instruction: ADD ● Example equation: d=(a+b)*c ... ... ... ... Postfix Expressions a + b ab+ (a + b)*c X*c Xc* where X = (a + b) postfix form of (a + b) is ab+ ab+c* Postfix Expressions a + (b*c) abc*+ (a + b)* (c - d) X * (c – d) where X = (a + b) X * Y XY* where Y = (c – d) replace Y with its postfix form Xcd-* replace X with its postfix form (a + b)* (c - d) ab+cd-* (((a + b)*c)+d)*e ((X*c)+d)*e where X = (a + b) (Y+d)*e where Y = (X*c) Z*e Ze* where Z = (Y+d) replace Z with its postfix form Yd+e* replace Y with its postfix form Xc*d+e* replace X with its postfix form ab+c*d+e* Reverse Polish Notation ● A way of expressing arithmetic expressions that avoids the use of brackets. ● Evaluated left-to-right. Natural on a stack. ● Devised by the Polish philosopher and mathematician Jan Łukasiewicz (1878-1956) Infix Notation RPN a+b ab+ (a+b)*c ab+c* a+(b*c) abc*+ (a+b) * (c-d) ab+cd-* (((a+b)*c)+d)*e ab+c*d+e* RPN Example Stack Postfix Form: ab+ ... ... ... ... ... ... ... ... ... ... ... a RPN Example Stack Postfix Form: ab+ ... ... ... ... ... ... ... ... ... ... ... b a RPN Example Stack Postfix Form: ab+ ... ... ... ... ... ..
Recommended publications
  • Robust Architectural Support for Transactional Memory in the Power Architecture
    Robust Architectural Support for Transactional Memory in the Power Architecture Harold W. Cain∗ Brad Frey Derek Williams IBM Research IBM STG IBM STG Yorktown Heights, NY, USA Austin, TX, USA Austin, TX, USA [email protected] [email protected] [email protected] Maged M. Michael Cathy May Hung Le IBM Research IBM Research (retired) IBM STG Yorktown Heights, NY, USA Yorktown Heights, NY, USA Austin, TX, USA [email protected] [email protected] [email protected] ABSTRACT in current p795 systems, with 8 TB of DRAM), as well as On the twentieth anniversary of the original publication [10], strengths in RAS that differentiate it in the market, adding following ten years of intense activity in the research lit- TM must not compromise any of these virtues. A robust erature, hardware support for transactional memory (TM) system is one that is sturdy in construction, a trait that has finally become a commercial reality, with HTM-enabled does not usually come to mind in respect to HTM systems. chips currently or soon-to-be available from many hardware We structured TM to work in harmony with features that vendors. In this paper we describe architectural support for support the architecture's scalability. Our goal has been to TM provide a comprehensive programming environment includ- TM added to a future version of the Power ISA . Two im- ing support for simple system calls and debug aids, while peratives drove the development: the desire to complement providing a robust (in the sense of "no surprises") execu- our weakly-consistent memory model with a more friendly tion environment with reasonably consistent performance interface to simplify the development and porting of multi- and without unexpected transaction failures.2 TM must be threaded applications, and the need for robustness beyond usable throughout the system stack: in hypervisors, oper- that of some early implementations.
    [Show full text]
  • Modelling the Armv8 Architecture, Operationally: Concurrency and ISA
    Modelling the ARMv8 Architecture, Operationally: Concurrency and ISA Shaked Flur1 Kathryn E. Gray1 Christopher Pulte1 Susmit Sarkar2 Ali Sezgin1 Luc Maranget3 Will Deacon4 Peter Sewell1 1 University of Cambridge, UK 2 University of St Andrews, UK 3 INRIA, France 4 ARM Ltd., UK [email protected] [email protected] [email protected] [email protected] Abstract Keywords Relaxed Memory Models, semantics, ISA In this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to 1. Introduction clarify what the range of architecturally allowable behaviour is, and The ARM architecture is the specification of a wide range of pro- thereby to support future work on formal verification, analysis, and cessors: cores designed by ARM that are integrated into devices testing of concurrent ARM software and hardware. produced by many other vendors, and cores designed ab initio by Establishing such models with high confidence is intrinsically ARM architecture partners, such as Nvidia and Qualcomm. The difficult: it involves capturing the vendor’s architectural intent, as- architecture defines the properties on which software can rely on, pects of which (especially for concurrency) have not previously identifying an envelope of behaviour that all these processors are been precisely defined. We therefore first develop a concurrency supposed to conform to. It is thus a central interface in the industry, model with a microarchitectural flavour, abstracting from many between those hardware vendors and software developers. It is also hardware implementation concerns but still close to hardware- a desirable target for software verification and analysis: software designer intuition.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Outline What Makes a Good ISA? Programmability Implementability
    Outline What Makes a Good ISA? • Instruction Sets in General • Programmability • MIPS Assembly Programming • Easy to express programs efficiently? • Other Instruction Sets • Implementability • Goals of ISA Design • Easy to design high-performance implementations (i.e., microarchitectures)? • RISC vs. CISC • Intel x86 (IA-32) • Compatibility • Easy to maintain programmability as languages and programs evolve? • Easy to maintain implementability as technology evolves? © 2012 Daniel J. Sorin 66 © 2012 Daniel J. Sorin 67 from Roth and Lebeck from Roth and Lebeck Programmability Implementability • Easy to express programs efficiently? • Every ISA can be implemented • For whom? • But not every ISA can be implemented well • Human • Bad ISA bad microarchitecture (slow, power-hungry, etc.) • Want high-level coarse-grain instructions • As similar to HLL as possible • We’d like to use some of these high-performance • This is the way ISAs were pre-1985 implementation techniques • Compilers were terrible, most code was hand-assembled • Pipelining, parallel execution, out-of-order execution • Compiler • We’ll discuss these later in the semester • Want low-level fine-grain instructions • Compiler can’t tell if two high-level idioms match exactly or not • Certain ISA features make these difficult • This is the way most post-1985 ISAs are • Variable length instructions • Optimizing compilers generate much better code than humans • Implicit state (e.g., condition codes) • ICQ: Why are compilers better than humans? • Wide variety of instruction formats © 2012 Daniel J. Sorin 68 © 2012 Daniel J. Sorin 69 from Roth and Lebeck from Roth and Lebeck Compatibility Compatibility in the Age of VMs • Few people buy new hardware … if it means they have to • Virtual machine (VM) : piece of software that emulates buy new software, too behavior of hardware platform • Intel was the first company to realize this • Examples: VMWare, Xen, Simics • ISA must stay stable, no matter what (microarch.
    [Show full text]
  • Appendix K Survey of Instruction Set Architectures
    K.1 Introduction K-2 K.2 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers K-3 K.3 The Intel 80x86 K-30 K.4 The VAX Architecture K-50 K.5 The IBM 360/370 Architecture for Mainframe Computers K-69 K.6 Historical Perspective and References K-75 K Survey of Instruction Set Architectures RISC: any computer announced after 1985. Steven Przybylski A Designer of the Stanford MIPS K-2 ■ Appendix K Survey of Instruction Set Architectures K.1 Introduction This appendix covers 10 instruction set architectures, some of which remain a vital part of the IT industry and some of which have retired to greener pastures. We keep them all in part to show the changes in fashion of instruction set architecture over time. We start with eight RISC architectures, using RISC V as our basis for compar- ison. There are billions of dollars of computers shipped each year for ARM (includ- ing Thumb-2), MIPS (including microMIPS), Power, and SPARC. ARM dominates in both the PMD (including both smart phones and tablets) and the embedded markets. The 80x86 remains the highest dollar-volume ISA, dominating the desktop and the much of the server market. The 80x86 did not get traction in either the embed- ded or PMD markets, and has started to lose ground in the server market. It has been extended more than any other ISA in this book, and there are no plans to stop it soon. Now that it has made the transition to 64-bit addressing, we expect this architecture to be around, although it may play a smaller role in the future then it did in the past 30 years.
    [Show full text]
  • Power Struggles: Revisiting the RISC Vs. CISC Debate On
    Appears in the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013) 1 Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures Emily Blem, Jaikrishnan Menon, and Karthikeyan Sankaralingam University of Wisconsin - Madison fblem,menon,[email protected] Abstract design complexity were previously the primary constraints, en- ergy and power constraints now dominate. Third, from a com- RISC vs. CISC wars raged in the 1980s when chip area and mercial standpoint, both ISAs are appearing in new markets: processor design complexity were the primary constraints and ARM-based servers for energy efficiency and x86-based mo- desktops and servers exclusively dominated the computing land- bile and low power devices for higher performance. Thus, the scape. Today, energy and power are the primary design con- question of whether ISA plays a role in performance, power, or straints and the computing landscape is significantly different: energy efficiency is once again important. growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC Related Work: Early ISA studies are instructive, but miss ISA). Further, the traditionally low-power ARM ISA is enter- key changes in today’s microprocessors and design constraints ing the high-performance server market, while the traditionally that have shifted the ISA’s effect. We review previous com- high-performance x86 ISA is entering the mobile low-power de- parisons in chronological order, and observe that all prior com- vice market. Thus, the question of whether ISA plays an intrinsic prehensive ISA studies considering commercially implemented role in performance or energy efficiency is becoming important, processors focused exclusively on performance.
    [Show full text]
  • POWER® Instruction Set Architecture(ISA): Openpower Profile Revision 1.0.0 (2016-02-17) Copyright © 2016 Openpower Foundation
    www.openpowerfoundation.org POWER® Instruction Set February 17, 2016 Revision 1.0.0 Architecture(ISA) POWER® Instruction Set Architecture(ISA): OpenPOWER Profile Revision 1.0.0 (2016-02-17) Copyright © 2016 OpenPOWER Foundation All capitalized terms in the following text have the meanings assigned to them in the OpenPOWER Intellectual Property Rights Poli- cy (the "OpenPOWER IPR Policy"). The full Policy may be found at the OpenPOWER website or are available upon request. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise ex- plain it or assist in its implementation may be prepared, copied, published, and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this section are included on all such copies and derivative works. Howev- er, this document itself may not be modified in any way, including by removing the copyright notice or references to OpenPOWER, except as needed for the purpose of developing any document or deliverable produced by an OpenPOWER Work Group (in which case the rules applicable to copyrights, as set forth in the OpenPOWER IPR Policy, must be followed) or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by OpenPOWER or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis AND TO THE MAXIMUM EXTENT PERMIT- TED BY APPLICABLE LAW, THE OpenPOWER Foundation AS WELL AS THE AUTHORS AND DEVELOPERS OF THIS STAN- DARDS FINAL DELIVERABLE OR OTHER DOCUMENT HEREBY DISCLAIM ALL OTHER WARRANTIES AND CONDITIONS, EITHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES, DUTIES OR CONDITIONS OF MERCHANTABILITY, OF FITNESS FOR A PARTICULAR PURPOSE, OF ACCURACY OR COMPLETENESS OF RESPONSES, OF RESULTS, OF WORKMANLIKE EFFORT, OF LACK OF VIRUSES, OF LACK OF NEGLIGENCE OR NON- INFRINGEMENT.
    [Show full text]
  • The Design of Scalar AES Instruction Set Extensions for RISC-V
    The design of scalar AES Instruction Set Extensions for RISC-V Ben Marshall1, G. Richard Newell2, Dan Page1, Markku-Juhani O. Saarinen3 and Claire Wolf4 1 Department of Computer Science, University of Bristol {ben.marshall,daniel.page}@bristol.ac.uk 2 Microchip Technology Inc., USA [email protected] 3 PQShield, UK [email protected] 4 Symbiotic EDA [email protected] Abstract. Secure, efficient execution of AES is an essential requirement on most computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. RISC-V is a (relatively) new ISA that lacks such a standardised ISE. We survey the state-of-the-art industrial and academic ISEs for AES, implement and evaluate five different ISEs, one of which is novel. We recommend separate ISEs for 32 and 64-bit base architectures, with measured performance improvements for an AES-128 block encryption of 4× and 10× with a hardware cost of 1.1K and 8.2K gates respectivley, when compared to a software-only implementation based on use of T-tables. We also explore how the proposed standard bit-manipulation extension to RISC-V can be harnessed for efficient implementation of AES-GCM. Our work supports the ongoing RISC-V cryptography extension standardisation process. Keywords: ISE, AES, RISC-V 1 Introduction Implementing the Advanced Encryption Standard (AES). Compared to more general workloads, cryptographic algorithms like AES present a significant implementation chal- lenge. They involve computationally intensive and specialised functionality, are used in a wide range of contexts, and form a central target in a complex attack surface. The demand for efficiency (however measured) is an example of this challenge in two ways.
    [Show full text]
  • Altivec Technology Programming Environments Manual for Power ISA Processors
    AltiVec Technology Programming Environments Manual for Power ISA Processors ALTIVECPOWERISAPEM Rev 0 06/2014 How to Reach Us: Information in this document is provided solely to enable system and software Home Page: implementers to use Freescale products. There are no express or implied copyright freescale.com licenses granted hereunder to design or fabricate any integrated circuits based on the Web Support: information in this document. freescale.com/support Freescale reserves the right to make changes without further notice to any products herein. Freescale makes no warranty, representation, or guarantee regarding the suitability of its products for any particular purpose, nor does Freescale assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. “Typical” parameters that may be provided in Freescale data sheets and/or specifications can and do vary in different applications, and actual performance may vary over time. All operating parameters, including “typicals,” must be validated for each customer application by customer’s technical experts. Freescale does not convey any license under its patent rights nor the rights of others. Freescale sells products pursuant to standard terms and conditions of sale, which can be found at the following address: freescale.com/SalesTermsandConditions Freescale, the Freescale logo, AltiVec, C-5, CodeTest, CodeWarrior, ColdFire, C-Ware, Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony, and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, ColdFire+, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, Ready Play, SafeAssure, SMARTMOS, TurboLink, Vybrid, and Xtrinsic are trademarks of Freescale Semiconductor, Inc.
    [Show full text]
  • Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8
    Front cover Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8 Peter Bergner Bernard King Smith Brian Hall Julian Wang Alon Shalev Housfater Suresh Warrier Madhusudanan Kandasamy David Wendt Tulio Magno Alex Mericas Steve Munroe Mauricio Oliveira Bill Schmidt Will Schmidt Redbooks International Technical Support Organization Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8 August 2015 SG24-8171-01 Note: Before using this information and the product it supports, read the information in “Notices” on page ix. Second Edition (August 2015) This edition pertains to IBM Power Systems servers based on IBM Power Systems processor-based technology, including but not limited to IBM POWER8 processor-based systems. Specific software levels and firmware levels that are used are noted throughout the text. © Copyright International Business Machines Corporation 2014, 2015. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . ix Trademarks . .x IBM Redbooks promotions . xi Preface . xiii Authors. xiii Now you can become a published author, too! . xvii Comments welcome. xvii Stay connected to IBM Redbooks . xvii Summary of changes. xix August 2015, Second Edition. xix Chapter 1. Optimization and tuning on IBM POWER8 processor-based systems . 1 1.1 Introduction . 2 1.2 Outline of this guide . 2 1.3 Conventions that are used in this guide . 5 1.4 Background . 5 1.5 Optimizing performance on POWER8 processor-based systems. 6 1.5.1 Lightweight tuning and optimization guidelines. 7 1.5.2 Deployment guidelines .
    [Show full text]
  • Xilinx Logicore IP Virtex-5 APU Floating-Point Unit (V1.01A), Data Sheet
    LogiCORE IP Virtex-5 APU Floating-Point Unit (v1.01a) DS693 March 1, 2011 Product Specification Introduction LogiCORE IP Facts Table The Virtex®-5 Auxiliary Processor Unit (APU) Core Specifics Floating-Point Unit is an optimized FPU designed for Supported Virtex-5 the PowerPC® 440 embedded microprocessor of the Device Family Virtex-5 FXT FPGA family. The FPU implementation Supported User APU Interfaces provides support for IEEE-754 floating-point arithmetic Resources operations in single or double precision. Resources LUT-Reg DSP Blocks Block The FPU is not Power ISA compliant and does not Used Pairs RAMs support every instruction defined by the PowerPC Single 2620 3 0 processor instruction set architecture. The Double 4950 13 0 double-precision FPU configuration will support any -1 (high speed) 200 MHz compiler that allows graphics instructions (fsel, fres and -1 (low latency) 140 MHz frsqrte) to be disabled, which makes it compatible with Clock Speed -2 (high speed) 225 MHz all compilers and operating systems that currently -2 (low latency) 160 MHz support the Virtex-5 FXT family. Provided with Core The FPU is tightly coupled to the PowerPC processor Documentation Product Specification core with the APU interface. Software applications can Design Files VHDL use native PowerPC processor floating-point instructions to achieve typical speedups of 6x over Example Design Not Provided software emulation. Test Bench Not Provided Constraints File Automatically generated UCF Simulation Features Model VHDL • Compatible with the IEEE-754 standard for single- Tested Design Tools and double-precision floating-point arithmetic, Design Entry 13.1 EDK with minor and documented exceptions Tools • Decodes and executes standard PowerPC Simulation ModelSim PE/SE 6.6c or higher processor floating-point instructions Synthesis Tools XST • Optimized for 2:1 and 3:1 APU:CPU clock ratios, Support allowing PowerPC processor to operate at Provided by Xilinx, Inc.
    [Show full text]
  • ECE 152 / 496 Introduction to Computer Architecture Instruction Set Architecture (ISA) Benjamin C
    ECE 152 / 496 Introduction to Computer Architecture Instruction Set Architecture (ISA) Benjamin C. Lee Duke University Slides from Daniel Sorin (Duke) and are derived from work by Amir Roth (Penn) and Alvy Lebeck (Duke) Spring 2013 © 2012 Daniel J. Sorin from Roth and Lebeck Instruction Set Architecture (ISA) Application • ISAs in General OS • Using MIPS as primary example Compiler Firmware • MIPS Assembly Programming CPU I/O • Other ISAs Memory Digital Circuits Gates & Transistors © 2012 Daniel J. Sorin 2 from Roth and Lebeck Readings • Patterson and Hennessy • Chapter 2 • Read this chapter as if you’d have to teach it • Appendix A (reference for MIPS instructions and SPIM) • Read as much of this chapter as you feel you need © 2012 Daniel J. Sorin 3 from Roth and Lebeck What Is an ISA? • ISA • The “contract” between software and hardware • If software does X, hardware promises to do Y • Functional definition of operations, modes, and storage locations supported by hardware • Precise description of how software can invoke and access them • Strictly speaking, ISA is the architecture, i.e., the interface between the hardware and the software • Less strictly speaking, when people talk about architecture, they’re also talking about how the the architecture is implemented © 2012 Daniel J. Sorin 4 from Roth and Lebeck How Would You Design an ISA? • What kind of interface should the hardware present to the software? • Types of instructions? • Instruction representation? • How do we get from instruction 1 to 2 (or to 7 instead)? • Software’s view of storage? Where do variables live? • Does the hardware help to support function/method calls? If so, how? • Should the hardware support other features that are specific to certain HLLs (e.g., garbage collection for Java)? © 2012 Daniel J.
    [Show full text]