University of Wisconsin-Madison CS/ECE 752 Advanced Computer Architecture I

Total Page:16

File Type:pdf, Size:1020Kb

University of Wisconsin-Madison CS/ECE 752 Advanced Computer Architecture I University of Wisconsin-Madison CS/ECE 752 Advanced Computer Architecture I Professor Matt Sinclair Unit 1: Technology, Cost, Performance, & Power Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood. Slides enhanced by Milo Martin, Mark Hill, and David Wood with sources that included Profs. Asanovic, Falsafi, Hoe, Lipasti, Sankaralingam, Shen, Smith, Sohi, Vijaykumar, and Wood CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 1 Announcements • Manually enrolled everyone in the course on Piazza • Using this for all announcements and questions moving forward • Poll: did everyone receive email sent to class mailing list? • Poll: who is interested in lecture from the Condor team? • Will post poll on Piazza • Lecture slides posted – see Course Schedule • Reminder: HW0 due next Wednesday CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 2 This Unit • What is a computer and what is computer architecture • Forces that shape computer architecture • Applications (covered last time) • Semiconductor technology • Evaluation metrics: parameters and technology basis • Cost • Performance • Power • Reliability CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 3 Readings • H&P • 1.4 – 1.7 (technology) • Paper • G. Moore, “Cramming More Components onto Integrated Circuits” CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 4 What is Computer Architecture? (review) • Design of interfaces and implementations… • Under constantly changing set of external forces… • Applications: change from above (discussed last time) • Technology: changes transistors characteristics from below • Inertia: resists changing all levels of system at once • To satisfy different constraints • This course mostly about performance • Cost • Power • Reliability • Iterative process driven by empirical evaluation • The art/science of tradeoffs (“it depends”) CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 5 Abstraction and Layering • Abstraction: only way of dealing with complex systems • Divide world into objects, each with an … • Interface: knobs, behaviors, knobs → behaviors • Implementation: “black box” (ignorance + apathy) • Specialists deal with implementations; others interface • Example: car drivers vs. mechanics • Layering: abstraction discipline makes life even easier • Removes need to even know interfaces of most objects • Divide objects in system into layers • Layer X objects • Implemented in terms of interfaces of layer X-1 objects • Don’t even need to know interfaces of layer X-2 objects CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 6 Abstraction, Layering, & Computers • Computers are complex systems, built in layers • Applications • O/S, compiler • Firmware, device drivers • Processor, memory, raw I/O devices • Digital circuits, digital/analog converters • Gates • Transistors • 99% of users don’t know hardware layers implementation • 90% of users don’t know implementation of any layer • That’s ok – computers still work just fine • But unfortunately, the layers sometimes breakdown • Someone needs to understand what’s “under the hood” CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 7 Gray box: peering through the layers • Layers of abstraction in a car • Interface (drivers): steering wheel, clutch, shift, brake • Implementation (mechanic): engine, fuel injection, transmission • But high-performance drivers know the torque curve • Achieve maximum performance • Similar exs. for computers Keep RPM in range where • Cache organization/locality torque is maximized • Pipeline scheduling/interlocks • Power users peek across layers CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 8 A Computer Architecture Picture Application OS Software Compiler Firmware Instruction Set Architecture (ISA) CPU I/O Memory Hardware Digital Circuits Gates & Transistors • Computer architecture • Definition of ISA to facilitate implementation of software layers • This course mostly on computer micro-architecture • Design CPU, Memory, I/O to implement ISA … CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 9 Technology Basis of Clock Frequency CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 10 Semiconductor Technology Background • Transistor (1947) Application • A key invention of 20th century OS • Fabrication Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 11 Shaping Force: Technology Drain • Basic technology element: MOSFET Gate Channel • MOS: metal-oxide-semiconductor • Conductor, insulator, semi-conductor • FET: field-effect transistor Source • Solid-state component acts like electrical switch • Channel conducts source → drain when voltage applied to gate • Channel length: characteristic parameter (short → fast) • Aka “feature size” or “technology” • Currently: 10-12 nm (0.010 – 0.012 micron) • Continued miniaturization (scaling) known as “Moore’s Law” • Won’t last forever, physical limits approaching (or are they?) CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 12 Transistor implementations are complex CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 13 Simple CMOS Interface • Voltage as values • Power (VDD) = 1, Ground = 0 Power (1) • Two kinds of MOSFETs • N-transistors p-transistor Input Output (“node”) • Conduct when gate voltage is 1 • Good at passing 0s n-transistor • P-transistors • Conduct when gate voltage is 0 • Good at passing 1s Ground (0) • CMOS: complementary n-/p-networks form boolean logic CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 14 CMOS Examples • Example I: inverter • Case I: input = 0 • P-transistor closed, n-transistor open • Power charges output (1) • Case II: input = 1 • P-transistor open, n-transistor closed • Output discharges to ground (0) • Example II: look at truth table • 0, 0 → 1 0, 1 → 1 • 1, 0 → 1 1, 1 → 0 • Result: this is a NAND (NOT AND) • NAND is universal (can build any logic function) CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 15 Transistor Speed, Power, and Reliability • Transistor characteristics and scaling impact • Switching speed • Power • Reliability • “Undergrad” gate delay model for architecture • Each Not, NAND, NOR, AND, OR gate has a delay of “1” • Reality: much more complex • Full answer requires 2nd order differential equations • Need something easier to reason about CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 16 Simple RC Delay Model • Switching time is a RC circuit (charge or discharge) 1 • R – Resistance: slows rate of current flow • Depends on material, length, cross-section area 1→0 • C – Capacitance: electrical charge storage • Depends on material, area, distance • Voltage affects speed, too 1 I 1→0 0→1 1→0 CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 17 Resistance • Transistor channel resistance 1 • Function of Vg (gate voltage) • Wire resistance 1→0 • Negligible for short wires 1 I 1→0 0→1 1→0 CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 18 Capacitance • Source/drain capacitance 1 • Gate capacitance • Wire capacitance 1→0 • Negligible for short wires 1 I 1→0 0→1 1→0 CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 19 RC Delay Model Ramifications • Want to reduce resistance • Wide drive transistors (width specified per device) 1 • Short gate length • Short wires • Want to reduce capacitance 1→0 • Number of connected devices • Less-wide transistors (gate capacitance 1 of next stage) • Short wires I 1→0 0→1 1→0 CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 20 Improving RC Delay • Exploit good effects of scaling • Fabrication technology improvements • Use copper instead of aluminum for wires (ρ↓ → Resistance ↓) CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 21 Transistors and Wires ©IBM From slides © Krste Asanović, MIT CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 22 Transistor Geometry: Width Length Gate Drain Source Gate Width Source Drain Width Bulk Si Length Diagrams © Krste Asanovic, MIT • Transistor width, set by designer for each transistor • Wider transistors: • Lower resistance of channel (increases drive strength) – good! • But, increases capacitance of gate/source/drain – bad! • Result: set width to balance these conflicting effects CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 23 Transistor Geometry: Length & Scaling Length Gate Drain Source Gate Width Source Drain Width Bulk Si Length Diagrams © Krste Asanovic, MIT • Transistor length: characteristic of “process generation” • 45nm refers to the transistor gate length, same for all transistors • Shrink transistor length: • Lower resistance of channel (shorter) – good! • Lower gate/source/drain capacitance – good! • Result: switching speed improves linearly as gate length shrinks CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 24 Wire Geometry Pitch Height Length Width IBM CMOS7, 6 layers of copper wiring • Transistors 1-dimensional for design purposes: width • Wires 4-dimensional: length, width, height, “pitch” • Longer wires have more resistance • “Thinner” wires have more resistance • Closer wire spacing (“pitch”) increases capacitance CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc.From slides © Krste Asanovic,
Recommended publications
  • IBM Z Systems Introduction May 2017
    IBM z Systems Introduction May 2017 IBM z13s and IBM z13 Frequently Asked Questions Worldwide ZSQ03076-USEN-15 Table of Contents z13s Hardware .......................................................................................................................................................................... 3 z13 Hardware ........................................................................................................................................................................... 11 Performance ............................................................................................................................................................................ 19 z13 Warranty ............................................................................................................................................................................ 23 Hardware Management Console (HMC) ..................................................................................................................... 24 Power requirements (including High Voltage DC Power option) ..................................................................... 28 Overhead Cabling and Power ..........................................................................................................................................30 z13 Water cooling option .................................................................................................................................................... 31 Secure Service Container .................................................................................................................................................
    [Show full text]
  • 45-Year CPU Evolution: One Law and Two Equations
    45-year CPU evolution: one law and two equations Daniel Etiemble LRI-CNRS University Paris Sud Orsay, France [email protected] Abstract— Moore’s law and two equations allow to explain the a) IC is the instruction count. main trends of CPU evolution since MOS technologies have been b) CPI is the clock cycles per instruction and IPC = 1/CPI is the used to implement microprocessors. Instruction count per clock cycle. c) Tc is the clock cycle time and F=1/Tc is the clock frequency. Keywords—Moore’s law, execution time, CM0S power dissipation. The Power dissipation of CMOS circuits is the second I. INTRODUCTION equation (2). CMOS power dissipation is decomposed into static and dynamic powers. For dynamic power, Vdd is the power A new era started when MOS technologies were used to supply, F is the clock frequency, ΣCi is the sum of gate and build microprocessors. After pMOS (Intel 4004 in 1971) and interconnection capacitances and α is the average percentage of nMOS (Intel 8080 in 1974), CMOS became quickly the leading switching capacitances: α is the activity factor of the overall technology, used by Intel since 1985 with 80386 CPU. circuit MOS technologies obey an empirical law, stated in 1965 and 2 Pd = Pdstatic + α x ΣCi x Vdd x F (2) known as Moore’s law: the number of transistors integrated on a chip doubles every N months. Fig. 1 presents the evolution for II. CONSEQUENCES OF MOORE LAW DRAM memories, processors (MPU) and three types of read- only memories [1]. The growth rate decreases with years, from A.
    [Show full text]
  • Comparing the Power and Performance of Intel's SCC to State
    Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs Ehsan Totoni, Babak Behzad, Swapnil Ghike, Josep Torrellas Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA E-mail: ftotoni2, bbehza2, ghike2, [email protected] Abstract—Power dissipation and energy consumption are be- A key architectural challenge now is how to support in- coming increasingly important architectural design constraints in creasing parallelism and scale performance, while being power different types of computers, from embedded systems to large- and energy efficient. There are multiple options on the table, scale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the namely “heavy-weight” multi-cores (such as general purpose best use of exponentially increasing numbers of transistors within processors), “light-weight” many-cores (such as Intel’s Single- the power and energy budgets. Intel SCC is an appealing option Chip Cloud Computer (SCC) [1]), low-power processors (such for future many-core architectures. In this paper, we use various as embedded processors), and SIMD-like highly-parallel archi- scalable applications to quantitatively compare and analyze tectures (such as General-Purpose Graphics Processing Units the performance, power consumption and energy efficiency of different cutting-edge platforms that differ in architectural build. (GPGPUs)). These platforms include the Intel Single-Chip Cloud Computer The Intel SCC [1] is a research chip made by Intel Labs (SCC) many-core, the Intel Core i7 general-purpose multi-core, to explore future many-core architectures. It has 48 Pentium the Intel Atom low-power processor, and the Nvidia ION2 (P54C) cores in 24 tiles of two cores each.
    [Show full text]
  • Performance of a Computer (Chapter 4) Vishwani D
    ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Performance of a Computer (Chapter 4) Vishwani D. Agrawal & Victor P. Nelson epartment of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 1 What is Performance? Response time: the time between the start and completion of a task. Throughput: the total amount of work done in a given time. Some performance measures: MIPS (million instructions per second). MFLOPS (million floating point operations per second), also GFLOPS, TFLOPS (1012), etc. SPEC (System Performance Evaluation Corporation) benchmarks. LINPACK benchmarks, floating point computing, used for supercomputers. Synthetic benchmarks. ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 2 Small and Large Numbers Small Large 10-3 milli m 103 kilo k 10-6 micro μ 106 mega M 10-9 nano n 109 giga G 10-12 pico p 1012 tera T 10-15 femto f 1015 peta P 10-18 atto 1018 exa 10-21 zepto 1021 zetta 10-24 yocto 1024 yotta ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 3 Computer Memory Size Number bits bytes 210 1,024 K Kb KB 220 1,048,576 M Mb MB 230 1,073,741,824 G Gb GB 240 1,099,511,627,776 T Tb TB ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 4 Units for Measuring Performance Time in seconds (s), microseconds (μs), nanoseconds (ns), or picoseconds (ps). Clock cycle Period of the hardware clock Example: one clock cycle means 1 nanosecond for a 1GHz clock frequency (or 1GHz clock rate) CPU time = (CPU clock cycles)/(clock rate) Cycles per instruction (CPI): average number of clock cycles used to execute a computer instruction.
    [Show full text]
  • Technology & Energy
    This Unit: Technology & Energy • Technology basis • Fabrication (manufacturing) & cost • Transistors & wires CIS 501: Computer Architecture • Implications of transistor scaling (Moore’s Law) • Energy & power Unit 3: Technology & Energy Slides'developed'by'Milo'Mar0n'&'Amir'Roth'at'the'University'of'Pennsylvania'' with'sources'that'included'University'of'Wisconsin'slides' by'Mark'Hill,'Guri'Sohi,'Jim'Smith,'and'David'Wood' CIS 501: Comp. Arch. | Prof. Milo Martin | Technology & Energy 1 CIS 501: Comp. Arch. | Prof. Milo Martin | Technology & Energy 2 Readings Review: Simple Datapath • MA:FSPTCM • Section 1.1 (technology) + 4 • Section 9.1 (power & energy) Register Data Insn File PC s1 s2 d Mem • Paper Mem • G. Moore, “Cramming More Components onto Integrated Circuits” • How are instruction executed? • Fetch instruction (Program counter into instruction memory) • Read registers • Calculate values (adds, subtracts, address generation, etc.) • Access memory (optional) • Calculate next program counter (PC) • Repeat • Clock period = longest delay through datapath CIS 501: Comp. Arch. | Prof. Milo Martin | Technology & Energy 3 CIS 501: Comp. Arch. | Prof. Milo Martin | Technology & Energy 4 Recall: Processor Performance • Programs consist of simple operations (instructions) • Add two numbers, fetch data value from memory, etc. • Program runtime = “seconds per program” = (instructions/program) * (cycles/instruction) * (seconds/cycle) • Instructions per program: “dynamic instruction count” • Runtime count of instructions executed by the program • Determined by program, compiler, instruction set architecture (ISA) • Cycles per instruction: “CPI” (typical range: 2 to 0.5) • On average, how many cycles does an instruction take to execute? • Determined by program, compiler, ISA, micro-architecture Technology & Fabrication • Seconds per cycle: clock period, length of each cycle • Inverse metric: cycles per second (Hertz) or cycles per ns (Ghz) • Determined by micro-architecture, technology parameters • This unit: transistors & semiconductor technology CIS 501: Comp.
    [Show full text]
  • Computer “Performance”
    Computer “Performance” Readings: 1.6-1.8 BIPS (Billion Instructions Per Second) vs. GHz (Giga Cycles Per Second) Throughput (jobs/seconds) vs. Latency (time to complete a job) Measuring “best” in a computer Hyper 3.0 GHz The PowerBook G4 outguns Pentium Pipelined III-based notebooks by up to 30 percent.* Technology * Based on Adobe Photoshop tests comparing a 500MHz PowerBook G4 to 850MHz Pentium III-based portable computers 58 Performance Example: Homebuilders Builder Time per Houses Per House Dollars Per House Month Options House Self-build 24 months 1/24 Infinite $200,000 Contractor 3 months 1 100 $400,000 Prefab 6 months 1,000 1 $250,000 Which is the “best” home builder? Homeowner on a budget? Rebuilding Haiti? Moving to wilds of Alaska? Which is the “speediest” builder? Latency: how fast is one house built? Throughput: how long will it take to build a large number of houses? 59 Computer Performance Primary goal: execution time (time from program start to program completion) 1 Performance ExecutionTime To compare machines, we say “X is n times faster than Y” Performance ExecutionTime n x y Performancey ExecutionTimex Example: Machine Orange and Grape run a program Orange takes 5 seconds, Grape takes 10 seconds Orange is _____ times faster than Grape 60 Execution Time Elapsed Time counts everything (disk and memory accesses, I/O , etc.) a useful number, but often not good for comparison purposes CPU time doesn't count I/O or time spent running other programs can be broken up into system time, and user time Example: Unix “time” command linux15.ee.washington.edu> time javac CircuitViewer.java 3.370u 0.570s 0:12.44 31.6% Our focus: user CPU time time spent executing the lines of code that are "in" our program 61 CPU Time CPU execution time CPU clock cycles =*Clock period for a program for a program CPU execution time CPU clock cycles 1 =* for a program for a program Clock rate Application example: A program takes 10 seconds on computer Orange, with a 400MHz clock.
    [Show full text]
  • Trends in Processor Architecture
    A. González Trends in Processor Architecture Trends in Processor Architecture Antonio González Universitat Politècnica de Catalunya, Barcelona, Spain 1. Past Trends Processors have undergone a tremendous evolution throughout their history. A key milestone in this evolution was the introduction of the microprocessor, term that refers to a processor that is implemented in a single chip. The first microprocessor was introduced by Intel under the name of Intel 4004 in 1971. It contained about 2,300 transistors, was clocked at 740 KHz and delivered 92,000 instructions per second while dissipating around 0.5 watts. Since then, practically every year we have witnessed the launch of a new microprocessor, delivering significant performance improvements over previous ones. Some studies have estimated this growth to be exponential, in the order of about 50% per year, which results in a cumulative growth of over three orders of magnitude in a time span of two decades [12]. These improvements have been fueled by advances in the manufacturing process and innovations in processor architecture. According to several studies [4][6], both aspects contributed in a similar amount to the global gains. The manufacturing process technology has tried to follow the scaling recipe laid down by Robert N. Dennard in the early 1970s [7]. The basics of this technology scaling consists of reducing transistor dimensions by a factor of 30% every generation (typically 2 years) while keeping electric fields constant. The 30% scaling in the dimensions results in doubling the transistor density (doubling transistor density every two years was predicted in 1975 by Gordon Moore and is normally referred to as Moore’s Law [21][22]).
    [Show full text]
  • Advanced Computer Architecture Lecture No. 30
    Advanced Computer Architecture-CS501 ________________________________________________________ Advanced Computer Architecture Lecture No. 30 Reading Material Vincent P. Heuring & Harry F. Jordan Chapter 8 Computer Systems Design and Architecture 8.3.3, 8.4 Summary • Nested Interrupts • Interrupt Mask • DMA Nested Interrupts (Read from Book, Jordan Page 397) Interrupt Mask (Read from Book, Jordan Page 397) Priority Mask (Read from Book, Jordan Page 398) Examples Example # 123 Assume that three I/O devices are connected to a 32-bit, 10 MIPS CPU. The first device is a hard drive with a maximum transfer rate of 1MB/sec. It has a 32-bit bus. The second device is a floppy drive with a transfer rate of 25KB/sec over a 16-bit bus, and the third device is a keyboard that must be polled thirty times per second. Assuming that the polling operation requires 20 instructions for each I/O device, determine the percentage of CPU time required to poll each device. Solution: The hard drive can transfer 1MB/sec or 250 K 32-bit words every second. Thus, this hard drive should be polled using at least this rate. Using 1K=210, the number of CPU instructions required would be 250 x 210 x 20 = 5120000 instructions per second. 23 Adopted from [H&P org] Last Modified: 01-Nov-06 Page 309 Advanced Computer Architecture-CS501 ________________________________________________________ Percentage of CPU time required for polling is (5.12 x 106)/ (10 x106) = 51.2% The floppy disk can transfer 25K/2= 12.5 x 210 half-words per second. It should be polled with at least this rate.
    [Show full text]
  • Xilinx Running the Dhrystone 2.1 Benchmark on a Virtex-II Pro
    Product Not Recommended for New Designs Application Note: Virtex-II Pro Device R Running the Dhrystone 2.1 Benchmark on a Virtex-II Pro PowerPC Processor XAPP507 (v1.0) July 11, 2005 Author: Paul Glover Summary This application note describes a working Virtex™-II Pro PowerPC™ system that uses the Dhrystone benchmark and the reference design on which the system runs. The Dhrystone benchmark is commonly used to measure CPU performance. Introduction The Dhrystone benchmark is a general-performance benchmark used to evaluate processor execution time. This benchmark tests the integer performance of a CPU and the optimization capabilities of the compiler used to generate the code. The output from the benchmark is the number of Dhrystones per second (that is, the number of iterations of the main code loop per second). This application note describes a PowerPC design created with Embedded Development Kit (EDK) 7.1 that runs the Dhrystone benchmark, producing 600+ DMIPS (Dhrystone Millions of Instructions Per Second) at 400 MHz. Prerequisites Required Software • Xilinx ISE 7.1i SP1 • Xilinx EDK 7.1i SP1 • WindRiver Diab DCC 5.2.1.0 or later Note: The Diab compiler for the PowerPC processor must be installed and included in the path. • HyperTerminal Required Hardware • Xilinx ML310 Demonstration Platform • Serial Cable • Xilinx Parallel-4 Configuration Cable Dhrystone Developed in 1984 by Reinhold P. Wecker, the Dhrystone benchmark (written in C) was Description originally developed to benchmark computer systems, a short benchmark that was representative of integer programming. The program is CPU-bound, performing no I/O functions or operating system calls.
    [Show full text]
  • Introduction to Cpu
    microprocessors and microcontrollers - sadri 1 INTRODUCTION TO CPU Mohammad Sadegh Sadri Session 2 Microprocessor Course Isfahan University of Technology Sep., Oct., 2010 microprocessors and microcontrollers - sadri 2 Agenda • Review of the first session • A tour of silicon world! • Basic definition of CPU • Von Neumann Architecture • Example: Basic ARM7 Architecture • A brief detailed explanation of ARM7 Architecture • Hardvard Architecture • Example: TMS320C25 DSP microprocessors and microcontrollers - sadri 3 Agenda (2) • History of CPUs • 4004 • TMS1000 • 8080 • Z80 • Am2901 • 8051 • PIC16 microprocessors and microcontrollers - sadri 4 Von Neumann Architecture • Same Memory • Program • Data • Single Bus microprocessors and microcontrollers - sadri 5 Sample : ARM7T CPU microprocessors and microcontrollers - sadri 6 Harvard Architecture • Separate memories for program and data microprocessors and microcontrollers - sadri 7 TMS320C25 DSP microprocessors and microcontrollers - sadri 8 Silicon Market Revenue Rank Rank Country of 2009/2008 Company (million Market share 2009 2008 origin changes $ USD) Intel 11 USA 32 410 -4.0% 14.1% Corporation Samsung 22 South Korea 17 496 +3.5% 7.6% Electronics Toshiba 33Semiconduc Japan 10 319 -6.9% 4.5% tors Texas 44 USA 9 617 -12.6% 4.2% Instruments STMicroelec 55 FranceItaly 8 510 -17.6% 3.7% tronics 68Qualcomm USA 6 409 -1.1% 2.8% 79Hynix South Korea 6 246 +3.7% 2.7% 812AMD USA 5 207 -4.6% 2.3% Renesas 96 Japan 5 153 -26.6% 2.2% Technology 10 7 Sony Japan 4 468 -35.7% 1.9% microprocessors and microcontrollers
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • AI Acceleration – Image Processing As a Use Case
    AI acceleration – image processing as a use case Masoud Daneshtalab Associate Professor at MDH www.idt.mdh.se/~md/ Outline • Dark Silicon • Heterogeneouse Computing • Approximation – Deep Neural Networks 2 The glory of Moore’s law Intel 4004 Intel Core i7 980X 2300 transistors 1.17B transistors 740 kHz clock 3.33 GHz clock 10um process 32nm process 10.8 usec/inst 73.4 psec/inst Every 2 Years • Double the number of transistors • Build higher performance general-purpose processors ⁻ Make the transistors available to masses ⁻ Increase performance (1.8×↑) ⁻ Lower the cost of computing (1.8×↓) Semiconductor trends ITRS roadmap for SoC Design Complexity Trens ITRS: International Technology Roadmap for Semiconductors Expected number of processing elements into a System-on-Chip (SoC). 4 Semiconductor trends ITRS roadmap for SoC Design Complexity Trens 5 Semiconductor trends Network-on-Chip (NoC) –based Multi/Many-core Systems Founder: Andreas Olofsson Sponsored by Ericsson AB 6 Dark Silicon Era The catch is powering exponentially increasing number of transistors without melting the chip down. 10 10 Chip Transistor Count 2,200,000,000 109 Chip Power 108 107 106 105 2300 104 103 130 W 102 101 0.5 W 100 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 If you cannot power them, why bother making them? 7 Dark Silicon Era The catch is powering exponentially increasing number of transistors without melting the chip down. 10 10 Chip Transistor Count 2,200,000,000 109 Chip Power 108 107 Dark Silicon 106 105 (Utilization Wall) 2300 104 Fraction of transistors that need to be 103 powered off at all times due to power 130 W 2 constraints.
    [Show full text]