CIS 501 Computer Architecture This Unit Readings Review

Total Page:16

File Type:pdf, Size:1020Kb

CIS 501 Computer Architecture This Unit Readings Review This Unit • Technology basis • Transistors & wires • Cost & fabrication CIS 501 • Implications of transistor scaling (Moore’s Law) Computer Architecture Unit 3: Technology Slides originally developed by Amir Roth with contributions by Milo Martin at University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood. CIS 501 (Martin/Roth): Technology 1 CIS 501 (Martin/Roth): Technology 2 Readings Review: Simple Datapath • Chapter 1.1 of MA:FSPTCM + 4 • Paper Register • G. Moore, “Cramming More Components onto Integrated Circuits” Data Insn File PC s1 s2 d Mem Mem • How are instruction executed? • Fetch instruction (Program counter into instruction memory) • Read registers • Calculate values (adds, subtracts, address generation, etc.) • Access memory (optional) • Calculate next program counter (PC) • Repeat • Clock period = longest delay through datapath CIS 501 (Martin/Roth): Technology 3 CSE 371 (Roth): Performance 4 Recall: Processor Performance Semiconductor Technology gate gate • Programs consist of simple operations (instructions) insulator • Add two numbers, fetch data value from memory, etc. source drain source drain channel • Program runtime = “seconds per program” = Substrate (instructions/program) * (cycles/instruction) * (seconds/cycle) channel • Basic technology element: MOSFET • Instructions per program: “dynamic instruction count” • Solid-state component acts like electrical switch • Runtime count of instructions executed by the program • MOS: metal-oxide-semiconductor • Determined by program, compiler, instruction set architecture (ISA) • Conductor, insulator, semi-conductor • Cycles per instruction: “CPI” (typical range: 2 to 0.5) • FET: field-effect transistor • On average, how many cycles does an instruction take to execute? • Channel conducts source!drain only when voltage applied to gate • Determined by program, compiler, ISA, micro-architecture • Channel length: characteristic parameter (short ! fast) • Seconds per cycle: clock period, length of each cycle • Aka “feature size” or “technology” • Inverse metric: cycles per second (Hertz) or cycles per ns (Ghz) • Currently: 0.032 micron (µm), 32 nanometers (nm) • Determined by micro-architecture, technology parameters • Continued miniaturization (scaling) known as “Moore’s Law” • This unit: transistors & semiconductor technology • Won’t last forever, physical limits approaching (or are they?) CIS 501 (Martin/Roth): Technology 5 CSE 371 (Roth): Performance 6 Complementary MOS (CMOS) Basic CMOS Logic Gate • Voltages as values • Inverter: NOT gate • One p-transistor, one n-transistor • Power (VDD) = “1”, Ground = “0” power (1) • Basic operation • Two kinds of MOSFETs 0 p-transistor • Input = 0 1 • N-transistors input output • P-transistor closed, n-transistor open • Conduct when gate voltage is 1 (“node”) • Power charges output (1) • Good at passing 0s • Input = 1 n-transistor • P-transistors • P-transistor open, n-transistor closed • Conduct when gate voltage is 0 • Output discharges to ground (0) ground (0) • Good at passing 1s 1 0 • CMOS • Complementary n-/p- networks form boolean logic (i.e., gates) • And some non-gate elements too (important example: RAMs) CIS 501 (Martin/Roth): Technology 7 CIS 501 (Martin/Roth): Technology 8 Another CMOS Gate Example • What is this? Look at truth table A B • 0, 0 ! 1 • 0, 1 ! 1 output • 1, 0 ! 1 A • 1, 1 ! 0 B • Result: NAND (NOT AND) • NAND is “universal” A • What function is this? B Technology Basis of Clock Frequency output A B 9 CIS 501 (Martin/Roth): Technology CIS 501 (Martin/Roth): Technology 10 Technology Basis of Clock Frequency Resistance • Physics 101: delay through an electrical component ~ RC • Channel resistance 1 • Resistance (R) ~ length / cross-section area • Wire resistance • Slows rate of charge flow • Negligible for short wires • Capacitance (C) ~ length * area / distance-to-other-plate 1!0 • Stores charge • Voltage (V) • Electrical pressure 1 • Threshold Voltage (Vt) • Voltage at which a transistor turns “on” I 0!1 • Property of transistor based on fabrication technology 1!0 • Switching time ~ to (R * C) / (V – V ) t 1!0 • Two kinds of electrical components • CMOS transistors (gates) • Wires CIS 501 (Martin/Roth): Technology 11 CIS 501 (Martin/Roth): Technology 12 Capacitance Transistor Geometry: Width Length" Gate • Gate capacitance 1 Drain Source • Source/drain capacitance Gate • Wire capacitance 1!0 Width • Negligible for short wires Source Drain Width" 1 Bulk Si Length I Diagrams © Krste Asanovic, MIT 0 1 1!0 ! • Transistor width, set by designer for each transistor • Wider transistors: 1!0 • Lower resistance of channel (increases drive strength) – good! • But, increases capacitance of gate/source/drain – bad! • Result: set width to balance these conflicting effects • Implication: number of “outputs” of gate matters CIS 501 (Martin/Roth): Technology 13 CIS 501 (Martin/Roth): Technology 14 Transistor Geometry: Length & Scaling Wire Geometry Pitch Length" Gate Drain Source Gate Width Height Source Drain Width" Length Bulk Si Length Width Diagrams © Krste Asanovic, MIT IBM CMOS7, 6 layers of copper wiring • Transistor length: characteristic of “process generation” • Transistors 1-dimensional for design purposes: width • 45nm refers to the transistor gate length, same for all transistors • Wires 4-dimensional: length, width, height, “pitch” • Shrink transistor length: • Longer wires have more resistance • Lower resistance of channel (shorter) – good! • “Thinner” wires have more resistance • Lower gate/source/drain capacitance – good! • Closer wire spacing (“pitch”) increases capacitance • Result: switching speed improves linearly as gate length shrinks CIS 501 (Martin/Roth): Technology 15 CIS 501 (Martin/Roth): Technology From slides © Krste Asanovic, MIT 16 Increasing Problem: Wire Delay RC Delay Model Ramifications • RC Delay of wires • Want to reduce resistance • Resistance proportional to: resistivity * length / (cross section) • Wide drive transistors (width specified per device) 1 • Wires with smaller cross section have higher resistance • Short gate length • Resistivity (type of metal, copper vs aluminum) • Short wires • Capacitance proportional to length • Want to reduce capacitance 1!0 • And wire spacing (closer wires have large capacitance) • Number of connected devices • Permittivity or “dielectric constant” (of material between wires) • Less-wide transistors (gate capacitance 1 • Result: delay of a wire is quadratic in length of next stage) I • Short wires 0 1 • Insert “inverter” repeaters for long wires 1!0 ! • Why? To bring it back to linear delay… but repeaters still add delay • Trend: wires are getting relatively slow to transistors 1!0 • And relatively longer time to cross relatively larger chips CIS 501 (Martin/Roth): Technology 17 CIS 501 (Martin/Roth): Technology 18 Cost • Metric: $ • In grand scheme: CPU accounts for fraction of cost • Some of that is profit (Intel’s, Dell’s) Desktop Laptop Netbook Phone $ $100–$300 $150-$350 $50–$100 $10–$20 % of total 10–30% 10–20% 20–30% 20-30% Fabrication & Cost Other costs Memory, display, power supply/battery, storage, software • We are concerned about chip cost • Unit cost: costs to manufacture individual chips • Startup cost: cost to design chip, build the manufacturing facility CIS 501 (Martin/Roth): Technology 19 CIS 501 (Martin/Roth): Technology 20 Cost versus Price Manufacturing Steps • Cost: cost to manufacturer, cost to produce • What is the relationship of cost to price? • Complicated, has to with volume and competition • Commodity: high-volume, un-differentiated, un-branded • “Un-differentiated”: copper is copper, wheat is wheat • “Un-branded”: consumers aren’t allied to manufacturer brand • Commodity prices tracks costs closely • Example: DRAM (used for main memory) is a commodity • Do you even know who manufactures DRAM? • Microprocessors are not commodities Source: P&H • Specialization, compatibility, different cost/performance/power • Complex relationship between price and cost CIS 501 (Martin/Roth): Technology 21 CIS 501 (Martin/Roth): Technology 22 Manufacturing Steps Transistors and Wires • Multi-step photo-/electro-chemical process • More steps, higher unit cost + Fixed cost mass production ($1 million+ for “mask set”) ©IBM From slides © Krste Asanovi!, MIT CIS 501 (Martin/Roth): Technology 23 CIS 501 (Martin/Roth): Technology 24 Manufacturing Defects Unit Cost: Integrated Circuit (IC) Correct: • Defects can arise • Chips built in multi-step chemical processes on wafers • Under-/over-doping • Cost / wafer is constant, f(wafer size, number of steps) • Over-/under-dissolved insulator • Chip (die) cost is related to area Defective: • Mask mis-alignment • Larger chips means fewer of them • Particle contaminants • Cost is more than linear in area • Why? random defects • Try to minimize defects • Larger chips means fewer working ones Defective: • Process margins • Chip cost ~ chip area#" • Design rules • # = 2 to 3" • Minimal transistor size, separation Slow: • Wafer yield: % wafer that is chips • Or, tolerate defects • Die yield: % chips that work • Redundant or “spare” memory cells • Yield is increasingly non-binary - fast vs slow chips • Can substantially improve yield CIS 501 (Martin/Roth): Technology 25 CIS 501 (Martin/Roth): Technology 26 Additional Unit Cost Fixed Costs • After manufacturing, there are additional unit costs • For new chip design • Testing: how do you know chip is working? • Design & verification: ~$100M (500 person-years @ $200K
Recommended publications
  • Imperial College London Department of Physics Graphene Field Effect
    Imperial College London Department of Physics Graphene Field Effect Transistors arXiv:2010.10382v2 [cond-mat.mes-hall] 20 Jul 2021 By Mohamed Warda and Khodr Badih 20 July 2021 Abstract The past decade has seen rapid growth in the research area of graphene and its application to novel electronics. With Moore's law beginning to plateau, the need for post-silicon technology in industry is becoming more apparent. Moreover, exist- ing technologies are insufficient for implementing terahertz detectors and receivers, which are required for a number of applications including medical imaging and secu- rity scanning. Graphene is considered to be a key potential candidate for replacing silicon in existing CMOS technology as well as realizing field effect transistors for terahertz detection, due to its remarkable electronic properties, with observed elec- tronic mobilities reaching up to 2 × 105 cm2 V−1 s−1 in suspended graphene sam- ples. This report reviews the physics and electronic properties of graphene in the context of graphene transistor implementations. Common techniques used to syn- thesize graphene, such as mechanical exfoliation, chemical vapor deposition, and epitaxial growth are reviewed and compared. One of the challenges associated with realizing graphene transistors is that graphene is semimetallic, with a zero bandgap, which is troublesome in the context of digital electronics applications. Thus, the report also reviews different ways of opening a bandgap in graphene by using bi- layer graphene and graphene nanoribbons. The basic operation of a conventional field effect transistor is explained and key figures of merit used in the literature are extracted. Finally, a review of some examples of state-of-the-art graphene field effect transistors is presented, with particular focus on monolayer graphene, bilayer graphene, and graphene nanoribbons.
    [Show full text]
  • The Instruction Set Architecture
    Quiz 0 Lecture 2: The Instruction Set Architecture COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 2 Quiz 0 CD 3 Miles of Music 3 4 Pits and Lands Interpretation 0 1 1 1 0 1 0 1 As Music: 011101012 = 117/256 position of speaker As Number: Transition represents a bit state (1/on/red/female/heads) 01110101 = 1 + 4 + 16 + 32 + 64 = 117 = 75 No change represents other state (0/off/white/male/tails) 2 10 16 (Get comfortable with base 2, 8, 10, and 16.) As Text: th 011101012 = 117 character in the ASCII codes = “u” 5 6 Interpretation – ASCII Princeton Computer Science Building West Wall 7 8 Interpretation Binary Code and Data (Hello World!) • Programs consist of Code and Data • Code and Data are Encoded in Bits IA-64 Binary (objdump) As Music: 011101012 = 117/256 position of speaker As Number: 011101012 = 1 + 4 + 16 + 32 + 64 = 11710 = 7516 As Text: th 011101012 = 117 character in the ASCII codes = “u” CAN ALSO BE INTERPRETED AS MACHINE INSTRUCTION! 9 Interfaces in Computer Systems Instructions Sequential Circuit!! Software: Produce Bits Instructing Machine to Manipulate State or Produce I/O Computers process information State Applications • Input/Output (I/O) Operating System • State (memory) • Computation (processor) Compiler Firmware Instruction Set Architecture Input Output Instruction Set Processor I/O System Datapath & Control Computation Digital Design Circuit Design • Instructions instruct processor to manipulate state Layout • Instructions instruct processor to produce I/O in the same way Hardware: Read and Obey Instruction Bits 12 State State – Main Memory Typical modern machine has this architectural state: Main Memory (AKA: RAM – Random Access Memory) 1.
    [Show full text]
  • Simple Computer Example Register Structure
    Simple Computer Example Register Structure Read pp. 27-85 Simple Computer • To illustrate how a computer operates, let us look at the design of a very simple computer • Specifications 1. Memory words are 16 bits in length 2. 2 12 = 4 K words of memory 3. Memory can be accessed in one clock cycle 4. Single Accumulator for ALU (AC) 5. Registers are fully connected Simple Computer Continued 4K x 16 Memory MAR 12 MDR 16 X PC 12 ALU IR 16 AC Simple Computer Specifications (continued) 6. Control signals • INCPC – causes PC to increment on clock edge - [PC] +1 PC •ACin - causes output of ALU to be stored in AC • GMDR2X – get memory data register to X - [MDR] X • Read (Write) – Read (Write) contents of memory location whose address is in MAR To implement instructions, control unit must break down the instruction into a series of register transfers (just like a complier must break down C program into a series of machine level instructions) Simple Computer (continued) • Typical microinstruction for reading memory State Register Transfer Control Line(s) Next State 1 [[MAR]] MDR Read 2 • Timing State 1 State 2 During State 1, Read set by control unit CLK - Data is read from memory - MDR changes at the Read beginning of State 2 - Read is completed in one clock cycle MDR Simple Computer (continued) • Study: how to write the microinstructions to implement 3 instructions • ADD address • ADD (address) • JMP address ADD address: add using direct addressing 0000 address [AC] + [address] AC ADD (address): add using indirect addressing 0001 address [AC] + [[address]] AC JMP address 0010 address address PC Instruction Format for Simple Computer IR OP 4 AD 12 AD = address - Two phases to implement instructions: 1.
    [Show full text]
  • 45-Year CPU Evolution: One Law and Two Equations
    45-year CPU evolution: one law and two equations Daniel Etiemble LRI-CNRS University Paris Sud Orsay, France [email protected] Abstract— Moore’s law and two equations allow to explain the a) IC is the instruction count. main trends of CPU evolution since MOS technologies have been b) CPI is the clock cycles per instruction and IPC = 1/CPI is the used to implement microprocessors. Instruction count per clock cycle. c) Tc is the clock cycle time and F=1/Tc is the clock frequency. Keywords—Moore’s law, execution time, CM0S power dissipation. The Power dissipation of CMOS circuits is the second I. INTRODUCTION equation (2). CMOS power dissipation is decomposed into static and dynamic powers. For dynamic power, Vdd is the power A new era started when MOS technologies were used to supply, F is the clock frequency, ΣCi is the sum of gate and build microprocessors. After pMOS (Intel 4004 in 1971) and interconnection capacitances and α is the average percentage of nMOS (Intel 8080 in 1974), CMOS became quickly the leading switching capacitances: α is the activity factor of the overall technology, used by Intel since 1985 with 80386 CPU. circuit MOS technologies obey an empirical law, stated in 1965 and 2 Pd = Pdstatic + α x ΣCi x Vdd x F (2) known as Moore’s law: the number of transistors integrated on a chip doubles every N months. Fig. 1 presents the evolution for II. CONSEQUENCES OF MOORE LAW DRAM memories, processors (MPU) and three types of read- only memories [1]. The growth rate decreases with years, from A.
    [Show full text]
  • Three-Dimensional Integrated Circuit Design: EDA, Design And
    Integrated Circuits and Systems Series Editor Anantha Chandrakasan, Massachusetts Institute of Technology Cambridge, Massachusetts For other titles published in this series, go to http://www.springer.com/series/7236 Yuan Xie · Jason Cong · Sachin Sapatnekar Editors Three-Dimensional Integrated Circuit Design EDA, Design and Microarchitectures 123 Editors Yuan Xie Jason Cong Department of Computer Science and Department of Computer Science Engineering University of California, Los Angeles Pennsylvania State University [email protected] [email protected] Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota [email protected] ISBN 978-1-4419-0783-7 e-ISBN 978-1-4419-0784-4 DOI 10.1007/978-1-4419-0784-4 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009939282 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Foreword We live in a time of great change.
    [Show full text]
  • PIC Family Microcontroller Lesson 02
    Chapter 13 PIC Family Microcontroller Lesson 02 Architecture of PIC 16F877 Internal hardware for the operations in a PIC family MCU direct Internal ID, control, sequencing and reset circuits address 7 14-bit Instruction register 8 MUX program File bus Select 8 Register 14 8 8 W Register (Accumulator) ADDR Status Register MUX Flash Z, C and DC 9 Memory Data Internal EEPROM RAM 13 Peripherals 256 Byte Program Registers counter Ports data 368 Byte 13 bus 2011 Microcontrollers-... 2nd Ed. Raj KamalA to E 3 8-level stack (13-bit) Pearson Education 8 ALU Features • Supports 8-bit operations • Internal data bus is of 8-bits 2011 Microcontrollers-... 2nd Ed. Raj Kamal 4 Pearson Education ALU Features • ALU operations between the Working (W) register (accumulator) and register (or internal RAM) from a register-file • ALU operations can also be between the W and 8-bits operand from instruction register (IR) • The operations also use three flags Z, C and DC/borrow. [Zero flag, Carry flag and digit (nibble) carry flag] 2011 Microcontrollers-... 2nd Ed. Raj Kamal 5 Pearson Education ALU features • The destination of result from ALU operations can be either W or register (f) in file • The flags save at status register (STATUS) • PIC CPU is a one-address machine (one operand specified in the instruction for ALU) 2011 Microcontrollers-... 2nd Ed. Raj Kamal 6 Pearson Education ALU features • Two operands are used in an arithmetic or logic operations • One is source operand from one a register file/RAM (or operand from instruction) and another is W-register • Advantage—ALU directly operates on a register or memory similar to 8086 CPU 2011 Microcontrollers-..
    [Show full text]
  • A Simple Processor
    CS 240251 SpringFall 2019 2020 FoundationsPrinciples of Programming of Computer Languages Systems λ Ben Wood A Simple Processor 1. A simple Instruction Set Architecture 2. A simple microarchitecture (implementation): Data Path and Control Logic https://cs.wellesley.edu/~cs240/s20/ A Simple Processor 1 Program, Application Programming Language Compiler/Interpreter Software Operating System Instruction Set Architecture Microarchitecture Digital Logic Devices (transistors, etc.) Hardware Solid-State Physics A Simple Processor 2 Instruction Set Architecture (HW/SW Interface) processor memory Instructions • Names, Encodings Instruction Encoded • Effects Logic Instructions • Arguments, Results Registers Data Local storage • Names, Size • How many Large storage • Addresses, Locations Computer A Simple Processor 3 Computer Microarchitecture (Implementation of ISA) Instruction Fetch and Registers ALU Memory Decode A Simple Processor 4 (HW = Hardware or Hogwarts?) HW ISA An example made-up instruction set architecture Word size = 16 bits • Register size = 16 bits. • ALU computes on 16-bit values. Memory is byte-addressable, accesses full words (byte pairs). 16 registers: R0 - R15 • R0 always holds hardcoded 0 Address Contents • R1 always holds hardcoded 1 0 First instruction, low-order byte • R2 – R15: general purpose 1 First instruction, Instructions are 1 word in size. high-order byte 2 Second instruction, Separate instruction memory. low-order byte Program Counter (PC) register ... ... • holds address of next instruction to execute. A Simple Processor 5 R: Register File M: Data Memory Reg Contents Reg Contents Address Contents R0 0x0000 R8 0x0 – 0x1 R1 0x0001 R9 0x2 – 0x3 R2 R10 0x4 – 0x5 R3 R11 0x6 – 0x7 R4 R12 0x8 – 0x9 R5 R13 0xA – 0xB R6 R14 0xC – 0xD R7 R15 … Program Counter IM: Instruction Memory PC Address Contents 0x0 – 0x1 ß Processor 1.
    [Show full text]
  • Cuda C Best Practices Guide
    CUDA C BEST PRACTICES GUIDE DG-05603-001_v9.0 | June 2018 Design Guide TABLE OF CONTENTS Preface............................................................................................................ vii What Is This Document?..................................................................................... vii Who Should Read This Guide?...............................................................................vii Assess, Parallelize, Optimize, Deploy.....................................................................viii Assess........................................................................................................ viii Parallelize.................................................................................................... ix Optimize...................................................................................................... ix Deploy.........................................................................................................ix Recommendations and Best Practices.......................................................................x Chapter 1. Assessing Your Application.......................................................................1 Chapter 2. Heterogeneous Computing.......................................................................2 2.1. Differences between Host and Device................................................................ 2 2.2. What Runs on a CUDA-Enabled Device?...............................................................3 Chapter 3. Application Profiling.............................................................................
    [Show full text]
  • The Economic Impact of Moore's Law: Evidence from When It Faltered
    The Economic Impact of Moore’s Law: Evidence from when it faltered Neil Thompson Sloan School of Management, MIT1 Abstract “Computing performance doubles every couple of years” is the popular re- phrasing of Moore’s Law, which describes the 500,000-fold increase in the number of transistors on modern computer chips. But what impact has this 50- year expansion of the technological frontier of computing had on the productivity of firms? This paper focuses on the surprise change in chip design in the mid-2000s, when Moore’s Law faltered. No longer could it provide ever-faster processors, but instead it provided multicore ones with stagnant speeds. Using the asymmetric impacts from the changeover to multicore, this paper shows that firms that were ill-suited to this change because of their software usage were much less advantaged by later improvements from Moore’s Law. Each standard deviation in this mismatch between firm software and multicore chips cost them 0.5-0.7pp in yearly total factor productivity growth. These losses are permanent, and without adaptation would reflect a lower long-term growth rate for these firms. These findings may help explain larger observed declines in the productivity growth of users of information technology. 1 I would like to thank my PhD advisors David Mowery, Lee Fleming, Brian Wright and Bronwyn Hall for excellent support and advice over the years. Thanks also to Philip Stark for his statistical guidance. This work would not have been possible without the help of computer scientists Horst Simon (Lawrence Berkeley National Lab) and Jim Demmel, Kurt Keutzer, and Dave Patterson in the Berkeley Parallel Computing Lab, I gratefully acknowledge their overall guidance, their help with the Berkeley Software Parallelism Survey and their hospitality in letting me be part of their lab.
    [Show full text]
  • Declustering Spatial Databases on a Multi-Computer Architecture
    Declustering spatial databases on a multi-computer architecture 1 2 ? 3 Nikos Koudas and Christos Faloutsos and Ibrahim Kamel 1 Computer Systems Research Institute University of Toronto 2 AT&T Bell Lab oratories Murray Hill, NJ 3 Matsushita Information Technology Lab oratory Abstract. We present a technique to decluster a spatial access metho d + on a shared-nothing multi-computer architecture [DGS 90]. We prop ose a software architecture with the R-tree as the underlying spatial access metho d, with its non-leaf levels on the `master-server' and its leaf no des distributed across the servers. The ma jor contribution of our work is the study of the optimal capacity of leaf no des, or `chunk size' (or `striping unit'): we express the resp onse time on range queries as a function of the `chunk size', and we show how to optimize it. We implemented our metho d on a network of workstations, using a real dataset, and we compared the exp erimental and the theoretical results. The conclusion is that our formula for the resp onse time is very accurate (the maximum relative error was 29%; the typical error was in the vicinity of 10-15%). We illustrate one of the p ossible ways to exploit such an accurate formula, by examining several `what-if ' scenarios. One ma jor, practical conclusion is that a chunk size of 1 page gives either optimal or close to optimal results, for a wide range of the parameters. Keywords: Parallel data bases, spatial access metho ds, shared nothing ar- chitecture. 1 Intro duction One of the requirements for the database management systems (DBMSs) of the future is the ability to handle spatial data.
    [Show full text]
  • Comparing the Power and Performance of Intel's SCC to State
    Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs Ehsan Totoni, Babak Behzad, Swapnil Ghike, Josep Torrellas Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA E-mail: ftotoni2, bbehza2, ghike2, [email protected] Abstract—Power dissipation and energy consumption are be- A key architectural challenge now is how to support in- coming increasingly important architectural design constraints in creasing parallelism and scale performance, while being power different types of computers, from embedded systems to large- and energy efficient. There are multiple options on the table, scale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the namely “heavy-weight” multi-cores (such as general purpose best use of exponentially increasing numbers of transistors within processors), “light-weight” many-cores (such as Intel’s Single- the power and energy budgets. Intel SCC is an appealing option Chip Cloud Computer (SCC) [1]), low-power processors (such for future many-core architectures. In this paper, we use various as embedded processors), and SIMD-like highly-parallel archi- scalable applications to quantitatively compare and analyze tectures (such as General-Purpose Graphics Processing Units the performance, power consumption and energy efficiency of different cutting-edge platforms that differ in architectural build. (GPGPUs)). These platforms include the Intel Single-Chip Cloud Computer The Intel SCC [1] is a research chip made by Intel Labs (SCC) many-core, the Intel Core i7 general-purpose multi-core, to explore future many-core architectures. It has 48 Pentium the Intel Atom low-power processor, and the Nvidia ION2 (P54C) cores in 24 tiles of two cores each.
    [Show full text]
  • Chap01: Computer Abstractions and Technology
    CHAPTER 1 Computer Abstractions and Technology 1.1 Introduction 3 1.2 Eight Great Ideas in Computer Architecture 11 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory 24 1.6 Performance 28 1.7 The Power Wall 40 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff: Benchmarking the Intel Core i7 46 1.10 Fallacies and Pitfalls 49 1.11 Concluding Remarks 52 1.12 Historical Perspective and Further Reading 54 1.13 Exercises 54 CMPS290 Class Notes (Chap01) Page 1 / 24 by Kuo-pao Yang 1.1 Introduction 3 Modern computer technology requires professionals of every computing specialty to understand both hardware and software. Classes of Computing Applications and Their Characteristics Personal computers o A computer designed for use by an individual, usually incorporating a graphics display, a keyboard, and a mouse. o Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party software. o This class of computing drove the evolution of many computing technologies, which is only about 35 years old! Server computers o A computer used for running larger programs for multiple users, often simultaneously, and typically accessed only via a network. o Servers are built from the same basic technology as desktop computers, but provide for greater computing, storage, and input/output capacity. Supercomputers o A class of computers with the highest performance and cost o Supercomputers consist of tens of thousands of processors and many terabytes of memory, and cost tens to hundreds of millions of dollars.
    [Show full text]