<<

What is ?

• “Computer Architecture is the science and art of selecting and interconnecting hardware components to create that meet functional, performance and cost CIS 501 goals.” - WWW Computer Architecture Page Computer Architecture • An analogy to architecture of buildings…

Unit 0: Introduction

Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood.

CIS 501 (Martin): Introduction 1 CIS 501 (Martin): Introduction 2

What is Computer Architecture? What is Computer Architecture? The role of a building architect: The role of a computer architect: Construction Buildings Manufacturing Computers Materials “Technology” Desktops Plans Houses Plans Steel Design Offices Logic Gates Design Servers Concrete Apartments SRAM Mobile Phones Brick Goals Stadiums DRAM Goals Wood Function Museums Circuit Techniques Function Game Consoles Glass Cost Packaging Performance Embedded Safety Magnetic Storage Reliability Ease of Construction Flash Memory Cost/Manufacturability Energy Efficiency Energy Efficiency Fast Build Time Time to Market Aesthetics Important differences: age (~60 years vs thousands), rate of change, automated mass production (magnifies design)

CIS 501 (Martin): Introduction 3 CIS 501 (Martin): Introduction 4 Computer Architecture Is Different… Design Goals

• Age of discipline • Functional • 60 years (vs. five thousand years) • Needs to be correct • And unlike software, difficult to update once deployed • What functions should it support (Turing completeness aside) • Rate of change • All three factors (technology, applications, goals) are changing • Reliable • Quickly • Does it continue to perform correctly? • Hard fault vs transient fault • Automated mass production • Google story - memory errors and sun spots • Space satellites vs desktop vs server reliability • Design advances magnified over millions of chips • High performance • Boot-strapping effect • “Fast” is only meaningful in the context of a set of important tasks • Better computers help design next generation • Not just “Gigahertz” – truck vs sports car analogy • Impossible goal: fastest possible design for all programs CIS 501 (Martin): Introduction 5 CIS 501 (Martin): Introduction 6

Design Goals Shaping Force: Applications/Domains

• Low cost • Another shaping force: applications (usage and context) • Per unit manufacturing cost (wafer cost) • Applications and application domains have different requirements • Cost of making first chip after design (mask cost) • Domain: group with similar character • Design cost (huge design teams, why? Two reasons…) • Lead to different designs • (Dime/dollar joke)

• Low power/energy • Scientific: weather prediction, genome sequencing • Energy in (battery life, cost of electricity) • First application domain: naval ballistics firing tables • Energy out (cooling and related costs) • Need: large memory, heavy-duty floating point • Cyclic problem, very much a problem today • Examples: CRAY T3E, IBM BlueGene

• Challenge: balancing the relative importance of these goals • Commercial: /web serving, e-commerce, Google • And the balance is constantly changing • Need: data movement, high memory + I/O bandwidth • No goal is absolutely important at expense of all others • Examples: Sun Enterprise Server, AMD Opteron, Intel Xeon • Our focus: performance, only touch on cost, power, reliability CIS 501 (Martin): Introduction 7 CIS 501 (Martin): Introduction 8 More Recent Applications/Domains Application Specific Designs

• Desktop: home office, multimedia, games • This class is about general-purpose CPUs • Need: integer, memory bandwidth, integrated graphics/network? • that can do anything, run a full OS, etc. • Examples: Intel Core 2, Core i7, AMD Athlon • E.g., Intel Core i7, AMD Athlon, IBM Power, ARM, Intel • Mobile: laptops, mobile phones • In contrast to application-specific chips • Need: low power, integer performance, integrated wireless • Or ASICs (Application specific integrated circuits) • Laptops: Intel Core 2 Mobile, Atom, AMD Turion • Also application-domain specific processors • Smaller devices: ARM chips by Samsung and others, Intel Atom • Implement critical domain-specific functionality in hardware • Embedded: in automobiles, door knobs • Examples: video encoding, 3D graphics • Need: low power, low cost • General rules • Examples: ARM chips, dedicated processors (DSPs) - Hardware is less flexible than software • Over 1 billion ARM cores sold in 2006 (at least one per phone) + Hardware more effective (speed, power, cost) than software + Domain specific more “parallel” than general purpose • Deeply Embedded: disposable “smart dust” sensors • But general mainstream processors becoming more parallel • Need: extremely low power, extremely low cost • Trend: from specific to general (for a specific domain) CIS 501 (Martin): Introduction 9 CIS 501 (Martin): Introduction 10

Constant Change: Technology “Technology” Applications/Domains Logic Gates Desktop SRAM Servers DRAM Mobile Phones Circuit Techniques Supercomputers Packaging Game Consoles Magnetic Storage Goals Embedded Technology Trends Flash Memory Function Performance Reliability Cost/Manufacturability Energy Efficiency Time to Market • Absolute improvement, different rates of change • New application domains enabled by technology advances CIS 501 (Martin): Introduction 11 CIS 501 (Martin): Introduction 12 “Technology” Technology Trends gate • Basic element • Moore’s Law • Solid-state (i.e., electrical switch) source drain • Continued (up until now, at least) transistor miniaturization • Building block of integrated circuits (ICs) channel • Some technology-based ramifications • What’s so great about ICs? Everything • Absolute improvements in density, speed, power, costs + High performance, high reliability, low cost, low power • SRAM/logic: density: ~30% (annual), speed: ~20% + Lever of mass production • DRAM: density: ~60%, speed: ~4% • Several kinds of IC families • Disk: density: ~60%, speed: ~10% (non-transistor) • SRAM/logic: optimized for speed (used for processors) • Big improvements in flash memory and network bandwidth, too • DRAM: optimized for density, cost, power (used for memory) • Flash: optimized for density, cost (used for storage) • Changing quickly and with respect to each other!! • Increasing opportunities for integrating multiple technologies • Example: density increases faster than speed • Non-transistor storage and inter-connection technologies • Trade-offs are constantly changing • Disk, optical storage, ethernet, fiber optics, wireless • Re-evaluate/re-design for each technology generation CIS 501 (Martin): Introduction 13 CIS 501 (Martin): Introduction 14

Technology Change Drives Everything Revolution I: The

• Computers get 10x faster, smaller, cheaper every 5-6 years! • Microprocessor revolution • A 10x quantitative change is qualitative change • One significant technology threshold was crossed in 1970s • Plane is 10x faster than car, and fundamentally different travel mode • Enough (~25K) to put a 16- processor on one chip • Huge performance advantages: fewer slow chip-crossings • New applications become self-sustaining market segments • Even bigger cost advantages: one “stamped-out” component • Recent examples: mobile phones, digital cameras, mp3 players, etc. • have allowed new market segments • Low-level improvements appear as discrete high-level jumps • Desktops, CD/DVD players, laptops, game consoles, set-top boxes, • Capabilities cross thresholds, enabling new applications and uses mobile phones, digital camera, mp3 players, GPS, automotive

• And replaced incumbents in existing segments • Microprocessor-based system replaced supercomputers, “mainframes”, “minicomputers”, etc.

CIS 501 (Martin): Introduction 15 CIS 501 (Martin): Introduction 16 First Microprocessor Pinnacle of Single-Core Microprocessors

• Intel 4004 (1971) • Intel Pentium4 (2003) • Application: calculators • Application: desktop/server • Technology: 10000 nm • Technology: 90nm (1/100x)

• 55M transistors (20,000x) • 2300 transistors • 101 mm2 (10x) • 13 mm2 • 3.4 GHz (10,000x) • 108 KHz • 1.2 Volts (1/10x) • 12 Volts • 32/64-bit data (16x) • 4-bit data • 22-stage pipelined • Single-cycle datapath • 3 (superscalar) • Two levels of on-chip cache • data-parallel vector (SIMD) instructions, hyperthreading

CIS 501 (Martin): Introduction 17 CIS 501 (Martin): Introduction 18

Tracing the Microprocessor Revolution Revolution II: Implicit Parallelism

• How were growing transistor counts used? • Then to extract implicit instruction-level parallelism • Hardware provides parallel resources, figures out how to use them • Software is oblivious • Initially to widen the datapath • 4004: 4 ! Pentium4: 64 bits • Initially using pipelining … • Which also enabled increased clock frequency • … and also to add more powerful instructions • … caches … • To amortize overhead of fetch and decode • Which became necessary as processor clock frequency increased • To simplify programming (which was done by hand then) • … and integrated floating-point • Then deeper pipelines and branch speculation • Then multiple instructions per cycle (superscalar) • Then dynamic (out-of-order execution)

• We will talk about these things CIS 501 (Martin): Introduction 19 CIS 501 (Martin): Introduction 20 Pinnacle of Single-Core Microprocessors Modern Multicore Processor

• Intel Pentium4 (2003) • Intel Core i7 (2009) • Application: desktop/server • Application: desktop/server • Technology: 90nm (1/100x) • Technology: 45nm (1/2x)

• 55M transistors (20,000x) • 774M transistors (12x) • 101 mm2 (10x) • 296 mm2 (3x) • 3.4 GHz (10,000x) • 3.2 GHz to 3.6 Ghz (~1x) • 1.2 Volts (1/10x) • 0.7 to 1.4 Volts (~1x)

• 32/64-bit data (16x) • 128-bit data (2x) • 22-stage pipelined datapath • 14-stage pipelined datapath (0.5x) • 3 instructions per cycle (superscalar) • 4 instructions per cycle (~1x) • Two levels of on-chip cache • Three levels of on-chip cache • data-parallel vector (SIMD) instructions, hyperthreading • data-parallel vector (SIMD) instructions, hyperthreading • Four-core multicore (4x)

CIS 501 (Martin): Introduction 21 CIS 501 (Martin): Introduction 22

Revolution III: Explicit Parallelism To ponder…

• Then to support explicit data & level parallelism • Hardware provides parallel resources, software specifies usage • Why? diminishing returns on instruction-level-parallelism

• First using (subword) vector instructions…, Intel’s SSE • One instruction does four parallel multiplies Is this decade’s

• … and general support for multi-threaded programs “multicore revolution” • Coherent caches, hardware synchronization primitives comparable to the original • Then using support for multiple concurrent threads on chip “microprocessor revolution”? • First with single-core multi-threading, now with multi-core

• Graphics processing units (GPUs) are highly parallel • Converging with general-purpose processors (CPUs)?

CIS 501 (Martin): Introduction 23 CIS 501 (Martin): Introduction 24 Technology Disruptions Recap: Constant Change “Technology” Applications/Domains • Classic examples: Logic Gates Desktop • The transistor SRAM Servers • Microprocessor DRAM Mobile Phones • More recent examples: Circuit Techniques Supercomputers • Multicore processors Packaging Game Consoles • Flash-based solid-state storage Magnetic Storage Goals Embedded • Near-term potentially disruptive technologies: Flash Memory Function • Phase-change memory (non-volatile memory) Performance • Chip stacking (also called 3D die stacking) Reliability • Disruptive “end-of-scaling” Cost/Manufacturability Energy Efficiency • “If something can’t go on forever, it must stop eventually” Time to Market • Can we continue to shrink transistors for ever? • Even if more transistors, not getting as energy efficient as fast

CIS 501 (Martin): Introduction 25 CIS 501 (Martin): Introduction 26

Managing This Mess Pervasive Idea: Abstraction and Layering

• Architect must consider all factors • Abstraction: only way of dealing with complex systems • Goals/constraints, applications, implementation technology • Divide world into objects, each with an… • Interface: knobs, behaviors, knobs ! behaviors • Implementation: “black box” (ignorance+apathy) • Questions • Only specialists deal with implementation, rest of us with interface • How to deal with all of these inputs? • Example: car, only mechanics know how implementation works • How to manage changes? • Layering: abstraction discipline makes life even simpler • Divide objects in system into layers, layer n objects… • Answers • Implemented using interfaces of layer n – 1 • Don’t need to know interfaces of layer n – 2 (sometimes helps) • Accrued institutional knowledge (stand on each other’s shoulders) • Inertia: a dark side of layering • Experience, rules of thumb • Layer interfaces become entrenched over time (“standards”) • Discipline: clearly defined end state, keep your eyes on the ball – Very difficult to change even if benefit is clear (example: Digital TV) • Abstraction and layering • Opacity: hard to reason about performance across layers

CIS 501 (Martin): Introduction 27 CIS 501 (Martin): Introduction 28 Abstraction, Layering, and Computers Why Study Computer Architecture?

Application Application Application • Understand where computers are going Software • Future capabilities drive the (computing) world , Device Drivers • Real world-impact: no computer architecture no computers! Instruction Set Architecture (ISA) ! Processor Memory I/O • Understand high-level design concepts Hardware • The best architects understand all the levels Circuits, Devices, Materials • Devices, circuits, architecture, , applications • Computer architecture • Understand computer performance • Writing well-tuned (fast) software requires knowledge of hardware • Definition of ISA to facilitate implementation of software layers • Get a (design or research) hardware job • Intel, AMD, IBM, ARM, Motorola, Sun/Oracle, NVIDIA, Samsung • This course mostly on computer micro-architecture • Get a (design or research) software job • Design Processor, Memory, I/O to implement ISA • Best software designers understand hardware • Need to understand hardware to write fast software • Touch on & OS (n +1), circuits (n -1) as well

CIS 501 (Martin): Introduction 29 CIS 501 (Martin): Introduction 30

Penn Legacy Course Goals

• ENIAC: electronic numerical integrator and calculator • See the “big ideas” in computer architecture • Pipelining, parallelism, caching, locality, abstraction, etc. • First operational general-purpose stored-program computer • Designed and built here by Eckert and Mauchly • Go see it (Moore building) • Exposure to examples of good (and some bad) engineering

• Understanding computer performance and metrics • First seminars on computer design • Experimental evaluation/analysis (“science” in ) • Moore School Lectures, 1946 • Gain experience with simulators (architect’s tool of choice) • “Theory and Techniques • Understanding quantitative data and experiments for Design of Electronic Digital Computers” • Get exposure to “research” and cutting edge ideas • Read some research literature (i.e., papers) • Course project

• My role: trick you into learning something

CIS 501 (Martin): Introduction 31 CIS 501 (Martin): Introduction 32 Computer Science as an Estuary Course Topics Where does architecture fit into computer science? Engineering Engineering, some Science • Revisiting “undergraduate” computer architecture topics Design • Evaluation metrics and trends Handling complexity • ISAs (instruction set ) Real-world impact • and pipelining Examples: Internet, • Memory hierarchies & microprocessor

Science • Parallelism Experiments • Instruction: multiple issue, dynamic scheduling, speculation Hypothesis • Data: vectors and streams Mathematics Examples: Limits of computation Internet behavior, • Thread: and synchronization, multicore & analysis Protein-folding Human/computer interaction • More fun stuff if we get to it Logic Proofs of correctness Other Issues Public policy, ethics, CIS 501 (Martin): Introduction law, security 33 CIS 501 (Martin): Introduction 34

CIS501: Administrivia Resources

• Instructor: Prof. Milo Martin ([email protected]) • Readings • TAs: Christian DeLozier & Abhishek Udupa • “Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors” by Jean-Loup Baer • Lectures • Penn Bookstore or Amazon ($68) or Kindle ($54) • Research papers (online) • Please do not be disruptive (I’m easily distracted as it is)

• Three different web sites • Free resources • Course website: syllabus, schedule, lecture notes, assignments • ACM digital : http://www.acm.org/dl/ • http://www.cis.upenn.edu/~cis501/ • Computer architecture page: http://www.cs.wisc.edu/~arch/www/ • “Piazza”: announcements, questions & discussion • http://www.piazza.com/upenn/fall2011/cis501 • Local resources: • The way to ask questions/clarifications • Architecture & Compilers Group: http://www.cis.upenn.edu/acg/ • Can post to just me & TAs or anonymous to class • As a general rule, no need to email me directly • “Blackboard”: grade book, turning in some assignments • https://courseweb.library.upenn.edu/

CIS 501 (Martin): Introduction 35 CIS 501 (Martin): Introduction 36 Prerequisites The Students of CIS501

• Basic computer organization an absolute must • Three different constituencies, different backgrounds • Basic digital logic: gates, boolean functions, latches • PhD students • Binary arithmetic: adders, hardware mul/div, floating-point • More research focused • Basic datapath: ALU, , memory interface, muxes • WPE-I PhD qualifying exam • Basic control: single-cycle control, • Familiarity with • MSE students (CIS, EMBS, Robotics, others) • “Computer Organization and Design: Hardware/Software Interface” • Expand on undergraduate coursework • Which, unfortunately, varies widely • http://www.cis.upenn.edu/~cis371/ • BSE (undergraduate) students • Significant programming experience • Expand on undergraduate coursework (CIS371) • For those considering graduate school • No specific language required • Why? assignments require writing code to simulate hardware • Extremely difficult to tailor course for all three constituencies • Not difficult if competent ; extremely difficult if not

CIS 501 (Martin): Introduction 37 CIS 501 (Martin): Introduction 38

For Non-CIS Students… Coursework

• Registration priority is given to CIS students • Homework assignments • Written questions and programming • For non-CIS students • Due at beginning of class • As the class is already extremely large… • 2 total “grace” periods (next class period), max one per assignment • Hand in late, no questions asked • I’ll only consider admitting students not in their first semester • No assignments accepted after solutions posted • For non-CIS students not in their first semester, if you • Individual work want to be considered, send me via email (milom@cis): • Paper reviews 1. Your name & Penn email address • Short response to papers we’ll read for class 2. What program you’re enrolled in • Discuss and write up in groups of four • Twist: can’t work with the same group member 3. A transcript of all your Penn courses with grades 4. Description of prior courses on computer architecture • Exams • Midterm, in class, Thursday, October 27th 5. A brief description of the largest programming project you’ve • Cumulative final completed (lines of code, overall complexity, language used, etc.) • Thursday, December 15th 12-2pm • WPE I for PhD students

CIS 501 (Martin): Introduction 39 CIS 501 (Martin): Introduction 40 Coursework Grading

• Mini-research project • Tentative grade contributions: • Topic • Homework assignments: 20% • Validate data in some paper studied in class (default) • Paper reviews: 5% • Examine modest extension to paper (more ambitious) • Mini-research group project: 15% • Your own idea (great!) • Use tools • Exams: 60% • Homework will help you get ready • Midterm: 25% • Groups of four (keep an eye out for potential partners) • Final: 35% • Proposal + final report • More detail later • Typical grade distributions • A: 40%, B: 40%, /D/F: 20%

CIS 501 (Martin): Introduction 41 CIS 501 (Martin): Introduction 42

Academic Misconduct Full Disclosure

• Cheating will not be tolerated • Potential sources of bias or conflict of interest

• General rule: • Most of my funding governmental (your tax $$$ at work) • Anything with your name on it must be YOUR OWN work • National Science Foundation (NSF) • Example: individual work on homework assignments • DARPA & ONR • Possible penalties • Zero on assignment (minimum) • My non-governmental sources of research funding • Fail course • NVIDIA (sub-contract of large DARPA project) • Note on permanent record • Intel • Suspension • Sun/Oracle (hardware donation) • Expulsion • Collaborators and colleagues • Penn’s Code of Conduct • Intel, IBM, AMD, Oracle, Microsoft, Google, VMWare, ARM, etc. • http://www.vpul.upenn.edu/osl/acadint.html • (Just about every major company)

CIS 501 (Martin): Introduction 43 CIS 501 (Martin): Introduction 44 First Assignment – Paper Review #1 Paper Review #1 Questions

• Read “Cramming More Components onto Integrated Circuits” by • Q1: The figure on page 2 graphs relative manufacturing Gordon Moore cost per component against the number of components per . Why do the chips become less cost • As a group of four, meet and discuss the paper effective per component for both very large and very small • Briefly answer the questions on the next slide numbers of components per chip? • The goal of these questions is to get you reading, thinking about, and discussing the paper • Your answers should be short but insightful. For most questions, a single short paragraph will suffice • Q2: One of the potential problems which Moore raises (and dismisses) is heat. Do you agree with Moore's conclusions? • E-mail the answers to me: Either justify or refute Moore's conclusions. • Text only, no html or attachments, please • Send to: [email protected] • Q3: A popular misconception of Moore's law is that it states • The “+reviews” is important, don’t leave it out • Carbon copy (CC) all group members that the speed of computers increases exponentially, • Include the names of all group member at the start of the e-mail however, that is not what Moore foretells in this paper. Explain what Moore's law actually says based on this • Due: “last thing” Wednesday, Sept 14th paper.

CIS 501 (Martin): Introduction 45 CIS 501 (Martin): Introduction 46

For Next Week…

• Read Chapter 1 for Thursday

• Read “Cramming More Components onto Integrated Circuits” by Moore, 1965 • Group discussion responses for “last thing” Wednesday

• If you’re a non-CIS student wanting to take this course • Send me email as discussed earlier

• See me right now if: • You’re an undergraduate taking this course • Any other questions about prerequisites or the course

CIS 501 (Martin): Introduction 47