CSE2021 Computer Organization CSE2021 Computer Organization
Instructor: Gulzar Khuwaja, PhD
Email: [email protected] Tel: (416) 736-2100 x 77874 Instructor: Gulzar Khuwaja, PhD Course Website: Department of Electrical Engineering https://wiki.eecs.yorku.ca/course_archive/2014-15/S/2021/ & Computer Science Schedule: Lassonde School of Engineering Lectures: R 7:00 – 10:00 pm Room CLH M (206) York University Labs: M 7:00 – 10:00 pm Room LAS 1006 Office Hrs: R 5:00 – 06:30 pm Room LAS 2018
Chapter 1 — Computer Abstractions and Technology — 2
COMPUTER ORGANIZATION AND DESIGN 5th Edition CSE2021 Computer Organization The Hardware/Software Interface Text Books:
Computer Organization and Design The Hardware/Software Interface, 5th Edition (2014) by David A. Patterson and John L. Hennessy Chapter 1 Morgan Kaufmann Publishers (Elsevier) ISBN 978-0-12-407726-3 Computer Abstractions MIPS Assembly Language Programming 2003 by Robert Britton, Pearson Publishers ISBN-10:0131420445 and Technology Assessment:
Quizzes: 11%
Labs: 24%
Midterm Exam: 25%
Final Exam: 40%
Chapter 1 — Computer Abstractions and Technology — 3 Chapter 1 — Computer Abstractions and Technology — 4
§ 1.1 Introduction COMPUTER ORGANIZATION AND DESIGN 5th Edition The Hardware/Software Interface Moore’s Law
Moore’s Law states that integrated circuit Computer Abstractions resources double every 18–24 months.
and Technology Moore’s Law resulted from a 1965 prediction of such growth in IC capacity - Introduction - Eight Great Ideas in Computer Architecture made by Gordon Moore, one of the - Below Your Program - Under the Covers founders of Intel. - Technologies for Building Processors and Memory Moore’s Law graph to represent - Performance - The Power Wall designing for rapid change - The Switch from Uniprocessors to Multiprocessors - Concluding Remarks Chapter 1 — Computer Abstractions and Technology — 5 Chapter 1 — Computer Abstractions and Technology — 6
1
§ §
1.1 Introduction 1.1 1.1 Introduction Moore’s Law Moore’s Law
Year of introduction Transistors 4004 1971 2,250 8008 1972 2,500 8080 1974 5,000 8086 1978 29,000 286 1982 120,000 386™ 1985 275,000 486™ DX 1989 1,180,000 Pentium® 1993 3,100,000 Pentium II 1997 7,500,000 Pentium III 1999 24,000,000 Pentium 4 2000 42,000,000 Source: ISSCC 2003 G. Moore “No exponential is forever, but ‘forever’ can be delayed”
Chapter 1 — Computer Abstractions and Technology — 7 Chapter 1 — Computer Abstractions and Technology — 8
§ §
1.1 Introduction 1.1 1.1 Introduction Is Moore’s Law Ending? Is Moore’s Law Ending? (2)
Intel’s former chief architect Bob Colwell says: Moore’s law will be dead within a decade (August 2013).
The end of Moore's Law is on the horizon, says AMD.
Theoretical physicist Michio Kaku believes Moore's Law has about 10 years of life left before ever-shrinking transistor sizes smack up against limitations imposed by the laws of thermodynamics and quantum physics (April 2013).
Chapter 1 — Computer Abstractions and Technology — 9 Chapter 1 — Computer Abstractions and Technology — 10
§ §
1.1 Introduction 1.1 Introduction Bell’s Law Bell’s Law
Bell's Law for the birth and death of computer classes:
Bell's law of computer classes formulated by Gordon Bell in 1972 describes how computing systems (computer classes) form, evolve and may eventually die out.
Roughly every decade a new, lower priced computer class forms based on a new programming platform, network, and interface resulting in new industry.
In 1951, men could walk inside a computer and now, computers are beginning to “walk” inside of us .
Source: B Bell, “Bell’s Law for the Birth and Death of Computer Classes”, Comms of ACM, 2008
Chapter 1 — Computer Abstractions and Technology — 11 Chapter 1 — Computer Abstractions and Technology — 12
2
§ §
1.1 Introduction 1.1 1.1 Introduction The 1st Generation Computer Computers Today
Source: http://www.computerhistory.org
Chapter 1 — Computer Abstractions and Technology — 13 Chapter 1 — Computer Abstractions and Technology — 14
§ §
1.1 Introduction 1.1 1.1 Introduction The Computer Revolution The Computer Revolution (2)
Progress in computer technology Computers in Automobiles
Underpinned by Moore’s Law reduce pollution, improve fuel efficiency via engine controls, and increase safety through Makes novel applications feasible blind spot warnings, lane departure warnings, Computers in Automobiles moving object detection, and air bag inflation Cell Phones to protect occupants in a crash Human Genome project Cell Phones World Wide Web more than half of the planet having mobile Search Engines phones, allowing person-to-person Computers are pervasive communication to almost anyone anywhere in the world
Chapter 1 — Computer Abstractions and Technology — 15 Chapter 1 — Computer Abstractions and Technology — 16
§ §
1.1 Introduction 1.1 Introduction The Computer Revolution (3) The Computer Revolution (4)
Human Genome Project World Wide Web a global effort to identify the estimated 30,000 has transformed our society. Web has genes in human DNA to figure out the replaced libraries and newspapers sequences of the chemical bases that make up human DNA to address ethical, legal, and Search Engines social issues As the content of the web grew in size and in value, many people rely on search engines The cost of computer equipment to map and analyze human DNA sequences was for such a large part of their lives that it would hundreds of millions of dollars. Since, costs be a hardship to go without them continue to drop, we will soon be able to acquire our own genome, allowing medical care to be tailored to us Chapter 1 — Computer Abstractions and Technology — 17 Chapter 1 — Computer Abstractions and Technology — 18
3
§ §
1.1 Introduction 1.1 1.1 Introduction Classes of Computers Classes of Computers (2)
Personal computers Personal computers Computers designed for use by an individual Server computers General purpose, variety of software Supercomputers Subject to cost/performance tradeoff Embedded computers Server computers
Computers used for running larger programs for multiple users, often simultaneously
Network based
High capacity, performance, reliability
Range from small servers to building sized
Chapter 1 — Computer Abstractions and Technology — 19 Chapter 1 — Computer Abstractions and Technology — 20
§ §
1.1 Introduction 1.1 1.1 Introduction Classes of Computers (3) The PostPC Era
Supercomputers Smart phones represent Consist thousands of processors and many the recent growth in cell terabytes of memory phone industry, and they passed PCs in 2011. High-level scientific and engineering calculations Tablets are the fastest Highest capability and cost but represent a small growing category, nearly fraction of the overall computer market doubling between 2011 Embedded computers and 2012.
Hidden as components of systems Recent PCs and traditional cell phone Largest class of computers and span the widest range of applications categories are relatively flat or declining. Strict power/performance/cost limitations
Chapter 1 — Computer Abstractions and Technology — 21 Chapter 1 — Computer Abstractions and Technology — 22
§ §
1.1 Introduction 1.1 Introduction The PostPC Era (2) The PostPC Era (3)
Personal Mobile Devices (PMDs) Cloud computing
Small wireless devices Term cloud essentially used for the Internet
Battery operated Portion of software run on a PMD and portion
Connects to the Internet run in the Cloud
Hundreds of dollars Warehouse Scale Computers (WSCs) Big datacenters containing 100,000 servers Smart phones, tablets, electronic glasses Amazon and Google cloud vendors
Software as a Service (SaaS)
Delivers software and data as a service over the Internet
Web search and social networking
Chapter 1 — Computer Abstractions and Technology — 23 Chapter 1 — Computer Abstractions and Technology — 24
4
§ §
1.1 Introduction 1.1 1.1 Introduction What You Will Learn Understanding Performance
How programs are translated into Algorithm machine language determines number of operations executed
and how hardware executes them Programming language, compiler, architecture
The hardware/software interface determine number of machine instructions executed per operation What determines program performance Processor and memory system and how it can be improved determine how fast instructions are executed How hardware designers improve I/O system (including OS) performance determine how fast I/O operations are What is parallel processing executed
Chapter 1 — Computer Abstractions and Technology — 25 Chapter 1 — Computer Abstractions and Technology — 26
§ §
1.2 1.2 Eight Great Ideas in Computer Architecture 1.2 Eight Great Ideas in Computer Architecture Eight Great Design Ideas Eight Great Design Ideas (2)
Design for Moore’s Law Performance via Parallelism
Computer designs can take years, resources available A form of computation in which many calculations per chip can easily double or quadruple between start are carried out simultaneously and finish of project Performance via Pipelining Anticipate where technology will be when design finishes A particular pattern of parallelism A set of data processing elements connected in Use Abstraction to simplify design series, so that the output of one element is the input Hide lower-level details to offer a simpler model at higher of the next one levels Performance via Prediction Make the Common Case Fast It can be faster on average to guess and start To enhance performance better than optimizing the working rather than wait rare case
Chapter 1 — Computer Abstractions and Technology — 27 Chapter 1 — Computer Abstractions and Technology — 28
§ §
1.2 Eight Great Ideas in Computer Architecture 1.3 Below YourProgram Eight Great Design Ideas (3) Below Your Program
Hierarchy of memories Application software
The closer to the top, the faster and more expensive Written in high-level language per bit of memory System software The wider the base of the layer, the bigger the Compiler: translates HLL code to memory machine code Dependability via redundancy Operating System: Handling input/output Design systems dependable by including redundant components Managing memory and storage Scheduling tasks & sharing resources to help detect failures, and Hardware to take over when failure occurs Processor, memory, I/O controllers
Chapter 1 — Computer Abstractions and Technology — 29 Chapter 1 — Computer Abstractions and Technology — 30
5
§
§
1.4 1.4 Under the Covers 1.3 1.3 Below YourProgram Levels of Program Code Components of a Computer
High-level language The BIG Picture Same components for Level of abstraction closer all kinds of computers to problem domain Desktop, Server, Provides for productivity Embedded and portability Input/Output includes Assembly language User-interface devices Textual representation of instructions Display, keyboard, mouse Hardware representation Storage devices Hard disk, CD/DVD, flash Binary digits (bits) Network adapters Encoded instructions and data For communicating with other computers
Chapter 1 — Computer Abstractions and Technology — 31 Chapter 1 — Computer Abstractions and Technology — 32
§ §
1.4 1.4 Under the Covers 1.4 Under the Covers Touchscreen Through the Looking Glass
PostPC device LCD screen: picture elements (pixels) Each coordinate in the frame buffer on the left Supersedes keyboard and mouse determines the shade of the corresponding coordinate for the raster scan CRT display on the right. Resistive and Pixel (X0,Y0) contains the bit pattern 0011, which is a Capacitive types lighter shade on the screen than the bit pattern 1101 in
Most tablets, smart pixel (X1,Y1). phones use capacitive
Capacitive allows multiple touches simultaneously
Chapter 1 — Computer Abstractions and Technology — 33 Chapter 1 — Computer Abstractions and Technology — 34
§ §
1.4 1.4 Under the Covers 1.4 Under the Covers Opening the Box Inside the Processor (CPU)
Capacitive multitouch LCD screen Datapath: performs operations on data 3.8 V, 25 Watt-hour battery Control: sequences datapath, memory, ... Computer board Cache memory Apple A5 dual ARM Small fast SRAM memory for immediate processor cores chip access to data
Components of the Apple iPad 2 32 GB flash memory chip Chapter 1 — Computer Abstractions and Technology — 35 Chapter 1 — Computer Abstractions and Technology — 36
6
§ §
1.4 1.4 Under the Covers 1.4 Under the Covers Inside the Processor (CPU)-2 Abstractions
The processor IC The BIG Picture inside Apple A5 Abstraction helps us deal with complexity package Hides lower-level details Size of chip is 12.1 by 10.1 mm Instruction Set Architecture (ISA) or Graphical processor unit (GPU) with four Computer Architecture datapaths The hardware/software interface Two identical ARM processors Includes instructions, registers, memory
Interfaces to main access, I/O, and so on memory (DRAM) Operating system hides details of doing I/O, allocating memory from programmers
Chapter 1 — Computer Abstractions and Technology — 37 Chapter 1 — Computer Abstractions and Technology — 38
§ §
1.4 1.4 Under the Covers 1.4 Under the Covers A Safe Place for Data Networks
Volatile main memory Backbone of computer systems Loses instructions and data when power off Communication Non-volatile secondary memory Information is exchanged between computers at high Flash memory speeds Optical disk (CDROM, DVD)
Magnetic disk Resource sharing
Rather than each computer having its own I/O devices, computers on the network can share I/O devices
Nonlocal access
By connecting computers over long distances, users need not be near the computer they are using
Chapter 1 — Computer Abstractions and Technology — 39 Chapter 1 — Computer Abstractions and Technology — 40
§
§
1.4 1.4 Under the Covers 1.5 1.5 Technologies for Building Processors and Memory Networks (2) Technology Trends
Electronics Local area network (LAN): Ethernet technology A network designed to carry data within a continues geographically confined area, typically within to evolve a single building – 40 gigabits/s Increased capacity and performance
Reduced cost Wide area network (WAN): the Internet DRAM Capacity per chip over time
A network extended over hundreds of Year Technology Relative performance/cost kilometers that can span a continent 1951 Vacuum tube 1 1965 Transistor 35 Wireless network: WiFi, Bluetooth 1975 Integrated circuit (IC) 900 transmission rates from 1 to100 million bits 1995 Very large scale IC (VLSI) 2,400,000 per second 2013 Ultra large scale IC 250,000,000,000
Chapter 1 — Computer Abstractions and Technology — 41 Chapter 1 — Computer Abstractions and Technology — 42
7
§ §
1.5 1.5 Technologies for Building Processors and Memory 1.5 Technologies for Building Processors and Memory Semiconductor Technology Manufacturing ICs
Silicon: Semiconductors
Add materials to transform properties:
Conductors
adding copper or aluminum wire
Insulators
like plastic or glass
Switches (transistors)
conduct or insulate under special conditions
Chapter 1 — Computer Abstractions and Technology — 43 Chapter 1 — Computer Abstractions and Technology — 44
§ §
1.5 1.5 Technologies for Building Processors and Memory 1.5 Technologies for Building Processors and Memory Manufacturing ICs Intel Core i7 Wafer
300mm wafer, 280 chips, 32nm technology
Yield: proportion of working dies per wafer Each chip is 20.7 by 10.5 mm
Chapter 1 — Computer Abstractions and Technology — 45 Chapter 1 — Computer Abstractions and Technology — 46
§
§
1.6 1.6 Performance 1.5 1.5 Technologies for Building Processors and Memory Integrated Circuit Cost Defining Performance
Which airplane has the best performance? Cost per wafer Cost per die Dies per wafer Yield Boeing 777 Boeing 777 Dies per wafer Wafer area Die area Boeing 747 Boeing 747 BAC/Sud BAC/Sud Concorde Concorde 1 Douglas Douglas DC- DC-8-50 8-50 Yield 2 (1 (Defects per areaDie area/2)) 0 100 200 300 400 500 0 2000 4000 6000 8000 10000 Passenger Capacity Cruising Range (miles)
Nonlinear relation to die area and defect rate Boeing 777 Boeing 777
Boeing 747 Boeing 747 Wafer cost and area are fixed BAC/Sud BAC/Sud Concorde Concorde Defect rate determined by manufacturing process Douglas Douglas DC- DC-8-50 8-50 Die area determined by architecture and circuit design 0 500 1000 1500 0 200000 400000
Cruising Speed (mph) Passengers x mph
Chapter 1 — Computer Abstractions and Technology — 47 Chapter 1 — Computer Abstractions and Technology — 48
8
§ §
1.6 1.6 Performance 1.6 Performance Response Time and Throughput Relative Performance
Response time Performance = 1/Execution Time How long it takes to do a task “X is n times faster than Y” Throughput Performanc eX Performanc eY Total work done per unit time e.g., tasks/transactions/… per hour Execution time Y Execution time X n
How are response time and throughput affected Example: time taken to run a program by 10s on A, 15s on B Replacing the processor with a faster version? Execution Time / Execution Time Adding more processors? B A = 15s / 10s = 1.5 = 1½ We’ll focus on response time for now… So A is 1½ times faster than B
Chapter 1 — Computer Abstractions and Technology — 49 Chapter 1 — Computer Abstractions and Technology — 50
§ §
1.6 Performance 1.6 Performance Measuring Execution Time CPU Clocking
Elapsed time Operation of digital hardware governed by a Total response time, including all aspects constant-rate clock Processing, I/O, OS overhead, idle time Clock period Determines system performance Clock (cycles) CPU time Data transfer Time spent processing a given job and computation Minus I/O time, other jobs’ shares Update state Includes user CPU time and system CPU time
Different programs are affected differently by CPU Clock period: duration of a clock cycle and system performance –12 Running on servers – I/O performance – hardware and e.g., 250ps = 0.25ns = 250×10 s software Clock frequency (rate): cycles per second Total elapsed time is of interest 9 Define performance metric and then proceed e.g., 4.0GHz = 4000MHz = 4.0×10 Hz
Chapter 1 — Computer Abstractions and Technology — 51 Chapter 1 — Computer Abstractions and Technology — 52
§ §
1.6 1.6 Performance 1.6 Performance CPU Time CPU Time Example
Computer A: 2GHz clock, 10s CPU time CPU Time CPU Clock CyclesClock Cycle Time Designing Computer B CPU Clock Cycles Aim for 6s CPU time Clock Rate Can do faster clock, but causes 1.2 × clock cycles How fast must Computer B clock be? Performance can be improved by Clock Cycles 1.2Clock Cycles Clock Rate B A Reducing number of clock cycles B CPU TimeB 6s Increasing clock rate Clock CyclesA CPU Time A Clock Rate A Hardware designer must often trade off clock 10s 2GHz 20109 rate against cycle count 1.2 20109 24109 Clock Rate 4GHz B 6s 6s
Chapter 1 — Computer Abstractions and Technology — 53 Chapter 1 — Computer Abstractions and Technology — 54
9
§ §
1.6 1.6 Performance 1.6 Performance Instruction Count and CPI CPI Example
Computer A: Cycle Time = 250ps, CPI = 2.0 Clock Cycles Instruction Count Cycles Per Instruction Computer B: Cycle Time = 500ps, CPI = 1.2 CPU Time Instruction Count CPIClock Cycle Time Same ISA Instruction Count CPI Which is faster? by how much? Clock Rate CPU Time Instruction Count CPI Cycle Time Instruction Count for a program A A A I 2.0 250ps I500ps A is faster… Determined by program, ISA, and compiler CPU Time Instruction Count CPI Cycle Time Average cycles per instruction B B B I1.2500ps I 600ps Determined by CPU hardware CPU Time If different instructions have different CPI B I 600ps 1.2 …by this much Average CPI gets affected by instruction mix CPU Time I500ps (dynamic frequency of instructions) A
Chapter 1 — Computer Abstractions and Technology — 55 Chapter 1 — Computer Abstractions and Technology — 56
§ §
1.6 Performance 1.6 Performance CPI in More Detail CPI Example
If different instruction classes take different Alternative compiled code sequences using numbers of cycles instructions in classes A, B, C
n Class A B C CPI for class 1 2 3 Clock Cycles (CPIi Instruction Count i ) i1 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Weighted average CPI
n Sequence 1: IC = 5 Sequence 2: IC = 6 Clock Cycles Instruction Count i CPI CPIi Clock Cycles Clock Cycles Instruction Count Instruction Count i1 = 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3 = 10 = 9 Relative frequency Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 57 Chapter 1 — Computer Abstractions and Technology — 58
§
§
1.6 1.6 Performance 1.7 1.7 ThePower Wall Performance Summary
The BIG Picture
Instructions Clock Cycles Seconds CPU Time Program Instruction Clock Cycle
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
Chapter 1 — Computer Abstractions and Technology — 59 Chapter 1 — Computer Abstractions and Technology — 60
10
§ §
1.7 1.7 ThePower Wall 1.7 ThePower Wall Power Trends Reducing Power
Suppose a new CPU has
85% of capacitive load of old CPU
15% voltage and 15% frequency reduction
2 Pnew Cold 0.85(Vold 0.85) Fold 0.85 4 2 0.85 0.52 Pold Cold Vold Fold Clock rate and Power for Intel x86 microprocessors over eight generations The power wall In CMOS IC technology We can’t reduce voltage further 2 Power Capacitive load Voltage Frequency We can’t remove more heat
×30 5V → 1V ×1000 How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 61 Chapter 1 — Computer Abstractions and Technology — 62
§ §
1.8 The Sea Change: The Switchto Multiprocessors 1.8 1.8 TheSea Change: The Switchto Multiprocessors Uniprocessor Performance Multiprocessors
Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
Hardware executes multiple instructions at once
Hidden from the programmer
Hard to do (Why?)
Technology driven Technology Programming for performance
Load balancing Advanced architectural and organizational ideas Optimizing communication and synchronization Constrained by power, instruction-level parallelism, memory latency
Chapter 1 — Computer Abstractions and Technology — 63 Chapter 1 — Computer Abstractions and Technology — 64
§ 1.9 Concluding Remarks Concluding Remarks Acknowledgement
Cost/performance is improving The slides are adopted from Computer
Due to underlying technology development Organization and Design, 5th Edition by David A. Patterson and John L. Hennessy Hierarchical layers of abstraction 2014, published by MK (Elsevier) In both hardware and software Instruction set architecture
The hardware/software interface Execution time
The best performance measure Power is a limiting factor
Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 65 Chapter 1 — Computer Abstractions and Technology — 66
11