CSE2021 Computer Organization CSE2021 Computer Organization

 Instructor: Gulzar Khuwaja, PhD

 Email: [email protected] Tel: (416) 736-2100 x 77874 Instructor: Gulzar Khuwaja, PhD  Course Website: Department of Electrical Engineering https://wiki.eecs.yorku.ca/course_archive/2014-15/S/2021/ & Computer Science  Schedule: Lassonde School of Engineering Lectures: R 7:00 – 10:00 pm Room CLH M (206) York University Labs: M 7:00 – 10:00 pm Room LAS 1006 Office Hrs: R 5:00 – 06:30 pm Room LAS 2018

Chapter 1 — Computer Abstractions and Technology — 2

COMPUTER ORGANIZATION AND DESIGN 5th Edition CSE2021 Computer Organization The Hardware/Software Interface Text Books:

 Computer Organization and Design The Hardware/Software Interface, 5th Edition (2014) by David A. Patterson and John L. Hennessy Chapter 1 Morgan Kaufmann Publishers (Elsevier) ISBN 978-0-12-407726-3 Computer Abstractions  MIPS Assembly Language Programming 2003 by Robert Britton, Pearson Publishers ISBN-10:0131420445 and Technology Assessment:

 Quizzes: 11%

 Labs: 24%

 Midterm Exam: 25%

 Final Exam: 40%

Chapter 1 — Computer Abstractions and Technology — 3 Chapter 1 — Computer Abstractions and Technology — 4

§ 1.1 Introduction COMPUTER ORGANIZATION AND DESIGN 5th Edition The Hardware/Software Interface Moore’s Law

 Moore’s Law states that integrated circuit Computer Abstractions resources double every 18–24 months.

and Technology  Moore’s Law resulted from a 1965 prediction of such growth in IC capacity - Introduction - Eight Great Ideas in Computer Architecture made by Gordon Moore, one of the - Below Your Program - Under the Covers founders of . - Technologies for Building Processors and Memory  Moore’s Law graph to represent - Performance - The Power Wall designing for rapid change - The Switch from Uniprocessors to Multiprocessors - Concluding Remarks Chapter 1 — Computer Abstractions and Technology — 5 Chapter 1 — Computer Abstractions and Technology — 6

1

§ §

1.1 Introduction 1.1 1.1 Introduction Moore’s Law Moore’s Law

Year of introduction Transistors 4004 1971 2,250 8008 1972 2,500 8080 1974 5,000 8086 1978 29,000 286 1982 120,000 386™ 1985 275,000 486™ DX 1989 1,180,000 Pentium® 1993 3,100,000 Pentium II 1997 7,500,000 Pentium III 1999 24,000,000 2000 42,000,000 Source: ISSCC 2003 G. Moore “No exponential is forever, but ‘forever’ can be delayed”

Chapter 1 — Computer Abstractions and Technology — 7 Chapter 1 — Computer Abstractions and Technology — 8

§ §

1.1 Introduction 1.1 1.1 Introduction Is Moore’s Law Ending? Is Moore’s Law Ending? (2)

 Intel’s former chief architect Bob Colwell says: Moore’s law will be dead within a decade (August 2013).

 The end of Moore's Law is on the horizon, says AMD.

 Theoretical physicist Michio Kaku believes Moore's Law has about 10 years of life left before ever-shrinking transistor sizes smack up against limitations imposed by the laws of thermodynamics and quantum physics (April 2013).

Chapter 1 — Computer Abstractions and Technology — 9 Chapter 1 — Computer Abstractions and Technology — 10

§ §

1.1 Introduction 1.1 Introduction Bell’s Law Bell’s Law

Bell's Law for the birth and death of computer classes:

 Bell's law of computer classes formulated by Gordon Bell in 1972 describes how computing systems (computer classes) form, evolve and may eventually die out.

 Roughly every decade a new, lower priced computer class forms based on a new programming platform, network, and interface resulting in new industry.

 In 1951, men could walk inside a computer and now, computers are beginning to “walk” inside of us .

Source: B Bell, “Bell’s Law for the Birth and Death of Computer Classes”, Comms of ACM, 2008

Chapter 1 — Computer Abstractions and Technology — 11 Chapter 1 — Computer Abstractions and Technology — 12

2

§ §

1.1 Introduction 1.1 1.1 Introduction The 1st Generation Computer Computers Today

Source: http://www.computerhistory.org

Chapter 1 — Computer Abstractions and Technology — 13 Chapter 1 — Computer Abstractions and Technology — 14

§ §

1.1 Introduction 1.1 1.1 Introduction The Computer Revolution The Computer Revolution (2)

 Progress in computer technology  Computers in Automobiles

 Underpinned by Moore’s Law  reduce pollution, improve fuel efficiency via engine controls, and increase safety through  Makes novel applications feasible blind spot warnings, lane departure warnings,  Computers in Automobiles moving object detection, and air bag inflation  Cell Phones to protect occupants in a crash  Human Genome project  Cell Phones  World Wide Web  more than half of the planet having mobile  Search Engines phones, allowing person-to-person  Computers are pervasive communication to almost anyone anywhere in the world

Chapter 1 — Computer Abstractions and Technology — 15 Chapter 1 — Computer Abstractions and Technology — 16

§ §

1.1 Introduction 1.1 Introduction The Computer Revolution (3) The Computer Revolution (4)

 Human Genome Project  World Wide Web  a global effort to identify the estimated 30,000  has transformed our society. Web has genes in human DNA to figure out the replaced libraries and newspapers sequences of the chemical bases that make up human DNA to address ethical, legal, and  Search Engines social issues  As the content of the web grew in size and in value, many people rely on search engines  The cost of computer equipment to map and analyze human DNA sequences was for such a large part of their lives that it would hundreds of millions of dollars. Since, costs be a hardship to go without them continue to drop, we will soon be able to acquire our own genome, allowing medical care to be tailored to us Chapter 1 — Computer Abstractions and Technology — 17 Chapter 1 — Computer Abstractions and Technology — 18

3

§ §

1.1 Introduction 1.1 1.1 Introduction Classes of Computers Classes of Computers (2)

 Personal computers  Personal computers  Computers designed for use by an individual  Server computers  General purpose, variety of software  Supercomputers  Subject to cost/performance tradeoff  Embedded computers  Server computers

 Computers used for running larger programs for multiple users, often simultaneously

 Network based

 High capacity, performance, reliability

 Range from small servers to building sized

Chapter 1 — Computer Abstractions and Technology — 19 Chapter 1 — Computer Abstractions and Technology — 20

§ §

1.1 Introduction 1.1 1.1 Introduction Classes of Computers (3) The PostPC Era

 Supercomputers  Smart phones represent  Consist thousands of processors and many the recent growth in cell terabytes of memory phone industry, and they passed PCs in 2011.  High-level scientific and engineering calculations  Tablets are the fastest  Highest capability and cost but represent a small growing category, nearly fraction of the overall computer market doubling between 2011  Embedded computers and 2012.

 Hidden as components of systems  Recent PCs and traditional cell phone  Largest class of computers and span the widest range of applications categories are relatively flat or declining.  Strict power/performance/cost limitations

Chapter 1 — Computer Abstractions and Technology — 21 Chapter 1 — Computer Abstractions and Technology — 22

§ §

1.1 Introduction 1.1 Introduction The PostPC Era (2) The PostPC Era (3)

 Personal Mobile Devices (PMDs)  Cloud computing

 Small wireless devices  Term cloud essentially used for the Internet

 Battery operated  Portion of software run on a PMD and portion

 Connects to the Internet run in the Cloud

 Hundreds of dollars  Warehouse Scale Computers (WSCs)  Big datacenters containing 100,000 servers  Smart phones, tablets, electronic glasses  Amazon and Google cloud vendors

 Software as a Service (SaaS)

 Delivers software and data as a service over the Internet

 Web search and social networking

Chapter 1 — Computer Abstractions and Technology — 23 Chapter 1 — Computer Abstractions and Technology — 24

4

§ §

1.1 Introduction 1.1 1.1 Introduction What You Will Learn Understanding Performance

 How programs are translated into  Algorithm machine language  determines number of operations executed

 and how hardware executes them  Programming language, compiler, architecture

 The hardware/software interface  determine number of machine instructions executed per operation  What determines program performance  Processor and memory system  and how it can be improved  determine how fast instructions are executed  How hardware designers improve  I/O system (including OS) performance  determine how fast I/O operations are  What is parallel processing executed

Chapter 1 — Computer Abstractions and Technology — 25 Chapter 1 — Computer Abstractions and Technology — 26

§ §

1.2 1.2 Eight Great Ideas in Computer Architecture 1.2 Eight Great Ideas in Computer Architecture Eight Great Design Ideas Eight Great Design Ideas (2)

 Design for Moore’s Law  Performance via Parallelism

 Computer designs can take years, resources available  A form of computation in which many calculations per chip can easily double or quadruple between start are carried out simultaneously and finish of project  Performance via Pipelining  Anticipate where technology will be when design finishes  A particular pattern of parallelism  A set of data processing elements connected in  Use Abstraction to simplify design series, so that the output of one element is the input  Hide lower-level details to offer a simpler model at higher of the next one levels  Performance via Prediction  Make the Common Case Fast  It can be faster on average to guess and start  To enhance performance better than optimizing the working rather than wait rare case

Chapter 1 — Computer Abstractions and Technology — 27 Chapter 1 — Computer Abstractions and Technology — 28

§ §

1.2 Eight Great Ideas in Computer Architecture 1.3 Below YourProgram Eight Great Design Ideas (3) Below Your Program

 Hierarchy of memories  Application software

 The closer to the top, the faster and more expensive  Written in high-level language per bit of memory  System software  The wider the base of the layer, the bigger the  Compiler: translates HLL code to memory machine code  Dependability via redundancy  Operating System:  Handling input/output  Design systems dependable by including redundant components  Managing memory and storage  Scheduling tasks & sharing resources  to help detect failures, and  Hardware  to take over when failure occurs  Processor, memory, I/O controllers

Chapter 1 — Computer Abstractions and Technology — 29 Chapter 1 — Computer Abstractions and Technology — 30

5

§

§

1.4 1.4 Under the Covers 1.3 1.3 Below YourProgram Levels of Program Code Components of a Computer

 High-level language The BIG Picture  Same components for  Level of abstraction closer all kinds of computers to problem domain  Desktop, Server,  Provides for productivity Embedded and portability  Input/Output includes  Assembly language  User-interface devices  Textual representation of instructions  Display, keyboard, mouse  Hardware representation  Storage devices  Hard disk, CD/DVD, flash  Binary digits (bits)  Network adapters  Encoded instructions and data  For communicating with other computers

Chapter 1 — Computer Abstractions and Technology — 31 Chapter 1 — Computer Abstractions and Technology — 32

§ §

1.4 1.4 Under the Covers 1.4 Under the Covers Touchscreen Through the Looking Glass

 PostPC device  LCD screen: picture elements (pixels)  Each coordinate in the frame buffer on the left  Supersedes keyboard and mouse determines the shade of the corresponding coordinate for the raster scan CRT display on the right.  Resistive and  Pixel (X0,Y0) contains the bit pattern 0011, which is a Capacitive types lighter shade on the screen than the bit pattern 1101 in

 Most tablets, smart pixel (X1,Y1). phones use capacitive

 Capacitive allows multiple touches simultaneously

Chapter 1 — Computer Abstractions and Technology — 33 Chapter 1 — Computer Abstractions and Technology — 34

§ §

1.4 1.4 Under the Covers 1.4 Under the Covers Opening the Box Inside the Processor (CPU)

Capacitive multitouch LCD screen  Datapath: performs operations on data 3.8 V, 25 Watt-hour battery  Control: sequences datapath, memory, ... Computer board  Cache memory Apple A5 dual ARM  Small fast SRAM memory for immediate processor cores chip access to data

Components of the Apple iPad 2 32 GB flash memory chip Chapter 1 — Computer Abstractions and Technology — 35 Chapter 1 — Computer Abstractions and Technology — 36

6

§ §

1.4 1.4 Under the Covers 1.4 Under the Covers Inside the Processor (CPU)-2 Abstractions

 The processor IC The BIG Picture inside Apple A5  Abstraction helps us deal with complexity package  Hides lower-level details  Size of chip is 12.1 by 10.1 mm  Instruction Set Architecture (ISA) or  Graphical processor unit (GPU) with four Computer Architecture datapaths  The hardware/software interface  Two identical ARM processors  Includes instructions, registers, memory

 Interfaces to main access, I/O, and so on memory (DRAM)  Operating system hides details of doing I/O, allocating memory from programmers

Chapter 1 — Computer Abstractions and Technology — 37 Chapter 1 — Computer Abstractions and Technology — 38

§ §

1.4 1.4 Under the Covers 1.4 Under the Covers A Safe Place for Data Networks

 Volatile main memory Backbone of computer systems  Loses instructions and data when power off  Communication  Non-volatile secondary memory  Information is exchanged between computers at high  Flash memory speeds  Optical disk (CDROM, DVD)

 Magnetic disk  Resource sharing

 Rather than each computer having its own I/O devices, computers on the network can share I/O devices

 Nonlocal access

 By connecting computers over long distances, users need not be near the computer they are using

Chapter 1 — Computer Abstractions and Technology — 39 Chapter 1 — Computer Abstractions and Technology — 40

§

§

1.4 1.4 Under the Covers 1.5 1.5 Technologies for Building Processors and Memory Networks (2) Technology Trends

 Electronics  Local area network (LAN): Ethernet technology  A network designed to carry data within a continues geographically confined area, typically within to evolve a single building – 40 gigabits/s  Increased capacity and performance

  Reduced cost Wide area network (WAN): the Internet DRAM Capacity per chip over time

 A network extended over hundreds of Year Technology Relative performance/cost kilometers that can span a continent 1951 Vacuum tube 1 1965 Transistor 35  Wireless network: WiFi, Bluetooth 1975 Integrated circuit (IC) 900  transmission rates from 1 to100 million bits 1995 Very large scale IC (VLSI) 2,400,000 per second 2013 Ultra large scale IC 250,000,000,000

Chapter 1 — Computer Abstractions and Technology — 41 Chapter 1 — Computer Abstractions and Technology — 42

7

§ §

1.5 1.5 Technologies for Building Processors and Memory 1.5 Technologies for Building Processors and Memory Semiconductor Technology Manufacturing ICs

 Silicon: Semiconductors

 Add materials to transform properties:

 Conductors

 adding copper or aluminum wire

 Insulators

 like plastic or glass

 Switches (transistors)

 conduct or insulate under special conditions

Chapter 1 — Computer Abstractions and Technology — 43 Chapter 1 — Computer Abstractions and Technology — 44

§ §

1.5 1.5 Technologies for Building Processors and Memory 1.5 Technologies for Building Processors and Memory Manufacturing ICs Intel Core i7 Wafer

 300mm wafer, 280 chips, 32nm technology

 Yield: proportion of working dies per wafer  Each chip is 20.7 by 10.5 mm

Chapter 1 — Computer Abstractions and Technology — 45 Chapter 1 — Computer Abstractions and Technology — 46

§

§

1.6 1.6 Performance 1.5 1.5 Technologies for Building Processors and Memory Integrated Circuit Cost Defining Performance

 Which airplane has the best performance? Cost per wafer Cost per die  Dies per wafer  Yield Boeing 777 Boeing 777 Dies per wafer  Wafer area Die area Boeing 747 Boeing 747 BAC/Sud BAC/Sud Concorde Concorde 1 Douglas Douglas DC- DC-8-50 8-50 Yield  2 (1 (Defects per areaDie area/2)) 0 100 200 300 400 500 0 2000 4000 6000 8000 10000 Passenger Capacity Cruising Range (miles)

 Nonlinear relation to die area and defect rate Boeing 777 Boeing 777

Boeing 747 Boeing 747  Wafer cost and area are fixed BAC/Sud BAC/Sud Concorde Concorde  Defect rate determined by manufacturing process Douglas Douglas DC- DC-8-50 8-50  Die area determined by architecture and circuit design 0 500 1000 1500 0 200000 400000

Cruising Speed (mph) Passengers x mph

Chapter 1 — Computer Abstractions and Technology — 47 Chapter 1 — Computer Abstractions and Technology — 48

8

§ §

1.6 1.6 Performance 1.6 Performance Response Time and Throughput Relative Performance

 Response time  Performance = 1/Execution Time  How long it takes to do a task  “X is n times faster than Y”  Throughput Performanc eX Performanc eY  Total work done per unit time e.g., tasks/transactions/… per hour  Execution time Y Execution time X  n

 How are response time and throughput affected  Example: time taken to run a program by  10s on A, 15s on B  Replacing the processor with a faster version?  Execution Time / Execution Time  Adding more processors? B A = 15s / 10s = 1.5 = 1½  We’ll focus on response time for now…  So A is 1½ times faster than B

Chapter 1 — Computer Abstractions and Technology — 49 Chapter 1 — Computer Abstractions and Technology — 50

§ §

1.6 Performance 1.6 Performance Measuring Execution Time CPU Clocking

 Elapsed time  Operation of digital hardware governed by a  Total response time, including all aspects constant-rate clock  Processing, I/O, OS overhead, idle time Clock period  Determines system performance Clock (cycles)  CPU time Data transfer  Time spent processing a given job and computation  Minus I/O time, other jobs’ shares Update state  Includes user CPU time and system CPU time

 Different programs are affected differently by CPU  Clock period: duration of a clock cycle and system performance –12   Running on servers – I/O performance – hardware and e.g., 250ps = 0.25ns = 250×10 s software  Clock frequency (rate): cycles per second  Total elapsed time is of interest 9  Define performance metric and then proceed  e.g., 4.0GHz = 4000MHz = 4.0×10 Hz

Chapter 1 — Computer Abstractions and Technology — 51 Chapter 1 — Computer Abstractions and Technology — 52

§ §

1.6 1.6 Performance 1.6 Performance CPU Time CPU Time Example

 Computer A: 2GHz clock, 10s CPU time CPU Time  CPU Clock CyclesClock Cycle Time  Designing Computer B CPU Clock Cycles  Aim for 6s CPU time  Clock Rate  Can do faster clock, but causes 1.2 × clock cycles  How fast must Computer B clock be?  Performance can be improved by Clock Cycles 1.2Clock Cycles Clock Rate  B  A  Reducing number of clock cycles B CPU TimeB 6s  Increasing clock rate Clock CyclesA  CPU Time A Clock Rate A  Hardware designer must often trade off clock  10s 2GHz  20109 rate against cycle count 1.2 20109 24109 Clock Rate    4GHz B 6s 6s

Chapter 1 — Computer Abstractions and Technology — 53 Chapter 1 — Computer Abstractions and Technology — 54

9

§ §

1.6 1.6 Performance 1.6 Performance Instruction Count and CPI CPI Example

 Computer A: Cycle Time = 250ps, CPI = 2.0 Clock Cycles  Instruction Count Cycles Per Instruction  Computer B: Cycle Time = 500ps, CPI = 1.2 CPU Time  Instruction Count CPIClock Cycle Time  Same ISA Instruction Count CPI   Which is faster? by how much? Clock Rate CPU Time  Instruction Count CPI Cycle Time  Instruction Count for a program A A A  I 2.0  250ps  I500ps A is faster…  Determined by program, ISA, and compiler CPU Time  Instruction Count CPI Cycle Time  Average cycles per instruction B B B  I1.2500ps  I 600ps  Determined by CPU hardware CPU Time  If different instructions have different CPI B I 600ps   1.2 …by this much  Average CPI gets affected by instruction mix CPU Time I500ps (dynamic frequency of instructions) A

Chapter 1 — Computer Abstractions and Technology — 55 Chapter 1 — Computer Abstractions and Technology — 56

§ §

1.6 Performance 1.6 Performance CPI in More Detail CPI Example

 If different instruction classes take different  Alternative compiled code sequences using numbers of cycles instructions in classes A, B, C

n Class A B C CPI for class 1 2 3 Clock Cycles  (CPIi Instruction Count i ) i1 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1  Weighted average CPI

n  Sequence 1: IC = 5  Sequence 2: IC = 6 Clock Cycles  Instruction Count i  CPI   CPIi    Clock Cycles  Clock Cycles Instruction Count  Instruction Count i1   = 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3 = 10 = 9 Relative frequency  Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5

Chapter 1 — Computer Abstractions and Technology — 57 Chapter 1 — Computer Abstractions and Technology — 58

§

§

1.6 1.6 Performance 1.7 1.7 ThePower Wall Performance Summary

The BIG Picture

Instructions Clock Cycles Seconds CPU Time    Program Instruction Clock Cycle

 Performance depends on

 Algorithm: affects IC, possibly CPI

 Programming language: affects IC, CPI

 Compiler: affects IC, CPI

 Instruction set architecture: affects IC, CPI, Tc

Chapter 1 — Computer Abstractions and Technology — 59 Chapter 1 — Computer Abstractions and Technology — 60

10

§ §

1.7 1.7 ThePower Wall 1.7 ThePower Wall Power Trends Reducing Power

 Suppose a new CPU has

 85% of capacitive load of old CPU

 15% voltage and 15% frequency reduction

2 Pnew Cold 0.85(Vold 0.85) Fold 0.85 4  2  0.85  0.52 Pold Cold  Vold Fold Clock rate and Power for Intel x86 microprocessors over eight generations  The power wall  In CMOS IC technology  We can’t reduce voltage further 2 Power  Capacitive load  Voltage Frequency  We can’t remove more heat

×30 5V → 1V ×1000  How else can we improve performance?

Chapter 1 — Computer Abstractions and Technology — 61 Chapter 1 — Computer Abstractions and Technology — 62

§ §

1.8 The Sea Change: The Switchto Multiprocessors 1.8 1.8 TheSea Change: The Switchto Multiprocessors Uniprocessor Performance Multiprocessors

 Multicore microprocessors

 More than one processor per chip

 Requires explicitly parallel programming

 Compare with instruction level parallelism

 Hardware executes multiple instructions at once

 Hidden from the programmer

 Hard to do (Why?)

Technology driven Technology  Programming for performance

 Load balancing Advanced architectural and organizational ideas  Optimizing communication and synchronization Constrained by power, instruction-level parallelism, memory latency

Chapter 1 — Computer Abstractions and Technology — 63 Chapter 1 — Computer Abstractions and Technology — 64

§ 1.9 Concluding Remarks Concluding Remarks Acknowledgement

 Cost/performance is improving The slides are adopted from Computer

 Due to underlying technology development Organization and Design, 5th Edition by David A. Patterson and John L. Hennessy  Hierarchical layers of abstraction 2014, published by MK (Elsevier)  In both hardware and software  Instruction set architecture

 The hardware/software interface  Execution time

 The best performance measure  Power is a limiting factor

 Use parallelism to improve performance

Chapter 1 — Computer Abstractions and Technology — 65 Chapter 1 — Computer Abstractions and Technology — 66

11