<<

Outline ECE473 Computer Architecture and Organization • Technology Trends • Introduction to Computer Technology Trends Architecture

Lecturer: Prof. Yifeng Zhu

Fall, 2009

Portions of these slides are derived from: ECE473 Lec 1.1 ECE473 Lec 1.2 Dave Patterson © UCB

Birth of the Revolution -- What If Your Salary? The 4004 • Parameters – $16 base First in 1971 – 59% growth/year – 40 years • • 2300 transistors • Initially $16 Æ buy book • Barely a processor • 3rd year’s $64 Æ buy computer game • Could access 300 bytes • 16th year’ s $27 ,000 Æ buy cacar of memory • 22nd year’s $430,000 Æ buy house th @intel • 40 year’s > billion dollars Æ buy a lot Introduced November 15, 1971 You have to find fundamental new ways to spend money! 108 KHz, 50 KIPs, 2300 10μ transistors

ECE473 Lec 1.3 ECE473 Lec 1.4 2002 - Intel 2 Processor for Servers 2002 – ® 4 Processor • 64-bit processors Branch Unit Floating Point Unit • .18μm bulk, 6 layer Al process IA32 Pipeline Control November 14, 2002 L1I • 8 stage, fully stalled in- cache ALAT Integer Multi- Int order pipeline L1D Medi Datapath RF @3.06 GHz, 533 MT/s bus cache a • Symmetric six integer- CLK unit issue design HPW DTLB 1099 SPECint_base2000* • IA32 execution engine

1077 SPECfp_base2000* integrated 21.6 mm L2D Array and Control L3 Tag • 3 levels of cache on-die totaling 3.3MB 55 Million 130 nm process • 221 Million transistors Bus Logic • 130W @1GHz, 1.5V • 421 mm2 die @intel • 142 mm2 CPU core L3 Cache

ECE473 Lec 1.5 ECE473 19.5mm Lec 1.6 Source: http://www.specbench.org/cpu2000/results/ @intel

2006 - Duo Processors for Desktop 2008 - Intel Core i7 64-bit -64

PERFORMANCE • Successor to the family 40% • Max CPU clock: 2.66 GHz to 3.33 GHz • Cores :4(: 4 (physical)8(), 8 (logical) • 45 nm CMOS process • Adding GPU into the processor

POWER 40% …relative to Intel® Pentium® D 960

When compared to the Intel® Pentium® D processor 960. Performance measured using SPECint* rate base2000. Actual performance may vary. Energy efficiency based on Thermal Design Power (TDP) measurement. See http://www.intel.com/performance for more information.

ECE473 Lec 1.7 ECE473 Lec 1.8 Technology Trends: Moore’s Law Amazing Underlying Technology Change • Gordon Moore (Founder of Intel) observed in 1965 that the number of transistors on a chip doubles about every 24 months. • In 1965, Gordon Moore • In fact, the number of transistors on a chip doubles about every 18 sketched out his prediction of months. the pace of sili con t ech nol ogy.

• Moore's Law: The number of transistors incorporated in a chip will approximately double every 24 months.

• Decades later, Moore's Law remains true. From Intel

From intel ECE473 Lec 1.9 ECE473 Lec 1.10

Intel processor How did we do so far ? technology 2 years cycle Moore´s Law applied to the travel industry • A flight from New York to Paris

180 nm Reach a wall at 1999 130 nm this point 2001 90 nm 2003 65 nm 2005 32 45 nm nm 22 2007 2009 nm 2011 ECE473 Lec 1.11 ECE473 Lec 1.12 AMD64 Dual Core Processor IBM Power4 Dual Processor on a Chip • Two AMD ™ CPU Two cores (~30M transistors each) cores on one single die, each with 1MB L2 cache Core 0 • 90nm, ~205 million 1-MB L2 transistors* – Approximately same die size as 130nm single-core AMD Large Shared L2: Opteron processor* Multi-ported: 3 • 95 watt power envelope independent Northbridge fits into 90nm power L3 & Mem slices infrastructure Controller: L3 tags • Dual-core processors for Chip-to-Chip & on-die for client market are expected MCM-to-MCM full-speed to follow Fabric: coherency 1-MB L2 Glueless SMP checks Core 1

@IBM

*Based on current revisions of the design ECE473 *Other names and brands may be claimed as the property of others Lec 1.13 ECE473 Lec 1.14

Niagara: Multithreaded SPARC Processor Niagara Architecture

ECE473 Lec 1.15 ECE473 Lec 1.16 Cell Overview Cell Overview - Main Processor Cell Prototype Die (Pham et al, ISSCC 2005) Cell Prototype Die (Pham et al, ISSCC 2005)

S S S S S S S S P P P P P P P P P U U U U P U U U U R R M P B M P B MIB R MIB R I U I I U I A A C C C C C C S S S S S S S S P P P P P P P P UUUU UUUU

•IBM/Toshiba/Sony joint project - 4-5 years, 400 designers •One 64-bit PowerPC processor – 234 million transistors, ~80 watts at 4+ Ghz – 4+ Ghz, dual issue, two threads – 256 Gflops (billions of floating pointer operations per second) – 512 kB of second-level cache ECE473 – Used in Sony PlayStation 3 Lec 1.17 ECE473 Lec 1.18

Cell Overview - SPE Cell Overview - SPE Cell Prototype Die (Pham et al, ISSCC 2005) Cell Prototype Die (Pham et al, ISSCC 2005)

S S S S S S S S P P P P P P P P P U U U U P U U U U R R M P B M P B MIB R MIB R I U I I U I A A C C C C C C S S S S S S S S P P P P P P P P UUUU UUUU

•Eight Synergistic Processor Elements •Synergistic Processor Elements – Or “Streaming Processor Elements” – Or “Streaming Processor Elements” – Co-processors with dedicated 256kB of memory (not cache) – Co-processors with dedicated 256kB of memory (not cache) ECE473 Lec 1.19 ECE473 Lec 1.20 Cell Overview - Memory and I/O What else except desktop and server Cell Prototype Die (Pham et al, ISSCC 2005) processors?

S S S S P P P P P U U U U R M P B MIB R I U I A C C C S S S S P P P P UUUU

•Dual Rambus XDR memory controllers (on chip) – 25.6 GB/sec of memory bandwidth

ECE473•76.8 GB/s chip-to-chip bandwidth (to off-chip GPU) Lec 1.21 ECE473 Lec 1.22 Slides from ECE692 of Jie Hu@NJIT

Embedded Processors Intel® ™ World Embedded Systems Market, 2003, 2004 and 2009

AAGR% 2003 2004 2009 2004-2009 ADDR IO

Embedded Software 1,401 1,641 3,448 16.0 FUSE B Embedded IC 34, 681 40, 539 78, 746 14. 2 L2 U CORE S

Embedded Boards 3,401 3,693 5,950 10.0 PLL Total 39,483 45,873 88,144 14.0 DATA IO

13x14mm

Intel® Atom 45 nm CMOS Used most for Netbook

Ultra-Low Power, Small Form from “High Growth Expected in the Worldwide Embedded System Market in the Next Five

ECE473 ”, 04/28/2005 Lec 1.23 ECE473 Lec 1.24 Slides from ECE692 of Jie Hu@NJIT Years Factor, Embedded Applications Tear-down of iPod Touch 2nd Gen Tear-down of iPod Touch 2nd Gen

photo from ifixit.com photo from ifixit.com ECE473 Lec 1.25 ECE473 Lec 1.26

Tear-down of iPod Touch 2nd Gen Tear-down of iPod Shuffle 3rd Gen

• The NAND is a Micron ¾ Weighed 10.7 MLC chip: grams 29F64G08TAA ¾ 10% more • The processor is an than two Apple-branded single letter Samsung-manufactured papers ARM with SDRAM on the package • Rumor: ARM Cortex A8

photo from ifixit.com ECE473 Lec 1.27 ECE473 28 photo from ifixit.com Lec 1.28 Performance Trend CPU determines performance? Doubling the number of people on a project doesn’t speed it up by 2x BBasedased oonn SSPEED,PEED, tthehe CCPUPU hashas increasedincreased ddramatically,ramatically, Similarly, 2x transistors don’t but memory and disk have increased only a little. This magically get you 2x performance has led to dramatic changed in architecture, Operating Possible because of continued Systems, and programming practices. advances in computer architecture.

Much of computer architecture is about how do you organize these additional resources to get more done

ECE473 Lec 1.29 ECE473 Lec 1.30

Memory Technology Memory Technology

• DDR: Double Data Rate SDRAM • BdidhfBandwidth of a memory mod dlule

SBmax = SBbus* fbus* 2 where

– SBmax: max. memory bandwidth

– SBbus: Bandwidth of the memory bus (64 Bit = 8 Bytes)

– fbus: Frequency of the memory bus

Memory speed improves ~10% per year. http://www.kingston.com/newtech

http://en.wikipedia.org/wiki/DDR_SDRAM ECE473 Lec 1.31 ECE473 Lec 1.32 Photo of Disk Head, Arm, Actuator Disk Technology Spindle

Arm HdHead

Actuator

Platters (12)

ECE473 Lec 1.33 ECE473 Lec 1.34

Disk Technology Disk Device Terminology

Sector Inner Track Head Outer Track

Platter Arm Actuator

Disk Latency = Seek Time + Rotation Time + Transfer Time

Order-of-magnitude times for 4K byte transfers: Seek: 8 ms or less Rotate: 4.2 ms @ 7200 rpm 2007: Hitachi releases the 1TB (1024 Gigabytes (GB) = 1 Terabyte (TB) ) Transfer: 1 ms @ 7200 rpm Hitachi Deskstar 7k100

Disk capacity improves about 60% per year. ECE473 Lec 1.35 ECE473 Lec 1.36 Disk Device Terminology Technology ⇒ dramatic change

Disk Latency = Seek Time + Rotation Time + Transfer Time • Processor – transistor number in a chip: about 59% per year – clock rate: about 20% per year • Memory – DRAM capacity: about 60% per year (4x every 3 years) – Memory speed: about 10% per year – Cost per bit: improves about 25% per year • Disk – cappyacity: about 60% pyper year – Total use of data: 100% per 9 months! • Network Bandwidth – 10 years: 10Mb → 100Mb – 5 years: 100Mb → 1 Gb

ECE473 Lec 1.37 ECE473 Lec 1.38

Technology ⇒ dramatic change What is Architecture?

• Original sense: – Taking a range of building materials,,p putting tog ether in desirable ways to achieve a building suited to its purpose • In Computer Engineering: – Similar: how parts are put together to achieve some overall goal – Examples: the architecture of a chip, of the Internet, of an enterpridbise database system, an email system, a cable TV distribution system

Adapted from David Clark’s, What is “Architecture”?

ECE473 From IBM Lec 1.39 ECE473 Lec 1.40 What is “Computer Architecture”? The Rest of this Course • Instruction Set Architecture (ISA) • How are modern ISAs arranged? – Visible to the programmer • How do you organize these millions/billions of – E.g., IA-32, IA-64, SPARC, ARM,… transistors to implement the ISA • Organization – data-processing (workers) – High-level detail of the system – control-logic (managers) – memory (warehouse) » Does it have a cache, full FP support, etc? – parallel systems (multiple worksites) • Hardware • How to bridge the performance gap between – Specifics CPU and memory? » E.g., at 3GHz vs. Core Duo at 2 GHz – Cache – Redundant Array of Inexpensive Disks (RAID)

ECE473 Lec 1.41 ECE473 Lec 1.42

Computer Engineering Methodology Summary Evaluate Existing Implementation 1. Moore’s laws: The number of transistors Systems for Complexity incorporated in a chip will approximately Bottlenecks double every 18 months. Benchmarks Technology 2. CPU speed increases dramatically, but the Trends speed of memory, disk and network increases slowly. Implement Next Simulate New Generation System Designs and 3. Architecture design is an iterative process. Organizations Measure performance: Benchmarks Workloads

Architecture design is an iterative process: Searching the space of possible designs at all levels of computer systems ECE473 Lec 1.43 ECE473 Lec 1.44