Outline ECE473 Computer Architecture and Organization • Technology Trends • Introduction to Computer Technology Trends Architecture Lecturer: Prof. Yifeng Zhu Fall, 2009 Portions of these slides are derived from: ECE473 Lec 1.1 ECE473 Lec 1.2 Dave Patterson © UCB Birth of the Revolution -- What If Your Salary? The Intel 4004 • Parameters – $16 base First Microprocessor in 1971 – 59% growth/year – 40 years • Intel 4004 • 2300 transistors • Initially $16 Æ buy book • Barely a processor • 3rd year’s $64 Æ buy computer game • Could access 300 bytes • 16th year’s $27 ,000 Æ buy cacar of memory • 22nd year’s $430,000 Æ buy house th @intel • 40 year’s > billion dollars Æ buy a lot Introduced November 15, 1971 You have to find fundamental new ways to spend money! 108 KHz, 50 KIPs, 2300 10μ transistors ECE473 Lec 1.3 ECE473 Lec 1.4 2002 - Intel Itanium 2 Processor for Servers 2002 – Pentium® 4 Processor • 64-bit processors Branch Unit Floating Point Unit • .18μm bulk, 6 layer Al process IA32 Pipeline Control November 14, 2002 L1I • 8 stage, fully stalled in- cache ALAT Integer Multi- Int order pipeline L1D Medi Datapath RF @3.06 GHz, 533 MT/s bus cache a • Symmetric six integer- CLK unit issue design HPW DTLB 1099 SPECint_base2000* • IA32 execution engine 1077 SPECfp_base2000* integrated 21.6 mm L2D Array and Control L3 Tag • 3 levels of cache on-die totaling 3.3MB 55 Million 130 nm process • 221 Million transistors Bus Logic • 130W @1GHz, 1.5V • 421 mm2 die @intel • 142 mm2 CPU core L3 Cache ECE473 Lec 1.5 ECE473 19.5mm Lec 1.6 Source: http://www.specbench.org/cpu2000/results/ @intel 2006 - Intel Core Duo Processors for Desktop 2008 - Intel Core i7 64-bit x86-64 PERFORMANCE • Successor to the Intel Core 2 family 40% • Max CPU clock: 2.66 GHz to 3.33 GHz • Cores :4(: 4 (physical)8(), 8 (logical) • 45 nm CMOS process • Adding GPU into the processor POWER 40% …relative to Intel® Pentium® D 960 When compared to the Intel® Pentium® D processor 960. Performance measured using SPECint* rate base2000. Actual performance may vary. Energy efficiency based on Thermal Design Power (TDP) measurement. See http://www.intel.com/performance for more information. ECE473 Lec 1.7 ECE473 Lec 1.8 Technology Trends: Moore’s Law Amazing Underlying Technology Change • Gordon Moore (Founder of Intel) observed in 1965 that the number of transistors on a chip doubles about every 24 months. • In 1965, Gordon Moore • In fact, the number of transistors on a chip doubles about every 18 sketched out his prediction of months. the pace o f s ilicon tec hno logy. • Moore's Law: The number of transistors incorporated in a chip will approximately double every 24 months. • Decades later, Moore's Law remains true. From Intel From intel ECE473 Lec 1.9 ECE473 Lec 1.10 Intel processor How did we do so far ? technology 2 years cycle Moore´s Law applied to the travel industry • A flight from New York to Paris 180 nm Reach a wall at 1999 130 nm this point 2001 90 nm 2003 65 nm 2005 32 45 nm nm 22 2007 2009 nm 2011 ECE473 Lec 1.11 ECE473 Lec 1.12 AMD64 Dual Core Processor IBM Power4 Dual Processor on a Chip • Two AMD Opteron™ CPU Two cores (~30M transistors each) cores on one single die, each with 1MB L2 cache Core 0 • 90nm, ~205 million 1-MB L2 transistors* – Approximately same die size as 130nm single-core AMD Large Shared L2: Opteron processor* Multi-ported: 3 • 95 watt power envelope independent Northbridge fits into 90nm power L3 & Mem slices infrastructure Controller: L3 tags • Dual-core processors for Chip-to-Chip & on-die for client market are expected MCM-to-MCM full-speed to follow Fabric: coherency 1-MB L2 Glueless SMP checks Core 1 @IBM *Based on current revisions of the design ECE473 *Other names and brands may be claimed as the property of others Lec 1.13 ECE473 Lec 1.14 Niagara: Multithreaded SPARC Processor Niagara Architecture ECE473 Lec 1.15 ECE473 Lec 1.16 Cell Overview Cell Overview - Main Processor Cell Prototype Die (Pham et al, ISSCC 2005) Cell Prototype Die (Pham et al, ISSCC 2005) S S S S S S S S P P P P P P P P P U U U U P U U U U R R M P B M P B MIB R MIB R I U I I U I A A C C C C C C S S S S S S S S P P P P P P P P UUUU UUUU •IBM/Toshiba/Sony joint project - 4-5 years, 400 designers •One 64-bit PowerPC processor – 234 million transistors, ~80 watts at 4+ Ghz – 4+ Ghz, dual issue, two threads – 256 Gflops (billions of floating pointer operations per second) – 512 kB of second-level cache ECE473 – Used in Sony PlayStation 3 Lec 1.17 ECE473 Lec 1.18 Cell Overview - SPE Cell Overview - SPE Cell Prototype Die (Pham et al, ISSCC 2005) Cell Prototype Die (Pham et al, ISSCC 2005) S S S S S S S S P P P P P P P P P U U U U P U U U U R R M P B M P B MIB R MIB R I U I I U I A A C C C C C C S S S S S S S S P P P P P P P P UUUU UUUU •Eight Synergistic Processor Elements •Synergistic Processor Elements – Or “Streaming Processor Elements” – Or “Streaming Processor Elements” – Co-processors with dedicated 256kB of memory (not cache) – Co-processors with dedicated 256kB of memory (not cache) ECE473 Lec 1.19 ECE473 Lec 1.20 Cell Overview - Memory and I/O What else except desktop and server Cell Prototype Die (Pham et al, ISSCC 2005) processors? S S S S P P P P P U U U U R M P B MIB R I U I A C C C S S S S P P P P UUUU •Dual Rambus XDR memory controllers (on chip) – 25.6 GB/sec of memory bandwidth ECE473•76.8 GB/s chip-to-chip bandwidth (to off-chip GPU) Lec 1.21 ECE473 Lec 1.22 Slides from ECE692 of Jie Hu@NJIT Embedded Processors Intel® Atom™ World Embedded Systems Market, 2003, 2004 and 2009 AAGR% 2003 2004 2009 2004-2009 ADDR IO Embedded Software 1,401 1,641 3,448 16.0 FUSE B Embedded IC 34, 681 40, 539 78, 746 14. 2 L2 U CORE S Embedded Boards 3,401 3,693 5,950 10.0 PLL Total 39,483 45,873 88,144 14.0 DATA IO 13x14mm Intel® Atom 45 nm CMOS Used most for Netbook Ultra-Low Power, Small Form Factor, Embedded Applications from “High Growth Expected in the Worldwide Embedded System Market in the Next Five ECE473 Years”, 04/28/2005 Lec 1.23 ECE473 Lec 1.24 Slides from ECE692 of Jie Hu@NJIT Tear-down of iPod Touch 2nd Gen Tear-down of iPod Touch 2nd Gen photo from ifixit.com photo from ifixit.com ECE473 Lec 1.25 ECE473 Lec 1.26 Tear-down of iPod Touch 2nd Gen Tear-down of iPod Shuffle 3rd Gen • The NAND flash memory is a Micron ¾ Weighed 10.7 MLC chip: grams 29F64G08TAA ¾ 10% more • The processor is an than two Apple-branded single letter Samsung-manufactured papers ARM with SDRAM on the package • Rumor: ARM Cortex A8 photo from ifixit.com ECE473 Lec 1.27 ECE473 28 photo from ifixit.com Lec 1.28 Performance Trend CPU determines performance? Doubling the number of people on a project doesn’t speed it up by 2x BBasedased oonn SSPEED,PEED, tthehe CCPUPU hhasas inincreasedcreased ddramatically,ramatically, Similarly, 2x transistors don’t but memory and disk have increased only a little. This magically get you 2x performance has led to dramatic changed in architecture, Operating Possible because of continued Systems, and programming practices. advances in computer architecture. Much of computer architecture is about how do you organize these additional resources to get more done ECE473 Lec 1.29 ECE473 Lec 1.30 Memory Technology Memory Technology • DDR: Double Data Rate SDRAM • BdidhfBandwidth of a memory mod dlule SBmax = SBbus* fbus* 2 where – SBmax: max. memory bandwidth – SBbus: Bandwidth of the memory bus (64 Bit = 8 Bytes) – fbus: Frequency of the memory bus Memory speed improves ~10% per year. http://www.kingston.com/newtech http://en.wikipedia.org/wiki/DDR_SDRAM ECE473 Lec 1.31 ECE473 Lec 1.32 Photo of Disk Head, Arm, Actuator Disk Technology Spindle Arm HdHead Actuator Platters (12) ECE473 Lec 1.33 ECE473 Lec 1.34 Disk Technology Disk Device Terminology Sector Inner Track Head Outer Track Platter Arm Actuator Disk Latency = Seek Time + Rotation Time + Transfer Time Order-of-magnitude times for 4K byte transfers: Seek: 8 ms or less Rotate: 4.2 ms @ 7200 rpm 2007: Hitachi releases the 1TB (1024 Gigabytes (GB) = 1 Terabyte (TB) ) Transfer: 1 ms @ 7200 rpm Hitachi Deskstar 7k100 Disk capacity improves about 60% per year. ECE473 Lec 1.35 ECE473 Lec 1.36 Disk Device Terminology Technology ⇒ dramatic change Disk Latency = Seek Time + Rotation Time + Transfer Time • Processor – transistor number in a chip: about 59% per year – clock rate: about 20% per year • Memory – DRAM capacity: about 60% per year (4x every 3 years) – Memory speed: about 10% per year – Cost per bit: improves about 25% per year • Disk – cappyacity: about 60% p pyer year – Total use of data: 100% per 9 months! • Network Bandwidth – 10 years: 10Mb → 100Mb – 5 years: 100Mb → 1 Gb ECE473 Lec 1.37 ECE473 Lec 1.38 Technology ⇒ dramatic change What is Architecture? • Original sense: – Taking a range of building materials,,p putting tog ether in desirable ways to achieve a building suited to its purpose • In Computer Engineering: – Similar: how parts are put together to achieve some overall goal – Examples: the architecture of a chip, of the Internet, of an enterpridbise database system, an email system, a cable TV distribution system Adapted from David Clark’s, What is “Architecture”? ECE473 From IBM Lec 1.39 ECE473 Lec 1.40 What is “Computer Architecture”? The Rest of this Course • Instruction Set Architecture (ISA) • How are modern ISAs arranged? – Visible to the programmer • How do you organize these millions/billions of – E.g., IA-32, IA-64, SPARC, ARM,… transistors to implement the ISA • Organization – data-processing (workers) – High-level detail of the system – control-logic (managers) – memory (warehouse) » Does it have a cache, full FP support, etc? – parallel systems (multiple worksites) • Hardware • How to bridge the performance gap between – Specifics CPU and memory? » E.g., Pentium 4 at 3GHz vs.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-