Andrew Eggers & Ding Zhuo ` CPU Overview ◦ What is a CPU? ◦ History of CPU ` Single-Core Processors Limitations ` Multi-Core Processors ◦ Hyper Threading ◦ and AMD CPUs ` Performance ` Multi-Core Processors Limitations ` Future Developments ◦ Many-Core Processors ` Questions ` References ` What is a CPU? ◦ Stands for . ◦ Is the brain of a . x Tells the computer what to do and when to do it. ◦ Also known as “core” ` A 4-bit CPU released by Intel Corporation in 1971 ` The first complete CPU on one chip ` Instruction execution time: 92,600 IPC

Intel 4004 Produced Late 1971 Manufacturer Intel Max. CPU clock 108 kHz to rate 740 kHz Min. feature size 10µm Instruction set 4-bit BCD oriented Successor Package(s) •16-pin DIP

` Clock Speed Roadblock: ◦ Problem: x Single-core processor x Computer processors can not physically handle running at a speed of more than 4 gigahertz. ◦ Solution: x Multi-core processor x To increase the number of cores instead of the . ` The idea of having multiple cores is to have multiple processors perform the same tasks just like a single processor but perform the tasks at a faster speed.

` The software has to be written in a multi-threaded or multi-process manner to take full advantage of the hardware. ` Hyper Threading ◦ Simultaneous, multithreading implementation x Allows multiple threads to run simultaneously x Used on , , and “Core i Series” CPUs. ◦ Combines thread I/O to increase efficiency ◦ Takes advantage of super scalar architecture x Instruction level parallelism Estimated Die Key Release Last-level Process node TDP Code name Cores Threads transistors area products date cache size (Nanometers) (W) (Millions) (mm²)

September, Lynnfield Core i5, i7 4 8 8 MB 45 774 296 95 2009

Core i7- July, 2010 6 12 12 MB 32 1168 248 130 970, 990X

Sandy January, Core i5, i7 4 8 8 MB 32 995 216 95 Bridge 2011

Ivy Bridge Core i5, i7 April, 2012 4 8 8 MB 22 1400 160 77

January, Deneb Phenom II 4 4 6 MB 45 758 258 140 2009 Phenom II Thuban April, 2010 6 6 6 MB 45 904 346 125 X6

October, Bulldozer FX 8 8 8 MB 32 1200 315 140 2011 x i7 Ivy Bridge was released in April, 2012. x Clock rate: 2.5 – 3.5 GHz x TDP: 77 Watts

L1 cache 64 kB per core L2 cache 256 kB per core L3 cache 3 MB to 8 MB shared x Clock rate: 3.6 – 4.2 GHz x TDP: 140 Watts ` Does having “more core processors” mean “faster”?

◦ Yes, for task level parallelism. x Multi-core processors can always run multiple programs concurrently, as independent processes.

◦ Not necessarily, for single applications. x Limited by data parallelism.

` Problems: x Speed vs. number of processors is not linear due to lock contention while accessing shared or dependent resources. x Higher memory bandwidth needed x Inter-core communication and caching x Power Consumption / Cooling

` Solutions: x Code optimization x Create network between cores. x Reduce clock speed, to reduce power consumption. ` Multi-core Processors: ◦ Currently used across many application domains such as general-purpose, embedded, network, DSP(Digital Signal Processing), and graphics.

` Many-core Processors: ◦ Research and network applications. ◦ Like cluster computing, but with lower latency between cores. ◦ Not very common yet. ` Purpose: ◦ Provide communication between on-chip modules. ◦ Reduce latency for memory caching. ◦ I/O

` Design Choices: ◦ Dynamic vs. Static ◦ Electrical vs. Wireless vs. Optical ` Some companies working on many core: ◦ Intel – intended for supercomputer use ◦ Tilera – intended for network applications ◦ Adapteva – focused on low power, low cost ` Number of Cores: 64 ` Type of Core: ` Clock speed: 1.2GHz ` Caching: Shared L3 ` Network: Ring Bus ` Performance: 1 GFLOPS/W ` Purpose: High Performance

` Number of Cores: 36, 64, 100 ` Type of Core: VLIW (Very Long Instruction Word) ` Clock speed: 700 MHz ` Caching: No L3 cache, shareable L2. ` Network: Dynamic Mesh of Routers ` Purpose: Networking, Cloud Computing

` Number of Cores: 16 (planning for 4096 core version) ` Type of Core: Floating Point RISC ` Clock speed: 1 GHz ` Caching: No L3 cache, shareable L2. ` Network: Dynamic Mesh of Routers. ` Performance: 35 GFLOPS/W ` Purpose: Low power, low cost (~$99), and scalability.

` www.intel.com/technology/architecture-silicon/core/ ` http://www.intel.com/content/www/us/en/processor-comparison/intel- processor-ratings.html ` http://en.wikipedia.org/wiki/Central_processing_unit ` http://en.wikipedia.org/wiki/Intel_4004 ` http://www.eetimes.com/electronics-news/4217220/Tilera-launches-many- core-processor