<<

CS 433 - Computer Architecture

Intel

Murph, Kevin, and Eric History

Intel Low Power Processor History ● Earlier ○ Low power processors were mostly the same as normal models ● M ○ First explicitly mobile processor ○ Built off Pentium III ● Atom ○ Built from ground up History

Emerging Market ● ● Tablets ● Designed to Compete with ARM ● Arm dominated mobile market ○ Had necessary low power consumption ○ Used in phones, GPS, PDA's ● Much of existing ARM software compatible ○ and Windows CE (and now Win8) Intro to the

Atom ● x86 mobile processor ○ Low power consumption ○ Integrated components ● Low power consumption ○ Reduced speed/cores/cache ○ 0.65-13W (most 2.5-7W) ● In-Order (what??) ● 16 stages Branch Prediction

● Two level adaptive predictor ● Global history table ● 128 entry BTB Micro-ops

● Elementary building blocks which make up instructions ○ Used to control small parts of the processor ○ Allows for more freedom with ILP and order ● ○ Allowed Intel expose the x86 CISC interface ○ Internally it was RISC ● Atom - a step in the other direction ○ In-Order ○ Most instructions aren't decoded ○ Lower issue Pipeline

In Order 16 Stages ● 3 Fetch ● 3 Decode ● 2 Dispatch ● 1 Determine Source Operands ● 3 Data Cache Access ● 1 Execute ● 2 Exception & Multithread Handling ● 1 Commit Sleep States

C0 ● Basically everything is still running ● Still has multiple frequency modes to reduce power C1/2 ● Core clock off (not executing instructions) ● L1 Cache flushed C4 ● PLLs off ● L2 Cache flushed C6 ● L1/L2 off ● Very low power consumption

Wakeup time increases as the C-state is higher. Clock Distribution

Traditional Grid Clock Distribution ● Useful for high frequency chips ● Often accounts for 30-35% of power Atom's Gridless System ● Grid consumed too much power ● Atom has lower requirements ● Less than 10% of power Asymmetrical L1 Cache

● 56 kB cache ○ 24 kB data cache (6-way) ○ 32 kB instruction cache (8-way) ○ 64 B line size ● 8 transistors per bit ○ Compare to 6 transistors typically ○ Allows lower power consumption ● Change made late in design phase ● Required 25% reduction in D-cache ● Shared between two threads Variable L2 Cache Size

● 512 kB capacity ○ 8-way associativity ○ 16 cycle latency ● Can shut down 75% of cache ○ Go from 8-way to 2-way ○ From 512 kB to 128 kB ○ Reduces power consumption L1 & L2 Caches

● Instructions with L1 cached memory operands have same latency as those with register operands ● However, memory operands still costly ○ Limits IPC due to consumption of execution ports ○ All instructions using memory operands use the same port ○ Can also increase length of instruction ○ Instruction fetch rate is limited to 8 B per cycle L1 & L2 Caches

● Memory forwarding ○ Memory written in cycle N can be read back at cycle N+1 ○ The Atom can forward even when new instruction is larger or aligned differently ○ However, there is a heavy performance penalty when a cache line boundary is crossed ● Thanks to forwarding, integer operations have low cache latency (1 cycle) ● Latency higher for other ops due to memory unit positioning (4-5 cycles) Caching

● IA-32/64 use the MESI protocol ○ Modified, Exclusive, Shared, Invalid ○ A.K.A. the Illinois protocol ● Each cache line has a 2-bit state ○ Modified - block present only in current cache but is dirty ○ Exclusive - block present only in current cache but is clean ○ Shared - clean and possibly stored in other caches ○ Invalid - unused data Main Memory

● Memory access limited to a single read or write per cycle

● Cannot read/write simultaneously On Die GPUs

● Pros ○ Cheap ○ Low Power Consumption

● Cons ○ Die Usage ○ Performance On Die GPUs

● Cedar Trail - 32nm (2012) ○ DirectX 9 ○ 400-640Mhz ○ Hardware Video Decoding inc. H.264 ○ Intel Wireless Display Dual Core

Two Dies on One Chip 2x power 2x L2 Cache Allows 'binning'

Image from AnandTech.com 2009 - 2010 Hyperthreading

● Two logical processors per physical core

● Per-Thread Hardware ○ Prefetch buffer ○ Integer and FP register files ○ Instruction queue and prefetch ● 10% Power Consumption Increase

● Requires Software Support References

Intel Atom CPU Review http://www.tomshardware.com/reviews/intel-atom-cpu,1947.html

Intel's Atom Architecture: The Journey Begins http://www.anandtech.com/show/2493

The microarchitecture of Intel and AMD CPUs http://www.agner.org/optimize/microarchitecture.pdf

Intel 64 and IA-32 Architectures Software Developer's Manual http://download.intel.com/products/processor/manual/325462.pdf Discussion