
CS 433 - Computer Architecture Intel Atom Murph, Kevin, and Eric History Intel Low Power Processor History ● Earlier x86 ○ Low power processors were mostly the same as normal models ● Pentium M ○ First explicitly mobile processor ○ Built off Pentium III ● Atom ○ Built from ground up History Emerging Market ● Netbooks ● Tablets ● Smartphones Designed to Compete with ARM ● Arm dominated mobile market ○ Had necessary low power consumption ○ Used in phones, GPS, PDA's ● Much of existing ARM software compatible ○ Linux and Windows CE (and now Win8) Intro to the Microarchitecture Atom ● x86 mobile processor ○ Low power consumption ○ Integrated components ● Low power consumption ○ Reduced speed/cores/cache ○ 0.65-13W (most 2.5-7W) ● In-Order (what??) ● 16 stages Branch Prediction ● Two level adaptive predictor ● Global history table ● 128 entry BTB Micro-ops ● Elementary building blocks which make up instructions ○ Used to control small parts of the processor ○ Allows for more freedom with ILP and order ● Pentium Pro ○ Allowed Intel expose the x86 CISC interface ○ Internally it was RISC ● Atom - a step in the other direction ○ In-Order ○ Most instructions aren't decoded ○ Lower issue Pipeline In Order 16 Stages ● 3 Fetch ● 3 Decode ● 2 Dispatch ● 1 Determine Source Operands ● 3 Data Cache Access ● 1 Execute ● 2 Exception & Multithread Handling ● 1 Commit Sleep States C0 ● Basically everything is still running ● Still has multiple frequency modes to reduce power C1/2 ● Core clock off (not executing instructions) ● L1 Cache flushed C4 ● PLLs off ● L2 Cache flushed C6 ● L1/L2 off ● Very low power consumption Wakeup time increases as the C-state is higher. Clock Distribution Traditional Grid Clock Distribution ● Useful for high frequency chips ● Often accounts for 30-35% of power Atom's Gridless System ● Grid consumed too much power ● Atom has lower requirements ● Less than 10% of power Asymmetrical L1 Cache ● 56 kB cache ○ 24 kB data cache (6-way) ○ 32 kB instruction cache (8-way) ○ 64 B line size ● 8 transistors per bit ○ Compare to 6 transistors typically ○ Allows lower power consumption ● Change made late in design phase ● Required 25% reduction in D-cache ● Shared between two threads Variable L2 Cache Size ● 512 kB capacity ○ 8-way associativity ○ 16 cycle latency ● Can shut down 75% of cache ○ Go from 8-way to 2-way ○ From 512 kB to 128 kB ○ Reduces power consumption L1 & L2 Caches ● Instructions with L1 cached memory operands have same latency as those with register operands ● However, memory operands still costly ○ Limits IPC due to consumption of execution ports ○ All instructions using memory operands use the same port ○ Can also increase length of instruction ○ Instruction fetch rate is limited to 8 B per cycle L1 & L2 Caches ● Memory forwarding ○ Memory written in cycle N can be read back at cycle N+1 ○ The Atom can forward even when new instruction is larger or aligned differently ○ However, there is a heavy performance penalty when a cache line boundary is crossed ● Thanks to forwarding, integer operations have low cache latency (1 cycle) ● Latency higher for other ops due to memory unit positioning (4-5 cycles) Caching ● IA-32/64 use the MESI protocol ○ Modified, Exclusive, Shared, Invalid ○ A.K.A. the Illinois protocol ● Each cache line has a 2-bit state ○ Modified - block present only in current cache but is dirty ○ Exclusive - block present only in current cache but is clean ○ Shared - clean and possibly stored in other caches ○ Invalid - unused data Main Memory ● Memory access limited to a single read or write per cycle ● Cannot read/write simultaneously On Die GPUs ● Pros ○ Cheap ○ Low Power Consumption ● Cons ○ Die Usage ○ Performance On Die GPUs ● Cedar Trail - 32nm (2012) ○ DirectX 9 ○ 400-640Mhz ○ Hardware Video Decoding inc. H.264 ○ Intel Wireless Display Dual Core Two Dies on One Chip 2x power 2x L2 Cache Allows 'binning' Image from AnandTech.com 2009 - 2010 Hyperthreading ● Two logical processors per physical core ● Per-Thread Hardware ○ Prefetch buffer ○ Integer and FP register files ○ Instruction queue and prefetch ● 10% Power Consumption Increase ● Requires Software Support References Intel Atom CPU Review http://www.tomshardware.com/reviews/intel-atom-cpu,1947.html Intel's Atom Architecture: The Journey Begins http://www.anandtech.com/show/2493 The microarchitecture of Intel and AMD CPUs http://www.agner.org/optimize/microarchitecture.pdf Intel 64 and IA-32 Architectures Software Developer's Manual http://download.intel.com/products/processor/manual/325462.pdf Discussion.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages21 Page
-
File Size-