CS 433 - Computer Architecture

CS 433 - Computer Architecture

CS 433 - Computer Architecture Intel Atom Murph, Kevin, and Eric History Intel Low Power Processor History ● Earlier x86 ○ Low power processors were mostly the same as normal models ● Pentium M ○ First explicitly mobile processor ○ Built off Pentium III ● Atom ○ Built from ground up History Emerging Market ● Netbooks ● Tablets ● Smartphones Designed to Compete with ARM ● Arm dominated mobile market ○ Had necessary low power consumption ○ Used in phones, GPS, PDA's ● Much of existing ARM software compatible ○ Linux and Windows CE (and now Win8) Intro to the Microarchitecture Atom ● x86 mobile processor ○ Low power consumption ○ Integrated components ● Low power consumption ○ Reduced speed/cores/cache ○ 0.65-13W (most 2.5-7W) ● In-Order (what??) ● 16 stages Branch Prediction ● Two level adaptive predictor ● Global history table ● 128 entry BTB Micro-ops ● Elementary building blocks which make up instructions ○ Used to control small parts of the processor ○ Allows for more freedom with ILP and order ● Pentium Pro ○ Allowed Intel expose the x86 CISC interface ○ Internally it was RISC ● Atom - a step in the other direction ○ In-Order ○ Most instructions aren't decoded ○ Lower issue Pipeline In Order 16 Stages ● 3 Fetch ● 3 Decode ● 2 Dispatch ● 1 Determine Source Operands ● 3 Data Cache Access ● 1 Execute ● 2 Exception & Multithread Handling ● 1 Commit Sleep States C0 ● Basically everything is still running ● Still has multiple frequency modes to reduce power C1/2 ● Core clock off (not executing instructions) ● L1 Cache flushed C4 ● PLLs off ● L2 Cache flushed C6 ● L1/L2 off ● Very low power consumption Wakeup time increases as the C-state is higher. Clock Distribution Traditional Grid Clock Distribution ● Useful for high frequency chips ● Often accounts for 30-35% of power Atom's Gridless System ● Grid consumed too much power ● Atom has lower requirements ● Less than 10% of power Asymmetrical L1 Cache ● 56 kB cache ○ 24 kB data cache (6-way) ○ 32 kB instruction cache (8-way) ○ 64 B line size ● 8 transistors per bit ○ Compare to 6 transistors typically ○ Allows lower power consumption ● Change made late in design phase ● Required 25% reduction in D-cache ● Shared between two threads Variable L2 Cache Size ● 512 kB capacity ○ 8-way associativity ○ 16 cycle latency ● Can shut down 75% of cache ○ Go from 8-way to 2-way ○ From 512 kB to 128 kB ○ Reduces power consumption L1 & L2 Caches ● Instructions with L1 cached memory operands have same latency as those with register operands ● However, memory operands still costly ○ Limits IPC due to consumption of execution ports ○ All instructions using memory operands use the same port ○ Can also increase length of instruction ○ Instruction fetch rate is limited to 8 B per cycle L1 & L2 Caches ● Memory forwarding ○ Memory written in cycle N can be read back at cycle N+1 ○ The Atom can forward even when new instruction is larger or aligned differently ○ However, there is a heavy performance penalty when a cache line boundary is crossed ● Thanks to forwarding, integer operations have low cache latency (1 cycle) ● Latency higher for other ops due to memory unit positioning (4-5 cycles) Caching ● IA-32/64 use the MESI protocol ○ Modified, Exclusive, Shared, Invalid ○ A.K.A. the Illinois protocol ● Each cache line has a 2-bit state ○ Modified - block present only in current cache but is dirty ○ Exclusive - block present only in current cache but is clean ○ Shared - clean and possibly stored in other caches ○ Invalid - unused data Main Memory ● Memory access limited to a single read or write per cycle ● Cannot read/write simultaneously On Die GPUs ● Pros ○ Cheap ○ Low Power Consumption ● Cons ○ Die Usage ○ Performance On Die GPUs ● Cedar Trail - 32nm (2012) ○ DirectX 9 ○ 400-640Mhz ○ Hardware Video Decoding inc. H.264 ○ Intel Wireless Display Dual Core Two Dies on One Chip 2x power 2x L2 Cache Allows 'binning' Image from AnandTech.com 2009 - 2010 Hyperthreading ● Two logical processors per physical core ● Per-Thread Hardware ○ Prefetch buffer ○ Integer and FP register files ○ Instruction queue and prefetch ● 10% Power Consumption Increase ● Requires Software Support References Intel Atom CPU Review http://www.tomshardware.com/reviews/intel-atom-cpu,1947.html Intel's Atom Architecture: The Journey Begins http://www.anandtech.com/show/2493 The microarchitecture of Intel and AMD CPUs http://www.agner.org/optimize/microarchitecture.pdf Intel 64 and IA-32 Architectures Software Developer's Manual http://download.intel.com/products/processor/manual/325462.pdf Discussion.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    21 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us