<<

History of Performance

  )NTELª8EON ªª'(Z  BITª)NTELª8EON ªª'(Z  !-$ª/PTERON ªª'(Z  )NTELª0ENTIUMª ª'(Z   !-$ª!THLON ªª'(Z )NTELª0ENTIUMª))) ªª'(Z  !LPHAª! ªª'(Z  !LPHAª ªª'(Z   !LPHAª ªª'(Z  !LPHAª ªª'(Z   !LPHAª ªª'(Z  !LPHAª! ªª'(Z   0OWER0#ª ª'(Z   !LPHAª ªª'(Z  (0ª0! 2)3# ªª'(Z  )"-ª23  YEAR

0ERFORMANCEªVS6!8  -)03ª-ª  -)03ª-   3UN   6!8ª 

6!8  YEAR  6!8                 

&)'52%ää 'ROWTHäINäPROCESSORäPERFORMANCEäSINCEäTHEäMID S 4HISäCHARTäPLOTSäPERFORMANCEäRELATIVEäTOäTHEä6!8ää ASäMEASUREDäBYäTHEä30%#INTäBENCHMARKSäSEEä3ECTIONä ä0RIORäTOäTHEäMID S äPROCESSORäPERFORMANCEäGROWTHäWASäLARGELYäTECHNOLOGY DRIVENäANDäAVERAGEDäABOUTääPERäYEARä4HEäINCREASEäINäGROWTHäTOäABOUTääSINCEäTHENäISäATTRIBUTABLEäTOäMOREäADVANCEDäARCHITECTURALäANDä ORGANIZATIONALäIDEASä"Yä  äTHISäGROWTHäLEDäTOä AäDIFFERENCEäINäPERFORMANCEäOFäABOUTäAäFACTORäOFäSEVENä0ERFORMANCEäFORämäOATING POINT ORIENTEDäCALCULATIONSäHASäINCREASEDäEVENäFASTERä3INCEä äTHEäLIMITSäOFäPOWER äAVAILABLEäINSTRUCTION LEVELäPARALLELISM äANDäLONGäMEMORYäLATENCYä HAVEäSLOWEDäUNIPROCESSORäPERFORMANCEäRECENTLY äTOäABOUTääPERäYEARä#OPYRIGHTäÚää%LSEVIER ä)NCä!LLäRIGHTSäRESERVED

1 History of Processor Performance

  )NTELª8EON ªª'(Z  BITª)NTELª8EON ªª'(Z  !-$ª/PTERON ªª'(Z  )NTELª0ENTIUMª ª'(Z   CSEE 3827 !-$ª!THLON ªª'(Z )NTELª0ENTIUMª))) ªª'(Z  !LPHAª! ªª'(Z  !LPHAª ªª'(Z   !LPHAª ªª'(Z  !LPHAª ªª'(Z   !LPHAª ªª'(Z  !LPHAª! ªª'(Z   0OWER0#ª ª'(Z   !LPHAª ªª'(Z  (0ª0! 2)3# ªª'(Z  )"-ª23  YEAR

0ERFORMANCEªVS6!8  -)03ª-ª  -)03ª-   3UN   6!8ª 

6!8  YEAR  6!8                 

&)'52%ää 'ROWTHäINäPROCESSORäPERFORMANCEäSINCEäTHEäMID S 4HISäCHARTäPLOTSäPERFORMANCEäRELATIVEäTOäTHEä6!8ää ASäMEASUREDäBYäTHEä30%#INTäBENCHMARKSäSEEä3ECTIONä ä0RIORäTOäTHEäMID S äPROCESSORäPERFORMANCEäGROWTHäWASäLARGELYäTECHNOLOGY DRIVENäANDäAVERAGEDäABOUTääPERäYEARä4HEäINCREASEäINäGROWTHäTOäABOUTääSINCEäTHENäISäATTRIBUTABLEäTOäMOREäADVANCEDäARCHITECTURALäANDä ORGANIZATIONALäIDEASä"Yä  äTHISäGROWTHäLEDäTOä AäDIFFERENCEäINäPERFORMANCEäOFäABOUTäAäFACTORäOFäSEVENä0ERFORMANCEäFORämäOATING POINT ORIENTEDäCALCULATIONSäHASäINCREASEDäEVENäFASTERä3INCEä äTHEäLIMITSäOFäPOWER äAVAILABLEäINSTRUCTION LEVELäPARALLELISM äANDäLONGäMEMORYäLATENCYä HAVEäSLOWEDäUNIPROCESSORäPERFORMANCEäRECENTLY äTOäABOUTääPERäYEARä#OPYRIGHTäÚää%LSEVIER ä)NCä!LLäRIGHTSäRESERVED

2 History of Processor Performance

  )NTELª8EON ªª'(Z  BITª)NTELª8EON ªª'(Z  !-$ª/PTERON ªª'(Z  )NTELª0ENTIUMª ª'(Z   !-$ª!THLON ªª'(Z )NTELª0ENTIUMª))) ªª'(Z  !LPHAª! ªª'(Z  !LPHAª ªª'(Z   !LPHAª ªª'(Z  !LPHAª ªª'(Z   !LPHAª ªª'(Z  !LPHAª! ªª'(Z   0OWER0#ª ª'(Z   !LPHAª ªª'(Z  (0ª0! 2)3# ªª'(Z  )"-ª23  YEAR

0ERFORMANCEªVS6!8  -)03ª-ª  -)03ª-   3UN   6!8ª 

6!8  COMS 4824 YEAR  6!8                 

&)'52%ää 'ROWTHäINäPROCESSORäPERFORMANCEäSINCEäTHEäMID S 4HISäCHARTäPLOTSäPERFORMANCEäRELATIVEäTOäTHEä6!8ää ASäMEASUREDäBYäTHEä30%#INTäBENCHMARKSäSEEä3ECTIONä ä0RIORäTOäTHEäMID S äPROCESSORäPERFORMANCEäGROWTHäWASäLARGELYäTECHNOLOGY DRIVENäANDäAVERAGEDäABOUTääPERäYEARä4HEäINCREASEäINäGROWTHäTOäABOUTääSINCEäTHENäISäATTRIBUTABLEäTOäMOREäADVANCEDäARCHITECTURALäANDä ORGANIZATIONALäIDEASä"Yä  äTHISäGROWTHäLEDäTOä AäDIFFERENCEäINäPERFORMANCEäOFäABOUTäAäFACTORäOFäSEVENä0ERFORMANCEäFORämäOATING POINT ORIENTEDäCALCULATIONSäHASäINCREASEDäEVENäFASTERä3INCEä äTHEäLIMITSäOFäPOWER äAVAILABLEäINSTRUCTION LEVELäPARALLELISM äANDäLONGäMEMORYäLATENCYä HAVEäSLOWEDäUNIPROCESSORäPERFORMANCEäRECENTLY äTOäABOUTääPERäYEARä#OPYRIGHTäÚää%LSEVIER ä)NCä!LLäRIGHTSäRESERVED

3 Abstract Stages of Execution

Instruction Fetch (Instructions fetched from memory into CPU)

Instruction Issue / Execution (Instructions executed on ALU or other functional unit)

Instruction Completion / Commit (Architectural state updated, i.e., regfile or memory)

4 Multiple Instruction Issue Processors

Multiple instructions fetched, executed, and committed in each cycle

F: In superscalar processors instructions E: are scheduled by the HW C:

In VLIW processors instructions are F: E: scheduled by the SW C:

In all cases, the goal is to exploit instruction-level parallelism (ILP)

5 Flynn’s Taxonomy

Single Instruction, Single Data (SISD) Single Instruction, Multiple Data (SIMD)

Multiple Instruction, Single Data (MISD) Multiple Instruction, Multiple Data (MIMD) Exploits instr-level parallelism (ILP)

Exploits data-level parallelism (DLP)

6 Out-of-order execution

In in-order execution, In out-of-order execution (OOO), instructions are fetched, instructions are fetched, and executed, and committed in committed in compiler, order; may be compiler order executed in some other order

F: F: One stalls, independent One stalls, instrs may they all stall proceed E: E:

Relatively Additional simple hardware HW required for C: C: reordering

Another way to exploit instruction-level parallelism (ILP)

7 Speculation

Speculation is executing an instruction before it is known that it should be executed Speculate If correct If incorrect

F: F: F:

Mis- E: E: speculation executes E: excess instructions, C: C: costing time and power

C:

8 The Memory Wall Performance CPU speeds increase approx 25-20% annually DRAM speeds increase approx 2-11% annually

Time A result of this gap is that design has increased in importance over the years. This has resulted in innovations such as victim caches and trace caches.

9 Modern Processor Performance

While single threaded performance has leveled, multithreaded performance potential scaling.   )NTELª8EON ªª'(Z  BITª)NTELª8EON ªª'(Z  !-$ª/PTERON ªª'(Z  )NTELª0ENTIUMª ª'(Z   !-$ª!THLON ªª'(Z )NTELª0ENTIUMª))) ªª'(Z  !LPHAª! ªª'(Z  !LPHAª ªª'(Z   !LPHAª ªª'(Z  !LPHAª ªª'(Z   !LPHAª ªª'(Z  !LPHAª! ªª'(Z   0OWER0#ª ª'(Z   !LPHAª ªª'(Z  (0ª0! 2)3# ªª'(Z  )"-ª23  YEAR COMS 6824 0ERFORMANCEªVS6!8  -)03ª-ª  -)03ª-   + 3UN   6!8ª  COMS 4130 6!8  YEAR  6!8  (Parallel Programming)                

&)'52%ää 'ROWTHäINäPROCESSORäPERFORMANCEäSINCEäTHEäMID S 4HISäCHARTäPLOTSäPERFORMANCEäRELATIVEäTOäTHEä6!8ää ASäMEASUREDäBYäTHEä30%#INTäBENCHMARKSäSEEä3ECTIONä ä0RIORäTOäTHEäMID S äPROCESSORäPERFORMANCEäGROWTHäWASäLARGELYäTECHNOLOGY DRIVENäANDäAVERAGEDäABOUTääPERäYEARä4HEäINCREASEäINäGROWTHäTOäABOUTääSINCEäTHENäISäATTRIBUTABLEäTOäMOREäADVANCEDäARCHITECTURALäANDä ORGANIZATIONALäIDEASä"Yä  äTHISäGROWTHäLEDäTOä AäDIFFERENCEäINäPERFORMANCEäOFäABOUTäAäFACTORäOFäSEVENä0ERFORMANCEäFORämäOATING POINT ORIENTEDäCALCULATIONSäHASäINCREASEDäEVENäFASTERä3INCEä äTHEäLIMITSäOFäPOWER äAVAILABLEäINSTRUCTION LEVELäPARALLELISM äANDäLONGäMEMORYäLATENCYä HAVEäSLOWEDäUNIPROCESSORäPERFORMANCEäRECENTLY äTOäABOUTääPERäYEARä#OPYRIGHTäÚää%LSEVIER ä)NCä!LLäRIGHTSäRESERVED

10   )NTELª8EON ªª'(Z  BITª)NTELª8EON ªª'(Z  !-$ª/PTERON ªª'(Z  )NTELª0ENTIUMª ª'(Z   5 !-$ª!THLON ªª'(Z Area Performance Power MIPS/Watt (%) )NTELª0ENTIUMª))) ªª'(Z  4 !LPHAª! ªª'(Z  3 !LPHAª ªª'(Z   2 !LPHAª ªª'(Z  

Increase !LPHAª ªª'(Z 1  !LPHAª ªª'(Z 0  Pipelined Superscalar OOO-Speculation!LPHAª! ªª'(ZDeep Pipelined -1   0OWER0#ª ª'(Z   !LPHAª ªª'(Z  (0ª0! 2)3# ªª'(Z  )"-ª23  YEAR

0ERFORMANCEªVS6!8  -)03ª-ª  -)03ª-   3UN   6!8ª 

6!8  YEAR  6!8                 

&)'52%ää 'ROWTHäINäPROCESSORäPERFORMANCEäSINCEäTHEäMID S 4HISäCHARTäPLOTSäPERFORMANCEäRELATIVEäTOäTHEä6!8ää ASäMEASUREDäBYäTHEä30%#INTäBENCHMARKSäSEEä3ECTIONä ä0RIORäTOäTHEäMID S äPROCESSORäPERFORMANCEäGROWTHäWASäLARGELYäTECHNOLOGY DRIVENäANDäAVERAGEDäABOUTääPERäYEARä4HEäINCREASEäINäGROWTHäTOäABOUTääSINCEäTHENäISäATTRIBUTABLEäTOäMOREäADVANCEDäARCHITECTURALäANDä ORGANIZATIONALäIDEASä"Yä  äTHISäGROWTHäLEDäTOä AäDIFFERENCEäINäPERFORMANCEäOFäABOUTäAäFACTORäOFäSEVENä0ERFORMANCEäFORämäOATING POINT ORIENTEDäCALCULATIONSäHASäINCREASEDäEVENäFASTERä3INCEä äTHEäLIMITSäOFäPOWER äAVAILABLEäINSTRUCTION LEVELäPARALLELISM äANDäLONGäMEMORYäLATENCYä HAVEäSLOWEDäUNIPROCESSORäPERFORMANCEäRECENTLY äTOäABOUTääPERäYEARä#OPYRIGHTäÚää%LSEVIER ä)NCä!LLäRIGHTSäRESERVED

Source: Hennessy and Patterson, “ Architecture: A Quantitative Approach”   )NTELª8EON ªª'(Z  BITª)NTELª8EON ªª'(Z  !-$ª/PTERON ªª'(Z  )NTELª0ENTIUMª ª'(Z   5 !-$ª!THLON ªª'(Z Area Performance Power MIPS/Watt (%) )NTELª0ENTIUMª))) ªª'(Z  4 !LPHAª! ªª'(Z  3 !LPHAª ªª'(Z   2 !LPHAª ªª'(Z  

Increase !LPHAª ªª'(Z 1  !LPHAª ªª'(Z 0  Pipelined Superscalar OOO-Speculation!LPHAª! ªª'(ZDeep Pipelined -1   0OWER0#ª ª'(Z   !LPHAª ªª'(Z  (0ª0! 2)3# ªª'(Z  )"-ª23  YEAR

0ERFORMANCEªVS6!8  -)03ª-ª  -)03ª-   3UN   6!8ª 

6!8  YEAR  6!8                 

&)'52%ää 'ROWTHäINäPROCESSORäPERFORMANCEäSINCEäTHEäMID S 4HISäCHARTäPLOTSäPERFORMANCEäRELATIVEäTOäTHEä6!8ää ASäMEASUREDäBYäTHEä30%#INTäBENCHMARKSäSEEä3ECTIONä ä0RIORäTOäTHEäMID S äPROCESSORäPERFORMANCEäGROWTHäWASäLARGELYäTECHNOLOGY DRIVENäANDäAVERAGEDäABOUTääPERäYEARä4HEäINCREASEäINäGROWTHäTOäABOUTääSINCEäTHENäISäATTRIBUTABLEäTOäMOREäADVANCEDäARCHITECTURALäANDä ORGANIZATIONALäIDEASä"Yä  äTHISäGROWTHäLEDäTOä AäDIFFERENCEäINäPERFORMANCEäOFäABOUTäAäFACTORäOFäSEVENä0ERFORMANCEäFORämäOATING POINT ORIENTEDäCALCULATIONSäHASäINCREASEDäEVENäFASTERä3INCEä äTHEäLIMITSäOFäPOWER äAVAILABLEäINSTRUCTION LEVELäPARALLELISM äANDäLONGäMEMORYäLATENCYä HAVEäSLOWEDäUNIPROCESSORäPERFORMANCEäRECENTLY äTOäABOUTääPERäYEARä#OPYRIGHTäÚää%LSEVIER ä)NCä!LLäRIGHTSäRESERVED

Source: Hennessy and Patterson, “: A Quantitative Approach” The Power Wall

         #LOCK2ATE          0OWER  0OWER7ATTS

#LOCK2ATE-(Z                     0ENTIUM  #ORE  0RESCOTT 0ENTIUM +ENTSFIELD 0ENTIUM 0ENTIUM 7ILLAMETTE 0RO

1, Ê£°£xÊÊÊÊ œŽÊÀ>ÌiÊ>˜`Ê*œÜiÀÊvœÀʘÌiÊÝnÈʓˆVÀœ«ÀœViÃÜÀÃʜÛiÀÊiˆ} ÌÊ}i˜iÀ>̈œ˜ÃÊ >˜`ÊÓxÊÞi>ÀðÊ4HE0ENTIUMMADEADRAMATICJUMPINCLOCKRATEANDPOWERBUTLESSSOINPERFORMANCE 4HE0RESCOTTTHERMALPROBLEMSLEDTOTHEABANDONMENTOFTHE0ENTIUMLINE4HE#ORELINEREVERTSTOA SIMPLERPIPELINEWITHLOWERCLOCKRATESANDMULTIPLEPROCESSORSPERCHIP#OPYRIGHTÚ%LSEVIER )NC!LL RIGHTSRESERVED

12 Much of it goes back to the

Source: press foils Much of it goes back to the transistor

individual atoms!

Source: Intel press foils Much of it goes back to the transistor

individual atoms! = leakage current + defects Source: Intel press foils A model of power

P = Pswitch + Pleakage 2 Pswitch = Eswitch x F = (C x Vdd ) x F Pleakage = Vdd x I Voltage Scaling: DVFS + Near-Threshold Computing

[Source: Dreslinski et al.: Near-Threshold Computing: Recplaiming Moore’s Law Through Energy Efficient Integrated Circuits] Voltage Scaling: DVFS + Near-Threshold Computing

[Source: Dreslinski et al.: Near-Threshold Computing: Recplaiming Moore’s Law Through Energy Efficient Integrated Circuits] Chip Area and Power Consumption

1500 Active Power Leakage Power 1000 With leakage power dominating, power envelope to remain constant 500 power consumption roughly proportional to

0 Power Density (Watts/cm2) 90nm 65nm 45nm 32nm 22nm 16nm

Source: Shekhar Borkar (Intel)

1000

100 Pollack’s Law: Processor performance grows 10 with sqrt of area Integer Performance

1 1 10 100 1000 10000 100000 Processor Area Source: Shekhar Borkar (Intel) The Resulting Shift to Multicore

Perf = 1 Power = 1 The Resulting Shift to Multicore

Perf = 1 Perf = 2 Power = 1 Power = 4 The Resulting Shift to Multicore

Perf = 1 Perf = 2 Perf = 4 Power = 1 Power = 4 Power = 4 Sea Change in Architecture: Multicore

(4ª0(9 ªLINKª 3LOWª)/ &USES

 BITª&05

,OAD ,ª$ATA K" -" 3TORE #ACHE , #OREª 3HARED , #ACHE , %XECUTION #TL #ACHE &ETCH (4ª0(9 ªLINKª $ECODE ,ª)NSTR "RANCH #ACHE $ $ 2 .ORTHBRIDGE 0 ( 9

#OREª #OREª (4ª0(9 ªLINKª

(4ª0(9 ªLINKª 3LOWª)/ &USES

&)'52%ää )NSIDEäTHEä!-$ä"ARCELONAäMICROPROCESSORä4HEäLEFT HANDäSIDEäISäAäMICROPHOTOGRAPHäOFäTHEä!-$ä"ARCELONAäPROCESSORä CHIP äANDäTHEäRIGHT HANDäSIDEäSHOWSäTHEäMAJORäBLOCKSäINäTHEäPROCESSORä4HISäCHIPäHASäFOURäPROCESSORSäORähCORESv ä4HEäMICROPROCESSORäINäTHEä LAPTOPäINä&IGUREääHASäTWOäCORESäPERäCHIP äCALLEDäANä)NTELä#OREää$UOä#OPYRIGHTäÚää%LSEVIER ä)NCä!LLäRIGHTSäRESERVED

18 64- Architecture Evolution

2003 2005 2007 2008 2009 2010

AMD AMD “Barcelona” “Shanghai” “Istanbul” “Magny-Cours” ™ Opteron™

Mfg. 90nm SOI 90nm SOI 65nm SOI 45nm SOI 45nm SOI 45nm SOI K8 K8 Greyhound Greyhound+ Greyhound+ Greyhound+ CPU Core

L2/L3 1MB/0 1MB/0 512kB/2MB 512kB/6MB 512kB/6MB 512kB/12MB

Hyper Transport™ 3x 1.6GT/.s 3x 1.6GT/.s 3x 2GT/s 3x 4.0GT/s 3x 4.8GT/s 4x 6.4GT/s Technology

Memory 2x DDR1 300 2x DDR1 400 2x DDR2 667 2x DDR2 800 2x DDR2 1066 4x DDR3 1333

Max Power Budget Remains Consistent

3 | The AMD “Magny-Cours” Processor | Hot Chips | August 2009

[Source: HotChips ‘09] 19 Tick%Tock'Development'Model

Sandy Merom Penryn Nehalem Westmere Bridge

NEW NEW NEW NEW NEW 1 Process Microarchitecture Process Microarchitecture 65nm 45nm 45nm 32nm 32nm

TOCK TICK TOCK TICK TOCK

Forecast

Nehalem-EX Architecture All products, dates, and figures are preliminary and are subject to change without notice. 6 Hot Chips 2009

[Source: HotChips ‘09] 20 Nehalem'Core/Uncore'Modularity

C C C O O O Core R R R E E … E

L3 Cache DRAM Uncore Power … IM QPI … QPI & C Clock QPI … Common%core%for%client%and%%CPUs – http://www.intel.com/technology/architecture

Nehalem-EX Architecture

7 Hot Chips 2009

[Source: HotChips ‘09] 21 Nehalem'EX*CPU

Monolithic)single)die)CPU 8)Nehalem)cores,)16)threads 3MB*LLC 3MB LLC 24MB)shared)L3)cache 3MB*LLC 3MB*LLC 2)integrated)memory)controllers Scalable)Memory)Interconnect)(SMI))with) support)for)up)to)8)DDR)channels) 3MB*LLC 3MB*LLC 4)Quick)Path)Interconnect)(QPI))links)with)up)to)) 6.4GT/s 3MB LLC 3MB*LLC Supports)2,)4)and)8)socket)in)glueless)configs) and)larger)systems)using)Node)Controller)(NC) Intel)45nm)process)technology 2.3)Billion)

Nehalem-EX Architecture

8 Hot Chips 2009

[Source: HotChips ‘09] 22 POWER7: IBM’s Next Generation, Balanced POWER Server Chip

45nm

20+ Years of POWER Processors 65nm Next Gen.

POWER7 RS64IV Sstar 130nm -Multi-core POWER6TM RS64III Pulsar 180nm -Ultra High Frequency .18um RS64II North Star .25um POWER5TM -SMT RS64I Apache .35um BiCMOS .5um POWER4TM Major POWER® Innovation .5um Muskie A35 .22um -Dual Core -1990 RISC Architecture .5um -1994 SMP -Cobra A10 -1995 Out of Order Execution -64 bit -1996 64 Bit Enterprise Architecture POWER3TM -1997 Hardware Multi-Threading .35um -630 -2001 Dual Core Processors -2001 Large System Scaling -2001 Shared Caches .72um -2003 On Chip Memory Control POWER2TM P2SC -2003 SMT .25um -2006 Ultra High Frequency RSC .35um -2006 Dual Scope Coherence Mgmt 1.0um -2006 Decimal Float/VSX .6um -2006 Processor Recovery/Sparing 604e -2009 Balanced Multi-core Processor -603 POWER1 -2009 On Chip EDRAM -AMERICA’s -601 1990 1995 2000 2005 2010

3 * Dates represent approximate processor power-on dates, not system availability [Source: HotChips ‘09] 23 POWER7: IBM’s Next Generation, Balanced POWER Server Chip

POWER7 Processor Chip

567mm2 Technology: 45nm lithography, Cu, SOI, eDRAM 1.2B transistors Equivalent function of 2.7B eDRAM efficiency Eight processor cores 12 execution units per core 4 Way SMT per core 32 Threads per chip 256KB L2 per core 32MB on chip eDRAM shared L3 Dual DDR3 Memory Controllers 100GB/s Memory per chip sustained Scalability up to 32 Sockets 360GB/s SMP bandwidth/chip 20,000 coherent operations in flight Advanced pre-fetching Data and Instruction Binary Compatibility with POWER6

* Statements regarding SMP servers do not imply that IBM will introduce a system with this capability. 4 [Source: HotChips ‘09] 24 Challenges in Multicore

1. Performance dependent on parallel codes ! 2. 3. Communication and (“feeding the beast”) coherence ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

[Source: Sandia NL]

25 Technology: Logic gates Application domains: SRAM PCs DRAM Servers Circuit technologies PDAs Packaging Mobile phones Magnetic storage Game consoles Biochips Embedded 3D stacking

Goals: Functional Performance Reliability Cost Energy efficiency Time to market [Credit: Milo Martin, UPenn] Technology: Logic gates Application domains: SRAM PCs DRAM Servers Circuit technologies Computer PDAs Packaging architecture Mobile phones Magnetic storage Supercomputers Flash memory is at the Game consoles Biochips intersection Embedded 3D stacking

Goals: Functional Performance Reliability Cost Energy efficiency Time to market [Credit: Milo Martin, UPenn]