Vincent Coffey Mike Albert Overview
Origins of the Performance Gap
The separation of CPU and Memory performance
The reason for CPU performance increases
The reason for Memory lagging behind
Measures Taken to close the gap already (cache)
How to counteract the problem Industry Origins of the Gap
● Memory focus on Cost/Size ○ Caused by early lack of storage space ○ Can be seen in early game development, space was at a premium ● CPU focus on performance ○ Limitations on single core performance ○ Multi core architecture
Limiting Factors
Bandwidth
● Busses limit amount of data able to be transmitted
Latency/Distance (proximity)
● Space on chip is a premium ○ Faster Memory takes more space (SRAM vs DRAM) ● Need for more memory storage Attempts to Address Gap - Reduction
HBM
● Trades Higher Latency for better bandwidth and lower power consumption ● AMD Developed, adopted on high-end video cards
HMC - Similar to HBM, slower to evolve and not adopted in mainstream Attempts to Address Gap - Hiding
Faster Cache
● Z-RAM - Zero Capacitor RAM ● T-RAM - Thyristor RAM
Faster Non-volatile/Permanent Memory
● 3D XPoint ● NVRAM ○ MRAM - Magnetoresistive RAM ○ FeRAM - Ferroelectric RAM Zero Capacitor RAM
● Developed by Innovative Silicon ● Smaller Cell Size ○ 5x cell density ● Use of floating body effects of SOI ○ Ideal for SOI produced chips ○ 1.8 ns cell read time Thyristor RAM
Use of Thyristor to replace 6 transistor DRAM cell
● Developed by T-RAM Semiconductor ● Read Speed <1.7 ns Write Speed <2.0 ns ○ At 170 nm ● 4x Cell Density 3D XPoint Memory
Fill the gap between NAND permanent storage and Main Memory (RAM)
● 3D XPoint ○ 95k IOPS ○ 9 us Latency ● Flash ○ 13.4k IOPS ○ 73 ms Latency MRAM and FeRAM
● MRAM ○ In development since 1990s ○ Suffers from low density ○ Access times similar to SRAM ○ Density similar to DRAM ● FeRAM ○ In Production ○ Integrated onto TI MSP430 chip ■ Replaces EEPROM and Flash ■ 126uA/MHz vs 200uA/MHz Power Consumption with Flash and EEPROM If the Gap Widens...
Is it worth improving CPU performance?
Bottlenecked CPU performance
Need for highly parallelizable programs
Reduced single core performance increases Summary
The different needs for CPU and Memory
CPU performance bottlenecking Questions?
Insert Funny Pic Here Sources
Dr. John C McCallum http://www.jcmit.com https://www.amd.com/Documents/High-Bandwidth-Memory-HBM.pdf http://www.cs.columbia.edu/~sedwards/classes/2012/3827-spring/advanced-arch-2011.pdf http://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed/4 http://www.hotchips.org/wp-content/uploads/hc_archives/hc18/2_Mon/HC18.S3/HC18.S3T1.pdf
EETimes http://img.deusm.com/eetimes/2014/02/1320947/snia-nvdimm-carousel.jpg http://www.eetimes.com/document.asp?doc_id=1328682
Trolomite - wikipedia (XPoint image) http://www.ti.com/lsds/ti/microcontrollers_16-bit_32-bit/msp/ultra-low_power/msp430frxx_fram/overview.page
http://www.digitimes.com/bits_chips/a20060328PR202.html http://www.hotchips.org/wp-content/uploads/hc_archives/hc19/3_Tues/HC19.05/HC19.05.02.pdf http://www.dailytech.com/Hynix+Licenses+ISi+ZRAM+Technology+for+Future+DRAM+Chips/article8395.htm https://www.micron.com/
FRAM Image - Cyferz at English Wikipedia https://www.everspin.com/parallel-interface-mram