Vincent Coffey Mike Albert Overview

Origins of the Performance Gap

The separation of CPU and Memory performance

The reason for CPU performance increases

The reason for Memory lagging behind

Measures Taken to close the gap already ()

How to counteract the problem Industry Origins of the Gap

● Memory focus on Cost/Size ○ Caused by early lack of storage space ○ Can be seen in early game development, space was at a premium ● CPU focus on performance ○ Limitations on single core performance ○ Multi core architecture

Limiting Factors

Bandwidth

● Busses limit amount of able to be transmitted

Latency/Distance (proximity)

● Space on chip is a premium ○ Faster Memory takes more space (SRAM vs DRAM) ● Need for more memory storage Attempts to Address Gap - Reduction

HBM

● Trades Higher Latency for better bandwidth and lower power consumption ● AMD Developed, adopted on high-end video cards

HMC - Similar to HBM, slower to evolve and not adopted in mainstream Attempts to Address Gap - Hiding

Faster Cache

● Z-RAM - Zero RAM ● T-RAM - Thyristor RAM

Faster Non-volatile/Permanent Memory

● 3D XPoint ● NVRAM ○ MRAM - Magnetoresistive RAM ○ FeRAM - Ferroelectric RAM Zero Capacitor RAM

● Developed by Innovative Silicon ● Smaller Cell Size ○ 5x cell density ● Use of floating body effects of SOI ○ Ideal for SOI produced chips ○ 1.8 ns cell read time Thyristor RAM

Use of Thyristor to replace 6 DRAM cell

● Developed by T-RAM Semiconductor ● Read Speed <1.7 ns Write Speed <2.0 ns ○ At 170 nm ● 4x Cell Density 3D XPoint Memory

Fill the gap between NAND permanent storage and Main Memory (RAM)

● 3D XPoint ○ 95k IOPS ○ 9 us Latency ● Flash ○ 13.4k IOPS ○ 73 ms Latency MRAM and FeRAM

● MRAM ○ In development since 1990s ○ Suffers from low density ○ Access times similar to SRAM ○ Density similar to DRAM ● FeRAM ○ In Production ○ Integrated onto TI MSP430 chip ■ Replaces EEPROM and Flash ■ 126uA/MHz vs 200uA/MHz Power Consumption with Flash and EEPROM If the Gap Widens...

Is it worth improving CPU performance?

Bottlenecked CPU performance

Need for highly parallelizable programs

Reduced single core performance increases Summary

The different needs for CPU and Memory

CPU performance bottlenecking Questions?

Insert Funny Pic Here Sources

Dr. John C McCallum http://www.jcmit.com https://www.amd.com/Documents/High-Bandwidth-Memory-HBM.pdf http://www.cs.columbia.edu/~sedwards/classes/2012/3827-spring/advanced-arch-2011.pdf http://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed/4 http://www.hotchips.org/wp-content/uploads/hc_archives/hc18/2_Mon/HC18.S3/HC18.S3T1.pdf

EETimes http://img.deusm.com/eetimes/2014/02/1320947/snia-nvdimm-carousel.jpg http://www.eetimes.com/document.asp?doc_id=1328682

Trolomite - wikipedia (XPoint image) http://www.ti.com/lsds/ti/microcontrollers_16-bit_32-bit/msp/ultra-low_power/msp430frxx_fram/overview.page

http://www.digitimes.com/bits_chips/a20060328PR202.html http://www.hotchips.org/wp-content/uploads/hc_archives/hc19/3_Tues/HC19.05/HC19.05.02.pdf http://www.dailytech.com/Hynix+Licenses+ISi+ZRAM+Technology+for+Future+DRAM+Chips/article8395.htm https://www.micron.com/

FRAM Image - Cyferz at English Wikipedia https://www.everspin.com/parallel-interface-mram