Low Power DRAM Evolution Osamu Nagashima Executive Professional Micron Memory Japan

JEDEC Mobile Copyright © 2016 , Inc & IOT Forum How We Got Here

• Low Power DRAM evolved from a lower- voltage, lower-performance version of PC-DRAM designed for mobile packages to become one of the highest bandwidth-per-pin DRAMs available • High resolution displays, high-resolution cameras, and 3D rendered content are the primary drivers for increased bandwidth in mobile devices Mainstream DRAM Datarate by Type and Year of Introduction

4500

4000

3500

3000

2500

2000

1500

1000

500

0 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

LPDDR PC-DDR Evolution of Mainstream DRAM Energy

Power Evolution 50.00

DDR2 DDR3 DDR4

LPDDR2 LPDDR3 LPDDR4 ENERGY,PJ/BIT

0.00 500 1000 1500 2000 2500 3000 3500 4000 4500 DATA RATE, MBPS Typical Mobile Device Usage

2% 8% 100% 4% 4% • The percentage of 20% 80% active usage has 20%

24% greatly increased in 60% recent years, driving 22% 13% an increase in 40% 12% memory bandwidth 20% 37% 34% • This has shifted 0% limitations from Heavy user Light user standby battery life to active battery life and Read Energy Write Energy Activate Energy thermal limits Standby Energy Refresh Energy Self Refresh Energy Near-Term Future

• This evolution of system limitations is driving future LPDRAM architectures, beginning with the evolution of the LPDDR4 standard • Responding to the need for lower power, JEDEC is developing a reduced-I/O power version of LPDDR4, called LPDDR4X • LPDDR4X will reduce the Vddq level from 1.1v to 0.6v • Signaling swing will remain similar to LPDDR4 – This allows the same receiver designs and specifications to be used for both LPDDR4 and LPDDR4X LPDDR4X: I/O Energy Reduction • Reducing Vddq from 1.1v to 0.6v produces about 40% I/O energy savings

LPDDR4 vs. LPDDR4X I/O Energy, Including Pre-Driver 4

3

LP4, 60 Ohms LP4, 120 Ohms 2 LP4, Unterm

I/O Energy, pJ/bit Energy, I/O LP4X, 60 Ohms LP4X, 120 Ohms 1 LP4X, Unterm ~40% Decrease

0 1000 1500 2000 2500 3000 3500 4000 4500 Data Rate, Mbps LPDDR4X I/O

• LPDDR4X reduces I/O channel energy to where it can sometimes be lower than pre- driver energy • With LPDDR4X we’ve reached the point where further reductions in Vddq have limited opportunity for energy savings – The DRAM Rx limits the minimum signal swing – Significant reductions in channel loading would be required to reduce pre-driver energy – The DRAM core energy is now much larger than DRAM I/O energy LPDDR4X: I/O Energy Breakdown • Notice that at the most-efficient operating points (toward the right end of each line), the pre-driver energy is comparable to the channel energy – Reducing Vddq below 0.6v will have limited impact, and may even increase total I/O energy

LPDDR4X I/O Energy, Channel+Termination vs. Pre-Driver 1.5

1.0 Channel + Term, 60 Ohms Channel + Term, 120 Ohms Channel, Unterminated Pre-Driver, 60 Ohms I/O Energy, I/O pJ/bit 0.5 Pre-Driver, 120 Ohms Pre-Driver, Unterminated

0.0 1000 1500 2000 2500 3000 3500 4000 4500 Data Rate, Mbps I/O vs. DRAM Core Energy • LPDDR4X energy is dominated by the core • Future energy reductions should focus on core efficiency in order to be significant LPDDR4X Core Energy vs. I/O Energy 8

Core I/O, 48 Ohms I/O, 60 Ohms Energy, pJ/bit I/O, 120 Ohms I/O, Unterm

0 1000 1500 2000 2500 3000 3500 4000 4500 Data Rate, Mbps The Future – All About Power Efficiency • Power efficiency across a range of bandwidths is a more important attribute than peak bandwidth – And cost is still very important • JEDEC is beginning to consider LPDDR5 – Data rates of 6.4Gbps or even higher are being considered • Pushing DRAM performance to extreme speeds has consequences – Higher I/O speeds than LPDDR4 will reduce power efficiency at all speeds Effect of Increasing Speed Capability • 6.4Gbps is approaching the limits of the DRAM process – Pre-driver fanouts must be reduced – As a consequence, deploying a DRAM I/O circuit capable of 6.4Gbps will cause an increase in pre-driver power at all speeds – This will degrade the energy efficiency of the LP5 I/O compared to LPDDR4X at the lower-speeds - where it matters most!

LPDDR5 vs. LPDDR4X I/O Energy, Including Pre-Driver 3

LP5, 48 Ohms LP5, 60 Ohms LP5, Unterm LP4X, 48 Ohms I/O Energy, I/O pJ/bit LP4X, 60 Ohms LP4X, Unterm >50% @1600Mbps!Increase

0 1000 2000 3000 4000 5000 6000 7000 Data Rate, Mbps Energy Cost for Higher Speeds

• Yes, DRAM can be made to function with very high data rates – GDDR5 is a good example • Adding GDDR5 to our power evolution chart, we see that energy/bit increases vs. LPDDR4

25.00 Power Evolution DDR3 DDR4

LPDDR3 LPDDR4

GDDR5 ENERGY, PJ/BIT

0.00 500 1500 2500 3500 4500 5500 6500 7500 8500 DATA RATE, MBPS Alternatives to LPDDR5 in Mobile Devices • The challenge of LPDDR5 is to minimize I/O energy while supporting data rates of 6.4Gbps or higher • Alternative solutions, include going wider – More LPDDR4X I/O’s could be a more power-efficient scheme – Maintaining LPDDR4X data rates could enable wider PoP solutions that don’t require significant changes in packaging • There is discussion about using LPDDR3 I/O speeds, but this will: – Require twice the I/O pins of LP4X for equivalent bandwidth, therefore increasing packaging costs and risks – Unlikely to be more power efficient • Another solution that could increase data per pin without a corresponding increase in pin count could be multi-level signaling – This could be applied to both PoP and xMCP configurations Attacking the Largest Component of DRAM Power • The non-I/O portion of DRAM power now dominates – DRAM manufacturers have already highly optimized designs to maximize power efficiency while meeting user requirements for high- frequencies and low latencies – Significant reduction in the DRAM array voltage is not practical • Lower voltage reduces the maximum charge that can be stored • The resulting loss in performance and necessary increase in refresh rate would offset any power improvements DRAM DVFS?

• Dynamic Voltage Frequency Scaling – DVFS - has been used by many mobile components for years – DVFS presents significant challenges for DRAM • Building a DRAM that meets all of the demanding performance and reliability requirements at a wider voltage range is unlikely to improve efficiency • Array core voltage must remain static – Only peripheral circuits can operate at a wider range – This means the array and periphery voltages must be separated DRAM DVFS

• Additionally, verification and test of wide voltage ranges for peripheral circuits could be expensive – This requirement is driven by the need in mobile applications to quickly change from low-frequency to higher-frequency operation while active • Expecting the DRAM to continue operating at the lower frequency while voltage is ramped is required to avoid a ‘stall’ – The DRAM process does not scale to lower voltage well – transistor performance decreases much faster than with today’s logic processes • Timing closure across an increased voltage range for DRAM is a very complex challenge Two-Step DRAM DVFS

• More palatable to DRAM manufacturers could be a scheme that allowed for DRAM periphery operation at two discrete voltage levels, while leaving the array at one fixed level – Much of the power savings can be realized – Switching between these two discrete voltages must be fast enough that DRAM operation during the voltage ramp can be prohibited Two-Step DRAM DVFS

• Low-to-high switching could be performed within the LPDDR4 tFC spec, therefore operation could be dis-allowed during the switch time • This means DVFS can be automatically applied when the user switches operation between Frequency Set Points (FSP) • Power efficiency can be improved by >30% Future Challenges

• DRAM scaling challenges will add complexity to future memory systems – tWR will increase – Native refresh times will shorten • Especially if DRAM vendors reduce core voltage – Error detection and correction will become a requirement • ECC can reduce power and mitigate the performance impact of DRAM scaling challenges Heterogenous Memory Space • Mobile memory density continues to increase - do we really need maximum DRAM performance for the entire memory space? • Would mobile systems be better served with a smaller, high- speed “Local Main Memory” and a larger, non- array?

• A system like the one below could leverage emerging non-volatile memories that promise to be >1000x faster than NAND and much less costly than DRAM

Storage Videos Music Apps OS Logic-to-Logic Signaling • DRAM I/O performance is nearing its limits • Continuing to push I/O performance will decrease energy efficiency • Addition of a logic die to the memory subsystem could enable higher-speed signaling, and therefore either higher performance or reduced SoC pincount • A wide, slower interface to multiple DRAM and/or NVRAM die would be restricted to inside the memory package • High-pincount packaging would not be required Thank You