Challenges and Solutions for Future Main Memory
Total Page:16
File Type:pdf, Size:1020Kb
White Paper May 26, 2009 Challenges and Solutions for Future Main Memory DDR3 SDRAM is used in many computing systems today and offers data rates of up to 1600Mbps. To achieve performance levels beyond DDR3, future main memory subsystems must attain faster data rates while maintaining low power, high access efficiency and competitive cost. This whitepaper will outline some of the key challenges facing next generation main memory and the Rambus innovations that can be applied to advance the main memory roadmap. Challenges and Solutions for Future Main Memory 1 Introduction Rambus Innovations Enable: Demand for an enriched end-user • 3200Mbps Data Rates experience and increased performance in • 40% Lower Power next-generation computing applications is • Up to 50% Higher never-ending. Driven by recent multi-core computing, virtualization and processor Throughput integration trends, the computing industry is • 2-4x Higher Capacity in need of a next-generation main memory solution with anticipated data rates of up to In order to address these challenges and 3200 Megabits per second (Mbps) at a meet the increased performance demanded similar or lower power envelope as that of by recent multi-core computing, DDR3-1600 solutions. The divergence of virtualization and chip integration trends, these two requirements – increased Rambus has developed an architectural performance and reduced power – presents concept for future main memory. The a difficult challenge for the design of future architecture builds upon existing Rambus power-efficient memories. innovations and designs, such as FlexPhase™ circuitry, FlexClocking™ and In addition to power-efficiency challenges, Dynamic Point-to-Point (DPP) technologies, future memory solutions face potential and introduces new concepts, which include bottlenecks in access efficiency and Near Ground Signaling and Module capacity, both of which become more Threading. When used in combination, difficult to solve as memory speeds rise. these innovations can enable a future main For example, while memory module memory system to achieve double the upgrades are the traditional way to increase current data rates, with 50% higher memory capacity, the maximum number of modules access efficiency, while consuming 40% allowed on a DDR3 memory channel lower power than present techniques would decreases as data rates increase. In fact, provide. due to degraded signal integrity, DDR3 memory channels can reliably support only This white paper will describe the key one module at data rates beyond 1333Mbps. challenges facing future main memory, and In addition, memory access granularity the details and benefits of the Rambus suffers as data rates increase due to the 8x innovations that can be applied to go to 16x multiple between the interface and beyond DDR3 and advance the main core access speeds. The result is an memory roadmap. Many of these increase to the core prefetch and a sub- innovations are patented inventions optimal minimum access size for future available for licensing by Rambus. computing applications. Challenges and Solutions for Future Main Memory 2 Industry Trends and Key virtualization, a capability that allows multiple concurrent "virtual machines", each Issues capable of running its own independent operating system and applications 2.1 Computer Architecture Trends simultaneously on the same hardware. A growing number of enterprise computing Fueled by the progress of Moore’s law, the systems are harnessing the power of computational power of personal computers virtualization to consolidate and dynamically (PCs) has increased exponentially over the repurpose hardware in the data center. This past two decades. Transistor count has in turn results in better utilization of existing increased from 275 thousand in the Intel hardware and reduced operating costs. 80386 processor to 731 million in the current Intel Core™ i7 processor with quad cores. A third recent trend in advanced computing This abundance of high-performance is the integration of the graphics and general transistors has increased computational purpose processing cores onto a single chip. power by nearly four orders of magnitude, The approach to integration varies from the from 8.5 million of instructions per second combination of discrete graphics and (MIPS) in 1988 to greater than 90,000MIPS general purpose cores, to high throughput in processors found in typical PCs today. architectures where workloads are computed in parallel to better utilize the In addition to increasing clock speed, other large number of available processor cores. recent trends have developed that both Throughput computing can dramatically advance computing performance and accelerate the performance of computing increase associated demands on the systems, especially when combined with memory system. The first is multi-core vector processing or specialized graphics computing which incorporates several cores that handle operations faster than processing cores into a single chip. Some traditional general-purpose cores. systems today incorporate 4 processors, each with 8 cores per chip, for a combined total of 32 processing cores. Each core allows for several simultaneous threads, providing an unprecedented ability to multi- Key Computing Trends: task applications and rapidly execute • Multi-Core Computing programs. Future multi-core systems are • Virtualization forecasted to incorporate an increasing • Converged CPU-GPU number of processing cores for even greater Architecture performance. A second recent trend which increases the demand on main memory performance is Challenges and Solutions for Future Main Memory 2.2 Memory Data Rates 2.3 Pin Count Considerations for Total System Bandwidth All of these recent trends advancing compute performance require an associated In addition to higher memory data rates, increase in memory bandwidth. Over the computing systems have added more past decade, main memory data rates have parallel memory channels per processor to doubled on average every four to five years, meet the growing demand for memory with memory suppliers introducing 33% bandwidth. Starting in 1999, dual-channel faster main memory components every 18 memory systems were introduced in months. Today’s DDR3 main memory is workstations and subsequently in shipping at typical data rates of 1066Mbps mainstream PC platforms. The trend with a roadmap to 1600Mbps. Projecting continues today with many processors able from these historical trends, the next to interface with up to four independent generation of main memory will support data memory channels. rates up to 3200Mbps. Challenges and Solutions for Future Main Memory Ultimately, the number of memory channels higher memory power. In many computing is limited by the package pin count of the systems, the power consumption of the chipset or processor. Current cost-effective memory subsystem is second only to the packages for chipsets are limited to power consumed by the processor. approximately 1000 pins whereas the LGA packages used in mainstream processors The push to increase performance are transitioning now from 776 to 1366 pins. competes directly with the desire to reduce Given this limitation, only three to four DDR3 power. As performance rises, future memory channels can be implemented on memory systems are forecasted to consume these packages, even if up to 75% of the an even larger portion of overall system pins are allocated to the memory interfaces. power, as per-pin data rates and the number Based on projected processor pin-count of memory channels increases. Computing trends, future processors could incorporate platforms vary in system configuration and three to six memory channels for workloads. In order to reduce overall mainstream and high-end products memory power in all configurations, two respectively. principal power modes of compute memory need to be addressed: Active Read/Write, An additional method used to improve and Stand-by Idle. overall performance is the integration of the memory controller with the processor. This 2.4.1 Active Power integration increases performance by eliminating the added latency of the chipset, When in active mode, a DRAM is actively but can be difficult due to differences in reading or writing, and consumes power in operating voltages between the CPU and three areas: IO signaling, system clocking, memory subsystem. and DRAM core access. This peak power state is described in DRAM datasheets as 2.4 Power Considerations the IDD7 power state. For example, a single unbuffered 1GB DDR3 module in In addition to increased performance, power active mode can consume over 3A, is a significant issue for nearly all current dissipating nearly 5W of memory power. IO computing systems. Notebooks and signaling further adds 2W of memory power. netbooks require reduced power to extend The resulting active memory power for a battery life, while desktop PCs and servers single processor with four memory channels require reduced power to lower overall cost can exceed 25W, while server platforms of ownership and to comply with green containing up to 4 such processors can initiatives for reduced energy consumption. consume greater than 100W of memory power in active mode. Increased memory bandwidth requires higher data rates and a greater number of Despite a drop in power supply voltage from memory channels, both of which result in 3.3V in SDR SDRAM to 1.5V in DDR3, Challenges and Solutions for Future Main Memory active memory power has increased between successive DRAM generations. As