APPLICATIONS NOTE

AN504: Memory Options and Performance on the Intel 955X Express Chip Set John Beekley, VP Applications Engineering, Corsair Memory, Inc.

Introduction This white paper will examine memory options for the Intel 955X Express chip set. We will give a brief summary of chip set features. A benchmark setup to test memory for this chip set will be described, and BIOS parameters and frequency settings will be explained. We will then evaluate several memory modules that are suitable for use with this chip set. We will examine performance for these modules, and provide guidance for which memory types should be used based on target performance and cost of the system. Finally, we will overclock the platform and explore its maximum performance capabilities.

Chip Set Overview The Intel 955X Express Chip set (abbreviated throughout the article as “955X”) is Intel’s highest performance desktop chip set. A block diagram is shown in Figure 1. The 955X enables Intel’s highest performance platforms, with support for dual core processors with hyperthreading technology, an 8 GB of memory address space, and other advanced features. The 955X utilizes DDR2 memory. As the chip set uses a dual channel architecture, memory should be used in pairs of similar modules to obtain optimal performance and stability. The 955X is Intel’s primary enthusiast chip Figure1. 955X Block Diagram (source: Intel Corp.) set; most motherboards that use the chip set provide excellent flexibility in adjusting memory latency settings. See Corsair’s AN501 application note for more information on the impact of latency settings on performance.

Test Setup We decided to construct a test setup which would highlight the performance and features of the chip set and memory. With this goal in mind, an aggressive test platform was constructed using

May, 2005 Page 1 the following components: • Asus P5WD2 Premium motherboard, BIOS version 0408 • Intel Pentium 4 550, 3.4GHz, 800MHz front side • GeFORCE 6800 Ultra PCI Express video card • Western Digital “Raptor” SATA hard drive As CPU speed has a substantial impact on benchmark results, we attempted to keep the internal CPU speed constant throughout the testing.

BIOS Settings - Overview and Terminology

The Asus P5WD2 BIOS uses the following settings to control memory and CPU operating characteristics: • CPU Frequency - Found in the “Advanced/Jumper Free” submenu, this frequency is used as the base frequency for the Front Side Bus (“FSB”) and the Memory Bus. FSB frequency is set to CPU Frequency times four. Memory bus is a selectable, but is based on a multiple of CPU Frequency • DRAM Frequency - This parameter is also found in the “Advanced/Jumper Free” submenu. It determines memory bus frequency; the BIOS will force you to select values which are a valid multiple of the CPU frequency. • AI CPU Lock Free - This parameter allows you to determine the internal speed of the CPU. For the P4 550, if “Enabled” is selected, the internal CPU speed will equal fourteen times the CPU Frequency. If “Disabled” is selected, the internal CPU speed will equal seventeen times the CPU Frequency. Also found in “Advanced/Jumper Free”. • Memory Latency Settings - These are configured in the “Advanced/Chipset” submenu. While a general discussion of latency settings is outside the scope of this document, we will note that “RAS# Activate to Precharge” (known as TRAS) was determined to have no real impact on performace, regardless of value selected. Also, we will also note that the P5WD2 does not allow the user to manually set a value for Command Rate; it is automatically set to 2T.

Optimum Front Side Bus Selection

Some of the memory frequencies that we were interested in testing could be achieved in more than one way, based on the multipliers available in the P5WD2 BIOS. For example, DDR2- 667 could be selected by using a CPU frequency of 200MHz and a multiplier of 3.33, as well as with a CPU Bus speed of 250 MHz and a multiplier of 2.667. Similarly, DDR2-800 could be reached with a CPU bus of either 200 MHz or 240 MHz, with 200MHz actually providing a slightly higher CPU speed (3.4 GHz vs. 3.36 GHz). Our findings were somewhat surprising. Benchmark results for the two configurations are shown in Table 1. As you can clearly see, the 250MHz CPU frequency provided a sizeable jump in performance across all the benchmarks. Spot checks at other speeds confirmed these results. So, May, 2005 Page 2 the 250 MHz CPU frequency was used in fCPU = fCPU = Increase Test Name 250 MHz 200 MHz the balance of the testing.

PCMark2004 - Memory 5810 5287 9.8% Benchmark Parameters SiSoft Sandra 2005 - Int 5823 MB/s 4923 MB/s 18.3%

SiSoft Sandra 2005 - Float 5875 MB/s 4933 MB/s 19.1% Benchmarking this memory was somewhat problematic. The BIOS of Lavalys Everest - Read 6399 MB/s 5710 MB/s 12.1% the P5WD2 allows the CPU multiplier Lavalys Everest - Write 2139 MB/s 2085 MB/s 2.6% to be set to either 14 or 17. Most of the Lavalys Everest - Latency 82.4 ns 90.1 ns 8.5% desired memory bus frequencies could be Super Pi 2M digits 37.75 sec 38.81 sec 2.7% opbtained by the front side 5374.28 4565.79 bus to 250MHz and setting the multiplier ScienceMark2 Membench 17.7% MB/s MB/s to 14. The only clock speed that could Doom3 demo1 640 x 480 97.8 fps 93.8 fps 4.3% not be achieved using these settings

Front Side Bus (4*fCPU) 1000 MHz 800 MHz 20% was an 800MHz memory bus. With a 3.50 GHz 3.40 GHz CPU frequency of 250 MHz, memory CPU Speed 2.9% (14*fCPU) (14*fCPU) frequencies of 750 MHz and 833 MHz 667 MHz 667 MHz Memory Bus 0% are allowed, as 800 MHz would require a (2.667*fCPU) (3.33*fCPU) multiplier of 3.2. Latency Settings 5-5-5-15 5-5-5-15 0% In order to achieve a memory bus speed Table 1. Optimum Front Side Bus Testing of 800 MHz, two combinations could be used: either [1] CPU frequency of 200MHz and a multiplier of 4.0, or [2] CPU frequency of 240 MHz and a multiplier of 3.33. Since our front side bus testing (above) showed that a higher CPU frequency caused dramatically improved performance, we selected option [2], a CPU frequency of 240 MHz and a multiplier of 3.33. This configuration was confirmed to be the higher performance of the two. However, this results in a somewhat lower front side bus speed (960 MHz vs. 1000 MHz), and more importantly slows down the CPU from 3.50 GHz to 3.36 GHz, a slowdown of four percent. This had a significant impact on the XMS6400, as you will see in the discussion of results. Five Corsair memory pairs were benchmarked in this system using the closest spec possible to their guaranteed speed settings (see Table 2) with the constraints described above. As with all benchmarks, system settings and component characteristics can greatly affect the benchmark scores measured. These tests represent results achieved in our lab under the conditions outlined in the paper. Your own results, of course, may vary substantially.

Benchmark Descriptions The following benchmarks will be used to measure system performance: • PCMark 2004 - Memory test suite. PCMark is designed to measure relative performance in general computing functions. The PCMark memory test suite focuses on system memory, so it makes a good measure of memory subsystem performance. • SiSoft Sandra 2005 - This system diagnostic has a memory benchmarking tool that is designed to measure memory bandwidth. It provides two output values; one for integer processing, and one for floating point processing.

May, 2005 Page 3 • Lavalys Everest Ultimate Edition - This program is a powerful system diagnostic utility, and provides a suite of memory benchmarking tools. Everest provides three output values; memory READ bandwidth, memory WRITE bandwidth, and memory latency. The latency measurement represents the typical delay between the time that the CPU requests memory data and the time that data is available. • Super Pi - Super Pi is a simple application which calculates pi to a specified number of digits. The one million digit calculation was used for these benchmarks, using a modified version of Super Pi which measures to a precision of 0.001 seconds. Benchmark results indicate the time in seconds required to calculate pi to one million digits. • ScienceMark 2 Membench - This is another synthetic memory performance benchmark, which tests a series of different memory bandwidth algorithms. It provides a single memory bandwidth measurement score. • Doom 3 timedemo, demo1 - This demo is included with the retail version of Doom 3, and provides a measurement of frames per second. By setting display resolution to 640x480 pixels, the benchmark score focuses on CPU/memory performance, rather than video card performance. This is a real-world benchmark, completely based on a retail game that is available to the public.

Memory Overview Corsair offers a wide variety of DDR2 memory modules, all of which are completely compatible with the 955X chip set. Since the chip set has dual memory channels, matched memory pairs are ideal for use on boards using this chip set. Table 2 summarizes Corsair’s dual channel DDR2 product offerings. Product Features are summarized below: • TWIN2X1024-8000UL: This XMS memory is optimized for extremely high clock speed. Operation is guaranteed at 1 GHz with 5-4-4-12 latency settings at an operating voltage of 2.2 volts. The part also typically operates at 800MHz with 4-3-3-8 latency settings. This part number represents a matched pair of single rank DIMMs based on very tightly screened 64Mx8 DDR2 DRAMs. • TWIN2X1024A-6400: This module pair is overclocked to 800MHz and is tested at JEDEC-type (5-5-5-12) latencies at a VDIMM level of 1.9 volts. It is not as fast as the

Module Module Part Number Performance Speed Latency (1 GByte Module Pair) Characteristics Hyper-overclocked memory for ultimate 1000 MHz 5-4-4-9 TWIN2X1024-8000UL performance at high clock speeds 800 MHz 5-5-5-12 TWIN2X1024A-6400 Solid performance at very high Very low latency; very strong perfor- 675 MHz 3-2-2-8 TWIN2X1024A-5400UL mance with enhanced stability Tightened latency settings for strong 675 MHz 4-4-4-12 TWIN2X1024-5400C4 performance at DDR2-667 Low cost 667MHz solution, JEDEC 667 MHz 5-5-5-15 VS1GBKIT667D2 latency settings Table 2. Memory Module Options for Intel 955X Chip Set, In Order of Performance

May, 2005 Page 4 5400UL or 8000UL, but still provides excellent overclocking performance. This part number represents a matched pair of single rank DIMMs based on specially screened 64Mx8 DDR2 DRAMs. • TWIN2X1024A-5400UL: Operation for this XMS memory is guaranteed at 675MHz and 3-2-2-8 latency settings at an operating voltage of 2.1 volts. However, while there is no guaranteed operational specification at relaxed latencies, results of well over 800 MHz at 4-4-4-12 latencies are typically achieved. This part number represents a matched pair of single rank DIMMs based on very tightly screened 64Mx8 DDR2 DRAMs. • TWIN2X1024-5400C4: This module pair is guaranteed at 675 MHz and is tested at 4- 4-4-12 latencies at a VDIMM level of 1.9 volts. This speed grade represents an ideal combination of high bus speed, low latency, and excellent value. This part number represents a matched pair of dual rank DIMMs based on specially screened 32Mx8 DDR2 DRAMs. • VS1GBKIT667D2: This part number represents a pair of matched 512 MByte DDR2 modules which utilize JEDEC standard DDR2-667 clock speeds and latencies. The modules may be either single rank (based on eight 64Mx8 RAMs) or dual rank (based on sixteen 32Mx8 RAMs). Overclocking is not tested or recommended for these modules.

Test Results - Performance vs. Memory Settings Test results are shown in Table 3, with bus speed settings included for reference. These results are also illustrated in Figure 2a through 2h. Once again, when reviewing these results it is important to note that the 800MHz testing had to be performed at modified CPU settings. This results in benchmark scores that are on the pessimistic side for this speed grade. This leads us to conclude that if an overclocked CPU with native 800 MHz front side bus is being used, best performance XMS8000 XMS5400UL XMS5400C4 XMS6400 VS667D2 1000 MHz 667 MHz 667 MHz 800 MHz 667 MHZ Test Name 5-4-4-9 3-2-2-8 4-4-4-12 5-5-5-12 5-5-5-15

PCMark2004 - Memory 6298 6112 5892 5800 5810

SiSoft Sandra 2005 - Int 6274 MB/s 6220 MB/s 6026 MB/s 5787 MB/s 5823 MB/s

SiSoft Sandra 2005 - Float 6264 MB/s 6203 MB/s 6027 MB/s 5797 MB/s 5875 MB/s

Lavalys Everest - Read 7372 MB/s 6892 MB/s 6710 MB/s 6539 MB/s 6399 MB/s

Lavalys Everest - Write 2894 MB/s 2648 MB/s 2453 MB/s 2413 MB/s 2139 MB/s

Lavalys Everest - Latency 69.7 ns 73.6 ns 77.8 ns 81.3 ns 82.4 ns

Super Pi 2M digits 36.63 sec 36.86 sec 37.44 sec 38.72 sec 37.75 sec

ScienceMark2 Membench 5758.02 MB/s 5604.67 MB/s 5472.31 MB/s 5368.10 MB/s 5374.28 MB/s

Doom3 demo1 640 x 480 103.4 fps 101.9 fps 98.9 fps 95.6 fps 97.8 fps

CPU Frequency (fCPU) 250 MHz 250 MHz 250 MHz 240 MHz 250 MHz

Front Side Bus (4*fCPU) 1000 MHz 1000 MHz 1000 MHz 960 MHz 1000 MHz

CPU Speed (14*fCPU) 3.50 GHz 3.50 GHz 3.50 GHz 3.36 GHz 3.50 GHz 1000 MHz 667 MHz 667 MHz 800 MHz 667 MHz Memory Bus (4*fCPU) (2.667*fCPU) (2.667*fCPU) (3.33*fCPU) (2.667*fCPU) Table 3. Benchmark Results May, 2005 Page 5 6500 6500

6250 6250

6000 6000

5750 5750

5500 5500 a. PC Mark 2004 b. SiSoft Sandra 2005, Int/Float, MB/s 7500 3000

7000 2750

6500 2500

6000 2250

5500 2000 c. Lavalys Everest 2.0, Read, MB/s d. Lavalys Everest 2.0, Write, MB/s 85 40.00

80 38.75

75 37.50

70 36.25

65 35.00 e. Lavalys Everest 2.0, Latency, nanoseconds f. SuperPi, 2 million digits, seconds 6000 (short bars are better) 110 (short bars are better)

5750 105

5500 100

5250 95

5000 90 g. ScienceMark 2, Membench, MB/s h. DOOM3 timedemo demo1, frames per second

TWIN2X1024A-8000: 1000 MHz, 5-4-4-12 TWIN2X1024A-6400: 800 MHz, 5-5-5-12 TWIN2X1024A-5400UL: 667 MHz, 3-2-2-8 VS1GBKIT667D2: 667 MHz, 5-5-5-15 TWIN2X1024-5400C4: 667MHz, 4-4-4-12 Figure 2. Benchmark Performance, by Memory Type May, 2005 Page 6 will be achieved using memory speeds of 667 MHz or 1 GHz, rather than the intermediate 800MHz speed. We were interested to determine whether optimum performance for this chip set is provided by [a] high memory bus speed or [b] low latency settings. The test results clearly showed that running at high bus speed provides improved scores across all the benchmarks, when using the same CPU settings. Performance gains when moving from DDR2-667 3-2-2-8 to DDR2-1000 5-4-4-9 ranged from less than 1% to as high as 9.3%. As expected, benchmarks which specifically focus on the memory subsystem showed the most significant performance variance between the various configurations. Performance variations between the highest performance platform (DDR2-1000, 5-4-4-9) and the value platform (DDR2- 667, 5-5-5-15) were substantial, averaging from eight to fifteen percent on the synthetic tests, and from three to six percent on the real-world benchmarks.

Latency Testing at 1000MHz While experimenting with our test platform, we found that the system allowed us a great deal of flexibility in configuring latencies, even at high clock speeds. We eventually discovered that we could run the system as agressively as 5-3-3 latency settings at DDR2-1000. We ran an additional series of tests to determine the impact that modifying these settings would have on benchmark performance. The impact, which was surprisingly substantial, is shown in Table 4a.

Overclocking Results After testing the memory configurations at equivalent CPU speeds, we decided to try overclocking our test platform to its maximum stable clock speed, at both relaxed and tight latencies. For this a. Latency at 1000MHz b. Overclocking XMS8000 XMS8000 XMS8000 1060 MHz 733 MHz 1000 MHz 1000 MHz 1000 MHz Test Name 5-3-3-8 3-2-2-8 5-5-5-15 5-4-4-9 5-3-3-8

PCMark2004 - Memory 6250 6298 6355 6703 6770

SiSoft Sandra 2005 - Int 6214 MB/s 6274 MB/s 6296 MB/s 6681 MB/s 6830 MB/s

SiSoft Sandra 2005 - Float 6179 MB/s 6264 MB/s 6298 MB/s 6687 MB/s 6830 MB/s

Lavalys Everest - Read 7319 MB/s 7372 MB/s 7422 MB/s 7864 MB/s 7628 MB/s

Lavalys Everest - Write 2815 MB/s 2894 MB/s 2953 MB/s 3124 MB/s 2945 MB/s

Lavalys Everest - Latency 71.0 ns 69.7 ns 69.6 ns 65.5 ns 66.7 ns

Super Pi 2M digits 36.80 sec 36.63 sec 36.39 sec 34.53 sec 33.48 sec

ScienceMark2 Membench 5780.20 MB/s 5758.02 MB/s 5797.65 MB/s 6188.50 MB/s 6171.46 MB/s

Doom3 demo1 640 x 480 102.0 fps 103.4 fps 105.0 fps 110.5 fps 113.9 fps

CPU Frequency (fCPU) 250 MHz 250 MHz 250 MHz 265 MHz 275 MHz

Front Side Bus (4*fCPU) 1000 MHz 1000 MHz 1000 MHz 1060 MHz 1100 MHz

CPU Speed (14*fCPU) 3.50 GHz 3.50 GHz 3.50 GHz 3.71 GHz 3.85 GHz 1000 MHz 1000 MHz 1000 MHz 1060 MHz 733 MHz Memory Bus (4*fCPU) (4*fCPU) (4*fCPU) (4*fCPU) (2.667*fCPU) Table 4. Benchmark Results May, 2005 Page 7 testing, the CPU was water cooled to allow maximum overclockability. Other than this change, the platform was unchanged, and the same memory modules and CPU used to generate the previous results were installed. The test platform showed impressive overclockability at all latency settings. An overclock was considered to be stable if Windows could be booted and the full benchmark suite could be run successfully. Results were particularly impressive at low latency settings, where we achieved 3- 2-2-8 latency settings with a memory bus speed of 733 MHz. At the “relaxed” latency settings of 5-3-3-12, speeds of 1060 MHz were achieved. No improvement in clock speed was realized by relaxing the latency settings to 5-4-4 or 5-5-5. Benchmark scores are shown in Table 4b, highest scores are shown in bold print.

Summary We can draw several useful conclusions from the results of this testing. First, we have discovered that selection of the system memory should be tightly coupled to selection. 1000MHz memory looks like an excellent choice if you are planning to overclock a processor with native 800 MHz front side bus; if you do not plan to overclock, 800 MHz memory is a good choice. The 667 MHz memory offers a good compromise solution, providing strong performance and excellent flexibility. Overclocking results tend to be highly dependent on several hardware factors. The 955X platform demonstrated excellent overclockability both at tight latency settings and at relaxed latency settings. Optimum performance settings are likely to vary from system to system. Overall, the XMS8000UL and XMS5400UL are excellent ultra-performance modules for this platform; the XMS6400 (for non-overclocked 800 MHz front side bus) and XMS5400C4 (for overclocked 800MHz or 1066 MHz front side bus) provide solid performance and value. And, the VS667D2 represents excellent value and stability at JEDEC-standard speed and latency settings.

© Corsair Memory Incorporated, May, 2005. Corsair and the Corsair Logo are trademarks of Corsair Memory Incorporated. All other trademarks are the property of their respective owners. Corsair reserves the right to make changes without notice to any products herein. Corsair makes no warranty, representation, or guarantee regarding the suitability of its products for any particular purpose, nor does Corsair assume any liability arising out of the application of any product, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. Corsair does not convey any license under its patent rights nor the rights of others. Corsair products are not designed, intended, or authorized for use in applications intended to support or sustain life, or for any other application for which the failure of the Corsair product could create a situation in which personal injury or death may occur.

May, 2005 Page 8