Overcoming DDR Challenges in High-Performance Designs

Mazyar Razzaz, Applications Engineering Jeff Steinheider, Product Marketing

September 2018 | AMF-NET-T3267

Company Public – NXP, the NXP logo, and NXP secure connections for a smarter world are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2018 NXP B.V. Agenda

• Basic DDR SDRAM Structure • DDR3 vs. DDR4 SDRAM Differences • DDR Bring up Issues • Configurations and Validation via QCVS Tool

COMPANY PUBLIC 1 BASIC DDR SDRAM STRUCTURE

COMPANY PUBLIC 2 Single Transistor Memory Cell

Access Transistor Column (bit) line

Row (word) line G

S D “1” => Vcc “0” => Gnd “precharged” to Vcc/2

Cbit Ccol

Storage Parasitic Line Capacitor Vcc/2 Capacitance

COMPANY PUBLIC 3 Memory Arrays

B0 B1 B2 B3 B4 B5 B6 B7

ROW ADDRESS DECODER W0

W1

W2

SENSE AMPS & WRITE DRIVERS

COLUMN ADDRESS DECODER

COMPANY PUBLIC 4 Internal Memory Banks • Multiple arrays organized into banks • Multiple banks per memory device − DDR3 – 8 banks, and 3 bank address (BA) bits − DDR4 – 16 banks with 4 banks in each of 4 sub bank groups − Can have one active row in each bank at any given time • Concurrency − Can be opening or closing a row in one bank while accessing another bank Bank 0 Bank 1 Bank 2 Bank 3 Row 0 Row 1 Row 2 Row 3 Row …

Row Buffers

COMPANY PUBLIC 5 Memory Access

• A requested row is ACTIVATED and made accessible through the bank’s row buffers

• READ and/or WRITE are issued to the active row in the row buffers

• The row is PRECHARGED and is no longer accessible through the bank’s row buffers

Example: DDR4-2133 Open Page = 2.133Gb/s maximum bandwidth Closed Page = 199Mb/s maximum bandwidth 10x performance advantage to read and write from an open page

COMPANY PUBLIC 6 Example – 8Gb DDR4 SDRAM

• Micron MT40A1G8 • 1024M x 8 (64M x 8 x 16 banks) • 8 Gb total • 16-bit row address − 64K rows • 10-bit column address − 1K bits/row (1KB in x8 data with DRAM) • 2-bit group and 2-bit bank address ADD DATA bus • DATA bus: DQ, DQS, /DQS, DM (DBI) • ADD bus: A, BA, GB, ACT, /CS, /RAS, /CAS, /WE, ODT, CKE, CK, /CK, PAR, /ALERT

COMPANY PUBLIC 7 Example – DDR4 UDIMM /CSn ODTn 32M x 8

A[12:0] DQ[7:0] • Micron MTA9ASF51272AZ BA[1:0] DQS MDQ[0:7], MDQS0, MDM0 /DQS /RAS MDQ[8:15], MDQS1, MDM1 DM /CAS MDQ[16:23], MDQS2, MDM2 • 9 each 512M x 8 DRAM devices /WE MDQ[24:31 MDQS3, MDM3 CKE CK MDQ[32:39], MDQS4, MDM4 • 512M x 72 overall /CK MDQ[40:47], MDQS5, MDM5

MDQ[48:55], MDQS6, MDM6 ODT /CS • 4 GB total, single “rank” MDQ[56:31], MDQS7, MDM7 • 9 “byte lanes”

Two Signal Bus

32M x 8 • 1- Address, command, control, and A[12:0] DQ[7:0] BA[1:0] DQS ECC[0:7], MDQS8, MDM8 /DQS /RAS clock signals are shared among all 9 DM /CAS DRAM devices /WE CKE CK • 2- Data, strobe, data mask not shared /CK ODT /CS

COMPANY PUBLIC 8 DRAM Module Type

COMPANY PUBLIC 9 DDR3 VS. DDR4 SDRAM DIFFERENCES

COMPANY PUBLIC 10 DDR SDRAM Highlights and Comparison

Feature/Category DDR3 DDR4

Package BGA only BGA only

Densities 512Mb -8Gb 2Gb -16Gb

Voltage DDR3L:1.35V Core & I/O 1.2V Core DDR3: 1.5V Core & I/O 1.2V I/O, also 2.5V external VPP

Data I/O Center Tab Termination (CTT) Pseudo Open Drain (POD) CMD, ADDR I/O CTT CTT

Internal Memory Banks 8 16 for x4/x8, 8 for x16

Data Rate 800 DDR3/3L:2133/1866 Mbps 1600–3200 Mbps

VREF VREFCA & VREFDQ external VREFCA external VREFDQ internal

Data Strobes/Prefetch/Burst Differential/8-bits/BC4, BL8/ Fixed, Same as DDR3 Length/Burst Type OTF

Additive/read/write Latency 0, CL-1, CL-2/ AL+CL/ AL +CWL Same as DDR3

COMPANY PUBLIC 11 DDR SDRAM Highlights and Comparison (cont’d)

Feature/Category DDR3 DDR4

Yes (Parity is supported. But CRC CRC Data Bus & C/A Parity No NOT supported in QorIQ)

Connectivity test (TEN pin) No Yes (TEN is not supported in QorIQ)

Bank Grouping No Yes

Data Bus Inversion No Yes (DBI_n pin)

Write Leveling / ZQ / Reset Yes Yes

ACT_n new pin & command No Yes

Mirroring & DQ swizzle Yes Yes

VREFDQ calibration No Yes

CMD / ADDR Latency (CAL) No Yes

COMPANY PUBLIC 12 DDR3/DDR3L/DDR4 Power Saving

• DDR3 DRAM provides 20% power savings over DDR2

• DDR3L DRAM provides 10% power savings over DDR3

• DDR4 DRAM provides 37% power savings over DDR3L

COMPANY PUBLIC 13 DDR3 vs. DDR4 DRAM Pinouts

• DDR4 Pins Added − VDDQ (2) : 1.2V pins to DRAM − VPP (2): 2.5V external voltage source for DRAM internal word line driver − BG (2): Bank Group (2): pins to identify the bank groups − DBI_n: Data Bus Inversion − ACT_n: Active command − PAR: Parity error signal for address bus − ALERT_n: Both, Parity error on C\A and CRC error on data bus − TEN: Connectivity test mode

• DDR3 Pins Eliminated − VREFDQ − Bank Address (1): one less BA pin − VDD (1), VSS (3), VSSQ (1)

COMPANY PUBLIC 14 DRAM Densities DDR3 vs. DDR4

• 16 Banks for x4 and x8 DRAM DDR4, 8 Banks for x16 • 8Gb is DRAMs vendors choice for starting DDR4 density • Larger memory size is one reason to use x4 vs. x8 vs. x16 DRAM • Data mask or data bus inversion (DBI), not available in x4 DRAM Density 1Gb 2Gb 4Gb 8Gb 16 Gb Width x4 x8 x16 x4 x8 x16 x4 x8 x16 x4 x8 x16 x4 x8 x16 Banks 8 8 8 8 8 8 8 8 8 8 8 8 Rows 14 14 13 15 15 14 16 16 15 16 16 16 Columns 11 10 10 11 10 10 11 10 10 12 11 11 DDR3 Page Size (KB) 1 1 2 1 1 2 1 1 2 2 2 2 Banks 16 16 8 16 16 8 16 16 8 16 16 8 Rows 15 14 14 16 15 15 17 16 16 18 17 17 Columns 10 10 10 10 10 10 10 10 10 10 10 10 DDR4 Page Size (KB) 0.5 1 2 0.5 1 2 0.5 1 2 0.5 1 2 COMPANY PUBLIC 15 Modules DDR3 vs. DDR4

Module Feature DDR3 DDR4 U/RDIMM Pin Count 240 (1.0mm pin pitch) 288 (0.85mm pin pitch) Bottom Edge Flat Step Ramp (+ ~1mm on height and width) DRAM ball count and ball pitch Same ball count and ball pitch DIMM topology Fly-by for address/command bus SoDIMM Pin Count 204 260 SoDIMM ECC Support Non-compatible pin out Native (pin compatible for ECC or without ECC)

COMPANY PUBLIC 16 Why DDR4 Over DDR3 • Save power − DDR4 can reduce power by up to 40%

• Run faster − DDR4 offers double the data rate − DDR4 doubles the number of internal banks, increased bandwidth − New options to increase performance

• Better reliability & manufacturing capabilities − Connectivity test − Data bus inversion (DBI) − Internal VREF calibration

• Larger densities • Longevity COMPANY PUBLIC 17 DDR BRING UP ISSUES

COMPANY PUBLIC 18 List of products and DDR capabilities

Product DDR type Data bus width Data rate # of MC T1023/13 DDR3L / 4 32-bit + 4bit ECC 1600 MT/s 1 T1040/42, T1020/22, T1024/14 DDR3L / 4 64-bit + 8bit ECC 1600 MT/s 1 T2080/81 DDR3 / 3L 64-bit + 8bit ECC 2133 MT/s 1 T4240 DDR3 / 3L 64-bit + 8bit ECC 1866 MT/s 3 LS1024 DDR3 32-bit + 8bit ECC 1066 MT/s 1 LS1012 DDR3L 16-bit + 8bit ECC 1000 MT/s 1 LS1021/20/22, LS1043/23, LS1017/18/27/28 DDR3L / 4 32-bit + 4bit ECC 1600 MT/s 1 LS1088/84/48/44, LS1046/26 DDR4 64-bit + 8bit ECC 2100 MT/s 1 LS2088/ all derivatives DDR4 64-bit + 8bit ECC 2133 MT/s 2 LX2160/all derivatives DDR4 64-bit + 8bit ECC 3200 MT/s 2

COMPANY PUBLIC 19 List of DDR Bring up issues: Top HW and SW DDR Issues DDR BRING-UP Incorrect DQn_MAP setting ISSUES Setting WRLVL_START registers were incorrect SW + Reset HW SW QCVS was not used, incorrect setting used Erratum was not implemented Incorrect data rate, not matching the generated setting Bring up, DRAM reset not matched to HRESET Bring up, MDM pin, incorrect connection Bring up, A/C layout causing ECC errors 37% HW Bring up, ACTn signal not connected 63% Bring up, DQS and DQS_B swapped Bring up, incorrect bit swapping in layout Bring up, Manufacturing issue on 2 out of 20 boards.

COMPANY PUBLIC 20 Initialization failure It is an initialization failure when: 1) ERR_DETECT[ACE] is set or 2) SDRAM_CFG_2[D_INIT] does not clear  DDR Initialization Failed

1 2 Example: [0x01080110] E5000000 00401011 [0x01080E40] 00000080 00000000 00000000 00000000

COMPANY PUBLIC 21 DDR Bring up HW checklist:

Schematics review: Design checklist document Layout/HW guideline application note AN5097 HW specs Check all voltages: GVDD, VREF, VTT, AND VPP Check input and output DDR clocks Verify DRAM reset signal is matched to HRESET for UDIMM, SoDIMM, and discrete DRAM. AN5097 appendix B. Verify correct DRAM type strap Verify DQ pin swapping is per allowed limitation Have more than one board for bring up Check for manufacturing/fabrication/assembly issues

COMPANY PUBLIC 22 DDR Bring up SW checklist

Generate the setting via QCVS: Use SPD if available, otherwise Auto generation Select the DDR data rate based on the measured output clock RCW needs to be valid and correct Enter MCK to DQS skews in the DDR wizard Verify the DQn_MAP registers are correct Verify all related errata are implemented

COMPANY PUBLIC 23 DDR4 Initialization Flow VPP ramped GVDD & with or DRAMs Mode Register VPP ramped Power-up before GVDD Initialized Commands Issued & stable The initialization takes between DRAM reset Asserted at DRAM ZQ ZQCL Issued (512 clocks) 3ms to 4ms. signal DLL locks in DRAM least 200us Calibration Reset controlled by Then internal controller D_INIT time board logic VREF is trained (time it taks to write to entire DRAM space) Need at Write Automatically handled depends on least 500us Configure By the controller Leveling total size of from reset memory, data de-assertion DDR rate and Bus to the Registers width. For controller Automatic CAS-to-Preamble Read example 8GB being (aka Read Leveling)…. Adjust at 1600Mbps enabled. Per bit Data-to-Strobe DDR clocks centering for read cycle w/64-bit data Stable bus will take Timed loop Begin When 8GB/(1.6GBx 8 may be CLKS CS_n_EN = 1 Write DRAM data bus VREF training. byte lanes) = needed. Adjust Per bit Data-to-Strobe centering 625ms. for write cycle Controller MEM_EN =1 D-INIT, data initialized (optional) Ready for User Started CKE = HIGH Init Complete accesses COMPANY PUBLIC 24 How to bypass DQ mapping

• This is for debug use only. − The following steps bypasses the DQ mapping. A debug method to determine if DQ mapping is causing the memory controller initialization failure. Or when a design has violated the DQ bit swap rules in its layout.

1. Set the DDR data rate between 1000MT/s and 1200MT/s. 2. Clear all DQn_MAP registers 3. Set the DDR_SDRAM_CFG_2[DDR_SLOW] = 1 4. Set the DEBUG_2[27] = 1, (i.e. 0x1080F04 = 0x10)

COMPANY PUBLIC 25 General Hardware Guidelines

• Examine the DDR4 Layout Guidelines for QorIQ devices App. Note (AN5097) • Run pre and post board simulation − IBIS models are available for both controller and DRAM • Employ industry standard practices • Minimize Crosstalk, ISI, Vref noise, Impedance mismatches • Eliminate return path discontinuities (RPD) • Minimize the simultaneous switching output (SSO) effects − Proper distribution of power and ground planes − Proper capacitance decoupling • Examine the reference design boards with DDR4 implemented − Both discrete and DIMM DDR4 are available for QorIQ devices

COMPANY PUBLIC 26 Important HW Considerations for DDR4 Transition • VPP supply ▪ VPP = 2.5V required for each DRAM QorIQ with DDR3L\DDR4 memory controller ▪ Follow DRAM vendor specification for power/current requirements ▪ VPP ramped with or before GVDD

• VrefDQ reference input is removed

• New signals added to each DRAM ▪ ACT_n ▪ DBI ▪ PAR ▪ TEN (Pull to GND when not used) ▪ ALERT

COMPANY PUBLIC 27 UDIMM vs. RDIMM DDR4 Reset • UDIMM requires CKE to be low before RESET is de-asserted. • RDIMM requires CKE to be low and clock to be present before RESET is de-asserted • Details available in AN5097.

COMPANY PUBLIC 28 Confirmed it is not a DDR issue When: 1. ECC is enabled and 2. ERR_DETECT = 0x0 and 3. ERR_SBE[SBEC] = 0x0 and 4. SDRAM_CFG_2[D_INIT] = 0x0  No DDR failure

1 2 4 Example: [0x01080110] E5000000 00401001 [0x01080E20] 00000000 00000000 00000000 00000000 [0x01080E30] 00000000 00000000 00000000 00000000 3 [0x01080E40] 00000000 00000000 00000000 00000000 [0x01080E50] 00000000 00000000 00000000 00000000 [0x01080E60] 00000000 00000000 00000000 00000000

COMPANY PUBLIC 29 Memory controller ECC errors When: 1. ERR_DETECT ≠ 0x0 or 2. ERR_SBE[SBEC] ≠ 0x0  DDR ECC failure

Example: [0x01080E20] 12345678 12345678 84848484 00000000 [0x01080E30] 00000000 00000000 00000000 00000000 [0x01080E40] 00000000 00000000 00000000 00000000 1 [0x01080E50] 00000000 00000000 00000040 00000000 [0x01080E60] 00000000 00000000 00000000 00000000 2

COMPANY PUBLIC 30 How to get ECC register dump via ccs

This is for debug use only.

1) Open a CCS window (C:\Freescale\CW4NET_v2016.01\Common\CCS\bin\ccs.exe) 2) Physical connection: USB to PC, JTAG to the customer board. 3) SW connection: in the ccs window type: (for LS1043 or LS1046) delete all (for LS2088) config cc cwtap ccs::config_chain {ls1043a dap sap2} delete all display ccs::read_mem 32 0x1080000 4 0 1024 config cc cwtap ccs::write_mem 32 0x1080FB0 4 0 0x10000000 ccs::config_chain {ls2085a dap} display ccs::read_mem 32 0x1080000 4 0 1024 display ccs::read_mem 326 0x1080000 4 0 1024 ccs::write_mem 326 0x1080FB0 4 0 0x10000000 (for T1) display ccs::read_mem 326 0x1080000 4 0 1024 delete all (for LS1088) config cc cwtap ccs::config_chain t1040 delete all display ccs::read_mem 0 0x30000 0x8000 4 2 1024 config cc cwtap ccs::write_mem 0 0x30000 0x8FB0 4 2 0x10000000 ccs::config_chain {ls1088a dap} display ccs::read_mem 0 0x30000 0x8000 4 2 1024 display ccs::read_mem 119 0x1080000 4 0 1024 ccs::write_mem 119 0x1080FB0 4 0 0x10000000 (for LS1021A) display ccs::read_mem 119 0x1080000 4 0 1024 delete all config cc cwtap ccs::config_chain {ls1020a dap sap2} display ccs::read_mem 17 0x1080000 4 0 1024 ccs::write_mem 17 0x1080FB0 4 0 0x10000000 display ccs::read_mem 17 0x1080000 4 0 1024 COMPANY PUBLIC 31 LS1024 DDR3 and LS1012 DDR3L

• LS1024: − DDR3, 32-bit + ECC − Follows a strict layout policy, any need board design needs to be approved − Specific register setting will be generated by the factory

• LS1012: − DDR3L, 16-bit, no ECC − One chip select − Only one x16 DRAM or two x8 DRAM − QCVS will generate settings + simple write-read-compare test

COMPANY PUBLIC 32 CONFIGURATIONS AND VALIDATION VIA QCVS TOOL

COMPANY PUBLIC 33 Optimize/Validate the DDR Interface on your Board

• The board dependent parameters are optimized by connecting to your board and running targeted tests • After this stage, the DDR interface in your board is optimized/validated

COMPANY PUBLIC 34 Register Configuration Two general types of registers to be configured in the memory controller:

• First register type are set to the DRAM related parameter values, that are provided via SPD or DRAM datasheet. Over 100 register fields fall under this category.

• Second register type are the Non-SPD values that are set based on customer’s application. For example: − On-die-termination (ODT) settings for DRAM and controller − Driver impedance setting for DRAM and controller − Clock adjust value selection − Write leveling start value (WRLVL_START)

COMPANY PUBLIC 35 Using QCS DDRv Tool Configure and optimize your DDR interface in a matter of hours

1. Use the tool to generate the DDR register settings • Use the latest revision • Select the SPD option in configuration wizard when DIMM is used • Select Auto Configuration when Discrete DRAM is used

2. Optimize the DDR register setting on your QorIQ board • Run the clock centering test • Optimize the ODT and drive strength for read and write

DDRv DEMO: https://www.nxp.com/video/configure-qoriq-ddr-in-3-minutes:QRIQ-DDR- CONFIGURATION

COMPANY PUBLIC 36 Generate the DDR Register Settings

• Using DDR wizard, select the SPD option for , or Auto configure for DIMMs or Discrete DRAM • Press finish and you have generated DDR register settings

COMPANY PUBLIC 37 DDR Interface ADD/CMND Bus Margins via QCVS Tool

• Clock signal is stepped cross the address bus eye unit interval and tool regenerate a pass/fail address bus eye. ▪ In the example below the address eye is passing from 1/8 clk to 7/8 of clock. This is 80% of open eye from maximum available address bus unit interval.

• Write level margin table provides the reconstruction pass fail margins for each byte lane.

COMPANY PUBLIC 38 Data Write Cycle

Interconnects Controller Memory (Ideal Condition) Differential Strobe

Data

COMPANY PUBLIC 39 Write Data Eye on the Scope

• QCVS shifts the strobe in from right to left in small timing steps. • At each step the a DMA write read compare test is performed and each cell is marked as pass or fail. • This process is repeated for each byte lane.

COMPANY PUBLIC 40 Write Margin Table in QCVS Tool

COMPANY PUBLIC 41 Data Read Cycle Interconnects Controller Memory (Ideal Condition) Differential Strobe

Data

COMPANY PUBLIC 42 Read Data Eye on the Scope

• Purple: data signal • Yellow: strobe signal • Probe is connected close to DRAM • Strobe is aligned with the data eye • Setup and hold can NOT be measured • Approximate margin can be estimated by using a required functional mask

COMPANY PUBLIC 43 Read Margin Table in QCVS Tool

• Blue line indicates the beginning and end of the theoretical data eye • Estimated timing for each step = theoretical-data- eye / number of steps

On the left are the data eyes for each byte lane. This is available for LS2, LS1088, and LS1046.

COMPANY PUBLIC 44 QCVS & corresponding CW • All P, B, & T series QorIQ devices: CW for PA 10.5.1 installer, update to 10.5.2 using update site; QCVS 4.5 available using update site

• LS1021/20/22: CW4NET2017.03 installer, install CW for ARMv7 and update it to 10.0.9 using update site; QCVS 4.9.1 available using update site

• LS1024 is not supported under QCVS tool

• All other LS devices:CW4NET2018.01 installer, install CW for ARMv8 and update it to 10.3.1 using update site; QCVS 4.13 available as update site

Depending on flexara account for each customer, the corresponding purchased SW will be available at the link: https://nxp.flexnetoperations.com/control/frse/index

COMPANY PUBLIC 45 Summary Majority of the customer DDR issues can be resolved by:

• Schematics review for any errors • Verify DRAM reset signal is correct • Use the QCVS tool to generate and validate DDR • Verify DQ mapping is correct • Verify write leveling by entering correct CLK to DQS skew in QCVS tool

COMPANY PUBLIC 46 NXP and the NXP logo are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2016 NXP B.V.