Memory in SystemVerilog

Prof. Stephen A. Edwards

Columbia University

Spring 2020 Implementing Memory Memory = Storage Element Array + Addressing

Bits are expensive They should dumb, cheap, small, and tighly packed

Bits are numerous Can’t just connect a long to each one

CRT-based random access memory, 1946. Used on the Manchester Mark I. 2048 bits. Mercury acoustic delay line

Used in the EDASC, 1947.

32 × 17 bits Selectron Tube

RCA, 1948.

2 × 128 bits

Four-dimensional addressing

A four-input AND gate at each bit for selection Magnetic Core

IBM, 1952. Magnetic

1950s & 60s. Secondary storage. Modern Memory Choices Family Programmed Persistence

Mask ROM at fabrication ∞

PROM once ∞

EPROM 1000s, UV 10 years

FLASH 1000s, 10 years

EEPROM 1000s, byte 10 years

NVRAM ∞ 5 years

SRAM ∞ while powered

DRAM ∞ 64 ms Implementing ROMs

0 Z: “not 0/1 Bitline 2 Bitline 1 Bitline 0 connected” 0 Wordline 0 1 0 1 1

0 0 1 Wordline 1

A1 2-to-4 1 1 0 1 A0 Decoder 2 Wordline 2 1 1 1 0 0

Add. 3 Wordline 3 00 011 0 1 0 01 110 10 100 D2 D1 D0 11 010 Implementing ROMs

0 Z: “not 0/1 Bitline 2 Bitline 1 Bitline 0 connected” 0 0 Wordline 0 1 0 1 1

0 0 0 1 Wordline 1 1 A1 2-to-4 1 1 0 0 1 A Decoder 0 1 2 Wordline 2 1 1 1 0 0

0 Add. Data 3 Wordline 3 00 011 0 1 0 01 110 1 0 0 10 100 D2 D1 D0 11 010 Implementing ROMs

0 Z: “not 0/1 connected” 0 1

0 0 1

A1 2-to-4 1 A0 Decoder 2 1 1

Add. Data 3 00 011 01 110 10 100 D2 D1 D0 11 010 Implementing ROMs

0 Z: “not 0/1 connected” 1 0 1

0 0 1 1

A1 2-to-4 1 A Decoder 0 0 2 1 1

1 Add. Data 3 00 011 01 110 1 0 0 10 100 D2 D1 D0 11 010 Mask ROM Die Photo A Floating Gate MOSFET

Cross section of a NOR FLASH . Kawai et al., ISSCC 2008 (Renesas) Floating Gate n-channel MOSFET

SiO2 Control Gate

Floating Gate

Drain Source Channel

Floating gate uncharged; Control gate at 0V: Off Floating Gate n-channel MOSFET

SiO2 Control Gate +++++++++ − − − − − − −− Floating Gate +++++++++ − − − − − − −− Drain Source Channel

Floating gate uncharged; Control gate positive: On Floating Gate n-channel MOSFET

SiO2 Control Gate ++++ − − −− Floating Gate − − −− ++++ Drain Source Channel

Floating gate negative; Control gate at 0V: Off Floating Gate n-channel MOSFET

SiO2 Control Gate ++++++++ − − − − − − − Floating Gate −− ++ Drain Source Channel

Floating gate negative; Control gate positive: Off and FLASH use Floating-Gate Static Random-Access

Bit line Bit line

Word line Layout of a 6T SRAM Cell



! $% $%&$ !



'()

"# 

Weste and Harris. Introduction to CMOS VLSI Design. Addison-Wesley, 2010. Intel’s 2102 SRAM, 1024 × 1 bit, 1972 2102 Block Diagram SRAM Timing

A12 A11 . D7 CS1 . . D6 . CS2 A2 6264 . A1 8K × 8 . A0 SRAM D1 WE CS1 D0 OE CS2 WE Addr 1 2 OE Data write 1 read 2 6264 SRAM Block Diagram

I/O0 INPUT BUFFER I/O1

A1 I/O2 A2 A3 I/O3 A4 256 x 32 x 8 A5 ARRAY I/O4 A6 A7 I/O A8 5

I/O6

POWER CE1 DOWN I/O7 CE2 COLUMN DECODER WE OE

CY6264-1 Toshiba TC55V16256J 256K × 16

A17 A16

. D15 . A2. D14 A1 . 256K × 16 . A0 . SRAM D1 UB D0 LB WE OE CE Dynamic RAM Cell

Column

Row Ancient (c. 1982) DRAM: 4164 64K × 1

A7 A6 . . . A2 A1 4164 A0 64K × 1 DRAM Din Dout WE CAS RAS Basic DRAM read and write cycles

RAS CAS Addr Row Col Row Col WE Din to write Dout read Page Mode DRAM read cycle

RAS CAS Addr Row Col Col Col WE Din Dout read read read Samsung 8M × 16 SDRAM

BA1 BA0 I/O Control A11 LWE A10 Data Input Register . LDQM . Bank Select . Refresh Counter Output Buffer o eoe Col. Buffer Row Decoder A2 8M x 4 / 4M x 8 / 2M x 16 Sense AMP DQ15 Row Buffer A1 8M x 4 / 4M x 8 / 2M x 16 Address Register DQi DQ14 8M x 4 / 4M x 8 / 2M x 16 A0 CLK × . 8M x 4 / 4M x 8 / 2M x 16 8M 16 . SDRAM . ADD

UDQM DQ1 LRAS LCBR Column Decoder

LDQM DQ0 Latency & Burst Length WE LCKE Programming Register CAS LRAS LCBR LWE LCAS LWCBR LDQM RAS CS Timing Register CKE CLK CKE CS RAS CAS WE L(U)DQM CLK SDRAM: Control Signals

RAS CAS WE Action 1 1 1 NOP 0 0 0 Load mode register 0 1 1 Active (select row) 1 0 1 Read (select column, start burst) 1 0 0 Write (select column, start burst) 1 1 0 Terminate Burst 0 1 0 Precharge (deselect row) 0 0 1 Auto Refresh

Mode register: selects 1/2/4/8-word bursts, CAS latency, burst on write SDRAM: Timing with 2-word bursts

Load Active Write Read Refresh Clk

RAS

CAS

WE

Addr Op R C C

BA B B B

DQ W W R R Using Memory in SystemVerilog Synchronous SRAM

Clock Address A0 A1 A1 Address Read A0 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Synchronous SRAM

Clock Address A0 A1 A1 Address Write A1 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Synchronous SRAM

Clock Address A0 A1 A1 Address Read A1 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Memory Is Fundamentally a Bottleneck

Plenty of bits, but

You can only see a small window each clock cycle

Using memory = scheduling memory accesses

Software hides this from you: sequential programs naturally schedule accesses

You must schedule memory accesses in a hardware design Modeling Synchronous Memory in SystemVerilog

module memory( Write enable input logic clk , input logic write , 4-bit address input logic [3:0] address , 8-bit input bus input logic [7:0] data_in , output logic [7:0] data_out);data_out 8-bit output bus logic [7:0] mem [15:0]; The memory array: 16 8-bit bytes always_ff @(posedge clk)clk begin Clocked if (write) mem[address] <= data_in;data_in data_out <= mem[address];mem[address] Write to array when asked end

endmodule Always read (old) value from array M10K Blocks in the Cyclone V

10 kilobits (10240 bits) per block Dual ported: two addresses, write enable signals Data busses can be 1–20 bits wide Our Cyclone 5CSEMA5 has 397 of these blocks = 496 KB Memory in Quartus: the Megafunction Wizard Memory: Single- or Dual-Ported Memory: Select Port Widths Memory: One or Two Clocks Memory: Output Ports Need Not Be Registered Memory: Wizard-Generated Verilog Module

This generates the following SystemVerilog module:

module memory ( // PortA: input logic [12:0] address_a, // 8192 1-bit words input logic clock_a, input logic [0:0] data_a, input logic wren_a, // Write enable output logic [0:0] q_a, // PortB: input logic [8:0] address_b, // 512 16-bit words input logic clock_b, input logic [15:0] data_b, input logic wren_b, // Write enable output logic [15:0] q_b);

Instantiate like any module; Quartus treats specially Two Ways to Ask for Memory

1. Use the Megafunction Wizard + Warns you in advance about resource usage − Awkward to change 2. Let Quartus infer memory from your code + Better integrated with your code − Easy to inadvertantly ask for garbage The Perils of Memory Inference Failure: Exploded! module twoport( Synthesized to an 854-page input logic clk, input logic [8:0] aa, ab, schematic with 10280 input logic [19:0] da, db, registers (no M10K blocks) input logic wa, wb, output logic [19:0] qa, qb); Page 1 looked like this:

logic [19:0] mem [511:0];

always_ff @(posedge clk) begin if (wa) mem[aa] <= da; qa <= mem[aa]; if (wb) mem[ab] <= db; qb <= mem[ab]; end

endmodule The Perils of Memory Inference Failure module twoport2( input logic clk, input logic [8:0] aa, ab, Still didn’t work: input logic [19:0] da, db, input logic wa, wb, output logic [19:0] qa, qb); RAM logic “mem” is uninferred due to logic [19:0] mem [511:0]; unsupported always_ff @(posedge clk) begin read-during-write behavior if (wa) mem[aa] <= da; qa <= mem[aa]; end

always_ff @(posedge clk) begin if (wb) mem[ab] <= db; qb <= mem[ab]; end

endmodule The Perils of Memory Inference Finally! module twoport3( input logic clk, Took this structure from a input logic [8:0] aa, ab, template: Edit→Insert input logic [19:0] da, db, input logic wa, wb, Template→Verilog HDL→Full output logic [19:0] qa, qb); Designs→RAMs and logic [19:0] mem [511:0]; ROMs→True Dual-Port RAM (single clock) always_ff @(posedge clk) begin

clk qa[0]~reg[19..0] if (wa) begin mem DATAOUT[19..0] D Q qa[19..0] mem[aa] <= da; PORTBDATAOUT[0] CLK PORTBDATAOUT[1] qa <= da; PORTBDATAOUT[2] PORTBDATAOUT[3] qb[0]~reg[19..0] end else qa <= mem[aa]; CLK0 PORTBDATAOUT[4] D Q qb[19..0] da[19..0] DATAIN[19..0] PORTBDATAOUT[5] CLK end PORTBCLK0 PORTBDATAOUT[6] db[19..0] PORTBDATAIN[19..0] PORTBDATAOUT[7] ab[8..0] PORTBRADDR[8..0] PORTBDATAOUT[8] always_ff @(posedge clk) begin PORTBWADDR[8..0] PORTBDATAOUT[9] wb PORTBWE PORTBDATAOUT[10] if (wb) begin aa[8..0] RADDR[8..0] PORTBDATAOUT[11] WADDR[8..0] PORTBDATAOUT[12] mem[ab] <= db; wa WE PORTBDATAOUT[13] PORTBDATAOUT[14] qb <= db; PORTBDATAOUT[15] PORTBDATAOUT[16] end else qb <= mem[ab]; PORTBDATAOUT[17] PORTBDATAOUT[18] end PORTBDATAOUT[19] SYNC_RAM endmodule The Perils of Memory Inference Also works: separate read module twoport4( and write addresses input logic clk, input logic [8:0] ra, wa, q[0]~reg[19..0] D Q q[19..0] input logic write, clk CLK mem input logic [19:0] d, CLK0 output logic [19:0] q); d[19..0] DATAIN[19..0] ra[8..0] RADDR[8..0] DATAOUT[19..0] wa[8..0] WADDR[8..0] logic [19:0] mem [511:0]; write WE SYNC_RAM always_ff @(posedge clk) begin if (write) mem[wa] <= d; Conclusion: q <= mem[ra]; end Inference is fine for single port or one read and one endmodule write port. Use the Megafunction Wizard for anything else.