Memory in SystemVerilog
Prof. Stephen A. Edwards
Columbia University
Spring 2015 Implementing Memory Memory = Storage Element Array + Addressing
Bits are expensive They should dumb, cheap, small, and tighly packed
Bits are numerous Can’t just connect a long wire to each one Williams Tube
CRT-based random access memory, 1946. Used on the Manchester Mark I. 2048 bits. Mercury acoustic delay line
Used in the EDASC, 1947.
32 × 17 bits Selectron Tube
RCA, 1948.
2 × 128 bits
Four-dimensional addressing
A four-input AND gate at each bit for selection Magnetic Core
IBM, 1952. Magnetic Drum Memory
1950s & 60s. Secondary storage. Modern Memory Choices Family Programmed Persistence
Mask ROM at fabrication ∞
PROM once ∞
EPROM 1000s, UV 10 years
FLASH 1000s, block 10 years
EEPROM 1000s, byte 10 years
NVRAM ∞ 5 years
SRAM ∞ while powered
DRAM ∞ 64 ms Implementing ROMs
0 Z: “not 0/1 Bitline 2 Bitline 1 Bitline 0 connected” 0 Wordline 0 1 0 1 1
0 0 1 Wordline 1
A1 2-to-4 1 1 0 1 A0 Decoder 2 Wordline 2 1 1 1 0 0
Add. Data 3 Wordline 3 00 011 0 1 0 01 110 10 100 D2 D1 D0 11 010 Implementing ROMs
0 Z: “not 0/1 Bitline 2 Bitline 1 Bitline 0 connected” 0 0 Wordline 0 1 0 1 1
0 0 0 1 Wordline 1 1 A1 2-to-4 1 1 0 0 1 A Decoder 0 1 2 Wordline 2 1 1 1 0 0
0 Add. Data 3 Wordline 3 00 011 0 1 0 01 110 1 0 0 10 100 D2 D1 D0 11 010 Implementing ROMs
0 Z: “not 0/1 connected” 0 1
0 0 1
A1 2-to-4 1 A0 Decoder 2 1 1
Add. Data 3 00 011 01 110 10 100 D2 D1 D0 11 010 Implementing ROMs
0 Z: “not 0/1 connected” 1 0 1
0 0 1 1
A1 2-to-4 1 A Decoder 0 0 2 1 1
1 Add. Data 3 00 011 01 110 1 0 0 10 100 D2 D1 D0 11 010 Mask ROM Die Photo A Floating Gate MOSFET
Cross section of a NOR FLASH transistor. Kawai et al., ISSCC 2008 (Renesas) Floating Gate n-channel MOSFET
SiO2 Control Gate
Floating Gate
Drain Source Channel
Floating gate uncharged; Control gate at 0V: Off Floating Gate n-channel MOSFET
SiO2 Control Gate +++++++++ − − − − − − −− Floating Gate +++++++++ − − − − − − −− Drain Source Channel
Floating gate uncharged; Control gate positive: On Floating Gate n-channel MOSFET
SiO2 Control Gate ++++ − − −− Floating Gate − − −− ++++ Drain Source Channel
Floating gate negative; Control gate at 0V: Off Floating Gate n-channel MOSFET
SiO2 Control Gate ++++++++ − − − − − − − Floating Gate −− ++ Drain Source Channel
Floating gate negative; Control gate positive: Off EPROMs and FLASH use Floating-Gate MOSFETs Static Random-Access Memory Cell
Bit line Bit line
Word line Layout of a 6T SRAM Cell
! $% $%&$ !
'()
"#
Weste and Harris. Introduction to CMOS VLSI Design. Addison-Wesley, 2010. Intel’s 2102 SRAM, 1024 × 1 bit, 1972 2102 Block Diagram SRAM Timing
A12 A11 . D7 CS1 . . D6 . CS2 A2 6264 . A1 8K × 8 . A0 SRAM D1 WE CS1 D0 OE CS2 WE Addr 1 2 OE Data write 1 read 2 6264 SRAM Block Diagram
I/O0 INPUT BUFFER I/O1
A1 I/O2 A2 A3 I/O3 A4 256 x 32 x 8 A5 ARRAY I/O4 A6 A7 I/O A8 5
I/O6
POWER CE1 DOWN I/O7 CE2 COLUMN DECODER WE OE
CY6264-1 Toshiba TC55V16256J 256K × 16
A17 A16
. D15 . A2. D14 A1 . 256K × 16 . A0 . SRAM D1 UB D0 LB WE OE CE Dynamic RAM Cell
Column
Row Ancient (c. 1982) DRAM: 4164 64K × 1
A7 A6
. . A2. A1 4164 A0 64K × 1 DRAM Din Dout WE CAS RAS Basic DRAM read and write cycles
RAS CAS Addr Row Col Row Col WE Din to write Dout read Page Mode DRAM read cycle
RAS CAS Addr Row Col Col Col WE Din Dout read read read Samsung 8M × 16 SDRAM
BA1 BA0 I/O Control A11 LWE A10 Data Input Register . LDQM . Bank Select . Refresh Counter Output Buffer o eoe Col. Buffer Row Decoder A2 8M x 4 / 4M x 8 / 2M x 16 Sense AMP DQ15 Row Buffer A1 8M x 4 / 4M x 8 / 2M x 16 Address Register DQi DQ14 8M x 4 / 4M x 8 / 2M x 16 A0 CLK × . 8M x 4 / 4M x 8 / 2M x 16 8M 16 . SDRAM . ADD
UDQM DQ1 LRAS LCBR Column Decoder
LDQM DQ0 Latency & Burst Length WE LCKE Programming Register CAS LRAS LCBR LWE LCAS LWCBR LDQM RAS CS Timing Register CKE CLK CKE CS RAS CAS WE L(U)DQM CLK SDRAM: Control Signals
RAS CAS WE Action 1 1 1 NOP 0 0 0 Load mode register 0 1 1 Active (select row) 1 0 1 Read (select column, start burst) 1 0 0 Write (select column, start burst) 1 1 0 Terminate Burst 0 1 0 Precharge (deselect row) 0 0 1 Auto Refresh
Mode register: selects 1/2/4/8-word bursts, CAS latency, burst on write SDRAM: Timing with 2-word bursts
Load Active Write Read Refresh Clk
RAS
CAS
WE
Addr Op R C C
BA B B B
DQ W W R R Using Memory in SystemVerilog Basic Memory Model
Clock Address A0 A1 A1 Address Read A0 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Basic Memory Model
Clock Address A0 A1 A1 Address Write A1 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Basic Memory Model
Clock Address A0 A1 A1 Address Read A1 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Memory Is Fundamentally a Bottleneck
Plenty of bits, but
You can only see a small window each clock cycle
Using memory = scheduling memory accesses
Software hides this from you: sequential programs naturally schedule accesses
You must schedule memory accesses in a hardware design Modeling Synchronous Memory in SystemVerilog
module memory( Write enable input logic clk , input logic write , 4-bit address input logic [3:0] address , 8-bit input bus input logic [7:0] data_in , output logic [7:0] data_out);data_out 8-bit output bus logic [7:0] mem [15:0]; The memory array: 16 8-bit bytes always_ff @(posedge clk)clk begin Clocked if (write) mem[address] <= data_in;data_in data_out <= mem[address];mem[address] Write to array when asked end
endmodule Always read (old) value from array M10K Blocks in the Cyclone V
10 kilobits (10240 bits) per block Dual ported: two addresses, write enable signals Data busses can be 1–20 bits wide Our Cyclone V 5CSXFC6 has 557 of these blocks (696 KB) Memory in Quartus: the Megafunction Wizard Memory: Single- or Dual-Ported Memory: Select Port Widths Memory: One or Two Clocks Memory: Output Ports Need Not Be Registered Memory: Wizard-Generated Verilog Module
This generates the following SystemVerilog module:
module memory ( // PortA: input logic [12:0] address_a, // 8192 1-bit words input logic clock_a, input logic [0:0] data_a, input logic wren_a, // Write enable output logic [0:0] q_a, // PortB: input logic [8:0] address_b, // 512 16-bit words input logic clock_b, input logic [15:0] data_b, input logic wren_b, // Write enable output logic [15:0] q_b);
Instantiate like any module; Quartus treats specially Two Ways to Ask for Memory
1. Use the Megafunction Wizard + Warns you in advance about resource usage − Awkward to change 2. Let Quartus infer memory from your code + Better integrated with your code − Easy to inadvertantly ask for garbage The Perils of Memory Inference Failure: Exploded! module twoport( Synthesized to an 854-page input logic clk, input logic [8:0] aa, ab, schematic with 10280 input logic [19:0] da, db, registers (no M10K blocks) input logic wa, wb, output logic [19:0] qa, qb); Page 1 looked like this:
logic [19:0] mem [511:0];
always_ff @(posedge clk) begin if (wa) mem[aa] <= da; qa <= mem[aa]; if (wb) mem[ab] <= db; qb <= mem[ab]; end
endmodule The Perils of Memory Inference Failure module twoport2( input logic clk, input logic [8:0] aa, ab, Still didn’t work: input logic [19:0] da, db, input logic wa, wb, output logic [19:0] qa, qb); RAM logic “mem” is uninferred due to logic [19:0] mem [511:0]; unsupported always_ff @(posedge clk) begin read-during-write behavior if (wa) mem[aa] <= da; qa <= mem[aa]; end
always_ff @(posedge clk) begin if (wb) mem[ab] <= db; qb <= mem[ab]; end
endmodule The Perils of Memory Inference Finally! module twoport3( input logic clk, Took this structure from a input logic [8:0] aa, ab, template: Edit→Insert input logic [19:0] da, db, input logic wa, wb, Template→Verilog HDL→Full output logic [19:0] qa, qb); Designs→RAMs and logic [19:0] mem [511:0]; ROMs→True Dual-Port RAM (single clock) always_ff @(posedge clk) begin
clk qa[0]~reg[19..0] if (wa) begin mem DATAOUT[19..0] D Q qa[19..0] mem[aa] <= da; PORTBDATAOUT[0] CLK PORTBDATAOUT[1] qa <= da; PORTBDATAOUT[2] PORTBDATAOUT[3] qb[0]~reg[19..0] end else qa <= mem[aa]; CLK0 PORTBDATAOUT[4] D Q qb[19..0] da[19..0] DATAIN[19..0] PORTBDATAOUT[5] CLK end PORTBCLK0 PORTBDATAOUT[6] db[19..0] PORTBDATAIN[19..0] PORTBDATAOUT[7] ab[8..0] PORTBRADDR[8..0] PORTBDATAOUT[8] always_ff @(posedge clk) begin PORTBWADDR[8..0] PORTBDATAOUT[9] wb PORTBWE PORTBDATAOUT[10] if (wb) begin aa[8..0] RADDR[8..0] PORTBDATAOUT[11] WADDR[8..0] PORTBDATAOUT[12] mem[ab] <= db; wa WE PORTBDATAOUT[13] PORTBDATAOUT[14] qb <= db; PORTBDATAOUT[15] PORTBDATAOUT[16] end else qb <= mem[ab]; PORTBDATAOUT[17] PORTBDATAOUT[18] end PORTBDATAOUT[19] SYNC_RAM endmodule The Perils of Memory Inference Also works: separate read module twoport4( and write addresses input logic clk, input logic [8:0] ra, wa, q[0]~reg[19..0] D Q q[19..0] input logic write, clk CLK mem input logic [19:0] d, CLK0 output logic [19:0] q); d[19..0] DATAIN[19..0] ra[8..0] RADDR[8..0] DATAOUT[19..0] wa[8..0] WADDR[8..0] logic [19:0] mem [511:0]; write WE SYNC_RAM always_ff @(posedge clk) begin if (write) mem[wa] <= d; Conclusion: q <= mem[ra]; end Inference is fine for single port or one read and one endmodule write port. Use the Megafunction Wizard for anything else.