Memory in Systemverilog
Total Page:16
File Type:pdf, Size:1020Kb
Memory in SystemVerilog Prof. Stephen A. Edwards Columbia University Spring 2015 Implementing Memory Memory = Storage Element Array + Addressing Bits are expensive They should dumb, cheap, small, and tighly packed Bits are numerous Can’t just connect a long wire to each one Williams Tube CRT-based random access memory, 1946. Used on the Manchester Mark I. 2048 bits. Mercury acoustic delay line Used in the EDASC, 1947. 32 × 17 bits Selectron Tube RCA, 1948. 2 × 128 bits Four-dimensional addressing A four-input AND gate at each bit for selection Magnetic Core IBM, 1952. Magnetic Drum Memory 1950s & 60s. Secondary storage. Modern Memory Choices Family Programmed Persistence Mask ROM at fabrication 1 PROM once 1 EPROM 1000s, UV 10 years FLASH 1000s, block 10 years EEPROM 1000s, byte 10 years NVRAM 1 5 years SRAM 1 while powered DRAM 1 64 ms Implementing ROMs 0 Z: “not 0/1 Bitline 2 Bitline 1 Bitline 0 connected” 0 Wordline 0 1 0 1 1 0 0 1 Wordline 1 A1 2-to-4 1 1 0 1 A0 Decoder 2 Wordline 2 1 1 1 0 0 Add. Data 3 Wordline 3 00 011 0 1 0 01 110 10 100 D2 D1 D0 11 010 Implementing ROMs 0 Z: “not 0/1 Bitline 2 Bitline 1 Bitline 0 connected” 0 0 Wordline 0 1 0 1 1 0 0 0 1 Wordline 1 1 A1 2-to-4 1 1 0 0 1 A Decoder 0 1 2 Wordline 2 1 1 1 0 0 0 Add. Data 3 Wordline 3 00 011 0 1 0 01 110 1 0 0 10 100 D2 D1 D0 11 010 Implementing ROMs 0 Z: “not 0/1 connected” 0 1 0 0 1 A1 2-to-4 1 A0 Decoder 2 1 1 Add. Data 3 00 011 01 110 10 100 D2 D1 D0 11 010 Implementing ROMs 0 Z: “not 0/1 connected” 1 0 1 0 0 1 1 A1 2-to-4 1 A Decoder 0 0 2 1 1 1 Add. Data 3 00 011 01 110 1 0 0 10 100 D2 D1 D0 11 010 Mask ROM Die Photo A Floating Gate MOSFET Cross section of a NOR FLASH transistor. Kawai et al., ISSCC 2008 (Renesas) Floating Gate n-channel MOSFET SiO2 Control Gate Floating Gate Drain Source Channel Floating gate uncharged; Control gate at 0V: Off Floating Gate n-channel MOSFET SiO2 Control Gate +++++++++ − − − − − − −− Floating Gate +++++++++ − − − − − − −− Drain Source Channel Floating gate uncharged; Control gate positive: On Floating Gate n-channel MOSFET SiO2 Control Gate ++++ − − −− Floating Gate − − −− ++++ Drain Source Channel Floating gate negative; Control gate at 0V: Off Floating Gate n-channel MOSFET SiO2 Control Gate ++++++++ − − − − − − − Floating Gate −− ++ Drain Source Channel Floating gate negative; Control gate positive: Off EPROMs and FLASH use Floating-Gate MOSFETs Static Random-Access Memory Cell Bit line Bit line Word line Layout of a 6T SRAM Cell ! $% $%&$ ! '() "# Weste and Harris. Introduction to CMOS VLSI Design. Addison-Wesley, 2010. Intel’s 2102 SRAM, 1024 × 1 bit, 1972 2102 Block Diagram SRAM Timing A12 A11 . D7 CS1 . D6 . CS2 A2 6264 . A1 8K × 8 . A0 SRAM D1 WE CS1 D0 OE CS2 WE Addr 1 2 OE Data write 1 read 2 6264 SRAM Block Diagram I/O0 INPUT BUFFER I/O1 A1 I/O2 A2 A3 I/O3 A4 256 x 32 x 8 A5 ARRAY I/O4 A6 A7 I/O A8 5 I/O6 POWER CE1 DOWN I/O7 CE2 COLUMN DECODER WE OE CY6264-1 Toshiba TC55V16256J 256K × 16 A17 A16 . D15 . A2. D14 A1 . 256K × 16 . A0 . SRAM D1 UB D0 LB WE OE CE Dynamic RAM Cell Column Row Ancient (c. 1982) DRAM: 4164 64K × 1 A7 A6 . A2. A1 4164 A0 64K × 1 DRAM Din Dout WE CAS RAS Basic DRAM read and write cycles RAS CAS Addr Row Col Row Col WE Din to write Dout read Page Mode DRAM read cycle RAS CAS Addr Row Col Col Col WE Din Dout read read read Samsung 8M × 16 SDRAM BA1 BA0 I/O Control A11 LWE A10 Data Input Register . LDQM . Bank Select . Refresh Counter Output Buffer Output Row Decoder Col. Buffer A2 8M x 4 / 4M x 8 / 2M x 16 Sense AMP DQ15 Row Buffer A1 8M x 4 / 4M x 8 / 2M x 16 Address Register Address DQi DQ14 8M x 4 / 4M x 8 / 2M x 16 A0 CLK × . 8M x 4 / 4M x 8 / 2M x 16 8M 16 . SDRAM . ADD UDQM DQ1 LRAS LCBR Column Decoder LDQM DQ0 Latency & Burst Length WE LCKE Programming Register CAS LRAS LCBR LWE LCAS LWCBR LDQM RAS CS Timing Register CKE CLK CKE CS RAS CAS WE L(U)DQM CLK SDRAM: Control Signals RAS CAS WE Action 1 1 1 NOP 0 0 0 Load mode register 0 1 1 Active (select row) 1 0 1 Read (select column, start burst) 1 0 0 Write (select column, start burst) 1 1 0 Terminate Burst 0 1 0 Precharge (deselect row) 0 0 1 Auto Refresh Mode register: selects 1/2/4/8-word bursts, CAS latency, burst on write SDRAM: Timing with 2-word bursts Load Active Write Read Refresh Clk RAS CAS WE Addr Op R C C BA B B B DQ W W R R Using Memory in SystemVerilog Basic Memory Model Clock Address A0 A1 A1 Address Read A0 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Basic Memory Model Clock Address A0 A1 A1 Address Write A1 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Basic Memory Model Clock Address A0 A1 A1 Address Read A1 Data In Data Out Memory Data In D1 Write Write Clock Data Out D0 old D1 D1 Memory Is Fundamentally a Bottleneck Plenty of bits, but You can only see a small window each clock cycle Using memory = scheduling memory accesses Software hides this from you: sequential programs naturally schedule accesses You must schedule memory accesses in a hardware design Modeling Synchronous Memory in SystemVerilog module memory( Write enable input logic clk , input logic write , 4-bit address input logic [3:0] address , 8-bit input bus input logic [7:0] data_in , output logic [7:0] data_out);data_out 8-bit output bus logic [7:0] mem [15:0]; The memory array: 16 8-bit bytes always_ff @(posedge clk)clk begin Clocked if (write) mem[address] <= data_in;data_in data_out <= mem[address];mem[address] Write to array when asked end endmodule Always read (old) value from array M10K Blocks in the Cyclone V 10 kilobits (10240 bits) per block Dual ported: two addresses, write enable signals Data busses can be 1–20 bits wide Our Cyclone V 5CSXFC6 has 557 of these blocks (696 KB) Memory in Quartus: the Megafunction Wizard Memory: Single- or Dual-Ported Memory: Select Port Widths Memory: One or Two Clocks Memory: Output Ports Need Not Be Registered Memory: Wizard-Generated Verilog Module This generates the following SystemVerilog module: module memory ( // PortA: input logic [12:0] address_a, // 8192 1-bit words input logic clock_a, input logic [0:0] data_a, input logic wren_a, // Write enable output logic [0:0] q_a, // PortB: input logic [8:0] address_b, // 512 16-bit words input logic clock_b, input logic [15:0] data_b, input logic wren_b, // Write enable output logic [15:0] q_b); Instantiate like any module; Quartus treats specially Two Ways to Ask for Memory 1. Use the Megafunction Wizard + Warns you in advance about resource usage − Awkward to change 2. Let Quartus infer memory from your code + Better integrated with your code − Easy to inadvertantly ask for garbage The Perils of Memory Inference Failure: Exploded! module twoport( Synthesized to an 854-page input logic clk, input logic [8:0] aa, ab, schematic with 10280 input logic [19:0] da, db, registers (no M10K blocks) input logic wa, wb, output logic [19:0] qa, qb); Page 1 looked like this: logic [19:0] mem [511:0]; always_ff @(posedge clk) begin if (wa) mem[aa] <= da; qa <= mem[aa]; if (wb) mem[ab] <= db; qb <= mem[ab]; end endmodule The Perils of Memory Inference Failure module twoport2( input logic clk, input logic [8:0] aa, ab, Still didn’t work: input logic [19:0] da, db, input logic wa, wb, output logic [19:0] qa, qb); RAM logic “mem” is uninferred due to logic [19:0] mem [511:0]; unsupported always_ff @(posedge clk) begin read-during-write behavior if (wa) mem[aa] <= da; qa <= mem[aa]; end always_ff @(posedge clk) begin if (wb) mem[ab] <= db; qb <= mem[ab]; end endmodule The Perils of Memory Inference Finally! module twoport3( input logic clk, Took this structure from a input logic [8:0] aa, ab, template: Edit!Insert input logic [19:0] da, db, input logic wa, wb, Template!Verilog HDL!Full output logic [19:0] qa, qb); Designs!RAMs and logic [19:0] mem [511:0]; ROMs!True Dual-Port RAM (single clock) always_ff @(posedge clk) begin clk qa[0]~reg[19..0] if (wa) begin mem DATAOUT[19..0] D Q qa[19..0] mem[aa] <= da; PORTBDATAOUT[0] CLK PORTBDATAOUT[1] qa <= da; PORTBDATAOUT[2] PORTBDATAOUT[3] qb[0]~reg[19..0] end else qa <= mem[aa]; CLK0 PORTBDATAOUT[4] D Q qb[19..0] da[19..0] DATAIN[19..0] PORTBDATAOUT[5] CLK end PORTBCLK0 PORTBDATAOUT[6] db[19..0] PORTBDATAIN[19..0] PORTBDATAOUT[7] ab[8..0] PORTBRADDR[8..0] PORTBDATAOUT[8] always_ff @(posedge clk) begin PORTBWADDR[8..0] PORTBDATAOUT[9] wb PORTBWE PORTBDATAOUT[10] if (wb) begin aa[8..0] RADDR[8..0] PORTBDATAOUT[11] WADDR[8..0] PORTBDATAOUT[12] mem[ab] <= db; wa WE PORTBDATAOUT[13] PORTBDATAOUT[14] qb <= db; PORTBDATAOUT[15] PORTBDATAOUT[16] end else qb <= mem[ab]; PORTBDATAOUT[17] PORTBDATAOUT[18] end PORTBDATAOUT[19] SYNC_RAM endmodule The Perils of Memory Inference Also works: separate read module twoport4( and write addresses input logic clk, input logic [8:0] ra, wa, q[0]~reg[19..0] D Q q[19..0] input logic write, clk CLK mem input logic [19:0] d, CLK0 output logic [19:0] q); d[19..0] DATAIN[19..0] ra[8..0] RADDR[8..0] DATAOUT[19..0] wa[8..0] WADDR[8..0] logic [19:0] mem [511:0]; write WE SYNC_RAM always_ff @(posedge clk) begin if (write) mem[wa] <= d; Conclusion: q <= mem[ra]; end Inference is fine for single port or one read and one endmodule write port.