Via-Programmable Read-Only Memory Design for Full Code Coverage Using A Dynamic Bit-Line Shielding Technique

Meng-Fan Chang1, 2, Ding-Ming Kwai2, and Kuei-Ann Wen1 1 Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan 2 Intellectual Property Library Company, Hsinchu, Taiwan [email protected]

Abstract 0-cell 1-cell 0-cell

Crosstalk between bit lines leads to read-1 failure Metal-3 (BL) in a high-speed via-programmable read only memory (ROM) and limits the coverage of applicable code Via-2 (Code) patterns. Due to the fluctuations in bit-line intrinsic Metal-2 and coupling capacitances, the amount of noise coupled to a selected bit line may vary, resulting in the Via-1 reduction of sensing margin. In this paper, we propose a dynamic bit-line shielding (DBS) technique, suitable Metal-1 to be implemented in compliable ROM, to eliminate the Contact crosstalk-induced read failure and to achieve full code coverage. Experiments of the 256Kb instances with Diffusion Diffusion Diffusion and without the DBS circuit were undertaken using STI STI STI STI 0.25µm and 0.18µm standard CMOS processes. The test results demonstrate the read-1 failures and Fig. 1. Cross-sectional view of ROM array with via-2 confirm that the DBS technique can remove them codes and metal-3 bit lines. successfully, allowing the ROM to operate under a wide range of supply voltage. sectional view of the memory array with via-2 codes 1. Introduction and metal-3 bit lines. This leads to more severe bit-line crosstalk, since the contact and via at the lower levels Mask-programmable read-only memory (ROM) must be stacked and the metal islands for them to land macros are commonly embedded into system-on-chip upon contribute considerable side-wall capacitances to (SoC) designs today. To meet the instance neighboring bit lines. requirements for a variety of configurations, ROM In a configurable design, where the memory array is compilers targeting at CMOS baseline processes have partitioned into at most two banks, the length of a bit been developed and the available code layers include line is determined by the choice of column multiplexers, diffusion, metal, and via. Compared to those coded by such as 16, 32, and 64. Code patterns on a via- diffusion and metal layers, via-programmable ROMs programmable ROM can produce wide variations in shorten the turnaround time in manufacturing [1]. bit-line loading and coupling capacitances. To achieve Consequently, it is preferred that the coding via can be a high speed, the word-line pulse width is short, and the promoted to an upper level. sensing margin becomes small and vulnerable to noise. We have seen synchronous ROM compilers The crosstalk-induced read failure limits the coverage designed to be coded by via-1 at the 0.25µm generation of applicable code patterns. and by via-2 at the 0.18µm generation. However, the To reduce power consumption and to improve speed coding via at an upper level implies that the bit lines performance of ROMs, previous works employed must be promoted as well. Fig. 1 shows a cross- schemes, such as precharge-discharge dynamic CMOS

Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE logic [2], charge recycling [3], restricted bit-line swing 0-cell [4]-[6], and divided wordlines [7]-[9]. These studies have not addressed their solutions to the crosstalk- induced read failure across different code patterns. In Code (Via) this paper, we investigate this problem and present a dynamic bit-line shielding (DBS) technique to achieve WL (Poly) 100% code coverage for high-speed via-programmable ROMs. To our knowledge, this is the first to deal with VSS the code-dependent read failure induced by crosstalk (Diffusion) between bit lines. The remainder of our presentation is organized as follows. Section 2 analyzes the behavior of the code- dependent crosstalk-induced read failure. Section 3 BL (Metal) presents the proposed dynamic bit-line shielding technique. Section 4 gives the experimental results on 0.25µm via-1 and 0.18µm via-2 programmable ROMs. 1-cell Section 5 contains our conclusion. Fig.2. The circuit and layout of a cell array in via- 2. Crosstalk Induced Read Failure programmable ROMs. The word line (WL), ground line (VSS), and bit line (BL) are implemented by poly, 2.1. Sensing Margin diffusion, via, and metal, respectively.

In typical NOR-type via-programmable ROMs, the stored in a bit cell is determined by whether the reading a 1-cell is VDD – 'V1 – VREF. To detect the data electrical connection between the transistor of the bit correctly, the voltage drops 'V0 and 'V1 must satisfy cell and the bit line is present or not. As illustrated in the inequalities 'V0 > VDD – VREF and 'V1 < VDD – VREF Fig. 2, a bit cell with its transistor connected to the bit simultaneously by noting that both SM0 and SM1 must line (BL) stores a 0 (0-cell), and a bit cell without such be larger than zero under the parametric variations. a connection stores a 1 (1-cell). When a row is Thus, the minimum of 'V0, denoted as 'V0min, and the activated by applying a high voltage to the relevant maximum of 'V1, denoted as 'V1max, deserve our word line (WL), the 0-cell sinks current from the bit attention, since 'V0min – 'V1max = SM0 + SM1. In other line, while the 1-cell does not sink any current. The 0- words, once 'V0min and 'V1max are determined, no cell also adds the parasitic capacitance on the drain matter how we adjust VREF, the total sensing margin side of its transistor to the bit line. cannot be improved. Correspondingly, the capacitive loading on a bit line may vary by different code patterns. The bit line of a 2.2. Crosstal-Induced Read Failure in ROMs column whose cells are all 0-cells experiences the largest loading effect. This fluctuation in parasitic Failure to sense a 1-cell correctly is one of the major capacitance causes various voltage swings on a bit line consequences when crosstalk between bit lines occurs. when a 0-cell is accessed. Fig. 3 shows the behaviors without and with crosstalk In a conventional synchronous ROM design, all bit between bit lines in a 256Kb ROM macro. The lines are precharged to the power supply voltage VDD memory array is configured as 512 u 512 in a single prior to the sensing phase of a cycle [5]. During the bank. Without crosstalk in Fig. 3(a), the bit-line sensing phase, a bit line is either discharged or floating. waveforms are BL-A1 and BL-A2 when reading a 0- Here, we assume that the bit line develops a voltage cell with light (with 511 1-cells) and heavy (with 512 swing of 'V0 when reading a 0-cell, and that of 'V1 0-cells) bit-line loads, and BL-D when reading a 1-cell. when reading a 1-cell. To differentiate a 0-cell from a Recall that a configurable design activates all bit 1-cell, the sense amplifier then needs a reference cells on the same row by a word line, while reading out voltage VREF whose value must be set between VDD – only those on the bit lines selected by the column 'V0 and VDD – 'V1. The sensing scheme with a multiplexers. A 0-cell of an unselected bit line, which predefined VREF [10] is employed in this work. shares the same word line, becomes an aggressor. Its

Clearly, the sensing margin SM0 for reading a 0-cell voltage drop 'V0 generates various amounts of is VREF – VDD + 'V0 and the sensing margin SM1 for coupling noise on the selected bit line, as we notice that

Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE BL-D Sense a 1-cell BL-V2 BL[j-2] BL[j-1] BL[j] BL[j+1] BL[j+2] WL [j]

VREF

CC[j-2,j-1] CC[j-1,j] CC[j, j+1] CC[j+1, j+2] ICELL BL-A2 BL-A2 Sense a WL 0-cell WL CBL[j] BL-V1 Read-1 Failure Precharge 1-cell BL-A1

(a) (b) 0-cell

Fig. 3. Simulated waveforms of a 512-cell bit line in a Y[j-1] Y[j] Y[j+1] conventional 0.25µm via-1 ROM design (a) without and (b) with crosstalk. Sense Amplifier

in Fig. 3(a), 'V0 can vary widely from a full VDD swing to a few hundred millivolts. Fig. 4. A simplified structure of bit-lines and In Fig. 3(b), two such aggressors (i.e., 0-cells) are multiplexing circuit in a conventional ROM. the nearest neighbors. With light bit-line load, a significant voltage drop occurs which causes the read failure of 1-cell. The voltage drop 'V1 on the victim word line WL[i] is turned on. Voltage swings on BL[j – bit line BL-V1 for reading a 1-cell is larger than VDD – 1] and BL[j +1] generate coupling noise through CC[j – 1, j] and CC[j, j + 1] onto BL[j]. VREF. As a result, the sense amplifier mistakenly detects a 0, rather than the expected 1. With heavy bit- Given word-line pulse width TWL and cell current ICELL, the voltage drop 'V0[j] of an aggressor bit line line load, 'V1 is smaller than VDD – VREF, as on BL- V2, and thus, the sense amplifier still detects the correct BL[j] can be derived as data. We find that it does not work to prevent such a large I CELL ˜TWL (1) 'V0 >@j voltage drop by adding a static bit-line pull-up C BL >@j  C C >j 1, j @ C C >j , j 1 @ (bleeder), if 'V1max is much larger than 'V0min. Because the cell transistor is already drawn with the minimum With two aggressors present at the neighboring bit channel width (in our case, 0.58µm for the 0.25µm lines, the coupled voltage 'V1[j] on a victim bit line process and 0.42µm for the 0.18µm process), coupling BL[j] can be derived as capacitance between bit lines and intrinsic bit-line load play a key role in the crosstalk effect. Unfortunately, as C C C>@j1,j C>@j,j1 (2) the feature size scales down, the coupling capacitance 'V1>@j 'V0 >j1 @˜ 'V0>@j1˜ increases, even through the intrinsic bit-line CBL>@j CC >j1,j @ CBL>@j CC >j,j1 @ capacitance decreases [11]. For the 0.25µm and 0.18µm processes that we used in this work, their metal Intuitively, a victim bit line suffers the largest thickness and inter-metal dielectric (IMD) thickness are coupling noise when it has no 0-cell and its neighbors virtually kept the same. The trend limits the number of on the same row are all 0-cells which turn into bit cells connected to a bit line in a conventional aggressors. Moreover, the aggressor bit lines have light synchronous ROM using advanced technologies. loading to gain the maximum strength. In practice, Fig. 4 depicts a simplified bit-line structure with the however, we are not able to obtain the value simply parasitic capacitances. The unselected bit lines BL[j–1] from (1) and (2), because they are intertwined by a and BL[j+1], which are the nearest neighbors to the more complex RC network with the metal (bit-line) selected bit line BL[j], have 0-cells activated when the sheet resistance ranges from 55m:/ to 100m:/

Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE 1.2 BL[j-2] BL[j-1] BL[j] BL[j+1] BL[j+2]

1 WL [j]

0.8 C CC[j-1,j] C C I Worst Case C[j-2,j-1] C[j, j+1] C[j+1, j+2] CELL 0.6

0.4 CBL[j] Typical Case

Maximum Voltage Drop (V) Drop Voltage Maximum 0.2 Pre_Odd 1-cell 0 0 64 128 192 256 320 384 448 512 Pre_Even 0-cell

Number of 0-Cells on the Victim Bit Line Y[j-1] Y[j] Y[j+1] Fig. 5. Maximum voltage drop versus number of 0- cells on a victim bit-line across code-patterns. Sense Amplifier

and the via resistance from 4: to 8:. Parasitic Fig. 6. A simplified structure of bit-lines and extraction and circuit simulation are used to obtain the multiplexing circuit of the DBS ROM. worst case. Fig. 5 shows the simulation results for the worst and typical cases in the 0.25µm via-1 ROM design. It selected bit lines and are dynamically specified isnoteworthy to point out that if the number of 0-cells according to the input address of each cycle. on the victim bit line is larger than 256 or if the number In Fig. 6, we let the clamping transistors be shared of 0-cells on the aggressor bit line is smaller than 256, with the precharge ones. Thus, the control signals for a correct sensing may still be achieved. both precharging and clamping are encoded with the least significant bit of the column address. Accordingly, 3. Dynamic Bit-line Shielding Technique the area overhead of the clamping circuit and the control for odd/even precharge signals is Since the fluctuations in coupling capacitance and relatively minor. intrinsic loading on the bit lines across various code If the bit line BL[j] is selected, then the column patterns are inevitable, restricting the voltage swing of select signal Y[j] turns on the path to pass the signal on the aggressors helps to reduce the coupled noise on the BL[j] to the sense amplifier. The Pre_Odd signal victims. The dynamic bit-line shielding (DBS) enables the clamping function on odd-numbered bit technique is proposed to eliminate the crosstalk lines (i.e., BL[jr1], BL[jr3], and so on). This clamping between bit lines and to prevent the read failure. scheme results in no voltage swing on the odd- The behavior of the DBS ROM in the precharge numbered bit lines, during the sensing phase. However, phase is the same as that of a conventional one: All bit the unselected even-numbered bit lines (i.e., BL[jr2] lines are precharged to VDD before a word line is turned and so on) still have voltage swings as in a on. After the input address is decoded, the selected bit conventional ROM design, if 0-cells are activated line BL[j] is connected to the sense amplifier through during the sensing phase. the multiplexer controlled by the column select signal This clamping eliminates the major coupling noise Y[j]. source for the selected bit lines and mitigates the However, the behavior of the sensing phase is crosstalk induced read failure. A variant to clamp only different. In the DBS ROM, a half of the bit lines, the nearest neighbors of the selected bit lines can either odd- or even-numbered, are clamped at a fixed certainly be implemented, but requires a more complex voltage during the sensing phase. The unselected bit control scheme. lines, whose voltage level is maintained by the In summary, a bit line during the sensing phase by clamping transistors, now act as the shielding for the the DBS technique is in one of the following three

Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE states: selected read, unselected read, or unselected Table I. 0.25µm and 0.18µm Via-Programmable ROM clamp. In the unselected clamp state, the nearest Macros neighbors of the selected bit line are put into the Technology 0.25 0.18 crosstalk-elimination mode. The clamped bit lines (µm) (even or odd) are dynamically selected based on the Capacity 256K 256K least significant bit of the column address. (bits) Macro Area 0.53 0.31 4. Experimental Results (mm2) Bit Cell Size 1.44 0.75 Two test chips have been fabricated using 0.25µm (µm2) and 0.18µm CMOS processes, respectively. Each test Code Layer Via-1 Via-2 chip contains two 256Kb ROM macros implementing a VDD (V) 2.5 1.8 conventional synchronous design and its DBS VREF (V) 2.35 2.25 2.15 1.65 1.55 1.45 counterpart, respectively. The memory array is configured as 512 u 512 in a single bank; the 0.25µm ROMs are coded by the via-1 layer and the 0.18µm ones by the via-2. For crosstalk analysis, the macros are 100000 Raised Reference equipped with two external pins to raise or to lower the

. Vo lt age 10000 reference voltage VREF by 100mV. Table I lists the features of the ROM macros.mail addresses if possible. A realistic code pattern was applied to the memory 1000 array to explore the crosstalk-induced read failures. Original Reference Volt age The code pattern implements three cases of bit-line 100 loads, respectively, with 1, 256, and 511 0-cells on the Lowered Reference bit lines, while making the victim bit lines read 1-cells Number of Bit Failures 10 and the aggressor bit lines read 0-cells. Vo lt age The conventional ROM design fails reading the 1- 1 cells with the three V settings in the nominal V REF DD 1.5 1.75 2 2.25 2.5 2.75 3 range, even though lowering VREF helps to reduce the number of failing bits. Fig. 7 shows an example using Sup ply Voltage (V) the 0.25µm test chip. Functional tests were performed by varying V from 2.9V to 1.5V in a step of 0.05V, DD Fig. 7. Number of read-1 failures in the conventional and in each step, the number of failing bits is recorded. 0.25µm via-1 ROM. The read failures appear to depend on VDD. As VDD is lowered, the aggressor strength becomes weaker and

the number of failing bits is reduced. With the original 1.5V 2.0V 2.5V 3.0V 5ns 5ns VREF setting, the test chip can eventually pass at VDD = * 1.55V and by lowering VREF, the operable supply ***** voltage is improved to 2.1V. ******** *********** On the other hard, the DBS ROM passes the 10ns *************10ns functional tests with all V settings. Fig. 8 shows the *************** REF ***************** Shmoo plot of the 0.25µm DBS counterpart with VDD ****************** ******************** sweeping from 3.0V to 1.5V in a step of 0.05V. The 15ns *********************15ns

0.25µm DBS ROM is shown to be able to operate at tACC ********************** *********************** the supply voltage down to 1.6V. The test results ************************ demonstrate that the DBS technique indeed eliminates ************************* 20ns **************************20ns the read-1 failures and also provides headroom for the *************************** *************************** ROM to have a higher VREF value. **************************** ***************************** 25ns *****************************25ns 5. Conclusion 1.5V 2.0V 2.5V 3.0V VDD We have investigated the crosstalk between bit lines and read-1 failures across various code patterns in

Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE Fig. 8. Shmoo plot of the 0.25µm DBS ROM by VDD [3] B. D. Yang and L. S. Kim, “A low-power ROM using sweep charge recycling and charge sharing techniques,” IEEE synchronous via-programmable ROMs. The J. Solid-State Circuits, pp. 641-653, Apr. 2003. conventional design was shown to be dependent of V [4] M.-F. Chang, K.-A. Wen and D.-M. Kwai, “Supply and DD substrate noise tolerance using dynamic tracking due to the crosstalk-induced read failures. Then, the clusters in configurable memory designs,” Proc. IEEE DBS technique was proposed to eliminate the crosstalk Int’l Symp. Quality Electronic Design, pp. 225- 230, and to preserve the sensing margin. The test chips Mar. 2004. demonstrated its effectiveness and achieved 100% code [5] R. L. Barry et al., “A high-performance ROM compiler coverage under high-speed operation. This DBS circuit, for 0.50µm and 0.36µm CMOS technologies,” Proc. albeit simple, has been implemented in our 0.25µm and IEEE Int’l ASIC Conf., Austin, TX, pp. 370-373, Sep. 0.18µm compilers, which aim primarily at area 1995. minimization, and proven successfully in mass [6] T. Tsang, “A compilable read-only-memory library for production. ASIC deep sub-micron applications” Proc. IEEE 11th Int’l Conf. VLSI Design, Chennai, India, pp. 490–494, Jan. 1998. Acknowledgments [7] M. Yoshimoto et al., “A Divided Wordline Structure in the Static RAM and its Application to a 64K Full The authors would like to thank S.-M. Yang, S.-J. CMOS RAM,” IEEE J. Solid-State Circuits., vol. 18, no. Shen, H.-Y. Pan, C.-H. Lee, Y.-C. Liao, Agnes Hsu , 5, Oct. pp. 479–485. 1983. and G.-S. Chaung of Intellectual Property Library [8] K. Itoh et al. “Trends in low-power RAM circuit Company (IPLib) for the implementation and testing of technologies,” in Proc. IEEE, vol. 83 no. 4, Apr. pp. 524–543, 1995. the testchips. [9] M. -F. Chang, M. J. Irwin, and R. M. Owens, “Power- area tradeoff in dual word line arrays,” J. References Circuits, Systems and Computers, vol. 7, no. 1, pp. 49– 67, Feb. 1997. [1] H. Takahashi, S. Muramatsu, and M. Itoigawa , “A new [10] A. Tuminaro, “A 400Mhz, 144Kb CMOS ROM macro contact programming ROM architecture for digital for an IBM S/390-class microprocessor,” Proc. IEEE signal processor,” Proc. IEEE Int’l Symp. VLSI Circuits, Int’l Conf. Computer Design, pp. 253-255 Oct. 1997. pp. 158-161, June 1998. [11] J. A. Davis et al., “Interconnect limits on gigascale [2] C. -R. Chang, J. -S. Wang, and C. -H. Yang, “Low- Integration (GSI) in the 21st century,” Proc. IEEE, vol. power and high-speed ROM modules for ASIC 89, no. 3, pp. 305-324, Mar. 2001. applications,” IEEE J. Solid-State Circuits, vol. 36, no. 10, pp. 1516-1523, Oct. 2001.

Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT’05) 1087-4852/05 $20.00 © 2005 IEEE