%*$ *OUFSOBUJPOBM%4ZTUFNT*OUFHSBUJPO$POGFSFODF %*$

High Bandwidth Memory (HBM) and High Bandwidth NAND (HBN) with the Bumpless TSV Technology

Koji Sakui1),2) Takayuki Ohba1) 1)Tokyo Institute of Technology, IIR, WOW Alliance 1)Tokyo Institute of Technology, IIR, WOW Alliance Yokohama, Japan [email protected] Yokohama, Japan 2)Honda Research Institute Japan Co., Ltd. [email protected] Wako-shi, Japan [email protected]

Abstract— This paper proposes a fundamental architecture for the High Bandwidth Memory (HBM) with the bumpless TSV for the Wafer-on-Wafer (WOW) technology. The bumpless interconnects technology can increase the number of TSVs per chip with fine pitch of TSVs, and reduce the impedance of the TSV interconnects with no bumps. Therefore, a further higher speed and higher density HBM can be realized. Also, the High Bandwidth NAND (HBN), which can read and program by plane instead of by line by using the bumpless TSV, has been proposed.

Keywords—wafer-on-wafer; bumpless; thinning; TSV; high density integration; HBM; HBN

I. INTRODUCTION TSVs with micro-bumps are conventionally used for the High Bandwidth Memory (HBM) [1-4]. However, there are Fig. 1 Wiring length comparison between the 2D and 3D. several issues using the micro-bumps. One major problem is that it should be difficult for even the HBM to catch up with the 2. Thin Die Thickness speed of the GPU or CPU. For example, the Pascal of is 1TB/s, so that four sets of the HBM with 256GB/s are forced Fig. 2 shows the die thickness comparison between the 2D to use for the operation of the Pascal. The GPU/CPU venders are and 3D. Because the bump is very large, the die thickness of at challenging their products speed up much more, like 2TB/s and least 30 μm is needed for the robust stiffness. As a result, the 4TB/s, for emphasizing the AI system, such as an advanced stacked chip pitch is something like 100 μm. If 16 chips are automated driving. However, the HBM speed should be getting stacked, the total height should be 1600 μm. On the contrary, almost saturated to 341GB/s [4]. thanks to the bumpless technology, the chip pitch becomes one tenths. Therefore, the total height of the 16 chip stacked case On the other hand, the 3D stacking, in combination with the would be only 160 μm. conventional 2D integration, has been studied extensively [5-8], because it is well understandable that the conventional two- dimensional (2D) scaling should be forced to face with a severe economic crisis due to the expensive lithography process and facilities required. However, reducing the TSV pitch and impedance has not been focused on in details. This paper is devoted to increasing the number of TSVs per chip with fine pitch of TSVs, and reducing the impedance of the TSV interconnects with no bumps [9-14]. As a result, a further higher speed and higher density HBM, and the High Bandwidth NAND (HBN) by using the bumpless TSV, can been proposed [15-16].

II. BENEFITS OF 3D STACKED MEMORIES WITH BUMPLESS TSV

1. Shorten Physical Wiring Length Fig. 1 illustrates the wiring length comparison between the Fig. 2 Die thickness comparison between the 2D and 3D. 2D and 3D. In the 3D layout, the -to-block length is 10 ~ 100 μm, while that of the 2D layout is 1 mm ~ 1 cm.

!*&&&

Authorized licensed use limited to: Fondren Library Rice University. Downloaded on May 17,2020 at 03:57:53 UTC from IEEE Xplore. Restrictions apply. %*$ *OUFSOBUJPOBM%4ZTUFNT*OUFHSBUJPO$POGFSFODF %*$

3. Low Impedance Interconnects temperature rising problem, which should be more significant The total length for the bumpless interconnect is 10 μm, according as the number of stacked dies increases over four dies. while that for the bump interconnect is 100 μm, as shown in Fig. 3. Therefore, the impedance of the bumpless TSV is drastically reduced to at least one tenths of that of the bump TSV.

Fig. 3 Comparison between bump and bumpless interconnects. Fig. 5 Thermal design and reduction.

4. Massive Parallelism Bumpless TSVs With respect to the bump TSV, the bump pitch is limited by the bump-to-bump short failure due to the stress migration, as shown in Fig. 4. Furthermore, the non-contact open failure, and Back-End-of-Line destruction would be occurred. Thereby, the TSV pitch cannot be tightened, so that the number of channels is limited. Conversely, the bumpless technology can tighten the TSV pitch, which increases the number of channels.

Fig. 6 Temperature rise simulation result.

III. COMPETITIVE 3D STRUCTURE

1. HBM: High Bandwidth Memory (RAM)

Fig. 4 Massive parallelism bumpless TSVs.

5. Low Heat Generation The bumpless technology can lower the interconnection thermal resistance, as shown in Fig. 5 [17]. The Cu TSVs penetrate all layers and act like a thermal “highway”. In the bumpless stacked system, the total can be small, so that

the density of the interconnect metal becomes high. Therefore, Fig. 7 HBM with bumpless TSVs. the total thermal resistance by the bumpless TSVs becomes very small. Fig. 6 shows a temperature rise simulation result. While The first target for the competitive 3D structure enables the the temperature rise of layer 0 of the micro-bump type is 20 8-die stacks with bumpless TSVs, as illustrated in Fig. 7. By degree C, that of the bumpless type is only 5.8 degree C, so that increasing the number of channels and lowering the TSV the bumpless type allows a sophisticated design that can reduce impedance, an ultra-high bandwidth of 1, 4, and 8 TB/s should the temperature rise into 30 %. As a result, the eight die stacked be achieved. HBM can be designed with this bumpless technology, and no modification in the refresh time of layer-by-layer is needed. Fig. 8 shows the roadmap for the HBM bandwidth. However, the conventional TSVs with micro-bumps have a Thanks to the parallelism enhancement due to the increase in the

Authorized licensed use limited to: Fondren Library Rice University. Downloaded on May 17,2020 at 03:57:53 UTC from IEEE Xplore. Restrictions apply. %*$ *OUFSOBUJPOBM%4ZTUFNT*OUFHSBUJPO$POGFSFODF %*$

number of I/O’s, the bandwidth of the bumpless HBM is Figs. 13 and 14 have proposed the HBN (High Bandwidth expected to be ever-increasing. With respect to the I/O power NAND) with multiple BL layers. The read and program speed consumption, the first target of the bumpless HBM is one of the conventional NAND Flash was limited by the number of thirtieths that of the current HBM2 [18], as shown in Fig. 9. Fig. the page buffers, because the data were read or 10 compares the HBM2 with the bump and the bumpless HBM programmed through one line of the page buffers. Thanks to the with respect to their data bandwidth and I/O buffer power, 3D NAND original architecture, a word line has expanded to a according as the I/O number. The bumpless HBM can word plate, so that the 3D NAND can read and program by plane drastically enhance the I/O parallelism, which should achieve instead of by line. the ultra-high speed of 8.5TB/s with the relatively low I/O buffer power of 5.4W.

Fig. 11 BiCS: presented at VLSI Technology ’07 [20].

Fig. 8 Data bandwidth roadmap. Fig. 9 I/O power efficiency.

Fig. 12 Future 3D stacked memory system [21].

Fig. 10 Data bandwidth and I/O buffer power comparison.

2. HBN: High Bandwidth NAND (ROM)~ Word Plate Access ~ In the conventional word line access scheme, if we would make the through-put double, double page buffer, or half bit line pitch, is needed [19]. Of course, additional page buffers in CUA, CMOS Under Array, and divided bitlines in sub-arrays might Fig. 13 HBN accessed by word plate with multiple BL layers. have a solution. But both the CUA area and the number of sub- arrays have limitations. At VLSI Technology ’07, Toshiba proposed an original 3D NAND at first, as shown in Fig. 11 [20]. Multiple CGs and Lower SGs should be merged in each plate for reducing the number of HV-driver transistors, as well as for tightening the pillar pitch. Right now, the poly silicon word plate is replaced by the damascene tungsten metal gate. But the basic 3D NAND structure of the merged multiple CGs and Lower SG is the same as the original one. In the past, a future 3D stacked memory design was issued as patents, as presented in Fig. 12 [21]. Thanks to increasing the number of TSVs, a part of a peripheral circuit of the first memory chip can be located in the second memory chip. An example shows the stacked DRAM in combination with the NAND . Fig. 14 Circuit diagram of HBN composed of two chips.

Authorized licensed use limited to: Fondren Library Rice University. Downloaded on May 17,2020 at 03:57:53 UTC from IEEE Xplore. Restrictions apply. %*$ *OUFSOBUJPOBM%4ZTUFNT*OUFHSBUJPO$POGFSFODF %*$

IV. CONCLUSIONS Bank Group Data Control,”in ISSCC Dig. Tech. Papers, pp. 208-209, Feb. 2018. WOW technology with bumpless interconnects using TSVs [5] J. U. Knickerbocker, G. S. Patel, P. S. Andry, C. K. Tsang, L. P. for three-dimensional stacking in wafer form has been Buchwalter, E. J. Sprogis, H. Gan, R. R. Horton, R. J. Polastre, S. L. described. An optimized thinning wafer thickness of 4 μm can Wright, and J. M. Cotte, “3-D Silicon Integration and Silicon Packaging increase the number of TSVs per chip with fine pitch of TSVs, Technology Using Silicon Through-Vias,” IEEE J. Solid-State Circuits and reduce the impedance of the TSV interconnects with no 41 [8] pp. 1718-1725, 2006. bumps. Therefore, a further higher speed and higher density [6] M. Koyanagi, T. Nakamura, Y. Yamada, H. Kikuchi, T. Fukushima, T. Tanaka, and H. Kurino, “Three-Dimensional Integration Technology HBM can be realized. Also, the High Bandwidth NAND (HBN), Based on Wafer Bonding With Vertical Buried Interconnections,” IEEE which can read and program by plane instead of by line by using Trans. Electron Devices 53 [11] pp. 2799-2808, 2006. the bumpless TSV, has been proposed. The HBM of RAM and [7] E. Beyne, P. De Moor, W. Ruythooren, R. Labie, A. Jourdain, H. Tilmans, the HBN of ROM are sister memories with the high bandwidth. D. S. Tezcan, P. Soussan, B. Swinnen, and R. Cartuyvels, “Through- silicon via and die stacking technologies for microsystems-integration,” According as the number of the stacked memory chips is in IEEE IEDM Tech. Dig., pp. 495-498, Dec. 2008. increased, the total memory density should be huge like an [8] J.-Q. Lu, “3-D Hyperintegration and Packaging Technologies for Micro- enterprise. Therefore, as an example, an AI robotic bee, which Nano Systems,” Proc. IEEE vol. 97 No. 1, pp. 18-30, Jan. 2009. has a CPU, ultra-small enterprise, HBM, HBN, and sensors, [9] T. Ohba, “Three-Dimensional (3D) Integration Technology,” doi: should be eventually realized in 50mm3 with 0.5mW for human 10.1149/1.3567707 ECS Trans. volume 34, issue 1, pp.1011-1016, 2011. assistant, as proposed in Fig. 15. [10] T. Ohba, N. Maeda, H. Kitada, K. Fujimoto, A. Kawai, K. Arai, K. Suzuki, and T. Nakamura, IEICE Trans Electron., Electron. Soc. J93-C [11] p.464, 2010 in Japanese. [11] N. Maeda, H. Kitada, K. Fujimoto, K. Suzuki, T. Nakamura, and T. Ohba, in Proc. Advanced Metallization Conference 2008, p. 501, 2009 [12] Y. S. Kim, A. Tsukune, N. Maeda, H. Kitada, A. Kawai, K. Arai, K. Fujimoto, K. Suzuki, Y. Mizushima, T. Nakamura, T. Ohba, T. Futatsugi, and M. Miyajima, “Ultra Thinning 300-mm Wafer down to 7-μm for 3D Wafer Integration on 45-nm Node CMOS using Strained Silicon and Cu/Low-k Interconnects,” in IEEE IEDM Tech. Dig., pp. 365-368, Dec. 2009. [13] T. Ohba, N. Maeda, H. Kitada, K. Fujimoto, K. Suzuki, T. Nakamura, A. Kawai, and K. Arai, “Thinned wafer multi-stack 3DI technology,” Microelectron. Eng. Vol. 87, 3, pp.485-490, Mar. 2010. [14] T. Ohba, Y. S. Kim, Y. Mizushima, N. Maeda, K. Fujimoto, and S. Kodama, “Review of wafer-level three-dimensional integration (3DI) using bumpless interconnects for tera-scale generation,” IEICE Electronics Express, Vol. 12, No. 7, pp. 1-14, Apr. 2015. [15] K. Sakui and T. Ohba, “Three-dimensional Integration (3DI) with Bumpless Interconnects for Tera-scale Generation,” in IEEE CICC Dig. Tech. Papers, 22-6, Apr. 2019. 3 Fig. 15 AI Robotic Bee (50mm , 0.5mW) for a human assistant. [16] K. Sakui and T. Ohba, “High Speed, Low Power, and Ultra-small Operating Platform with Three-dimensional Integration (3DI) by REFERENCES Bumpless interconnects,” in IEEE IMW Dig. Tech. Papers, pp.60-63, May 2019. [1] D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. [17] H. Ryoson, K. Fujimoto, and T. Ohba, “A design guide of thermal Kim, D. S. Kim, H. B. Park, J. W. Shin, J. H. Cho, K. H. Kwon, M. J. resistance down to 30% for 3D multi-stack devices,” in 2017 Kim, J. Lee, K. W. Park, B. Chung, and S. Hong, “A 1.2V 8Gb 8-Channel International Conference on Electronics Packaging (ICEP) Dig. Tech. 128GB/s High-Bandwidth Memory (HBM) Stacked DRAM with Papers, pp. 522-525, 2017. Effective Microbump I/O Test Methods Using 29nm Process and TSV,” in ISSCC Dig. Tech. Papers, pp. 432-433, Feb. 2014. [18] J. C. Lee, J. kim, K. W. Kim, Y. J. Ku, D. S. Kim, C. Jeong, T. S. Yun, H. Kim, H. S. Cho, S. Oh, H. S. Lee, K. H. Kwon, D. B. Lee, Y. J. Choi, [2] J. C. Lee, J. Kim, K. W. Kim, Y. J. Ku, D. S. Kim, C. Jeong, T. S. Yun, J. Lee, H. G. Kim, J. H. Chun, J. Oh, and S. H. Lee, “High Bandwidth H. Kim, H. S. Cho, Y. O. Kim, J. H. Kim, J. H. Kim, S. Oh, H. S. Lee, K. Memory (HBM) with TSV Technique,” in 2016 International SoC H. Kwon, D. B. Lee, Y. J. Choi, J. Lee, H. G. Kim, J. H. Chun, J. Oh, and Design Conference (ISOCC) Dig. Tech. Papers, pp. 181-182, 2016. S. H. Lee, “A 1.2V 64Gb 8-Channel 256GB/s HBM DRAM with Peripheral-Base-Die Architecture and Small-Swing Technique on Heavy [19] K.Imamiya, Y.Sugiura, H.Nakamura, T.Himeno, K.Takeuchi, Load Interface,” in ISSCC Dig. Tech. Papers, pp. 318-319, Feb. 2016. T.Ikehashi, K.Kanda, K.Hosono, R.Shirota, S.Aritome, K.Shimizu, K.Hatakeyama, and K.Sakui, “A 130mm2 256Mb NAND Flash with [3] K. Sohn, W.-J. Yun, R. Oh, C.-S. Oh, S.-Y. Seo, M.-S. Park, D.-H. Shin, Shallow Trench Isolation Technology,” in ISSCC Dig. Tech. Papers, W.-C. Jung, S.-H. Shin, J.-M. Ryu, H.-S. Yu, J.-H. Jung, K.-W. Nam, S.- pp.112-113, Feb. 1999. K. Choi, J.-W. Lee, U. Kang, Y.-S. Sohn, J.-H. Choi, C.-W. Kim, S.-J. Jang, and G.-Y. Jin, “A 1.2V 20nm 307GB/s HBM DRAM with At-Speed [20] H.Tanaka, M.Kido, K.Yahashi, M.Oomura, R.Katsumata, M.Kito, Wafer-Level I/O Test Scheme and Adoptive Refresh Considering Y.Fukuzumi, M.Sato, Y.Nagata, Y.Matsuoka, Y.Iwata, H.Aochi and Temperature Distribution,”in ISSCC Dig. Tech. Papers, pp. 316-317, Feb. A.Nitayama, “Bit Cost Scalable Technology with Punch and Plug Process 2016. for Ultra High Density Flash Memory,” in 2007 Symp. on VLSI Technology Dig., pp.14-15, Jun. 2007. [4] J. H. Cho, J. Kim, W. Y. Lee, D. U. Lee, T. K. Kim, H. B. Park, C. Jeong, M.-J. Park, S. G. Baek, S. Choi, B. K. Yoon, Y. J. Choi, K. Y. Lee, D. [21] K. Sakui, “ Device and Memory System,” US Shim, J. Oh, J. Kim, and S.-H. Lee, “A 1.2V 64Gb 341GB/s HBM2 6,594, 169 B2, Jul. 15, 2003. Stacked DRAM with Spiral Point-to-Point TSV Structure and Improved

Authorized licensed use limited to: Fondren Library Rice University. Downloaded on May 17,2020 at 03:57:53 UTC from IEEE Xplore. Restrictions apply.