ISSCC 2018

SESSION 20 Flash-Memory Solutions A 512Gb 3b/Cell on a 96-Word-Line-Layer Technology Hiroshi Maejima1, Kazushige Kanda1, Susumu Fujimura1, Teruo Takagiwa1, Susumu Ozawa1, Jumpei Sato1, Yoshihiko Shindo1, Manabu Sato1, Naoaki Kanagawa1, Junji Musha1, Satoshi Inoue1, Katsuaki Sakurai1, Naohito Morozumi1, Ryo Fukuda1, Yuui Shimizu1, Toshifumi Hashimoto1, Xu Li1, Yuuki Shimizu1, Kenichi Abe1, Tadashi Yasufuku1, Takatoshi Minamoto1, Hiroshi Yoshihara1, Takahiro Yamashita1, Kazuhiko Satou2, Takahiro Sugimoto1, Fumihiro Kono1, Mitsuhiro Abe1, Tomoharu Hashiguchi1, Masatsugu Kojima1, Yasuhiro Suematsu2, Takahiro Shimizu1, Akihiro Imamoto1, Naoki Kobayashi1, Makoto Miakashi1, Kouichirou Yamaguchi1, Sanad Bushnaq1, Hicham Haibi1, Masatsugu Ogawa1, Yusuke Ochi1, Kenro Kubota2, Taichi Wakui2, Dong He1, Weihan Wang1, Hiroe Minagawa1, Tomoko Nishiuchi1, Hao Nguyen3, Kwang-Ho Kim3, Ken Cheah3, Yee Koh3, Feng Lu3, Venky Ramachandra3, Srinivas Rajendra3, Steve Choi3, Keyur Payak3, Namas Raghunathan3, Spiros Georgakis3, Hiroshi Sugawara3, Seungpil Lee3, Takuya Futatsuyama1, Koji Hosono1, Noboru Shibata1, Toshiki Hisada1, Tetsuya Kaneko1, Hiroshi Nakamura1 1Toshiba Memory Corporation, Yokohama, Japan, 2Toshiba Memory Systems Corporation, Yokohama, Japan, 3Western Digital Corporation, Milpitas, CA

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 1 of 36 Outline • Introduction

• String based start bias control scheme(SSBC)

• Smart Vt tracking read(SVTR)

• Low pre-charge sense amp bus scheme(LPSAB)

• Conclusion

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 2 of 36 10 years from 1st proposed BiCS technology

2007 Bit Line(BL) 2017 Upper SG(SGD) 96-WL-Layer BiCS Control BL Gate(WL) BL SGD Lower SGD WL95 SG(SGS) WL3 WL94 WL2 Memory array WL1 WL0 WL1 SGS WL0 SGS Cell-source Cell-source ■ Multi-layered 3D-flash was reported as BiCS FLASH technology in 2007. - [1] H. Tanaka, et al., IEEE Symp. VLSI Technology, pp.14-15, Jun. 2007.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 3 of 36 Chip Architecture Capacity 512Gb (3bit/cell ) Cell Array Cell Array Technology 96-WL-Layer, BiCS (256Gb) (256Gb) Die size 86.1mm2 Plane-0 Plane-1 (bit density) ( 5.95Gb/mm2 ) (16KB + ECC) / Page, 1152 Pages / Block Row decoder Row decoder Organization 18MB/Block, (1822 + EXT) Blocks / Plane, 2 Planes S/A groups(16KB) S/A groups(16KB) Read(tR): 58µs (ABL) Throughput Prog: 57MB/s Peripheral Circuits & PADS I/O 533Mbps DDR, X8 Power Vcc : 2.3V to 3.6V Supply Vccq: 1.8V

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 4 of 36 3b/Cell 3D Flash Memory Comparison [2] J.W.Im, [3] D.Kang, [7] T. Tanaka, [4] R.Yamash [5] C.Kim, Our work ISSCC et al., 2015 et al., 2016 et al., 2016 ita, et al., 2017 et al., 2017 2018 32-WL- 48-WL- Floating Gate, 64-WL- 64-WL- 96-WL- Technology layer, layer, CMOS under layer, layer, layer, VNAND VNAND array BiCS VNAND BiCS Capacity 128Gb 256Gb 768Gb 512Gb 512Gb 512Gb

# of plane 2 2 4 2 2 2 Die size 68.9 97.6 179.2 132 128.5 86.1 [mm2] Bit density 1.86 2.62 4.29 3.88 3.98 5.95 [Gb/mm2] 53% up

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 5 of 36 Bit Density Trend(3b/cell 3D Flash Memory) 10 Micron/ Samsung 50% increase/ year 768Gb 64 layers 5.95 (CuA) 512Gb 3.98 [5] /WD Samsung 4.29 [7] 32 layers 96 layers 3.88 [4] 512Gb 128Gb (Our work) 2.62 [3] WD/Toshiba 64 layers Samsung 512Gb 1.86 [2] 48 layers Bit density [Gb/mm2] 256Gb 1 2015 2016 2017 2018 Year

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 6 of 36 Outline • Introduction

• String based start bias control scheme(SSBC)

• Smart Vt tracking read(SVTR)

• Low pre-charge sense amp bus scheme(LPSAB)

• Conclusion

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 7 of 36 String based start bias control scheme

• Motivation: Program time(tPROG) improvement

• Solution: String based start bias control scheme(SSBC) – tPROG improvement by reducing No. of program loops. – Suitable for 3D structure and 3D programming method.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 8 of 36 Conventional start bias control scheme [2D-NAND] 2b/cell case Start bias control (conventional) – [6] S.Lee, et al., ISSCC. 2016. Programming order ISPP(Incremental step-up program pulse) seq. Lower Lower Upper VPGM acquire optimal start VPGM page page -page program 3 6 WL2 (e.g., page-0) No. of loop 1 4 WL1 apply performance gain 0 2 WL0 Upper w/ start bias control -page VPGM 0,1,… ; page address program w/o start bias control (programming order) (e.g., page-2) No. of loop ■The optimal start VPGM for upper page programming is measured during lower page program, and is applied to upper page program for the same cell. Upper page programming time can be shortened.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 9 of 36 Program Method of 3D-FLASH [ 3D-FLASH ] 3b/cell case String based start bias control (Our work) Full Sequence (1-step) program Small Cell to Cell interference All page data is programmed simultaneously.

Er Initial Vt No points to acquire optimal start VPGM

VPGM WL95 ……. apply 3-page Er ABCDEFG program WL0 Vt No. of loop SGS ■ How to apply “start bias control scheme” Block for 3D-FLASH Full Sequence program?

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 10 of 36 3D FLASH Program speed feature

Memory Hole

FAST WL95 SLOW WL94 Vth VPGM=20V at WL0 WL No. ■ 3D Flash memory, program speed characteristic  Strong dependency on WL layers because the memory-hole size gradually varies from layer to layer.  Program speed of cells on common WL layers is almost the same.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 11 of 36 String based start bias control scheme(SSBC) [3D-FLASH] 3b/cell case L ; Lower page M; Middle page String0 String1 String2 U; Upper page program program program performance gain L/M/U L/M/U L/M/U L/M/U apply acquire w/ this 24-26 27-29 30-32 33-35 scheme

WL2 VPGM [V] apply 12-14 15-17 18-20 21-23 w/o this WL1 scheme 0-2 6-8 9-11 WL0 3-5 String0 String1 String2 String3 programming order No. of loop ■ Optimal start VPGM for string 1-3 is acquired during string 0 programming. ■ Optimal start VPGM is applied for same WL layer cells in the block.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 12 of 36 Measured Data Measured tPROG with SSBC Measured Vt distribution with SSBC 7% improvement ( string0-3 average ) w/o scheme w/ scheme 9% improvement w/o scheme ( string1-3 average ) w/ scheme log Bit Count [a.u.]

tPROG [a.u.] tPROG Vth [a.u.]

string-00123string-1 string-2 string-3 ■ String based start bias control achieved a 7% shorter program time.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 13 of 36 Outline • Introduction

• String based start bias control scheme(SSBC)

• Smart Vt tracking read(SVTR)

• Low pre-charge sense amp bus scheme(LPSAB)

• Conclusion

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 14 of 36 Smart Vt tracking read

• Motivation: Retry read performance improvement

• Solution: Smart Vt tracking read(SVTR) – High state option was implemented to reduce the number of tracking cycles. – Read latency was improved by supporting Program suspend operation.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 15 of 36 Background

3bit/cell Normal Read Retry Read ( Middle-page read case ) If SYSTEM execute Vt tracking, it takes “ms” order. Data If “ECC” fail Read Bit count (1st) V out B.c “MP3” state “Conventional” “MP2” state Vt “MP1” Optimal Read Level state Selected WL t Read Data V out x ”N”

t

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 16 of 36 Smart Vt tracking read (SVTR)

3bit/cell Normal Read “Normal mode" ( Middle-page read case ) SVTR Data V out Read Data If “ECC” fail “MP2” out V “MP1” state “MP3” “MP3” state state state “MP2” “Our work” state t “MP1” “High state option" state Selected WL SVTR Data t V out Improved to “MP3” “300us” state (3-page avg.) t

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 17 of 36 Smart Vt tracking read (SVTR)

3bit/cell Normal Read “Normal mode" ( Middle-page read case ) SVTR Data V out Read Data If “ECC” fail “MP2” out V “MP1” state “MP3” “MP3” state state state “MP2” “Our work” state t “MP1” “High state option" state Selected WL SVTR Data t V out Improved to “MP3” “300us” state (3-page avg.) t

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 18 of 36 “Normal mode” waveform SVTR “Normal mode”( Middle-page read case ) Vt tracking read (VTR) Calibrating read (CALR) Data out SBL read ABL read “MP1” “MP2” “MP3” “MP1” “MP2” “MP3” V state state state state state state

N stage read in one state Selected WL t Bit count B.c Bit count (1st)

Vt Optimal Read Level

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 19 of 36 ABL(All BL) read and SBL(Shielded BL) read Sense Amps Active BL ABL read Even Odd Even ALL BLs(Full page) are active. Cells

SBL read Sense Amps Active BL Shielding BLs Empty Odd S/A latch Half of the BLs (Half page) are active, the others are shielding BLs. Cells ■ SBL suppress BL coupling noise, realizing 20% shorter read time – [4] R.Yamashita, et al., ISSCC. 2016. ■ SBL reduces the number of S/A latches in use by half in comparison with ABL.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 20 of 36 Smart Vt tracking read (SVTR)

3bit/cell Normal Read “Normal mode" ( Middle-page read case ) SVTR Data V out Read Data If “ECC” fail “MP2” out V “MP1” state “MP3” “MP3” state state state “MP2” “Our work” state t “MP1” “High state option" state Selected WL SVTR Data t V out Improved to “MP3” “300us” state (3-page avg.) t

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 21 of 36 “High state option” concept ∆Vt Er ABCDEF G Vt shift Vt correlation retention formula ∆Vt

Er ABCDE F G Vt Vt “MP1” “MP2” “MP3” state state state ■ Vt shift of each cell due to data retention is correlated with its level. ■ Vt shift of lower states can be determined from the measured Vt shifts of higher states by utilizing Vt shift correlation formula. An optimal read voltage of the lower states calibrating read is calculated using the high-state Vt tracking read result.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 22 of 36 “High state option” waveform SVTR “High state(speed) option” ( Middle-page read case ) VTR CALR Data out SBL read ABL read “MP3” “MP1” “MP2” “MP3” state state state state V

Selected WL

t Bit count ■ VTR is carried out only on the highest state. In the middle-page read case, CALR level for 3 states are determined from MP3-state VTR result, thus tracking time is reduced to 1/3. ■ The read time achieves “300us” (3-page average).

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 23 of 36 Smart Vt tracking read

• Motivation: Retry read performance improvement

• Solution: Smart Vt tracking read(SVTR) – High state(speed) option was implemented to reduce the number of tracking cycles. – Read latency was improved by supporting Program suspend operation.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 24 of 36 Program suspend with SVTR wo/ Program Suspend Wait time ( max 1.85ms ) Program Read Data time Command (SVTR) out

Internal Program SVTR Operation end Implemented Wait time w/ Program Suspend ( 50us ) Program Suspend Read Data Resume time Command (SVTR) out

Internal Program SVTR Program Operation suspend end ■ SVTR Read latency at Program is improved to 50us.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 25 of 36 Data Latch operation in program suspend SVTR

S/A Data (Even I/O) BL latch Control and Bit-Line Sense part

S/A S/A Data data data Data (Odd I/O) BL latches latch Control and Bit-Line Sense part

Even Odd

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 26 of 36 Data Latch operation in program suspend SVTR

1. Program 2. Data 3. Vt tracking 4. Data 5. Calibrating suspend transfer read transfer read ( Even  Odd ) ( Odd  Even ) Prog. Vt Prog. Calibrated * **data trac data * * read data (E) king (E) read S/A * data Data latches Prog. Prog. Prog. Prog. Prog. Prog. Prog. Prog. Prog. Prog. data data data data data data data data data data (E) (O) (E) (O) (E) (O) (E) (O) (E) (O)

Even Odd Even Odd Even Odd Even Odd Even Odd

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 27 of 36 Outline • Introduction

• String based start bias control scheme(SSBC)

• Smart Vt tracking read(SVTR)

• Low pre-charge sense amp bus scheme(LPSAB)

• Conclusion

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 28 of 36 Low pre-charge sense amp bus scheme

• Motivation: Improvement of the data transfer time between the sense amplifier and the cache data.

• Solution: Low pre-charge sense amp bus scheme(LPSAB) – The data throughput is doubled without increasing Peak current.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 29 of 36 Background Long Data Bus (DBUS) 16BLs BL Hookup

BL direction 16 S/A groups DBUS about 900um S/A groups(16KB) S/A groups(16KB) 16 Cache data latches(XDL) Icc waveform of Data transfer via DBUS Conventional ( with single DBUS /16BLs pitch ) Our work ( with twin DBUS /16BLs pitch ) I[mA] I[mA] 1cycle 16 cycle 8 cycle time time

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 30 of 36 S/A group and Cache-DL(XDL) structure 8 S/A structure: data latches and bus system VPC Generator IVPC S/A 8 S/As S/AS/AS/AS/A SADL S/AS/AS/AS/A BL VPC Control and V Sense TH DPC ATI Replica transistor DSW part of DPC Tr. DBUS DBUS LBUS ENB VPC=1.0V+VTH VTG=0.5V+V S/AS/AS/AS/A XDL TH XDLS/AS/AS/A 8XDLs XTI DBUS XBUS

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 31 of 36 Low pre-charge S/A bus scheme (LPSAB) DBUS pre-charge DBUS waveform [V] 2.0 pre-charger Conventional Step.1 Pre-charge 1.5 scheme 1.0 VPC DPC our work DBUS Pre-charge level 0.5 V drop TH = VPC –VTH= 1.0V t [a.u] DBUS 1cycle “OFF” DSW XTI “OFF” ATI ”1”reset ”0”data ”1”data XDL (data receiver latch) SADL (data sender latch)

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 32 of 36 Low pre-charge S/A bus scheme (Cont’d) Data send 1 cycle (SADLXDL) Data receive Step.1Step.2 Step.3 pre-charger DPC VPC “OFF” DPC DSW VTG VSS if “0”data XTI VTG 1V if “1”data

DBUS VPC=1.0V+VTH DSW VTG Step2. Step.3. XTI VTG@Step3 VTG=0.5V+VTH ATI Send Receive data data ”0”data ”1”data XDL (data receiver latch) SADL (data sender latch)

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 33 of 36 The advantage of LPSAB Data transfer time and ICC waveform Read latency (tR) improvement Conventional ( with single DBUS /16BLs pitch ) Normal read I[mA] 1cycle 16 cycle Recovery Operation time WLs LPSAB scheme ( with single DBUS /16BLs pitch ) I[mA] Sense time Data transfer 3.5us gain (S/A Cache) LPSAB scheme ( with twin DBUS /16BLs pitch ) tR I[mA] 8 cycle RnB time ■ Data throughput is doubled without increasing peak current. ■ tR is improved 3.5us by LPSAB.

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 34 of 36 Outline • Introduction

• String based start bias control scheme(SSBC)

• Smart Vt tracking read(SVTR)

• Low pre-charge sense amp bus scheme(LPSAB)

• Conclusion

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 35 of 36 Conclusion • A 512Gb 3b/Cell flash memory on a 96-WL-layer BiCS technology has been developed successfully.

– 86.1mm2 die size and 5.95Gb/mm2 bit density was achieved.

– String based start bias control scheme achieved a 7% shorter program time.

– Smart Vt-tracking read improves retry read performance; 1. 300us Read time by minimizing the tracking time. 2. Read latency improved by supporting a program suspend read.

– Low-pre-charge sense-amp bus scheme reduces the data-transfer time between the sense amplifier and the cache data by half (=3.5us gain).

© 2018 IEEE 20.1: A 512Gb3b/Cell 3D Flash Memory on a 96-Word-Line-Layer Technology International Solid-State Circuits Conference 36 of 36 A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time

Wooseong Cheong, Chanho Yoon, Seonghoon Woo, Kyuwook Han, Daehyun Kim, Chulseung Lee, Youra Choi, Shine Kim, Dongku Kang, Geunyeong Yu, Jaehong Kim, Jaechun Park, Ki-Whan Song, Ki-Tae Park, Sangyeun Cho, Hwaseok Oh, Daniel DG LEE, Jin-Hyeok Choi, Jaeheon Jeong

Samsung Electronics, Korea

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 1of 25 Outline

• Introduction – Motivation – Definition of Latency – Main Contribution • Proposed Solutions – Controller Architecture – Split DMA Scheme – Suspend / Resume DMA Scheme • Experimental Results – Controller & Z-SSD Implementation – Random Read Performance – Benchmark Performance • Conclusions

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 2of 25 Outline

• Introduction – Motivation – Definition of Latency – Main Contribution • Proposed Solutions – Controller Architecture – Split DMA Scheme – Suspend / Resume DMA Scheme • Experimental Results – Controller & Z-SSD Implementation – Random Read Performance – Benchmark Performance • Conclusions

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 3of 25 Motivation

• Need for SCM class storage device  Ultra low latency SSD solution with NAND flash ( Z-NAND )

[ Memory Hierarchy ] [ Z-NAND ]

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 4of 25 Motivation

• Three Key Components of Ultra Low Latency SSD

3rd Gen. 4th Gen. Z-NANDTM V-NANDTM V-NANDTM 3D NAND 3D NAND 3D NAND + + Technology Controller S/W 48 stacked 64 stacked 48 stacked

tR 45µs 60µs 3µs

tPROG 660µs 700µs 100µs

Capacity 256Gb 512Gb 64Gb Controller Architecture Proposal for ultra low latency SSD Solution Page Size 16kB/Page 16kB/Page 2kB/Page [ Characteristics of NAND ]

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 5of 25 Definition of Latency

• Latency  Total consuming time from host command (①) to data transfer (⑥)

SSD Controller Logical NAND Command Information Command ② ③ Host ① System Media Host Media Media - NANDManagerMedia Flash - Manager Manager ManagerMedia ⑥ ⑤ ④ Manager DATA DATA check DATA

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 6of 25 Definition of Latency

• Total Latency

 Read latency = t Host + t Controller + t Media + t Transfer

Component Description Conventional SSD

t Host Time from host issuing command to controller’s recognition 5~ 10

t Controller Time from decoding command to starting media operation 18 ~ 22

t Media Time to read data of 4kB from media and transfer it to controller 53 ~ 68

t Transfer Time to transfer data of 4kB from controller to host 3 ~ 7

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 7of 25 Main Contribution

• Components of t Media •a  t = t + t Media R DMA * Conventional SSD : Samsung PM963

Conventional Proposed Component Description SSD SSD Time to transfer data of 4kB from cell array to t 45 3 R register in media DMA time to transfer read data from media to t 8 4 DMA controller

Main idea of this paper How to reduce t DMA

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 8of 25 Main Contribution

• Improvement Goal Setting

t Media t Host/Transfer t Controller t R t DMA

Conventional 8.5 15 45 8 SSD

by Z-NAND Total Read Latency : 76.5

Proposed 5.5 3 by co-operation of SSD 3.4 4 Z-NAND and Controller

Total Read Latency : 15.9 * Conventional SSD : Samsung PM963

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 9of 25 Outline

• Introduction – Motivation – Definition of Latency – Main Contribution • Proposed Solutions – Controller Architecture – Split DMA Scheme – Suspend / Resume DMA Scheme • Experimental Results – Controller & Z-SSD Implementation – Random Read Performance – Benchmark Performance • Conclusions

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 10 of 25 Controller Architecture

• Block Diagram 0 Host Media Manager Manager 1

HW Engine HW Engine 2 PCIe BUS NVMe 3

4 FCPU 5 HCPU FTL 6 Host FW 7

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 11 of 25 Methods of Reducing Data Transfer Time

Signal Implementation Power Size Software Integrity Risk Consumption Cost supportability Cost

IO Frequency Low High High High Low

IO Width High Medium Medium Medium Medium

Split DMA Medium Medium Low Low Medium Scheme

* Lower is better

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 12 of 25 Split DMA Scheme

tR CH0 tDMA (4kB) NAND 2kB CH0 8µs Management Engine 2kB CH2 DRAM LPN0 PPN0 Split DMA 2kB CH4 Split DMA LPN1 PPN1 2kB CH6 LPN2 PPN2 2kB CH1 ︙ LPNx PPNx 2kB CH3 t CH0 R t (2kB) 2kB CH5 t DMA R Gain 2kB CH7 CH1 tDMA (2kB) 4µs

* LPN : Logical Page Number / PPN : Physical Page Number

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 13 of 25 Challenge of Split DMA scheme

• Bad Block Management / Channel Data Skew

CH0 BLK0 CPU PBA P BLK1 Flash H PPN Controller BLK2 Decoder Y BLK3 Channel LPN Splitter Translator PBA Block PBA’ P CH1

Re-Configurator Flash BLK0 H Remap Controller Checker Y BLK1 BLK2 Bad Block Split DMA Management Engine BLK3 Remap Block * PBA : Physical Block Address

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 14 of 25 Suspend / Resume DMA

* QoS : Quality of Service • Read QoS Improvement @ Mixed Read/Write

t R CMD Issue

Way 0 DMA for 16kB program (24µs)

Way 1 Ready t R CMD t R ( 3µs ) Data Out

Suspend / Resume DMA

tR CMD Issue

Way 0 Resume DMA for remain program data

Way 1 tR CMD t R ( 3µs ) Ready Data Out Gain

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 15 of 25 Outline

• Introduction – Motivation – Definition of Latency – Main Contribution • Proposed Solutions – Controller Architecture – Split DMA Scheme – Suspend / Resume DMA Scheme • Experimental Results – Controller & Z-SSD Implementation – Random Read Performance – Benchmark Performance • Conclusions

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 16 of 25 Controller Implementation

PCIe PHY

Logic SRAM  Single Port PCIe Gen3 4-Lane  NVMe v1.2 NAND IO  NAND 8CH  Samsung Foundry FinFET Process

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 17 of 25 Z-SSD Implementation Z-SSD Z-SSD

Controller

Capacity 800GB Form factor HHHL ( Half-height Half-length ) Host Interface Single PCIe Gen3 x 4 Lanes Spec compliance NVMe v1.2, PCIe Express v3.0 DRAM LPDDR4 1.5GB

NAND Samsung Z-NAND [ Dell Power Edge R730 server w/ Z-SSD ]

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 18 of 25 QD1 Random Read NVMe Trace

Host Device Packet #124 ① ① ① 1.664μs ②③ ④ 0.590μs ② ②③ ④ 9.908μs ③ ⑤ 2.067μs ⑤ ④ ⑥ 0.157μs

⑤ ⑦ 1.200μs t ①/⑦ 2.864μs ⑥ Host

t Device ②~⑥ 12.722μs

⑦ t Total 15.586μs ⑥ ⑦ Performance 64.2KIOPS

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 19 of 25 4KB Random Read Latency Distribution

[ QD 1 case ]

Z-SSD

PRAM based SSD 1 10 100 1000 [μs] Conventional CTRL [ QD 16 case ] with Z-NAND

Conventional CTRL with Conventional NAND

1 10 100 1000 [μs]

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 20 of 25 Benchmark Performance ( Client )

- PCMark 8 -

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 21 of 25 Benchmark Performance ( DC/Enterprise )

DBMS (PostgreSQL+TPC-C) Key Value DB (Rocks DB)

C PZ CPZ C PZ Higher is Better C PZ Lower is Better

Cache (Fatcached+twemperf) Real Time Analytics (Pagerank) Conventional SSD

PRAM based SSD Higher is Better C C PZ Lower is Better PZ Z-SSD

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 22 of 25 Demo Session

Latency(μs) * Latency : Lower is better.

Time(Sec) [ Persistent Cache on Rocks DB ] [ Latency Comparison @ Rocks DB ]

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 23 of 25 Demo Session

• Real Time Read Latency Benchmark by FIO visualizer •a  Case1  Core 1 / QD 1 with host delay for latency check  Case2  Core 1 / QD 1 without host delay for latency check  Case3  Core 2 / QD 64 each for random read IOPS check

[ Example Screenshot of Case 3 ] © 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 24 of 25 Conclusions

• Controller Design for Ultra Low Latency SSD  Achieve average 4kB read latency of 15 with Z-NAND

 Overall t Media : 53  7 • Experimental Result  4kB Random Read distribution @QD16 is improved  39% compared to conventional controller + Z-NAND  20% compared to PRAM based SSD • Performance improvement at various applications  For client : PCMark 8  For DC/Enterprise : DBMS, Memcache, In-memory DB, Real time Analytics

© 2018 IEEE International Solid-State Circuits Conference 20.2 : A Flash Memory Controller for 15 Ultra-Low- Latency SSD Using High-Speed 3D NAND Flash with 3 Read Time 25 of 25 A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s program throughput

Seungjae Lee, Chulbum Kim, Minsu Kim, Sung-min Joe, Joonsuc Jang, Seungbum Kim, Kangbin Lee, Jisu Kim, Jiyoon Park, Han-Jun Lee, Minseok Kim, Seonyong Lee, SeonGeon Lee, Jinbae Bang, Dongjin Shin, Hwajun Jang, Deokwoo Lee, Nahyun Kim, Jonghoo Jo, Jonghoon Park, Sohyun Park, Youngsik Rho, Yongha Park, Ho-joon Kim, Cheon An Lee, Chungho Yu, Youngsun Min, Moosung Kim, Kyungmin Kim, Seunghyun Moon, Hyunjin Kim, Youngdon Choi, YoungHwan Ryu, Jinwon Choi, Minyeong Lee, Jungkwan Kim, Gyo Soo Choo, Jeong-Don Lim, Dae-Seok Byeon, Kiwhan Song, Ki-Tae Park, Kye-hyun Kyung

Samsung Electronics, Korea

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 1of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry scheme • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 2of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry scheme • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 3of 32 Introduction

Performance Gap (SSD vs. HDD) High Huge gap > x1000 QLC SSD Real Time Analytic Performance Z-SSD

Random Read [IOPs] Centric [Source : Flash Memory Summit 2015. ] HDD QLC TLC SSD SSD Capacity NVMe SSD Driven 50% Performance & Performance SAS SSD Cost Gap > 20% TCO [$/GB]

Blu-ray Near line HDD Cost gap is reducing 10K/15K TLC QLC Near line Low HDD SSD SSD* HDD Low $/Bit High

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 4of 32 Introduction Areal density increases by QLC tech. in the same 3D-NAND technology node

10.00 Samsung This Work Samsung 64 stacked WLs 48 stacked WLs 512Gb Samsung 256Gb 24 stacked WLs

Log scale) 128Gb TLC QLC 2, WL Stacking (Same # of WL stack) WL Stacking (4864) 1.00 (2448)

Toshiba/SanDisk Increase 43nm, 64Gb up to 2100% 2D QLC 3D TLC

Areal density (Gb/mm Areal density All data is based on ISSCC 0.10 2009 2015 2016 2017 2018

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 5of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry scheme • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 6of 32 Chip Architecture 181.9mm2

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 7of 32 Key Parameters

ISSCC09* ISSCC17** This Work Bits per cell 4 3 4 Density 64Gb 512Gb 1Tb Chip size 244.45mm2 128.5mm2 41.5% 181.9mm2 Areal density 0.26Gb/mm2 3.98Gb/mm2 5.63Gb/mm2 2D NAND 3D NAND 3D NAND Technology 43nm CMOS 64 stacked WL 64 stacked WL 8KB / Page 16KB / Page 16KB / Page Organization 256 Pages / Block 768 Pages / Block 1024 Pages / Block 2 Planes 2 Planes 2 Planes Program performance 5.6MB/s 51MB/s 12MB/s

*C. Trinh, et al., “A 5.6MB/s 64Gb 4b/Cell NAND Flash memory in 43nm CMOS,” ISSCC Dig. Tech. Papers, pp. 246-247, Feb. 2009 **Chulbum Kim, et al., “A 512Gb 3b/cell 64-stacked WL 3D V-NAND flash memory ,” ISSCC Dig. Tech. Papers, pp. 202-203, Feb. 2017

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 8of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry scheme • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 9of 32 Challenges Solutions for improving program performance and reliability are required

TLC

The # of states x2

QLC

 Increasing the number of states  Program performance & cell reliability get worse

 Limited Vth Window  Read-window margin decreases

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 10 of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry scheme • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 11 of 32 Program disturbance in unselected strings

Unselected strings suffer from program disturbance due to low VCH Program BL = GND Inhibit BL = VDD

Sel. SSL Unsel. SSL

Unselected SSLs = GND Program BL VCH=GND VCH=Floating Selected SSL= VDD + Vth

Inhibit BL VCH=VDD VCH=Floating

VPGM VPGM VPGM

A < B A B

VDD + VBoost Unknown + VBoost

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 12 of 32 Conventional operation The channel potential of inhibit strings is lower and unsettled by coupling @ t1 t0 t1 t2 Inhibit BL Program BL Inhibit BL (VDD) (GND) Program BL

Unsel. SSL Selected SSL (GND) Unselected SSL Sel. SSL (VDD + Vth) VPGM Vpass

VDD + VBoost Floating Floating VDD Unknown + VBoost Unknown VDD (Floating) © 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 13 of 32 Conventional USP operation All inhibit strings are pre-charged (VDD) during USP operation t0 t1 t2 t3 @ t1 tUSP Inhibit BL Program BL Inhibit BL (VDD) (VDD) Program BL

Unsel. SSL Selected SSL (VDD + Vth) Sel. SSL Unselected SSL (VDD + Vth) VPGM Vpass

VDD + VBoost VDD VDD VDD

[6] R. Yamashita, et al., ISSCC. 2017.

© 2018 IEEE VDD 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 14 of 32 Fast USP operation All inhibit strings are pre-charged (0V/VDD) during fast USP without increasing tPROG t0 t1 t2 t3 @ t1 tUSP Inhibit BL Program BL Inhibit BL (VDD) (GND) Program BL

Unsel. SSL Selected SSL (VDD + Vth) Sel. SSL Unselected SSL (VDD + Vth) VPGM Vpass

VDD + VBoost VDD 0V VDD VBoost 0V VDD © 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 15 of 32 Comparison of Unselected String Pre-charge (USP) operations

No USP t1 Sel. SSL BL Unsel. SSL VCH = Floating SSL Inhibit BL Program BL Fast USP t1

BL t1 t2 VCH = GND or VDD SSL Fast USP

Conventional USP t2

BL VCH = VDD Conv. SSL USP

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 16 of 32 Adaptive Unselected String Pre-charge (USP) operation tPROG can be reduced by ~ 8% with the adaptive USP scheme Fast USP Conv. USP Fast USP Conv. USP (VPGM ≦ A) (VPGM > A) PGM Disturbance A PGM Voltage

Nearly same  In early program loops, PGM disturbance is sufficiently overcome by using the fast USP  Re-program scheme & increasing the number of state  Number of PGM loop ↑  tPROG decreases more effectively © 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 17 of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry scheme • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 18 of 32 Two degradation factors in 3D NAND

Initial Charge Loss  Vth decreases WL interference Vth increases Oxide WL Oxide Poly-Si Poly-Si Trap layer Trap Trap layer Trap

① Vertical tunneling WLn+1

WL Space region ② Lateral spreading

③ Recombination WLn

Trapped electron after WLn PGM

Trapped electron after WLn+1 PGM

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 19 of 32 Re-Program Scheme Coarse PGM of WL Electrons at normal trap Electrons at shallow trap Electrons at WL space trap n After coarse Coarse PGM of WLn Fine PGM of WLn PGM of WLn+1 Oxide Oxide Oxide

WL Poly-Si Poly-Si n+2 Poly-Si Trap layer Trap Trap layer Trap Trap layer Trap After coarse PGM of WLn+1

Coarse WL n+1 PGM WL Interference

Coarse Fine WL n PGM PGM PGM Initial WL Interference disturbance charge loss Initial Fine PGM of WLn charge loss WLn-1

Re-program scheme is adopted to compensate the degradation factors

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 20 of 32 Measurement results

Re-program scheme can realize the narrow Vth distribution in a limited Vth window BER decreases by 84% as using the proposed schemes

w/o schemes w/ schemes

84% No. of Cells of Cells No. [A.U] Normalized BER

Vth [A.U] w/o schemes w/ schemes

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 21 of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 22 of 32 Degradation due to slow bits Program disturbance & tPROG increase due to slow bits

P15 state Erase state

Slow bits No. of Cells

Vth

 The number of overall programming loops is limited by only a few slow bits  Number of programming loops and maximum programming voltage increase

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 23 of 32 Slow bit bypass scheme tPROG reduces 1.74% with the scheme

# of Ref. bits : A

# of Ref. bits # of Ref. bits # of Ref. bits = B = A = A PGM Voltage

Performance After P14 Pass gain  Number of P15 reference bits can be set larger than other states  The number of programming loops & program disturbance are reduced

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 24 of 32 Measured Vth distribution

Vth window is expanded by 4% with the scheme

w/o scheme w/ scheme No. of Cells [A.U]

Vth [A.U]

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 25 of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry scheme • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 26 of 32 Retention issue th

V Initial

Initial Retention P15 Lower

P14 Upper P15 Lower Default Read level @Initial @Initial

P14 Upper A new read level is required ! Retention P14 Upper P15 Lower @Retention @Retention Log (tRET)

 Upper & lower Vth distributions of program states dramatically shift in proportion to the retention time  We need a new read scheme to change the read level adaptively

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 27 of 32 Fast read retry scheme

Initial Retention

# of Error bits ∝Charge loss due to retention Error bits by retention # of Error bits is criterion to correct the read level

VDET_RET VDET

 Sensing with two different voltages (VDET_RET, VDEF)  Timing overhead is inevitable due to reading with two read levels

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 28 of 32 Fast read retry scheme (Cont’d) Conventional Fast read retry

tSen′ tSen VDET_RET V tSen V tSen DET DET VDET VDET_RET

Sel. WL

Vth Off Off Sensing Trip Trip Level Level node On On

tSen tSen tSen′ tSen

 Sensing with two different voltages (VDET_RET, VDEF)  Sensing with two different sensing times (tSen, tSen′)  Timing overhead for reading is negligible

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 29 of 32 Flow chart Fast read retry can extend the lifetime of a NAND Flash with minimizing time overhead N : Counted bits Read C NTH : Reference NC LUT : Look up table

Count error bits

(NC)

 NC is extracted by read & page buffer operation  N ≦ N N >N C TH C TH  Read levels are unchanged & data out Yes No  NC > NTH  Charge loss read errors have been detected Change read level  Read levels are changed according to the predefined Data out based on LUT look-up table (LUT)

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 30 of 32 Outline

• Introduction • Chip architecture & Key parameters • Challenges • Proposed solutions – Adaptive Unselected String Pre-charge (USP) Operation – Re-program scheme – Slow bit by pass scheme – Fast read retry scheme • Conclusion

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 31 of 32 Conclusion

1Tb 4b/cell 3D NAND FLASH Memory • Die size of 181.9mm2 (5.63Gb/mm2), 12MB/s for write • Re-program scheme achieved a 84% reduced BER • Adaptive USP & Slow by pass schemes supported to improve program performance and cell reliability • Fast read retry scheme expanded read-window margin with minimizing a time overhead

© 2018 IEEE 20.3: A 1Tb 4b/Cell 64-Stacked-WL 3D NAND Flash Memory with 12MB/s Program Throughput International Solid-State Circuits Conference 32 of 32