Design and Modeling of Nonvolatile Memories by Resistive Switching Elements

Home , Racetrack memory

A Dissertation Presented

Pilin Junsangsri

The Department of Electrical and Computer Engineering

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in the field of

Computer Engineering

Northeastern University Boston, Massachusetts

December 2017

ABSTRACT

With the continued scaling in the nano ranges, the technology roadmap predicted by

Moore’s Law is becoming difficult to meet. So-called emerging technologies have been widely reported to supersede or complement CMOS. This type of design style is commonly referred to as

“hybrid” because it exploits different characteristics of emerging technologies. This is very attractive for memories in which the modular (cell-based) organization of these systems is well suited to new technologies and innovative paradigms for design. This research presents new hybrid memory design which employ emerging technologies; such as memristor, phase change memory

(PCM), programmable metallization cell (PMC), and racetrack memory (RM); and CMOS. By introduced new HSPICE macromodel of these emerging technologies and their memory applications such as the nonvolatile memory cell, CAM, TCAM, NVSRAM, and crossbar array, hybrid nonvolatile memory cells are generated. With its nonvolatile storage element, fast switching time, low power consumption, and good scalability, the hybrid memory cell of emerging technologies and CMOS would be one of the most promising candidates for the next generation of the nonvolatile memory.

iii

ACKNOWLEDGEMENTS

I would like to thank many people who have supported me. Thank you Prof. Fabrizio

Lombardi, my research advisor, for his guidance and support. It has been an honor to be your student. Your advice has been invaluable to me. Thank you my dissertation committees, Prof.

Carmine Vittoria and Prof. Matteo Rinaldi, for their time, interest, and suggestion in my dissertation. Thank you Prof. Amir Farhat, my teaching assistance supervisor, for his guidance and suggestion in many things. Thank you faculties and staffs at the department of Electrical and

Computer Engineering, Northeastern University for their great support. I had a great experience at this university.

Moreover I would like to thank my family and friends especially my aunt, Ms. Patchara

Duangpatra, for their encouragement and support throughout my life. I wouldn’t have made it without them.

TABLE OF CONTENTS

Page

ABSTRACT……………………………………………………………………………… ii

ACKNOWLEDGEMENTS………………………………………………………………iii

TABLE OF CONTENTS………………………………………………………………... iv

LIST OF FIGURES……………………………………………………………………… vi

LIST OF TABLES…………………………………………………………………..... xviii

1. INTRODUCTION……………………………………………………………………. 1 1.1. Overview…………………………………………………………………………. 1 1.2. Types of Memory...………………………………………………………………. 4 1.2.1. Random Access Memory………………………………………………... 5 1.2.2. Flash Memory…………………………………………………………..... 6 1.2.3. Content Addressable Memory………………………………………….. 10 1.3. Emerging Technology…………………………………..………………………. 13 1.3.1. Ambipolar Transistor…………………………………………………... 13 1.3.2. Magnetic Tunnel Junction (MTJ)………………………………………. 15 1.4. Hybrid Memory…….……………………………………………...…………… 16 1.4.1. Memristor-based Content Addressable Memory (MCAM)……………. 17 1.4.2. MTJs-based Nonvolatile SRAM……………………………………….. 19 1.4.3. MTJs-based Logic in Memory…………………………………………. 20 1.5.Conclusion………………………………………………………………….…… 26

2. MEMRISTOR……………………………………………………………………….. 27 2.1. Introduction………………………………………….………………………...... 27 2.2. Fundamental of Memristor……………………………………………………... 27 2.3. Applications of Memristor……………………………………………………… 30 2.3.1. Memristor-Based Nonvolatile Memory Cell…………………………… 30 2.3.2. Memristor-Based Ternary Content Addressable Memory (TCAM) cell. 48 2.4. Conclusion………………………………….…………………………………... 68

3. PROGRAMMABLE METALLIZATION CELL (PMC)…………………………... 70 3.1. Introduction………………………………………….………………………...... 70 3.2. Fundamental of Programmable Metallization Cell (PMC)……………………... 70 3.3. Macromodel of PMC…………………………………………………………… 73 3.3.1. Proposed Macromodel………………………………………………….. 73 3.3.2. Model Simulation………………………………………………………. 80 3.4. Applications of PMC…………………………………………………………… 88 3.4.1. PMC-Based Nonvolatile SRAM (7T1P)………………………………. 89 3.4.2. Crossbar Memory……………………………………………………... 101 v

3.4.3. PMC-Based NVSRAM with Concurrent SEU Detection and Correction……………………………………………………………... 107 3.4.4. PMC-Based Logic in Memory………………………………………... 128 3.5. Conclusion…………………………………………………………………….. 150

4. PHASE CHANGE MEMORY…………………………………………………….. 151 4.1. Introduction………………………………………….……………………...... 151 4.2. Fundamental of Phase Change Memory (PCM)………………………………. 151 4.2.1. Drift Behavior of PCM………………………………………………... 154 4.3. Macromodel of PMC………………………………………………………….. 155 4.3.1. Operational Model…………………………………………………….. 157 4.3.2. Drift Model……………………………………………………………. 165 4.3.3. Model Simulation……………………………………………………... 177 4.4. Applications of Phase Change Memory (PCM) ……………………………… 184 4.4.1. PCM-Based CAM and TCAM Cells………………………………….. 185 4.4.2. Multilevel Phase Change Memory……………………………………. 215 4.5. Conclusion…………………………………………………………………….. 248

5. RACETRACK MEMORY……………………………………………………….... 252 5.1. Introduction………………………………………….……………………...... 252 5.2. Fundamental of Racetrack Memory (RM)……………………………………. 253 5.3. Macromodel of Racetrack Memory…………………………………………... 254 5.3.1. Proposed Macromodel of Racetrack Memory………………………… 255 5.3.2. Model Simulation……………………………………………………... 273 5.4. Applications of Racetrack Memory…………………………………………... 280 5.4.1. Racetrack-based nonvolatile memories……………………………….. 280 5.4.2. Racetrack-based CAM and TCAM cells……………………………… 301 5.5. Conclusion…………………………………………………………………….. 320

6. REFERENCES…………………………………………………………………..… 322

LIST OF FIGURES Page

Fig. 1. Traditional memory hierarchy of the computer system [1]…………………………….. 1

Fig. 2. A Performance gap between memory and processor [2]……………………………….. 2

Fig. 3. A plot of CPU transistor counts against dates of introduction [3]……………………... 3

Fig. 4. Traditional Static Random Access Memory (SRAM) or 6T-SRAM……………...…… 5

Fig. 5. Dynamic Random Access Memory (DRAM)……………………...…………………... 6

Fig. 6. NOR Flash memory a) Circuit Schematic b) Erase c) Write d) Read Operations……... 7

Fig. 7. Circuit Schematic of NAND Flash memory………………………………………….... 8

Fig. 8. SRAM-Based Content Addressable Memory (CAM) [23]…………………………… 11

Fig. 9. SRAM-Based Ternary Content Addressable Memory (TCAM) [23]………………… 12

Fig. 10. Ambipolar transistor, a) Symbol, b) Characteristic……………………………….…... 14

Fig. 11. Model of an ambipolar transistor…………………………………………………...… 15

Fig. 12. Resistance variation of the MTJ according to the storage layer magnetization state…. 16

Fig. 13. 7T NOR-type memristor-based CAM cell [39]…………………………………….…. 17

Fig. 14. 4T2MTJs Nonvolatile SRAM [40]…………………………………………………… 19

Fig. 15. a) Input Images b) Output Images when AND, OR, XOR operations between the two

input images and the inverse operation of input 1 are executed………………………. 21

Fig. 16. General Structure of LiM cell of [41]…………………………………………………. 22

Fig. 17. MTJ-Based LiM when implementing AND Gate (Z = XY) [41]…………………..… 23

Fig. 18. Relationship between fundamental circuit elements………………………………….. 28

Fig. 19. TiO2 film sandwiched between two Pt electrodes…………………………………..… 29

Fig. 20. Proposed memristor-based nonvolatile memory cell…………………………………. 30

Fig. 21. Plot of voltage, resistance (y-axis) and time (x-axis) for the proposed memory cell.… 33

Fig. 22. Plot of current, resistance (y-axis) and time (x-axis) for the proposed memory cell…. 33 vii

Fig. 23. Plot of voltage difference between the bitlines (y-axis) and READ time (x-axis) for the

proposed memory cell……………………………...…………………………………. 36

Fig. 24. Driver circuit for WRITE and READ operations…………………………………...… 36

Fig. 25. Plot of WRITE time (ns) vs memristance range (kΩ)………………………………… 37

Fig. 26. Plot of voltage difference across memrisitor vs Read Time………………………….. 38

Fig. 27. Memristance of the proposed memory cell (y-axis) vs time (x-axis) for consecutive

WRITE ‘1’ and‘0’ operations…………………………………………………………. 39

Fig. 28. Memristance of the proposed memory cell (y-axis) vs READ time (x-axis) for a READ

operation………………………………………………………………………………. 41

Fig. 29. READ time of the proposed memory cell Vs Number of Consecutive READ

Operations for state change………………………………………………………...…. 41

Fig. 30. Plot of WRITE and READ times Vs transistor size (NMOS)………………………… 43

Fig. 31. Proposed TCAM design using memristors……………………………………………. 48

Fig. 32. a) First Step of Write ‘0’ Operation b) Second Step of Write ‘0’ Operation…………. 52

Fig. 33. a) First Step of Write ‘1’ Operation b) Second Step of Write ‘1’ Operation…………. 53

Fig. 34. Write ‘2’ Operation………………………………………………………………….... 54

Fig. 35. Match line voltage of TCAM in figure 32 during the search ‘0’ operation when TCAM

data is state ‘0’………………………………………………………………………… 56

Fig. 36. Match line voltage of TCAM in figure 32 during the search ‘0’ operation when TCAM

data is state ‘1’………………………………………………………………………… 57

Fig. 37. Match line voltage of TCAM of figure 32 during the search ‘0’ operation when TCAM

data is state ‘2’………………………………………………………………………… 57

Fig. 38. Match line voltage of TCAM in figure 32 during the search ‘1’ operation when TCAM

data is state ‘0’………………………………………………………………………… 58

Fig. 39. Match line voltage of TCAM in figure 32 during the search ‘1’ operation when TCAM

data is state ‘1’………………………………………………………………………… 58 viii

Fig. 40. Match line voltage of TCAM in figure 32 during the search ‘1’ operation when TCAM

data is state ‘2’………………………………………………………………………… 59

Fig. 41. Match line voltage of TCAM in figure 32 during the search ‘2’ operation when TCAM

data is state ‘0’………………………………………………………………………… 59

Fig. 42. Match line voltage of TCAM in figure 32 during the search ‘2’ operation when TCAM

data is state ‘1’………………………………………………………………………… 60

Fig. 43. Match line voltage of TCAM in figure 32 during the search ‘2’ operation when TCAM

data is state ‘2’………………………………………………………………………… 60

Fig. 44. Write time (ns) of proposed TCAM cell vs supply voltage at 32nm technology……... 63

Fig. 45. Write operation of the proposed TCAM cell when operating as a MCAM cell a) Write

‘0’ operation, b) Write ‘1’ operation…………………………………………………..66

Fig. 46. Switching processes in the PMC a) The CF vertically grows prior to set occurs, b) the

CF laterally dissolves prior to reset…………………………………….……………... 70

Fig. 47. Flowchart of the proposed PMC Macromodel………………………………………... 74

Fig. 48. Circuit model of programmable metallization cell (PMC)……………...…………….. 74

Fig. 49. Voltage polarity checking circuit……………………………………………………... 76

Fig. 50. Previous CF volume stored circuit………………………………………………...….. 77

Fig. 51. a) Voltage pulse sequence across PMC b) CF height c) CF radius of PMC Vs

Simulation time (ms)………………………………………………………………….. 81

Fig. 52. Voltage difference across PMC for generating I-V and R-V plots………………….... 82

Fig. 53. I-V characteristics of the proposed PMC macromodel……………………………….. 82

Fig. 54. R-V characteristics of the proposed PMC macromodel………………………………. 82

Fig. 55. Percentage errors between the switching time of the proposed PMC macromodel and

the experimental results [60] for the set and reset processes vs pulse amplitude (V).... 83

Fig. 56. I-V characteristics of the proposed PMC macromodel when the ramp rate of the DC

Sweep is 1, 3 and 5V/s…………………………………………………………...…… 84 ix

Fig. 57. I-V characteristics of the proposed PMC macromodel when the data of [68] is

employed and the ramp rate of the DC sweep is 1 V/s……………………………….. 87

Fig. 58. Relationship between switching time and pulse amplitude of voltage drop across PMC

from simulation and experimental data [68]…………………………………..……… 88

Fig. 59. Percentage difference between the switching time of experimental and simulated data

versus pulse amplitude……………………………………………………...………… 88

Fig. 60. The proposed non-volatile 7T1P Cell…………………………………………………. 89

Fig. 61. Least PMC resistance (ON-state resistance) vs ratio of largest CF radius and height of

PMC…………………………………………………………………………………… 91

Fig. 62. Store time of 7T1P cell when its supply is changed………………………………...… 92

Fig. 63. Voltages at D and DN of 7T1R cell vs. supply voltage (voltages at Ctrl1 and Ctrl2 are

0)………………………………………………………………………………………. 92

Fig. 64. Voltages at D and DN during the restore operation when a ‘1’ is stored in the 7T1P

cell…………………………………………………………………………………….. 94

Fig. 65. Restore time of 7T1P cell and restore voltages at Ctrl1 and Ctrl2, when data ‘1’ is

stored in the PMC (the PMC resistance is 169.697kΩ)………………………………. 94

Fig. 66. Least restore voltage (voltage at Ctrl1 and Ctrl2) vs. ON-state resistance of PMC…... 95

Fig. 67. Read-out scheme I………………………………………………………………….... 102

Fig. 68. Read-out scheme II……………………………………………………………...…… 103

Fig. 69. Ratio of sense voltage and supply voltage versus sense resistance of the crossbar…. 105

Fig. 70. Relative noise margin versus crossbar dimension………………………………….... 107

Fig. 71. Proposed non-volatile SRAM (7T1P) cell…………………………………………... 108

Fig. 72. Connection between 7T1P Array and CED circuit………………………………….. 111

Fig. 73. Proposed XOR gate using ambipolar transistors…………………………………….. 112

Fig. 74. Dual-rail checker for CED………………………………………………………...…. 113

Fig. 75. Ambipolar-based dual-rail checker………………………………………………….. 113 x

Fig. 76. Model of ambipolar transistor.………………………………………………………. 115

Fig. 77. Input and output voltages of inverter……………………………………………….... 116

Fig. 78. Input and output voltages of the proposed XOR gate……………………………….. 116

Fig. 79. Voltages at D and DN, and PMC resistance value of 7T1P cell when '0' and '1' are

written to the proposed memory cell……………………………………………….... 117

Fig. 80. Plot of voltage at DP versus read time of 7T1P cell when data stored in the PMC is

read…………………………………………………………………………………... 119

Fig. 81. Layout of the proposed NVSRAM (7T1P)……………………………………..…… 126

Fig. 82. Layout of the proposed Ambipolar-based CED…………………………………...… 126

Fig. 83. Layout of the CMOS-Based CED………………………………………………….... 126

Fig. 84. General structure of the proposed (PMC-based) LiM cell………………………...… 128

Fig. 85. Voltage at node D in the read operation for a '1' as data stored in the PMC……….... 129

Fig. 86. First proposed (ambipolar-based) LiM cell………………………………………….. 130

Fig. 87. AND operation between a ‘1’ stored in the PMC and '0' as input data…………...… 131

Fig. 88. Full adder using first proposed LiM cell…………………………………………….. 134

Fig. 89. Voltages at Pre1, Pre2, Cout and Sum when A, B, and Cin are in states '1', '1', and '0'

respectively…………………………………………………………………………... 135

Fig. 90. Second proposed (CMOS-based) LiM cell (9T1P)………………………………….. 136

Fig. 91. Full adder using second proposed LiM cell…………………………………………. 139

Fig. 92. Voltages at Ctrl2 and Pre of cells A, B, C, and D of full adder………...…………… 140

Fig. 93. Thermal physical model of PCM device (Vertical Section)……………………….... 152

Fig. 94. Temperature/time dependence of phase change process [85]……………………….. 153

Fig. 95. Measured I-V characteristics for a PCM cell in either a Set or a Reset state,

corresponding to a crystalline or amorphous phase of the active chalcogenide [86]... 153

Fig. 96. Flowchart of the proposed PCM macromodel………………………...……………... 156

Fig. 97. Circuit model of a PCM cell………………………………...……………………….. 158 xi

Fig. 98. Integrator Circuit [98]…………………………..…………...……………………….. 159

Fig. 99. Decision circuit as temperature comparator……………………………………...….. 161

Fig. 100. Crystalline fraction calculation circuit……………………………………………..... 162

Fig. 101. I-V curve of a PCM cell [94]………………………………………………...…..…... 163

Fig. 102. Control switch circuit……………………………………………...………………… 164

Fig. 103. Measured VT versus RPCM (at variable Toff, fixed reset pulse and fixed measured Toff =

5s) [86]………………………………………….……………………………………. 166

Fig. 104. Flow Chart for Drift Parameter Calculation……………………………...………….. 167

Fig. 105. ∆R circuit……………………………………...……………………………………... 168

Fig. 106. Circuit for ∆R Before Drift……………………………………...…………………… 169

Fig. 107. Crystalline fraction of PCM cell at different circuit behavior (Cx,rd)…………...…… 171

Fig. 108. Crystalline fraction calculation circuit under drift behavior………...………………. 173

Fig. 109. Toff Calculation Circuit………………………...…………………………………….. 175

Fig. 110. I-R curve of PCM cell……………………………...………………………………... 179

Fig. 111. I-V curve of PCM cell……………………………………...………………………... 179

Fig. 112. Resistance drift of PCM at different crystalline fraction (Cx)…………...…………... 180

Fig. 113. Threshold voltage drift of PCM cell at different crystalline fraction (Cx)……...…… 181

Fig. 114. Resistance drift of experimental data [88] and simulation results of proposed

macromodel (at R0=500 kΩ)…………....…………………..……………………….. 182

Fig. 115. Plot of threshold voltage (Vth) versus PCM resistance of experimental data [86] and

simulation results of proposed macromodel…………………..……………...……… 182

Fig. 116. Block diagram of proposed PCM-based CAM/TCAM cells…………..……………. 185

Fig. 117. The proposed 1T1P memory core…………………..…...……..……………………. 186

Fig. 118. Differential sense amplifier [9]…………………..……..…..……………………….. 187

Fig. 119. 1T1P memory core and differential sense amplifiers for TCAM operation……..….. 188

Fig. 120. CMOS-based CAM comparator circuit…………………..…………...……………... 189 xii

Fig. 121. CMOS-based TCAM comparator circuit…………………..………………...……… 190

Fig. 122. Ambipolar-based CAM comparator circuit………………..…………...……………. 190

Fig. 123. Ambipolar-based TCAM comparator circuit………………..………………...…….. 191

Fig. 124. Model of an ambipolar transistor………………..……………………………..……. 194

Fig. 125. Write time Vs PCM resistance range of 1T1P memory core when the PCM is

programmed from amorphous (‘0’) to crystalline phase (‘1’).….…………………... 194

Fig. 126. Bitline voltage of a 1T1P core for a read operation (the bitline capacitance is

0.03pF)………………………………………………………………………………..195

Fig. 127. Bitline voltage of 1T1P memory core when the intermediate PCM resistance is

varied……..………………………………………………………………………….. 196

Fig. 128. Output voltage of differential sense amplifier when Vths = 0.15V…………...……… 197

Fig. 129. Threshold voltage of differential sense amplifier (Vths) and its switching voltage...... 198

Fig. 130. Input and output voltages of differential sense amplifier versus simulation time.…... 199

Fig. 131. Power dissipation of the 1T1P memory core during a read operation…………...….. 203

Fig. 132. Average power dissipation of differential sense amplifier………………………...… 204

Fig. 133. Bitline voltage vs number of 1T1P cores per bitline, (read time of 0.294ns and CAM

operation)…………………...………………………………………………………... 206

Fig. 134. Bitline voltage vs number of 1T1P cores per bitline, (read time of 0.83ns and TCAM

operation)……………………..……………………………………………………... 207

Fig. 135. CAM and TCAM cells of [23]………………………………...…………………….. 212

Fig. 136. Current differential amplifier [9]…………………………...………………………... 212

Fig. 137. Phase Change Memory (PCM) Memory Cell……………………...………………... 215

Fig. 138. PCM resistance distribution over time………………………...…………………….. 216

Fig. 139. Iterative scheme of programming and verification for a multilevel PCM cell [107]... 218

Fig. 140. Row of PCM cells for calculating the threshold resistance of a level………...……... 219

Fig. 141. Separation of initial resistance of a PCM cell……………………...………………... 219 xiii

Fig. 142. Flowchart of initial resistance separation for a cell with N levels………………..…. 220

Fig. 143. Flowchart of reference threshold resistance process………………………...………. 223

Fig. 144. Flowchart of reference threshold resistance process for each level of a PCM cell in a

memory array………………………….……………………………………………... 223

Fig. 145. Maximum (largest) and minimum (least) percentage accuracies vs adjustment of

percentage resistance (4 levels/cell, write region is 1% and 5%)…………………… 230

Fig. 146. Number of simulations required for finding a flat initial level separation (4 levels/cell,

write region is 1%)……………………………………….………………………….. 230

Fig. 147. Flowchart of resistance drift calculation……………………...……………………... 232

Fig. 148. Average resistance of PCM vs Toff at different values of crystalline fraction (Cx)...... 233

Fig. 149. Flat initial level separation of a PCM cell with 4 levels………………...…………… 235

Fig. 150. Average percentage accuracy in each level of a PCM array at 4 levels per cell and Toff

at 1 year……………………………………………………………………………… 236

Fig. 151. Average percentage accuracy of levels in a PCM array at 4 levels per cell and Toff at 1

month and 1 year………………………………...…………………………………... 236

Fig. 152. Average percentage accuracy of PCM (4 levels per cell and Toff at 1 ms, 1 second, 1

minute, 1 hour, 1 month and 1 year)…………………………………………..…….. 236

Fig. 153. Average percentage accuracy of a PCM cell with 4 levels, 1% write region (Toff time of

1ms, 1second, 1 minute, 1 hour, 1 day, 1 week, 1 month and 1 year)…..…………... 237

Fig. 154. Average percentage accuracy of PCM cell at 1% write region, and Toff time is 1 second,

1 month, and 1 year vs number of read operations for proposed median method…... 238

Fig. 155. Average percentage accuracy of PCM cell at 1% write region……………...………. 239

Fig. 156. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 1%

write region, and 16 PCM cells per row to find the resistance levels………….……. 240

Fig. 157. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 1%

write region, and 32 PCM cells per row to find the resistance levels………….……. 241 xiv

Fig. 158. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 1%

write region, and 64 PCM cells per row to find the resistance levels………….……. 241

Fig. 159. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 5%

write region, and 16 PCM cells per row to find the resistance levels……….………. 241

Fig. 160. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 5%

write region, and 32 PCM cells per row to find the resistance levels…………….…. 242

Fig. 161. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 5%

write region, and 64 PCM cells per row to find the resistance levels………….……. 242

Fig. 162. Average number of faulty cells in a 1M PCM array versus time (Toff); write region is

1% and mission time is 1 year…………………………………..…………………… 243

Fig. 163. Average number of faulty cells in a 1M PCM array versus time (Toff); write region is

5% and mission time is 1 year……………………………………………..………… 243

Fig. 164. PCM resistance separation by using resistance margin [89]……………………….... 244

Fig. 165. Write time of PCM cell from crystalline to amorphous phases vs. Toff; initial PCM

resistance range is from 7kΩ to 200kΩ……………..……………………………….. 245

Fig. 166. Read time of RESET resistance vs Toff; the initial PCM resistances of the SET and

RESET are 7kΩ and 200kΩ…………………………………………………………. 246

Fig. 167. Write and Read times of RESET resistance when initial PCM resistance (R0) is

changed; Toff is 1 year and initial PCM resistance of SET state is 7kΩ……….…….. 247

Fig. 168. The cross-section structure of racetrack memory. At the back-end process, the magnetic

stripe is implemented above the CMOS/MTJ interfacing circuits, nodes Rin and Rout

are for reading, Win and Wout for writing, and Pin and Pout for the shift operation

[121]…………………....……………………………………………………………. 253

Fig. 169. Flowchart of the proposed HSPICE macromodel of a RM………………...………... 255

Fig. 170. Track separation; the layer of track with alphabet F is fixed while the other tracks (V)

are varied……………………………………………………………….……….…… 256 xv

Fig. 171. Main circuit of the propagation part………………………………...……………….. 257

Fig. 172. a) Shift voltage comparison circuit; b) Shift polarity checking circuit…………...…. 258

Fig. 173. Previous shift percentage circuit…………………………………………..………… 260

Fig. 174. Previous index write head circuit………………………………...………………….. 261

Fig. 175. Shift percentage controlled circuit…………………………...………………………. 262

Fig. 176. Main circuit of the write head…...... ………………………………………………… 263

Fig. 177. a) Write polarity checking circuit b) Write voltage comparison circuit………..…… 264

Fig. 178. Previous write percentage circuit…………………………...……………………….. 266

Fig. 179. Data stored in each track of the RM…………...…………………………………….. 268

Fig. 180. Write head data circuit…………………...…………………………………………... 269

Fig. 181. Write Percentage estimation circuit……………………………...…………………... 270

Fig. 182. Main model of the read head………………………………...………………………. 272

Fig. 183. Simulated Racetrack Memory (RM)…………...……………………………………. 274

Fig. 184. Precharged circuit used for the read head……………………………………………. 275

Fig. 185. Write operation of RM when the racetrack index 2 is connected to the write head…. 275

Fig. 186. Shift operation of the Racetrack Memory (RM)……………………...……………... 276

Fig. 187. Read operation of the Racetrack Memory (RM)………………………...…………... 277

Fig. 188. Simulation of RM; W0 and W1 denote the write '0' and '1' operations, Sf presents Shift

forward operation to the next index while Sb is shift backward operation to the

previous index, P presents the precharged operation while R presents the read

operation……………………………………………………………………………... 278

Fig. 189. Relationship between DW motion speed (V) and current density (Jp) at a total length of

racetrack (c) of 1µm……………………………...………………………………….. 279

Fig. 190. Percentage error of the proposed macromodel and the simulation results……...…… 279

Fig. 191. Proposed write hardware (racetrack cells and write head circuit)……………..……. 281

Fig. 192. Proposed read hardware (racetrack cells and read head circuit)…………...………... 282 xvi

Fig. 193. Proposed propagation hardware (racetrack cells and propagation circuit)……..…… 283

Fig. 194. Write operation of the proposed write hardware………………………...…………... 284

Fig. 195. Shift operation of RM when the proposed propagation circuit is employed……….... 285

Fig. 196. Delay vs. number of read hardware circuits connected to the same bitline……...….. 287

Fig. 197. Number of precharged circuits vs delay when the number of read hardware (racetrack

cells and a read head circuit) is fixed at 256……………………...…………………. 288

Fig. 198. Array of Read Hardware (Racetrack Cells + Read Head Circuit) when adding a

LATCH cell at each pair of bitline voltages…………..……………………………... 288

Fig. 199. Plot of write delay vs. supply voltage at different CMOS feature sizes……..……… 291

Fig. 200. Plot of shift delay vs. supply voltage at different CMOS feature sizes……..…….… 291

Fig. 201. Plot of read delay vs nominal variation of racetrack resistance (denoted by Δ);

racetrack R1 and R2 are in state '0' and '1' respectively………………………...…… 292

Fig. 202. Power dissipation of the proposed read hardware when varying the racetrack resistance

(R1 and R2 are in state '0' and '1' respectively)…………………………………….... 292

Fig. 203. Power Delay Product (PDP) of the proposed read hardware when varying the racetrack

resistance (R1 and R2 are in state '0' and '1' respectively)………………………...… 293

Fig. 204. Racetrack Memory…………………………...……………………………………… 294

Fig. 205. Write duration of a RM when increasing the number of racetrack bits per cell. The

write voltage is fixed at 0.9V…………………………….………………………….. 295

Fig. 206. Write and read resistances of a RM when increasing the number of racetracks per cell

(N). Write and read voltages are fixed at 0.9V………………………………………. 295

Fig. 207. The proposed detection circuit………………………………………...…………….. 299

Fig. 208. RM-based CAM cell of [108]……………………………………………...………… 301

Fig. 209. Proposed write circuit (racetrack cells and write head circuit) of a CAM (for a TCAM:

2 write circuits are required)……………………...………………………………….. 303

Fig. 210. Proposed read circuit (RM cells, read head circuit) of a) CAM and b) TCAM…...… 304 xvii

Fig. 211. a) Comparison circuit of RM-based CAM and TCAM b) Balancing circuit for sense

amplifier read operation of TCAM…………………………………………………... 304

Fig. 212. Array of proposed RM-based CAM/TCAM cells……………………...……………. 305

Fig. 213. Write operation of the write circuit at 32nm CMOS feature size (supply voltage at

3V)………………………………………………………………………………...…. 307

Fig. 214. Precharged circuit used in the read circuit………………………..…………………. 307

Fig. 215. Write and search times of the proposed RM-based CAM and TCAM cells when

changing the length of RM track……………………….……………………………. 317

Fig. 216. Write time of the proposed CAM and TCAM cells when changing the RM track

length………………………………………………………………………………… 317 xviii

LIST OF TABLES Page

Table 1. Comparison between NOR and NAND Flash memories [13]…………………..…… 9

Table 2. Truth Table of CAM…………………………………………………………..…….. 10

Table 3. Truth Table of TCAM…………...………………………………………………..… 12

Table 4. State of each SRAM in the SRAM-based TCAM cell…………………...…………. 13

Table 5. Truth table of MTJ-Based LiM when operating AND operation…………...………. 24

Table 6. WRITE/READ times at different supply voltages and feature sizes…………...…... 44

Table 7. Comparison of proposed memory cell with MCAM of [39] at same parameters and

operating voltage……………………...…………………………………………….. 47

Table 8. Comparison of proposed memory cell with NAND/NOR flash memories [55-59].... 47

Table 9. Write time (TW) of proposed TCAM cell………………………...………………..... 55

Table 10. Simulation results of the search operation in the TCAM cell; D denotes a discharged

match line and S denotes a stable (unchanged) match line…………………………. 61

Table 11. Searching time (TS) of proposed TCAM design……………...…………………….. 62

Table 12. Comparison between transistor size and write time for a memristor range of 100 –

19kΩ….……………………………………………………………………………... 63

Table 13. Parameters used in simulation (from [60])……………………...…………………... 80

Table 14. Parameter sensitivity when the film thickness of the solid electrolyte (L) is changed

by ±5%...... 85

Table 15. Parameter sensitivity when the radius at the bottom of the CF (R), is changed by

±5%...... …………………………………………………………………………….. 86

Table 16. Time comparison of 7T1P, 7T1R [69] and 7T1M cells in which a PMC, an

OxRRAM, and a memristor are used as storage element (32nm feature size)..……. 96

Table 17. Comparison of delays in the 7T1P cell when the CMOS feature size and supply

voltage are varied...... 98 xix

Table 18. Comparison of average power dissipation and Power Delay Product (PDP) for each

operation of a 7T1P cell when varying the CMOS feature size...... 99

Table 19. Percentage variation (3σ/µ) of store and restore times of 7T1P and 7T1M cells when

each parameter is varied...... 100

Table 20. Sense voltage (V) of crossbar memories...... 104

Table 21. ON/OFF current ratio of MOSFET and PMC-based crossbar memories………..... 106

Table 22. Voltages at nodes D, DN, and DP of proposed 7T1P cell and output voltages of a

dual-rail checker...... 114

Table 23. Delay, power dissipation, and Power Delay Product (PDP) of the proposed XOR

gate….……………………………………………………………………………... 116

Table 24. Delay, power dissipation, and Power Delay Product (PDP) of proposed 7T1P cell for

write '0' and '1' operations (when the PMC resistance is 100MegΩ and 70kΩ

respectively)………..……………………………………………………………… 118

Table 25. Delay, power dissipation, and Power Delay Product (PDP) for read operation of the

SRAM core in the proposed cell...... 119

Table 26. Voltages at D, DN, and DP of 7T1P cell and output voltage, delay time, power

dissipation, and PDP of dual-rail checker...... 120

Table 27. Voltages at D, DN, and DP of a 7T1P cell and output voltage, delay time, power

dissipation, and PDP of a dual-rail checker implemented in CMOS...... 121

Table 28. Delay, power dissipation, and Power Delay Product (PDP) of proposed 7T1P cell for

write '0' and '1' operations (when the PMC resistance is 100MegΩ and 70kΩ

respectively)………..……………………………………………………………… 122

Table 29. Delay (ps) of each operation of the proposed NVSRAM cell using high performance

(HP) CMOS PTMs at different feature sizes...... 122

Table 30. Charge of nodes D, DN and DP of the proposed 7T1P cell...... 124 xx

Table 31. Voltages at D, DN, and DP of 7T1P cell and output voltage, delay time, power

dissipation, and PDP of dual-rail checker...... 124

Table 32. Voltages at D, DN, and DP of a 7T1P cell and output voltage, delay time, power

dissipation, and PDP of a dual-rail checker implemented in CMOS...... 125

Table 33. Performance of proposed LiM cell when operating the AND function………...…. 132

Table 34. Performance of proposed LiM cell when operating the OR function………...…… 132

Table 35. Performance of proposed LiM cell when operating the XOR function…...………. 133

Table 36. Performance of full adder when implemented using proposed LiM cells……...…. 135

Table 37. Performance of proposed CMOS-based LiM cell for AND function…………...… 137

Table 38. Performance of proposed CMOS-based LiM cell for OR function……………...... 137

Table 39. Performance of proposed CMOS-based LiM cell for XOR function…………..…. 138

Table 40. Performance of proposed CMOS-based LiM cell for inverse function……...……. 139

Table 41. Metrics of the full adder cell when implemented using proposed LiM cell…...…... 140

Table 42. Percentage variation (3σ/µ) of delay of the proposed 7T2A1P LIM (AND

operation)…….……………………………………………………………………..141

Table 43. Percentage variation (3σ/µ) of delay of the proposed 7T2A1P LIM (OR

operation)…………………………………………………………………………...141

Table 44. Percentage variation (3σ/µ) of delay of the proposed 7T2A1P LIM (XOR

operation)…….……………………………………………………………………..141

Table 45. Performance of the proposed 7T2A1P LiM varying CMOS feature size (supply

voltage is fixed at 0.9V)…………………..……………………………………….. 142

Table 46. Percentage variation (3σ/µ) of delay of the proposed 9T1P LiM cell (AND

operation)…...………………………………………………………………………143

Table 47. Percentage variation (3σ/µ) of delay of the proposed 9T1P LiM cell (OR

operation)……..…………………………………………………………………….143 xxi

Table 48. Percentage variation (3σ/µ) of delay of the proposed 9T1P LiM cell (XOR

operation)…...………………………………………………………………………143

Table 49. Percentage variation (3σ/µ) of delay of the proposed 9T1P LiM cell (inverter

operation)...…………………………………………………………………………143

Table 50. Performance of the proposed 9T1P LiM at difference CMOS Feature Size where its

supply voltage is fixed at 0.9V…………………...………………………………... 144

Table 51. AND function comparison……………...…………………………………………. 145

Table 52. OR function comparison…………………………...……………………………… 145

Table 53. XOR function comparison………………………...………………………………..145

Table 54. Full adder comparison………………...…………………………………………… 146

Table 55. Ranking of 7T1R NVSRAM cells by resistive element……………...……………. 148

Table 56. Ranking of the nonvolatile Logic in Memory…………...………………………… 150

Table 57. Truth table of crystalline fraction calculation circuit (provided switches sw_cx1 and

sw_cx4 are ON)……………………………………………………………………. 162

Table 58. Physical parameters for PCM simulation………………………………………….. 178

Table 59. Calibrated parameters for PCM simulation………………………………………... 178

Table 60. Comparison of the resistance drift at different Toff………………………………... 183

Table 61. Comparison of the threshold voltage drift at different PCM resistance values……. 183

Table 62. Output voltages of differential sense amplifiers for CAM and TCAM operations... 188

Table 63. Voltages at nodes O1, O2, S1, S2, and match line voltage of proposed TCAM

comparator circuit…………………………………………………………………..193

Table 64. Read time and bitline voltage difference (between state ‘0’ and intermediate state and

between state ‘1’ and intermediate state) at intermediate PCM resistance values… 196

Table 65. Bitline voltage of 1T1P core and output voltage of differential sense amplifier for

CAM operation at read times for the two states…………………………………… 198 xxii

Table 66. Bitline voltage of 1T1P core and output voltages of differential sense amplifiers for

TCAM operation at read times for the three states………………………………... 198

Table 67. Search time of CMOS and ambipolar-based CAM comparator circuits at a supply

voltage (VDD) of 0.9V……………………………………………………………… 200

Table 68. Search time of the CMOS and ambipolar-based TCAM comparator circuits at a

supply voltage (VDD) of 0.9V……………………………………………………… 201

Table 69. Delay of proposed CAM/TCAM cells for a search operation……………………... 202

Table 70. Average power dissipation, average miss delay and power delay product of each

circuit in the proposed CAM and TCAM cells……………………………………. 204

Table 71. 1T1P core performance under different PCM resistance ranges; (at 32nm feature size

and a supply voltage of 0.9V)……………………………………………………... 205

Table 72. Delay of the proposed CAM and TCAM cells for the search operation when the

CMOS feature size is changed (supply voltage is 0.9V)…………………………... 207

Table 73. Delay of proposed CAM/TCAM cells for the search operation when both CMOS

feature size and supply voltage are changed………………………………………. 208

Table 74. Write time, read times and number of read operations prior to refresh for 1T1P and

1T1M cores…………………………………………………………………………210

Table 75. Match line current (IML) of CAM cell of [23] during the search operation, PCM

resistance range is 7kΩ – 200kΩ at 32nm CMOS feature size……………………. 211

Table 76. Comparison between proposed 1T1P CAM/TCAM cells and CAM/TCAM cells of

[23] at 32nm CMOS feature size and supply voltage of 0.9V…………………….. 212

Table 77. Comparison of the proposed and CMOS-based CAM/TCAM cells (in which a 6T

SRAM is used as storage core)……………………………………………………..213

Table 78. Comparison of proposed CAM cell and MTJ-based CAM cells [102] (32nm CMOS

feature size, supply voltage of 0.9V, match line capacitance of 0.03pF)………….. 214

Table 79. Initial Percentage PCM Resistance Separation……………………………………. 222 xxiii

Table 80. Initial resistances when the PCM resistance range is from 7kΩ – 200kΩ………… 226

Table 81. Percentage accuracies when the PCM resistance is initially separated (from Table

79)…………………………………………………………………………………. 226

Table 82. Adjustments to the initial percentage resistance separation……………………….. 228

Table 83. Flat initial percentage resistance Separation and percentage accuracy of each level for

a PCM cell with 3 levels……………………………………………………………229

Table 84. Parameters for simulating PCM cell………………………………………………. 232

Table 85. Flat initial PCM level separation, 4 levels/cell and write resistance is 1%...... 234

Table 86. Flat initial level separation for a PCM cell with 8 levels and write region is 1%..... 234

Table 87. Ranking of non-volatile CAM cells……………………………………………….. 250

Table 88. Ranking of non-volatile TCAM cells……………………………………………… 250

Table 89. Simulation Parameters [121]………………………………………………………. 274

Table 90. Performance comparison between the proposed write circuit and the write circuit of

[123] at 32nm CMOS feature size and W/L = 10…………………………………. 285

Table 91. Performance comparison between the proposed hardware and the circuits of [122]

and [124] for read operation……………………………………………………….. 286

Table 92. Delay of the proposed read hardware when LATCH is connected to the bitlines; the

number of precharged circuits is given by 2………………………………………. 289

Table 93. Performance of the proposed write, read and propagation circuits when varying the

CMOS feature size and supply voltage……………………………………………. 290

Table 94. Percentage variation (3σ/µ) of delays of the proposed write, read, and propagation

hardware when varying threshold voltage of CMOS……………………………… 293

Table 95. Percentage variation (3σ/µ) of delays of the proposed write, read, and propagation

circuits when varying threshold voltage and channel length of CMOS…………… 296 xxiv

Table 96. Critical Charge in each node of the proposed write and propagation circuits resulting

in an error in the stored and shifted data respectively when voltage at nodes Cont and

WLshf are at supply voltage………………………………………………………. 297

Table 97. Critical charge in each node of the proposed read hardware with/without LATCH at

the bitline voltages………………………………………………………………….298

Table 98. Output voltage of the proposed detection circuit………………………………….. 300

Table 99. Delay, power dissipation, and Power Delay Product (PDP) of the proposed detection

circuit………………………………………………………………………………. 300

Table 100. States of the CAM and TCAMs cells when using RMs to store data……………... 302

Table 101. Store and Search Voltage of CAM when using comparison circuit……………….. 304

Table 102. Store and Search Voltage of TCAM when using comparison circuit……………... 305

Table 103. Macromodel parameters for RM…………………………………………………... 306

Table 104. Delay, Power dissipation, and PDP of the RM-based CAM cell (search operation)

……………………………………………………………………………………... 308

Table 105. Delay, Power dissipation, and PDP of the RM-based TCAM cell (search

operation)…………………………………………………………………………...308

Table 106. Charge in the proposed RM-based CAM and TCAM cells when the stored and search

data are ‘0’…………………………………………………………………………. 309

Table 107. Percentage variation (3σ/µ) of mismatch delay of the proposed RM-based CAM and

TCAM cells………………………………………………………………………... 310

Table 108. Critical Transistor and Percentage variation (3σ/µ) of mismatch delay of the proposed

RM-based CAM and TCAM cells………………………………………………… 310

Table 109. Mismatch delay, power dissipation and PDP of the proposed CAM cell. Bitline

capacitance is 1fF, the number of precharged circuits is 2………………………... 311

Table 110. Mismatch delay, power dissipation and PDP of the proposed TCAM cell. Bitline

capacitance is 1fF, the number of precharged circuits is set to 2………………….. 312 xxv

Table 111. Array-level CAM delay (ns) when varying the CMOS feature size. Bitline

capacitance is 1fF, the number of precharged circuits is 2………………………... 313

Table 112. Array-level TCAM delay (ns) when varying the CMOS feature size. Bitline

capacitance is 1fF, the number of precharged circuits is 2………………………... 313

Table 113. Critical charge at node BLB of array of proposed CAM cells at difference CMOS

feature sizes with '0' as stored data. The bitline capacitance is 1fF. The number of

precharged circuits is 2.……………………………………………………………. 314

Table 114. Critical charge at node BLB of array of proposed TCAM cells at different CMOS

feature size with '0' as stored data. The bitline capacitance is 1fF. The number of

precharged circuits is 2…………………………………………………………….. 314

Table 115. Percentage variation (3σ/µ) of mismatch delays of proposed CAM cell. The bitline

capacitance is 1fF. The number of precharged circuits is 2……………………….. 315

Table 116. Percentage variation (3σ/µ) of mismatch delays of the proposed TCAM cell. The

bitline capacitance is 1fF. The number of precharged circuits is 2………………... 316

Table 117. Comparison between proposed CAM cell and CAM cell of [108]………………... 318

Table 118. Mismatch delay, power dissipation and PDP of the RM-based CAM cell [108] at

32nm HP-CMOS feature size, 0.9V supply voltage and 1fF line capacitance……. 319

Table 119. Comparison between proposed RM-based TCAM cell and other non-volatile TCAM

cells………………………………………………………………………………… 320 1

I. INTRODUCTION

1.1. Overview

Memory is a fundamental component of computer and electronic devices. It is used to store program instructions and data values. Due to the diverse characteristic of memory technology, different types of memories have been used in the computer system. Static Random Access

Memory (SRAM) is a type of memory that has the highest speed. SRAM is located on the processor chip to improve speed of the computing system. Dynamic Random Access Memory (DRAM) is a high speed memory that uses in the main memory of computer system. DRAM locates close to a computer’s processor to reduce data transferring time between main memory and Central

Processing Unit (CPU). Hard disk drives and solid-state drives (Magnetic disk, and Flash Memory) are long term memory storages. These memories are inexpensive but they have slow speed and very large area. CPU doesn’t directly load and store data from hard disk drive or solid-state drive.

Figure 1 presents the traditional memory hierarchy of the current computer system.

Fig. 1. Traditional memory hierarchy of the computer system [1]

As shown in figure 1, L1 cache is a small size memory that has the fastest speed. Only a few kB are stored in L1 cache. L2 and L3 caches are slower and larger memories. The size of L2 cache is around a few MB while the size of L3 cache is bigger than the size L2 cache. These caches are located on the processor chip. Static Random Access Memory (SRAM) has been used to 2 generate in these kinds of caches because SRAM has a very fast speed and it can also readily integrate on the same IC as the processor. Nevertheless, SRAM has a low density while its cost is also high. SRAM is not a good choice for main memory and long term data storage.

Dynamic Random Access Memory (DRAM) is a type of memory that has been used in the main memories. DRAM consists of one transistor and one capacitor. Although the speed of DRAM is fast and its area is small, DRAM is a volatile storage as SRAM. Both SRAM and DRAM can’t keep their data when supply voltage is interrupted. Nonvolatile storage element is needed to retain data when power is not available. Hard disk drives and solid-state drives (Magnetic/Flash memories) are examples of nonvolatile memories. These memories are inexpensive but they are slow, require large area, and consume high power. To efficiently use these nonvolatile devices, data from these nonvolatile storage are loaded into main memory, caches, and registers before it’s processed by the CPU.

Fig. 2. Performance gap between memory and processor [2]

Due to the limitation of current memory technology, the performance gap between Central

Processing Unit (CPU) and main memory is very large. As shown in figure 2, processor’s performance is much better than memory’s performance. To improve overall performance of 3 computer, scaling down the feature size of Complementary Metal-Oxide Semiconductor (CMOS) is one of the options. At the low CMOS feature size, the signal propagates at a short distance while its parasitic capacitance is low. Moreover the voltage rail of the low CMOS feature size is small, the signal takes less time to reach its logic level. Switching speed of CMOS is fast and circuit density on the integrated circuits is also high.

In 1965, Gordon Moore, the co-founder of Fairchild Semiconductor and Intel, predicted that the number of transistors that can be integrated on a single die would grow exponentially with time [3]. Figure 3 plots the number of transistors on an integrated circuit (IC) and the year of introduction. As can be observed, the circuit density on an integrated circuit are double every 18 months. The increasing of circuit density shows that IC technology has grown very fast.

Fig. 3. A plot of CPU transistor counts against dates of introduction [3]

Nowadays, CMOS has been scaling down to the nano ranges. The technology roadmap predicted by Moore’s Law is approaching its limitation. To continue improve performance of ICs, the emerging technologies have been widely reported to supersede or complement CMOS. 4

Integration of significantly different technologies such as spintronics [4], carbon nanotube field effect transistor [5], metanano material-based optical circuits [6], and more recently, the memristor

[7] have gained attention, thus creating new possibilities for designing innovative circuits and systems. This type of design style is commonly referred to as “hybrid” because it exploits different characteristics of emerging technologies (provided they show compatible features, inclusive of manufacturing and fabrication). A hybrid approach relies on partially utilizing CMOS, while introducing emerging technologies for performance improvement. This is very attractive for memories in which the modular cell-based organization of these systems is well suited to new technologies and innovative design paradigms.

This research explores different types of emerging devices and its memory applications.

Memristor, Programmable Metallization Cell (PMC), Phase Change Memory (PCM), and

Racetrack Memory (RM) are emerging devices that are used in this research. Various type of memory circuits are considered such Random Access Memory (RAM), Content Addressable

Memory (CAM), Logic in Memory (LiM) etc. This chapter presents brief introductions of current memory circuits, brief reviews of emerging devices, and the survey of hybrid memories that present in the technical literature. Moreover the soft error tolerance of memory circuit is considered at the end of this chapter.

1.2. Types of Memory

Currently, there are different types of electronic memories in the market. These memories are varied depend on the needs of a given application such as the required memory size, the time it takes to access the stored data, the access pattern, the application, and the system requirements [8].

In this research, three types of electronic memories are explored.

- Random Access Memory

- Flash Memory

- Content Addressable Memory 5

1.2.1. Random Access Memory

Random Access Memory (RAM) is the most common type of memory that found in computers and other electronic devices. Any bit of data store in RAM can be access (read and written) in a random order [8]. Two main types of RAM that have been used in the modern computers are Static Random Access Memory (SRAM) and Dynamic Random Access Memory

(DRAM).

1.2.1.1. Static Random Access Memory (SRAM)

Static Random Access Memory (SRAM) is a volatile semiconductor memory that uses bi- stable element such as inverter loop to store data in the cell. The term Static refers to the behavior of SRAM that remains its value as long as power is applied, no refresh operation is needed [9]. The traditional SRAM that uses in the today's microprocessor consists of six CMOS transistors (6T)

[10]. Transistors M5 and M6 control operations of SRAM while transistors M1-M4 maintain data in the cell as voltage at node D. When voltage at line WL is GND, transistors M5 and M6 are OFF.

SRAM is disconnected from bitlines (BL and BLB) and data is maintained in the SRAM cell. If word line voltage (VWL) is set to supply voltage (VDD), write or read operation of SRAM is operated depending on voltages at lines BL and BLB. Figure 4 presents circuit of traditional SRAM.

Fig. 4. Traditional Static Random Access Memory (SRAM) or 6T-SRAM 6

1.2.1.2. Dynamic Random Access Memory (DRAM)

Dynamic Random Access Memory (DRAM) is a type of RAM that stores each bit of data as a capacitor’s charge [10]. The term dynamic refers to the need to periodically refresh the charge on capacitors. Every time the read operation is performed, capacitor’s charge gradually change its value. Refresh operation of DRAM consists of a read operation of the cell contents followed by a write operation. DRAM should be refresh frequently to ensure that data in the memory cells are not corrupted by leakage [8]. Figure 5 presents schematic of DRAM. Capacitor CS maintains data store in the cell as voltage at node D. Transistor M1 is a selector while capacitor CBL presents bitline capacitor. Write and read operations of DRAM are performed by controlling voltages at lines WL and BL.

Fig. 5. Dynamic Random Access Memory (DRAM)

1.2.2. Flash Memory

Flash memory is a nonvolatile solid state memory that uses in today's electronic devices.

It has been used in different types of products such as mobile phones, portable music player, USB flash memory, and flash-based solid-state disks (SSDs) [11] etc. Flash memory stores data in the form of the existence or absence of electric charge at its floating gate transistor. If there is no electric charges at the floating gate, a device behaves like a normal NMOS. Data ‘1’ is stored in the cell. However if the floating gate is negatively charged, this transistor faces some difficulty in forming channel between source and drain. Its threshold voltage is increased. Data ‘0’ is stored in 7 the cell. Flash memory can be electrically programmed and erased. The threshold voltage of floating gate transistor can be changed repetitively from a high to a low state, corresponding to the states of the memory cell [12]. The read operation of flash memory is performed by applying voltages to its terminals and measuring current that flows into the cell [13].

Flash memory can be categorized into two types, NOR and NAND flash memories. NOR flash memories use a parallel array architecture where each cell may be accessed via a contact [14].

NOR Flash memories are mainly used for code storages which have relatively short block length such as 16 or 64 user bits per block. NAND flash memories, on the other hand, are used for massive data storages which have long block length such as 8192 or 16,384 user bits per block [15].

1.2.2.1. NOR Flash Memory

NOR Flash memory consists of array of floating gate transistors that arrange in a NOR- gate like structure [16]. Gates of these transistors are connected to wordline (WL). Drain of every two cells share to each other while source terminals of every floating gate transistor are common to all of the cells. Figures 6 presents circuit schematic of NOR flash memories. Write, read, and erase operations of NOR Flash memory are generated by controlling voltages at each terminal of floating gate transistor [8].

Fig. 6. NOR Flash memory a) Circuit Schematic b) Erase c) Write d) Read Operations [12] 8

The NOR flash memory has a fast random read access times but its erasure and programming times are slow due to the need for precise control of the thresholds [8]. NOR flash memory is attractive for applications such as program-code storage [8].

1.2.2.2. NAND Flash Memory

NAND Flash memory consists of an array of floating gate transistors and two selection transistors. The selection transistors locate at the edges of the string. They are used to control the connection between the source line (through MSL) and the bitline (through MDL). The bitline (BL) is shared with the other NAND String while wordlines (WLs) are connected to the controlled gate of floating gate transistors [14]. Figure 7 presents circuit schematic of NAND flash memories. An erased NAND Flash cell has a negative threshold voltage while a programmed cell has a positive threshold voltage (less than 4V) [13]. Write, read, and erase operations of NAND Flash memory are performed by controlling voltage at each node of the array.

Fig. 7. Circuit Schematic of NAND Flash memory

Table 1 presents the comparison between NOR and NAND flash memories. Different figures of merit are considered. In table 1, output parallelism is the number of bits that the memory is able to transfer to the output at the same time where Dword is equal to 32 bits [13]. Read access time is execution time of a read operation. The transfer time which shift read data to output is not included [13]. Read and write parallelisms present the number of addressable bits at the same time during read and program operation respectively. 9

Table 1. Comparison between NOR and NAND Flash memories [13] NOR NAND Memory size ≤ 512 Mbit 1-8 Gbit Sector size ~ 1 Mbit ~ 1 Mbit Output parallelism Byte/Word/Dword Byte/Word Read parallelism 8-16 Word 2 Kbyte Write parallelism 8-16 Word 2 Kbyte Read access time < 80 ns 20 µs Program time 9 µs/word 400 µs/page Erase time 1 s/sector 1 ms/sector

As shown in table 1, NOR Flash memory has a fast read access time but slow programming and erasing time. Moreover, the cell area of NOR Flash memory is large because its architecture is in parallel. NAND Flash memory, on the other hand, has a small area and low cost per bit. However the random access performance of NAND Flash memory is slow because NAND Flash memory doesn’t have the direct contact to the memory cell [14]. In general, the cell size of NOR and NAND

Flash memory is in the 10F2 and 4F2 range respectively where F is the design rule of the chip [14].

NAND Flash memory has a fast page writes capability. It can simultaneously write 4-8 kB so the sequential write throughput of NAND Flash memory is very high. Due to a serial architecture and small area of NAND Flash memory, NAND Flash memory is used for low cost mass storage while NOR Flash memory is used for performance code storage and execution [14]. Read technique is another factor that affects the read access time of NAND and NOR Flash memory. The differential reading method is used to read data in NOR architecture. Its read access time is only a few tens of nanoseconds. The NAND architecture employs Charge Integration technique. Its operation is slower than the differential method. The read access time of NAND Flash is a few tens microseconds which is slower than the read access time of NOR Flash memory [13].

1.2.3. Content Addressable Memory (CAM)

A Content Addressable Memory (CAM) is a special type of memory device that implements a lookup-table function; it compares the input search data against a table of stored data.

The address of the matching data is then returned [17]. CAMs have been used in a variety of applications that require a fast search capability such as parametric curve extraction [18], Hough transformation [19], Huffman coding and decoding [20], Lempel-Ziv compression [21], and image coding [22]. The significant commercial application of CAMs is to classify and forward Internet

Protocol (IP) packets in network routers [17]. A CAM can be classified into two types, binary CAM and ternary CAM (TCAM).

1.2.3.1. Binary Content Addressable Memory (Binary CAM, CAM)

Binary CAM or CAM is an associative memory that stores only two states (i.e. ‘0’ and ‘1’).

It is suitable for applications that require an exact match between the input data and the stored data such as the instruction or data cache. Table 2 presents truth table of the Binary CAM. If store data and search data are the same, match line voltage (VML) remains its value (VDD) and match outcome is presented. If store data and search data are different, match line voltage is discharged to GND and mismatched outcome is presented.

Table 2. Truth Table of CAM Store Data Search Data Match line Voltage Outcome

0 0 VDD Match 0 1 GND Mismatch 1 0 GND Mismatch

1 1 VDD Match

Fig. 8. SRAM-Based Content Addressable Memory (CAM) [23]

Figure 8 presents the SRAM-Based CAM cell. Similar with the SRAM, its data is stored in the form of voltage at node D while searched data is sent to lines SL and SLB. If there is a mismatch between stored data and searched data, match line voltage is discharged to GND and mismatched outcome is presented.

1.2.3.2. Ternary Content Addressable Memory (TCAM)

Ternary Content Addressable Memories (TCAMs) is a type of CAM that stores three states

(i.e. ‘1’, ‘0’, and ‘2’). The additional state ‘2’ is also referred to as the “mask” or “don’t care” state; it is used for matching to either a ‘0’ or ‘1’ in the input search data process. Hence, a TCAM is used for applications that allow both exact and partial matches such as the longest prefix matching in network search engines. A TCAM is essential in an application that requires a high performance search capability in a database, a list, or pattern; in a TCAM, the search operations are performed by comparing in parallel the input (search) data against the entire list of entries stored in memory.

Truth table of TCAM is presented in table 3. Similar with truth table of CAM, if store and search data are matched, match line voltage (VML) remains its value (VDD) and matched outcome is presented. However if store and search data are not matched, match line voltage (VML) is discharged 12 to GND and mismatched outcome is presented. Data ‘2’ presents the “don’t care” state. If data ‘2’ is stored or searched, match outcome is always presented.

Table 3. Truth Table of TCAM Store Data Search Data Match line Voltage Outcome

0 0 VDD Match 0 1 GND Mismatch

0 2 VDD Match 1 0 GND Mismatch

1 1 VDD Match

1 2 VDD Match

2 0 VDD Match

2 1 VDD Match

2 2 VDD Match

Fig. 9 SRAM-Based Ternary Content Addressable Memory (TCAM) [23]

Figure 9 presents the SRAM-Based TCAM cell. Two SRAM are employed to store tri- state data. By controlling voltage at each line of TCAM, write and search operations of SRAM-

Based TCAM cell are operated. Table 4 presents the stored data in each SRAM when TCAM stored data ‘0’, ‘1’, and ‘2’.

Table 4. State of each SRAM in the SRAM-based TCAM cell Store Data SRAM1 State SRAM2 State 0 0 0 1 1 1 2 1 0

As presented in table 4, data ‘1’ is stored in SRAM 1 and data ‘2’ is stored in SRAM 2 of the TCAM cell. Transistors MS3 and MS4 are always OFF. No direct path between match line and

GND is existed. Voltage at match line (VML) remains its value and matched outcome is always presented.

1.3. Emerging Technology

Emerging technologies are new type of devices that used to complement or supersede

CMOS. Nowadays Moore’s law has reached its limit. Different types of emerging devices such as spintronics [4], carbon nanotube field effect transistor [5], metanano material-based optical circuits

[6], and more recently, the memristor [7] have gained attention, thus creating new possibilities for designing innovative circuits and systems. In this section, different types of emerging technologies and their macromodels are considered.

1.3.1. Ambipolar Transistor

Differently from a traditional (unipolar silicon CMOS) device whose behavior (either p- type or n-type) is determined at fabrication, ambipolar devices can be operated in a switched mode

(from p-type ton-type, or vice versa) by changing the gate bias [24, 25]. Ambipolar conduction is characterized by the superposition of electron and hole currents; this behavior has been experimentally reported in different emerging technologies such as carbon nanotubes [26], graphene [27], silicon nanowires [24, 28], organic single crystals [29], and organic semiconductor 14 heterostructures [30]. An ambipolar transistor can be used to control the direction of the current based on the voltage at the so-called polarity gate.

In this book, a 4-terminals ambipolar transistor (Double Gate MOSFET, or DG-FET) is utilized. The second gate (referred to as the Polarity Gate, PG) controls its polarity, i.e. when PG is set to logic ‘0’, the ambipolar transistor behaves like a NMOS; when PG is set to logic ‘1’, it behaves like a PMOS [31]. The symbol and the modes of operation of the ambipolar transistor that used in this book are shown in Figure 10.

a) b)

Fig. 10. Ambipolar transistor, a) Symbol, b) Characteristic

In the technical literature and to the best knowledge of the authors, there is no HSPICE compatible model to simulate the behavior of an ambipolar transistor; therefore, in this dissertation, the model of Figure 11 is utilized at macroscopic level for simulating the characteristics of an ambipolar transistor by using two transmission gates and two MOSFETs.

Fig. 11. Model of an ambipolar transistor

1.3.2. Magnetic Tunnel Junction (MTJ)

The Magnetic Tunnel Junction (MTJ) is a nanostructure device that composed of two ferromagnetic layers (FM) and a thin layer of insulator. Different materials can be used to make ferromagnetic layers such as Cobalt, iron, or nickel. A thin layer of insulator is usually made from alumina oxide or magnesium oxide. This insulator is sandwiched between two ferromagnetic layers to create the tunnel barrier of MTJ [32].

Data in the MTJ is kept in the form of magnetization direction of the ferromagnetic layers.

One of the ferromagnetic layers is a hard layer. It has a fixed magnetization orientation and it behaves as a reference layer of the MTJ. Another ferromagnetic layer is a soft layer. This FM layer has a free magnetization orientation and it behaves as a storage layer of MTJ [32]. The magnetization of the soft layer can be switched between two states, either parallel (P) or antiparallel

(AP) with respect to the reference layer. If the magnetization direction of the soft layer is parallel

(P) with the reference layer, MTJ resistance is low. If the magnetization direction of the soft layer is antiparallel (AP) with the reference layer, MTJ resistance is high. These two configurations are considered as two logic states, state ‘0’ and state ‘1’ as presents in figure 12 [32]. 16

Fig. 12. Resistance variation of the MTJ according to the storage layer magnetization state.

The explanation of MTJ resistance variation is presented in [33]. The ratio between two

MTJ resistance values is considered as a Tunneling Magnetoresistance ratio (TMR).

푅 − 푅 푇푀푅 = 퐴푃 푃 (1) 푅푃

According to [32], the highest TMR value with aluminum oxide insulators are ~70% at room temperature. The highest TMR value that found in literature is around 600% at room temperature and more than 1100% at 4.2K. This MTJ is fabricated by using CoFeB/MgO/CoFeB

[34]. The benefit of large TMR value is the ability to detect store data in the MTJ. For memory applications, a single MTJ or two MTJs can be used to store one bit of binary data. A two MTJs method keeps data in the form of opposite magnetic states. This method receives more attention than a single MTJ memory because it has a large TMR value and simpler read operation [32].

1.4. Hybrid Memory

Hybrid memory is a type of memory device that exploits different characteristics of emerging technologies to supersede or complement CMOS. A hybrid approach relies on partially utilizing CMOS, while introducing emerging technologies for performance improvement. This is very attractive for memories in which the modular cell-based organization of these systems is well suited to new technologies and innovative design paradigms. Different types of emerging technologies have been used to improve performance of the memory cell such as memristor [36], 17 phase change memory (PCM) [37], carbon nanotube field effect transistor (CNTFET) [38] etc. This section presents different types of hybrid memories that found in the technical literatures.

1.4.1. Memristor-based Content Addressable Memory (MCAM)

Memristor is an emerging device that behaves as a variable resistor. Its resistance, also called “memristance”, is changed depending on the direction of current or voltage across it. If data

‘0’ is stored in the memristor, its memristance is high. If data ‘1’ is stored in the memristor, its memristance is low. A CAM using memristors has been presented in [39]. As shown in Figure 13, two memristors are employed as storage elements; seven transistors are used as control elements.

So, the number of transistors in the CAM of [39] is less than for a CMOS CAM (10 Transistors).

Fig. 13. 7T NOR-type memristor-based CAM cell [39]

1.4.1.1. Write Operation

The write operation of 7T NOR-type MCAM in figure 13 operates by sending write voltage at line D and 퐷̅ while setting voltage at line WS to VDD. Voltage at line VL is set at half of supply voltage. The memristor ME1 is programmed based on data in the bitline D while the complementary data is stored in ME2 [39]. When voltage at line WS to VDD, transistors M1 and 18

M2 are ON. Nodes SB and 푆퐵̅̅̅̅ are connected to bitlines D and 퐷̅ respectively. Voltage difference across memristors ME1 and ME2 are existed, memristor M1 and M2 are programmed to its state.

1.4.1.2. Search Operation

The search operation of this MCAM is operated by precharging voltage at match-line (ML) to VDD prior operated any search operation. During search, search data is sent at lines S and 푆̅ while voltage at line SS is set to VDD. Transistors M5 and M6 are ON. When voltage at line VL is set to

VDD, transistors M3 and M4 are ON depending on store data in the memristors. If data '1' is stored in this MCAM, memristor ME1 is RON state (low resistance) while memristor ME2 is in ROFF state

(high resistance). When voltage at line VL is at VDD, voltage at node SB is high while voltage at node 푆퐵̅̅̅̅ is low. Transistor M3 is ON while transistor M4 is OFF. Since transistors M3 and M5 are

ON, state of transistor ML is depended on search voltage (voltage at line S). If data '0' is searched, voltage at line S is at GND (Voltage at line 푆̅ is at VDD). Transistor ML is OFF and matchline voltage remain its value. Mismatch outcome is generated. On the other hand, if data '1' is searched, voltage at line S and 푆̅ is at VDD and GND respectively. Transistor ML is ON and matched outcome is generated.

One of the main advantages of memristor-based CAM (MCAM) is its nonvolatile capability. However, the nonvolatile MCAM has a slow write/read times when compared to a volatile CMOS-based CAM. Since MCAM keeps its data in the form of a memristance (resistance), its write time is dependent on the memristance range, the changing rate of memristor, and the voltage difference across each memristor. Its read time is dependent on the discharging rate of the match line voltage as related to the gate voltage of the transistors ML, M3, and M4 in Figure 13.

Since gate voltage of transistors M3 and M4 (node SB and SB̅̅̅̅) are not constantly set to VDD and

GND, they vary depending on the memristance. The driving capability of M3 and M4 (resulting also in a variable gate voltage for ML) is also varied. Moreover during the read operation, the voltage drop across a memristor slightly changes the value of its memristance; so following 19 multiple consecutive read operations, a refresh operation is required. However, the power dissipation of this CAM cell is less than the power dissipation of a CMOS CAM cell.

1.4.2. MTJs-based Nonvolatile SRAM

MTJs-based Nonvolatile SRAM [40] is a memory application that utilizes MTJs as nonvolatile storage element of SRAM. As presented in figure 14, 4 transistors and 2 MTJs are employed to generate this memory. Its data is stored in the form of MTJs resistance where magnetization direction of MTJ1 and MTJ2 are inverse. The concept of this MTJs-based

Nonvolatile SRAM is discussed as followed.

Fig. 14. 4T2MTJs Nonvolatile SRAM [40]

The structure of 4T2MTJs Nonvolatile SRAM is similar with a resistive load SRAM where its resistors are replaced with STT-MTJs. It’s important to note that the free layers of both STT-

MTJs must connect to the driver NFETs’ sources and drains in order to avoid data violation during data hold (top-pin structure) [40]. The gate voltage of the NFET is high, the current flow of NFET is large enough to change the MTJ from RAP to RP [40].

1.4.2.1. Write Operation

The Write operation of this 4T2MTJs Nonvolatile SRAM is operated by setting voltage at line PL to half of the supply voltage while voltage at line WL is at VDD. Transistors M1 and M2 are 20

ON. MTJs resistances are varied depended on write voltages at bitlines BL and ̅BL̅̅̅. If voltages at bitlines BL and ̅BL̅̅̅ are VDD and GND respectively, MTJ1 is switched to RP (low resistance) and

MTJ2 is switched to RAP (high resistance). Data ‘1’ is written into this memory cell. On the other hand, if voltages at bitlines BL and ̅BL̅̅̅ are GND and VDD respectively, MTJ1 is switched to RAP

(high resistance) and MTJ2 is switched to RP (low resistance). Data ‘0’ is written.

1.4.2.2. Read Operation

The read operation of this 4T2MTJs is similar to the read operation of SRAM. Bitlines BL and ̅BL̅̅̅ are precharged to VDD prior any read operation. During read operation, voltages at lines PL and WL are at VDD. Transistors M1 and M2 are ON, and bitlines voltage (BL and ̅BL̅̅̅) are varied depending on stored data of this memory cell.

If data ‘1’ is stored in the cell, resistance of MTJ1 is low and resistance of MTJ2 is high.

Since resistance of MTJ1 is low, voltage at node SN is close to voltage at line PL. When voltage at line PL is at VDD, voltage at node SN is also at VDD. Bitline voltage (VBL) is at VDD. As present in figure 14, node SN is connected to the gate of transistor M4. When voltage at node SN is at VDD, transistor M4 is ON and node SN̅̅̅̅ is connected to GND. Voltage at bitline ̅BL̅̅̅ is set to GND.

1.4.3. MTJs-based Logic in Memory

Logic-In-Memory (LiM) is a processing paradigm that exploits the large volume of storage found in today’s computing systems for performance improvements of specific computational applications. An application suitable for LiM is image processing; the pixels of an image are stored in memory and data from another image (that could be also stored in memory) is then provided as input for processing.

Fig. 15. a) Input Images b) Output Images when AND, OR, XOR operations between the two input images and the inverse operation of input 1 are executed

Figure 15 shows two input images; output images as obtained by processing on a pixel basis the two input images using different logic operations (such as AND, OR, XOR, and NOT) are also shown. The advantage of LiM is that processing is performed locally in the memory, so not incurring in any delay due to movement of data with the processor. However, only some processing capabilities can be provided in each memory cell and applications that compute based on SIMD, are best fitted for LiM.

LiM has been analyzed also with respect to non-volatile memories such as those utilizing magnetic tunnel junctions (MTJs) [41]. Non-volatile memories can then be utilized together with

CMOS-based gates for LiM.

Fig. 16. General Structure of LiM cell of [41]

Figure 16 shows the general structure of the MTJ-based LiM of [41]; it consists of 3 parts; a cross-coupled keeper (CCK), a logic-circuit tree, and a dynamic current source (DCS). The CCK generates the complementary binary outputs (z and z') in accordance with a magnitude comparison between two current signals (IZ and IZ'). The precise current difference is found by using the feedback circuit. The use of the DCS makes it possible to cut off the steady current from VDD to

GND, thus resulting in a low-power dissipation.

Logic circuits are realized by programming the configuration of the logic-circuit tree [41];

14 transistors, 2 MTJs devices and a capacitor are required for processing by a two-input AND gates and a two-input OR gates. These two different gates are generated by changing the wired- connection points of the logic-circuit tree.

Fig. 17. MTJ-Based LiM when implementing AND Gate (Z = XY) [41]

Figure 17 presents the MTJ-based LiM when AND gate is implemented. One data is kept in the form of MTJs resistance (Y), while another data (X) is inserted as the input voltage of the

LiM cell. Write operation and logic function of this MTJs-based LiM are discussed as follows.

1.4.3.1. Write Operation

In this write operation, stored data in the MTJs (Y and Y’) are written. Voltages at lines

Clk, X, and X’ are at GND. Transistors M5 to M9 are OFF. MTJs are separated from VDD and

GND. To write data into the MTJs devices, voltages at lines BL1 and BL2 are at VDD while write voltages are sent to lines WL1 and WL2 respectively. Transistors M11 to M14 are ON and write voltages from lines WL1 and WL2 are dropped across MTJs. MTJs are programmed into the written data.

The write ‘1’ (‘0’) operation operates by setting voltages at lines WL1 and WL2 are at

GND and VDD (VDD and GND) respectively. Negative (Positive) voltage is dropped across MTJ Y while positive (negative) voltage is dropped across Y’. Y is set to state ‘1’ (‘0’) while Y’ is set to state ‘0’ (‘1’). Data ‘1’ (‘0’) is written into the MTJ cells. 24

1.4.3.2. Logic Function – AND Operation

The logic in memory (LiM) circuit in figure 17 requires 2 clock cycles to operate its logic function. When performing logic function between input data (X) and stored data (Y), transistors

M11 to M14 have to be OFF. These can be done by setting voltage at lines BL1 and BL2 to GND.

During the first cycle, voltage at line Clk is set to GND. Output voltages (Z and Z’) are precharged to VDD while a capacitor is precharged to GND. When Clk is at VDD (second cycle), transistor M9 is ON and the AND operation between data X and Y are generated. Table 5 presents voltages at nodes X and X’, and MTJ’s resistance Y and Y’ when they are in state ‘0’ and ‘1’.

Table 5. Truth table of MTJ-Based LiM when operating AND operation Data X Voltage (V) Data Y MTJ’s resistance Output Output X X’ Y Y’ Z Z’

0 GND VDD 0 High Low GND VDD 0

0 GND VDD 1 Low High GND VDD 0

1 VDD GND 0 High Low GND VDD 0

1 VDD GND 1 Low High VDD GND 1

Both X and Y are in state ‘0’

When X is in state ‘0’, voltages at nodes X and X’ are at GND and VDD respectively.

Transistors M5 and M7 are OFF while transistors M6 and M8 are ON. Since stored data in the MTJ is also in the state ‘0’; resistance of MTJ Y is larger than resistance of MTJ Y’, voltage at line Z is connected to GND via transistor M6 and MTJ Y’. Voltage at node Z is set to GND. As shown in figure 17, node Z is connected to the gate of transistor M3. Transistor M3 is ON and voltage at node Z’ is at VDD. Voltages at nodes Z and Z’ are at GND and VDD respectively. Output of this condition is in the state ‘0’.

Data X is in state ‘0’, Y is in state ‘1’

When data ‘X’ is in state ‘0’ and ‘Y’ is in state ‘1’, transistors M6 and M8 are ON while

MTJ resistance of Y is lower than MTJ resistance of Y’. GND is connected to node Z via transistor

M8 and MTJ Y. Figure 17 shows that node Z is connected to the gate of transistor M3. When voltage at node Z is at GND, transistor M3 is ON. The direct path between supply voltage (VDD) and node Z’ is existed. Voltage at node Z and Z’ is at GND and VDD respectively. Output of this condition is set to state ‘0’.

Data X is in state ‘1’, Y is in state ‘0’

When data ‘X’ is in state ‘1’, voltage at node X is at VDD while voltage at node X’ is at

GND. Transistors M5 and M7 are ON while transistors M6 and M8 are OFF. Since resistance of

MTJ Y is larger than resistance of MTJ Y’, GND is connected to node Z via transistor M5 and MTJ

Y’. Voltage at node Z is at GND while voltage at node Z’ is at VDD. Output of this condition is at state ‘0’.

Both X and Y are in state ‘1’

When data of both X and Y are in state ‘1’, transistors M5 and M7 are ON while MTJ resistance of Y is lower than MTJ resistance of Y’. GND is connected to node Z’ through Y and transistor M7. Voltage at node Z’ is at GND. As shown in figure 17, the node Z’ is connected to the gate of transistor M2. When Z’ is at GND, transistor M2 is ON and voltage at node Z is at VDD.

Output of this condition is at state ‘1’. (Voltage at node Z and Z’ are at VDD and GND respectively.)

[41] implements the full adder by using Logic in Memory (LiM)’s concept as presented in the figure 16. Since the logic function of LiM is varied depending on its wire’s connection. The logic operations of [41] are fixed, so resulting in a considerable circuit complexity (as measured by the number of required transistors). [42] has presented a 2 input look-up table (LUT) by using LiM to address these concerns. This circuit requires 16 CMOS transistors, 4 MTJs devices, 1 reference resistor and a capacitor. Flexibility in logic operations is therefore improved, but the issue of circuit complexity still remains. 26

1.5. Conclusion

This chapter reviews the current memory technologies, emerging devices, and their memory applications. The operations of different memory types are considered such as Random

Access Memory (RAM), Flash Memory, and Content Addressable Memory (CAM). Due to the limitation of the current memory cell such as volatile storage of RAM and CAM, slow write and read operations of Flash memory, and high power dissipation of these memories, the new type of memory types which is called “hybrid memory”, are considered. The hybrid memory utilizes the advantages of CMOS and emerging devices to improve performance of the current memory circuit.

Different types of devices such as Memristor, Magnetic Tunnel Junction (MTJ), carbon nanotube field effect transistor (CNTFET), have been used to complement or supersede CMOS. The review of hybrid memories that are found in the technical literatures are presented at the end of this chapter. 27

II. MEMRISTOR

2.1. Introduction

The Memristor (or memory resistor) is the 4th fundamental element that utilizes for its operation the relationship between flux and electric charge. The memristor is considered to be one of the possible alternative elements to the current CMOS technology. After the invention of memristor in 2008 [7], memristor has received more publicity in the industry and in the academic field. The different types of memristor’s applications such as nonvolatile memory, resistive switching, and high density crossbar array etc. can be found in many technical literatures.

Memristor-based technology provides much better scalability, higher utilization when used as memory, and lower power consumption.

This chapter presents the fundamental of memristor and its memory’s applications.

Memristor’s applications such as memristor-based nonvolatile memory and memristor-based

Ternary Content Addressable Memory (TCAM) are presented in this chapter.

2.2. Fundamental of Memristor

In circuit theory, the memristor (or memory resistor) is the 4th fundamental element that utilizes for its operation the relationship between flux and electric charge. This element was postulated by Leon Chua in 1971 [43] based on the concept of symmetry with other circuit elements, such as the resistor, inductor and capacitor (Figure 18). However, it remained of theoretical interest for more than 30 years till HP Labs provided a physical implementation [7] based on a nano-scale thin film of titanium dioxide (TiO2) for its fabrication.

The relationship between the flux and the electric charge of a memristor is given by [7]

푑휙 = 푀 ∗ 푑푞 (2) where M is the memristance or memristor value (in Ω), ϕ is the flux through the magnetic field, and q is the electric charge, i.e. the electric charge moving through the memristor is proportional to the 28 flux of the magnetic field that flows through the material. Therefore, the magnetic flux between the terminals is a function of the amount of charge (i.e. q) that flows through the device. By differentiating dϕ with respect to time, (2) is equivalent to V=MI, where V and I are the voltage and current across the memristor, respectively [7].

Fig. 18. Relationship between fundamental circuit elements

A memristor operates as a variable resistor whose value depends on the direction of the current or the voltage across it, i.e. if there is a positive voltage across the memristor, its memristance reduces to a small value (given by RON); if there is a negative voltage across the memristor, its memristance increases to a high value (given by ROFF). Hereafter, the memristor is considered as a switching resistance device; as applicable to the HP Labs implementation [7], the rate of change for the memristance is usually linear provided its value is not close to the extreme values. If the memristance value is close to the extreme values, non-linearity is likely to occur for its rate of change [39].

As physical implementation of a memristor, HP Labs has fabricated a device based on a titanium dioxide film sandwiched between two platinum (Pt) electrodes (Figure 19) [7]. A size of

10nm is assumed (as reported in [7]). 29

Fig. 19. TiO2 film sandwiched between two Pt electrodes

As shown in Figure 19, the memristor consists of two parts (or regions), the doped region and the undoped region. The widths of the doped region (w) and the undoped region (L−w) change depending on the direction of the current or voltage across it. Let RON be the resistance for a completely doped memristor and ROFF be the resistance for a completely undoped memristor; so, the current-voltage relationship of a memristor is given as follows.

푤(푡) 푤(푡) 푣(푡) = {푅 + 푅 (1 − )} 푖(푡) (3) 푂푁 퐿 푂퐹퐹 퐿 where w(t) is the width of the doped region, and L is the TiO2 thickness [7]. As function of time, the width of the doped region is given by

푅 푤(푡) = 휇 푂푁 푞(푡) (4) 푣 퐿

-10 2 where μv denotes the average dopant mobility (~10 cm /s/V) [7]. By differentiating w(t) in (4) with respect to time, the rate of change for the width of the doped region is given by

푑푤(푡) 푅 = 휇 푂푁 푖(푡) (5) 푑푡 푣 퐿

Nowadays, different types of material can be used as the intermediate layer of memristor such as organic insulators [44], amorphous silicon [45], ferroelectric material [46], zinc oxide [47], and titanium dioxide [7, 48, 49]. The types of material in the fabrication are selected based on application of memristor [50]. 30

2.3. Applications of memristor

This section presents nonvolatile memory applications of the memristor. The memristor- based Nonvolatile Memory cell and the Memristor based TCAM cell are proposed in this dissertation. These memory cells use memristor as storage element and CMOS transistors as controlled elements. The results of these research show that the proposed cells improve performance of the current memory while reduce the power dissipation of the current nonvolatile memory cells.

2.3.1. Memristor-Based Nonvolatile Memory Cell

The proposed memristor-based nonvolatile memory cell utilized memristor as its nonvolatile storage element, ambipolar transistors as their controlled elements. An ambipolar transistor is a transistor that behaves as an NMOS or a PMOS depending on voltage at its polarity gate (PG). As mentioned previously, the value of the memristance changes based on the voltage across the memristor and the direction of the bias current; so in the proposed cell, ambipolarity is also used to control the memristance and limit its change during a READ operation and bidirectional control in the WRITE.

A novel design of a memristor-based nonvolatile memory cell is proposed in this section.

As shown in figure 20, the new memory cell is obtained by connecting the memristor and two ambipolar transistors in series.

Fig. 20. Proposed memristor-based nonvolatile memory cell 31

In the proposed memory cell, data is sent through the bit line (BL) and the inverse bit line

(BL’), while the word line (WL) is used for line selection. When the memory cell is selected, the voltage at WL is set to VDD for the transistors NT1 and NT2 of the selected memory cell to be ON.

Data is sent through BL and BL’. BL’ and L2 are connected by transistor NT2; L2 is also connected to the polarity gate (PG) of the ambipolar transistors. Therefore in the proposed design, BL’ is used for polarity selection during the READ/WRITE operation. When BL’ is ‘0’, the ambipolar transistors behave as NMOS, then current flows from node A to B (Figure 20), or from the drain to the source of the ambipolar transistors (node A is the source of AMB1 and node B is the drain of

AMB2). If BL’ is ‘1’, the ambipolar transistors behave as PMOS. The current flows from node B to node A, or from the source to the drain of the ambipolar transistors. Hence, the memristor is written along both directions, i.e. the WRITE operation is bidirectional due to the provided ambipolar feature.

To understand the characteristics of the memory circuit of figure 20, let WL be high (logic

‘1’) when selecting the memory cell. The memristance is equal to RON when the boundary of the

TiO2 film in the memristor moves to the right side, this is accomplished by forward biasing the voltage across the memristor. For ROFF as memristance, it is accomplished by reverse biasing the voltage across the memristor to move the boundary of the TiO2 film to the left side. The READ and

WRITE operations of the proposed memory cell are as follows.

2.3.1.1. Write Operation

In the proposed cell, the memristor is written with the data on BL and BL’. Due to the nearly symmetric conductance of the n- and p-types of the ambipolar transistors, the currents that flow in and out of the memristor, are equal.

Write a '0'

To WRITE a ‘0’, the memristance must be biased to ROFF, i.e. BL and BL’ are set to VDD and GND, respectively. The NMOS transistors (NT1 and NT2) can pass the GND value with no significant voltage drop across them, so the GND voltage from bit line BL’ is passed to L2. L2 is connected to the polarity gate of the ambipolar transistors; so, when the voltage at line L2 is GND, both ambipolar transistors behave as NMOS. Due to the voltage drop across transistor NT1, the voltage at line L1 (VL1) is given by

푉퐿1 = 푉퐷퐷 − 푉푁푇1 (6) where VDD is the supply voltage and VNT1 is the threshold voltage of the transistor NT1

Consider the ambipolar transistor AMB1; in Figure 20, the gate and drain of AMB1 are connected together by the line L1, VGS and VDS of AMB1 are equal, therefore AMB1 operates in the saturation region (VDS ≥ VGS - VT and VGS> VT). The voltage at node A of the memory cell in

Figure 20 (VA) is

푉퐴 = 푉퐿1 − 푉퐷푆1 (7) where VDS1 is the voltage difference between the drain and the source of AMB1.

Next, consider the ambipolar transistor AMB2; assume the memristor holds as data a ‘0’

(ROFF as memristance) prior to a WRITE ‘1’ operation. The memristance ROFF is relatively high, so the voltage drop across the memristor (Vmem) is also high, thus resulting in a high voltage difference between nodes A and B, where VA is given in (6) and VB is nearly GND. For AMB2, VGS2 is also relatively high (nearly VL1), while VDS2 is relatively low (nearly zero); so, AMB2 operates in the linear region. Due to the voltage drop across the memristor, the boundary of the memristor changes and its memristance is reduced during the WRITE ‘1’ operation. When the memristance is reduced,

Vmem is also reduced. The voltage at node B is increased when reducing the voltage difference across the memristor. The voltage at node B is increased until AMB2 operates in the saturation region (VDS2 ≥ VGS2-VT2), where VDS2 is the voltage difference between node B and L2, VGS2 is the 33

voltage difference between lines L1 and L2 and VT2 is the threshold voltage of AMB2. So the total current that pass through AMB1, AMB2 and the memristor is suddenly increased; then VB (VA) increases (decreases) at a higher rate as shown in Figure 21.

Fig. 21. Plot of voltage, resistance (y-axis) and time (x-axis) for the proposed memory cell

Fig. 22. Plot of current, resistance (y-axis) and time (x-axis) for the proposed memory cell

Figure 21 shows the plot of the voltage at nodes A and B, and the memristance of the memory cell versus time (ns). As explained previously, the voltage at node A (B) is slightly decreased (increased) at the beginning; however when the memristance reduces and VDS2 ≥ VGS2-

VT2, AMB2 operates in the saturation region, and the voltage at node A (B) is suddenly decreased

(increased).

Consider the current flowing through the memristor (Imem); Figure 22 shows that Imem is related to the memristance of the proposed memory cell (Rmem) and the voltage difference across the memristor (i.e. VA and VB). When the voltage at node A (B) of the proposed memory cell is suddenly decreased (increased), the current increases and the memristance of the proposed memory cell is switched to the ON state. Figure 22 shows that the ambipolar transistor provides excellent control over this process as the memristance is kept within the desired range following the initial increase and decay of the current.

Write ‘1’

The process for writing a ‘0’ is similar to the one for writing a ‘1’, but with the inverted logic value. For writing a ‘0’, the memristor must be in the ROFF state. The bit line (BL) is at GND

(i.e. 0), while the inverse bit line (BL’) is at VDD (i.e. 1). As BL’ is at VDD, the ambipolar transistors behave as PMOS and are ON (because BL is ‘0’). Then the memristor is set to the ROFF state, because the voltage at node B is higher than the voltage at A (or, the current flows from L2 to the drain of AMB2)

2.3.1.2. Read Operation

Recall that the memristor will change its memristance value if there is a current or voltage across it. To prevent this from happening, the READ operation must be fast. [39] has shown that if the timing for the voltage drop across the memristor (so corresponding to the READ time)is less than 12ns, then the memristance will not change. In the proposed cell, the READ operation occurs 35

by precharging the bit lines (BL and BL’) to VDD and GND respectively, then WL is set high.

Similar to the WRITE ‘1’ operation, both AMB1 and AMB2 are ON during the first part of the

READ operation. When both ambipolar transistors (AMB1 and AMB2) are ON, both bitlines are connected through the ambipolar transistors and the memristor; so, the voltages of the bitlines tend to balance their values, i.e. the voltage of BL is transferred to BL’. When the voltage of BL’ increases to a value higher than the threshold voltage of the polarity gate of the ambipolar transistor, the ambipolar transistors behave as PMOSs, i.e. L1 and L2 are disconnected, the voltage difference across the bitlines hereafter does not affect (as nearly isolated) the value of the memristor

(memristance). Moreover, the voltage difference between BL and BL’ is dependent on the data in the memory cell, i.e. the memristance value. If the memristance of the memory cell is ROFF, at the beginning of the READ operation, the voltage difference between BL and BL’ is higher than when the memristance of the memory cell is RON; this occurs because when the memristance value is high, the transfer of the voltage from BL to BL’ is more difficult than at a low memristance value.

After performing a READ operation, the voltage difference between BL and BL’ for ROFF is less than for RON because the ambipolar transistors are OFF and the voltage from BL and BL’ decreases down to zero. For RON, the voltage from BL is easier to transfer to BL’, so the voltage difference between BL and BL’ balances fast. When the ambipolar transistors operate as PMOS and they are

OFF, the bitline voltages (i.e. BL and BL’) decrease. Since the values of the voltages of BL and

BL’ are close, then the voltage difference between BL and BL’ for RON is nearly constant.

This is plotted in Figure 23; the data of the memory cell can be found by considering the voltage difference between the bitlines. A sense amplifier faster than 5ns will be capable to detect the voltage across the memristor (as corresponding to RON or ROFF).

Fig. 23. Plot of voltage difference between the bitlines (y-axis) and READ time (x-axis) for the proposed memory cell

2.3.1.3. Simulation Results

HSPICE [51] has been used to simulate the proposed memory cell; the memristor model from [52], (with a memristance range of 100-19kΩ) is employed. The macroscopic model of Figure

11 is utilized for an ambipolar transistor; the transistor sizes are then adjusted to generate the symmetric conduction between the PMOS and NMOS behaviors. In this research, the CMOS transistors of the macroscopic model of Figure 11 have a feature size of 32 nm [53]. The circuit is then designed by setting Leff = 12.6nm, Vth = 0.16V (NMOS) or -0.16V (PMOS), VDD = 0.9V, and

Tox = 1nm. The WRITE driver of Figure 24 [54] is utilized for the two memory operations (READ and WRITE).

Fig. 24. Driver circuit for WRITE and READ operations

Write Operation

In this section, the WRITE operation is simulated; recall that the memristance retains its value when the timing of the voltage difference across the memristor is less than 12ns [39]. For simulation, the WRITE time of a memristor can be found by considering the memristance of the memory cell. The WRITE time is the time to fully bias the memristor to its desired state, so it is a function of the value of the memristance and its range. Figure 25 shows the plot of the memristance range (x-axis) versus the WRITE time (y-axis) of the proposed memory cell when using the model of [52]. Figure 25 shows that by increasing the memristance range, the WRITE time is increased too; the WRITE time is 219ns for the considered memristor (with a memristance range of 100-

19kΩ) [7].

Fig. 25. Plot of WRITE time (ns) vs memristance range (kΩ)

Read Operation

Using a driver circuit (Figure 24), when the voltage at node IN is VDD, WE is high (VDD),

BL and BL’ are precharged to VDD and GND, respectively prior to the READ operation. Next, WE is low to isolate the input voltage from node IN and the bit lines (BL and BL’). So, WL is high to start the READ process. 38

Simulation of the READ operation (Figure 23) shows that the voltage difference between BL and

BL’ for the ‘1’ (RON) and ‘0’ (ROFF) states is not the same; for a 1ns READ operation, the voltage difference between BL and BL’ in the ‘1’ state (RON) is about 0.1205V, while it is 0.5817V for the

‘0’ state (ROFF). If the READ operation is slower than 5ns, the ambipolar transistors are OFF and the voltage difference between the bitlines (BL and BL’) for the‘1’ and ‘0’ states (RON and ROFF) will change again; the voltage difference between BL and BL’ at 12ns is 0.1167V for the ‘1’ state

(RON) and 0.0303V for the ‘0’ state (ROFF).

Even though the memristance of the memory cell does not cause a state change, in some cases (such as for a relatively slow READ operation or by selecting the wrong threshold voltage for the ambipolar transistor when operating as a PMOS), the memristance may slightly change its value during a READ ‘0’ (ROFF) operation. After precharging BL and BL’, every time the memory cell is read, it is slightly biased to the ‘1’ state (RON), i.e. only if it is already in the ‘1’ state, its value remains unchanged.

Fig. 26. Plot of voltage difference across memrisitor vs Read Time

The change of the memristance must consider the voltage difference across the memristor during a READ operation. As shown in Figure 26, when in state ‘0’ (ROFF), the voltage drop across 39 the memristor is high at the beginning of the READ operation; then it gradually decreases and drops to 0V at 40ns. The memristance does not change its value if the READ time is faster than 12ns [39]

(Figure 26); else, there is a voltage drop across the memristor resulting in a change in memristance for a READ operation in the ‘0’ state (ROFF).

However, the model [52] cannot simulate the timing when the memristance value remains unchanged (i.e. when the time of the voltage difference across the memristor is less than 12ns).

Therefore in this paper, the READ time is established by considering the rate of change in memristance from ROFF to RON (rather than the so-called threshold time [39]. Let the threshold time of the memristor be defined as the time needed to reach the threshold level. Consider the scenario when the memristance value is unchanged (i.e. the READ time is less than 12ns)). Therefore, a

R refresh operation is required. The value of OFF is chosen as the threshold level for the memristance 2 in the READ operation and for a refresh operation to take place.

Fig. 27. Memristance of the proposed memory cell (y-axis) vs time (x-axis) for consecutive WRITE ‘1’ and‘0’ operations

As shown in Figure 27, the shaded region corresponds to the WRITE time (TW) when the memristance is changed from ROFF to RON. The memristance is slightly changed every time that a 40

READ operation occurs; then following multiple and consecutive READ operations, the state finally changes, i.e. from ‘0’ to ‘1’ (ROFF to RON).So, the READ time is given by

푇 푇 = 푊 (8) 푅 2푁

Where TR is the READ time, TW is the WRITE time, and N is the number of consecutive READ

R operations. By considering OFF as the threshold level of the memristance, half of the WRITE time 2

T ( W) corresponds to the time to reach the threshold level, i.e. when the memristance reaches the 2 threshold level, a refresh operation is required. To reach the threshold value while still utilizing

[52], simulation is performed by considering the change in memristance for each consecutive

READ operation (as worst case condition). Note that the time to reach the threshold level is dependent on using values of ROFF and RON that preserve linearity in the memristor characteristics

[7].

Refresh Operation

In the ‘0’ state (ROFF), the READ operation slightly changes the memristance value. If the

READ operation is performed N consecutive times, its memristance value decreases to cause a possible change of state, i.e. from ROFF to RON by reaching the threshold level depending on the

READ time and the number of consecutively performed READ operations. Therefore, the relationship between the READ time and the reduction of memristance during a READ operation must be established. As shown in Figure 28, when the READ time is increased, the memristance is decreased prior to reaching the saturation region (in which case the memristance is constant). The simulation results of Figure 28 show that if for example, the READ operation takes nearly 50ns, the memristance for the‘0’ state (ROFF) is reduced by approximately 0.1447 kΩ (i.e. 18.96k –

18.8153k). If ROFF is 19kΩ, the memristance will be reduced to the value of RON after 130 consecutive READ operations (hence, a state change will occur). 41

Fig. 28. Memristance of the proposed memory cell (y-axis) vs READ time (x-axis) for a READ operation

Fig. 29. READ time of the proposed memory cell Vs Number of Consecutive READ Operations for state change

Figure 29 presents the plot of the READ time versus the number of consecutive READ operations such that the state ‘0’ of the proposed memory cell will not change to state ‘1’ (by

R considering OFF as the threshold level separating state ‘0’ from state ‘1’). The simulation results 2 42 of Figure 29 show that the READ time must be fast to have a high number of consecutive READ operations prior to causing a state change.

A refresh operation is required to avoid the state change due to consecutive READ ‘0’ operations. The refresh operation is based on setting the threshold resistance at half of the ROFF value and using a comparator, i.e.to compare the memristance of the memory cell with the threshold value. The memristance of the memory cell is detected from the voltage difference between BL and BL’ (VDIFF); let the memristor be in the ‘0’ state (ROFF) and VDIFF be the voltage such that the memristance of the memory cell is less than the threshold resistance. A refresh operation starts by re-writing a ‘0’ to the memory cell. In the proposed memory cell, the READ operation is affected only for the‘0’ state (i.e. the ROFF value) because the voltage difference between the bitlines biases the memristor to RON.

Transistor Sizing

Consider next the size of the transistors; tradeoffs are possible between transistor size

(NT1, NT2 and the ambipolar transistors), WRITE time, and the number of consecutive READ operations for assessing the impact of the change in memristance. For a large transistor size, the voltage from BL or BL’ is biased faster to the memristor, thus resulting also in a fast WRITE time.

In this case, the number of consecutive READ operations for a state change is also small, because there is a high voltage across the memristor, i.e. a high rate of change is applicable to the memristance.

Fig. 30. Plot of WRITE and READ times Vs transistor size (NMOS)

Figure 30 shows the plots of WRITE and READ times (ns) versus transistor size (nm) when the number of consecutive READ operations is fixed at 100. This graph shows that if the transistor size increases, the WRITE time decreases because the NMOS transistors allow a higher bias from the bit lines (BL and BL’) to the memristor. The READ time is nearly 200 times less than the WRITE time and has the same dependency with transistor size; by increasing the transistor size, the voltage across the memristor will increase, and the rate of change for the memristance will also be high. Therefore, by considering that the memory cell must be read for nearly 100 consecutive times before changing its state, the utilization of small sized (nanometric) transistors results in slow

WRITE and READ operations (Figure 30). So, the data in the memory cell can be read more times prior to the point at which the memristor changes its state, as shown previously by the plot between the READ time and the number of consecutive READ operations (Figure 29).

Moreover, the proposed memory cell is analyzed with respect to its CMOS feature size; designs using 32nm, 45nm, and 65nm [53] have been simulated. Two values of supply voltage (i.e.

0.9 and 1.0 V) have been used. The results are given in Table 6.

Table 6. WRITE/READ times at different supply voltages and feature sizes

32 nm 45nm 65 nm VDD(V) 0.9 V 1 V 0.9 V 1 V 0.9 V 1 V WRITE time (ns) 219 210 260 245 290 275 READ time (ns) 1.095 1.05 1.30 1.225 1.45 1.375

Table 6 shows that when the feature size is reduced, the WRITE and READ times decrease significantly too; moreover, the supply voltage significantly affects the WRITE and READ times.

Note that in Table 6 the READ time corresponds to the largest time with no state change incurred over 100 consecutive READ operations. By increasing the supply voltage, the WRITE time decreases; at a low supply voltage, the voltage across the memristor is also low and the change of memristance state can occur at a slower rate. Consider the number of consecutive READs prior to requiring a refresh operation; the READ time of a cell at a low supply voltage must be sufficiently long to move the boundary of the TiO2 film to the same point as for a cell with a higher supply voltage (and with a fast READ time), i.e. after many consecutive READ operations, the memristance of a memory cell at a lower supply voltage (and slower READ) is equal to the memristance of a memory cell at a higher supply voltage (and a faster READ). This feature is employed in simulation to assess the READ time for reaching the threshold level of the memristance, i.e. if the READ time is faster than the one specified in Table 6, the memristor value

(memristance) can be retained longer and the number of consecutive READ operations before reaching the threshold level is also higher (more than 100 for the considered cell).

Power

Next, consider the power dissipation of the proposed memory cell; the memory cell consists of a small number of transistors, i.e.2 NMOS transistors and 2 ambipolar transistors. Also, the proposed circuit does not require standby power to retain the memristance value, and there is no direct path connecting VDD to GND. However, the power dissipation due to switching is still 45 present (also referred to as dynamic power dissipation). This type of power dissipation is dependent on the clock frequency and is given by the well-known equation.

1 푃 = 퐶푉2푓 (9) 퐷푦푛 2 푐푙푘 where PDyn denotes the dynamic power dissipation, V is the supply voltage (i.e. VDD), C is the capacitance and fclk is the clock frequency. HSPICE has been used to simulate the power dissipation of the proposed memory cell using the macroscopic model presented previously. The simulation results show that the average power of the proposed memory cell is only 5.9% of the power in the

MCAM cell of [39] (also a memristor-based design); the main reason for the reduction in power dissipation compared with [39] is that the supply voltage of the proposed memory cell (0.9V) is significantly less than the MCAM cell (3V).

2.3.1.4. Comparative Discussion and Evaluation

In this section, a comparative discussion and evaluation between the proposed memory cell and the MCAM of [39] are pursued. It should be noted that in [39], the MCAM has a different functionality because the search data operation is needed during the READ operation, i.e. if the stored data is matched with the search data, the match line voltage will be discharged, else the original value will be preserved. As the proposed memory cell is binary, then for a READ operation, the stored value shows as a voltage difference between the bitlines. [39] requires two voltage

VDD sources, VDD and (VDD= 3V). This is significantly higher than the supply voltage of the 2 proposed cell (0.9-1.0 V as based on CMOS scaling in Table 6). Simulation shows also that at the same supply voltage (i.e. 0.9V), the proposed memory cell requires only 33.037% of the power of

[39]. This is a direct result of the dynamic power dissipation of the proposed memory cell and the lower number of components in the circuit, i.e. 4 transistors and 1 memristor, while [39] uses 7 transistors and 2 memristors for a 7-T NOR type, and 5 transistors and 2 memristors for a 5-T NOR type. 46

As for the memory operations, using the same transistor size and memristor model, the simulation results show that the WRITE time of the proposed memory cell is slower than [39] because the voltage across the memristor of [39] (1.5V) is higher than in the proposed memory cell

(0.9V), i.e. the rate of change of the memristance in the MCAM [39] is faster than in the proposed memory cell. However at the same supply voltage (0.9V), the simulation results (Table 7) show that the WRITE time of the proposed memory cell is slightly slower than the MCAM [39]. This occurs because in the proposed memory cell, the voltage drop across the transistors during a

WRITE operation is high; so, the voltage across the memristor of the proposed memory cell is less

V than half of the supply voltage. In the MCAM [39], the supply voltage ( DD) is provided to the 2 memristor for the WRITE operation; so the WRITE time of the MCAM [39] is not significantly affected by the voltage drop across the transistor. By reducing the voltage drop across the transistors, the WRITE time of the proposed memory cell can be improved with respect to the

MCAM [39].

Consider next the READ time. An MCAM needs to initially compare its state with the search data as part of its operation; therefore for state checking, the match line of the MCAM will either be discharged or keep its value and output as outcome a match or mismatch signal with the search data. So, the READ time of the MCAM is slower than the proposed memory cell, because the READ time of the proposed memory cell is dependent on the voltage difference between the bitlines (BL and BL’) only. Based on the simulation results of Figure 23, the voltage difference between the bitlines of the proposed memory cell can be used to check the state of the memristor at a READ time of 1ns (the READ time of the MCAM [39] is nearly 12ns).

Next the ability to retain the memristance value during a READ operation is considered.

As explained in Section 2.3.1.2, when the proposed memory cell is read with a few nano-seconds delay, the ambipolar transistors are turned off and the voltage across the memristor drops to zero.

Hence, the memristance of the proposed memory cell changes only when the voltage drop across 47 the memristor is functionally needed. For the MCAM cell [39], during the search operation, the

V supply voltage ( DD) is biased directly to the memristor to check its state and keep its value until 2 the search operation is completed. The search operation of the MCAM cell [39] (12ns) is slower than the READ operation of the proposed memory cell (1ns), so the memristor of the MCAM cell

[39] needs to be refreshed more frequently than for the proposed memory cell. Table 7 shows the comparison for the READ/WRITE times between the proposed memory cell and the MCAM of

[39] at different values of supply voltage.

Furthermore, the proposed memory cell is compared with NAND and NOR CMOS-based flash memories (Table 8). Based on both predicted and recently manufactured NAND and NOR flash memories reported in [55 - 59], it’s shown that the READ and the WRITE times as well as the WRITE and READ operating voltages of the proposed memory cell are significantly better than for NOR/NAND flash memories.

Table 7. Comparison of proposed memory cell with MCAM of [39] at same parameters and operating voltage

Proposed Memory MCAM [39] VDD (V) 0.9V 3V 0.9V 3V WRITE Time (ns) 219 60.1 201 51.2 READ Time (ns) 1.095 0.3005 12 12

Table 8. Comparison of proposed memory cell with NAND/NOR flash memories [55-59]

Proposed NAND Flash NOR Flash [55,

Memory [55, 56, 58] 57, 59] WRITE/Erase Time 219 ns 1/0.1ms 1µm/10ms READ Time 1.095ns 0.1ms 15ns WRITE Operating Voltage 0.9V 15V 10V READ Operating Voltage 0.9V 1.8V 1.8V

2.3.2. Memristor-Based Ternary Content Addressable Memory (TCAM) cell

In this section, a new hybrid design for a ternary CAM (TCAM) that utilizes both

MOSFETs and memristors is proposed. The TCAM cell requires two memristors in series to perform the traditional memory operations (read and write) as well as the search and matching operations for TCAM. Due to its non-volatile characteristic, the memristor can be used as storage device. This new TCAM cell is shown in Figure 31. In Figure 31, the three states of the TCAM are defined using 2 memristors as follows.

Fig. 31. Proposed TCAM design using memristors

 For state ‘0’, both memristors must be fully biased to the ROFF state. The memristance range

between RON and ROFF is assumed to be large (as experimentally found in [36]), so if both

memristors are in the ROFF state, then the total resistance of the memory cell is 2ROFF.

 In state ‘1’, both memristors must be in the RON state. So, the total resistance of the memory

cell is 2RON i.e. a very low value compared with the resistance in state ‘0’. 49

 For state ‘2’ (i.e. the don’t care state), one memristor must be in the RON state, while the

other memristor must be in the ROFF state. Therefore, the total resistance of the TCAM cell

in state ‘2’ is ROFF + RON. As the value of ROFF is significantly larger than RON, the total

resistance of state ‘2’ TCAM is approximately equal to ROFF, i.e. a value in the middle of

the range between those for state ‘0’ and state ‘1’.

Next the detailed treatment of the write and matching operations of the proposed TCAM cell are presented.

2.3.2.1. Write Operation

The proposed TCAM cell has two memristors connected in series. The write operation consists of two distinct halves. In Figure 31 the write line (WL) is high during the write operation, data is provided through bit line 1 (BL1), bit line 2 (BL2), and input 3 line (in3) as follows.

Write ‘0’

To write a ‘0’, both memristors have to be in the ROFF state. Then, WL must be enabled

(ON or high), while BL1 and in3 are low and high, respectively. During the first half of the write operation, BL2 is low and therefore mem2 is in the ROFF state. In the second half of the write operation, BL2 is high for mem1 to be in the ROFF state. Hence at completion of this process, both memristors are in the ROFF state.

Write ‘1’

For writing a ‘1’, both memristors must be in the RON state. This is similar to the write ‘0’ operation; so, WL is high. BL1 is also high, while in3 is low. BL2 is low during the first half of the write operation, so that mem1 is in the RON state. During the second half of the write operation,

BL2 is high such that mem2 is in the RON state also. Hence, both memristors are in the RON state.

Write ‘2’

In this case, one memristor must be in the RON state while the other memristor must be in the ROFF state. So the write line is high, while BL1, BL2, in3 are low, high and low respectively.

Therefore, mem1 is in the ROFF state and mem2 is in the RON state.

2.3.2.2. Search Operation

The search operation in a TCAM cell checks whether there is a match between the searched

(provided as input) and stored data. Two match lines (MLL and MLR) are used (Figure 31); these two lines are shown to better understand the operations of the proposed TCAM cell and the discharge process; in practice these two lines can be combined into a single line. The search operation starts by precharging the voltage on MLL and MLR to high. Then, the searched data is input through BL1 and BL2. An input is provided at in3 to compare the data stored in the TCAM cell with the searched data. If the data stored in the TCAM cell is equal (matched) to the searched data, the match line is discharged. Else (no match), its voltage is kept unaltered.

Search ‘0’

MLs must be precharged to VDD prior to starting the search operation. For the search ‘0’ operation, BL1 and BL2 are high and low respectively, i.e. ML1 is ON and MR1 is OFF. Then, the data input is placed through in3 to check the state of the TCAM cell. If the state of the TCAM cell is matched with the searched data (i.e. the TCAM state is ‘0’ or ‘2’), MLL will be discharged.

However, if the TCAM cell state is ‘1’ (no match with the searched ‘0’), MLL remains the same and MLR is not affected by the search ‘0’ operation.

The search operation can be better understood by considering Figure 31; each memristor is fully biased to its required state (RON or ROFF). When a memristor is in the RON state, the voltage drop across it has a very low value (especially when compared with the ROFF state). So, consider in

Figure 3 the scenario when mem1 is in the ROFF state, and mem2 is in the RON state. During the 51

search operation, a high voltage (VDD) is applied from in3 to both mem1 and mem2. Since mem2 is in the RON state, the voltage drop across mem2 has a low value and the voltage at node Y (VY) is approximately equal to VDD. However, mem1 has a very high value (19kΩ in this case), so the voltage at node X (VX) will slightly increase. As the search time is small, this increase is not significant. The following cases can be distinguished.

 If the TCAM cell is in state ‘0’, both memristors must be in the ROFF state, VX and VY are

very low (i.e. ML2 is ON and MR2 is OFF). For the search ‘0’ operation, ML1 is ON and

MR1 is OFF, a direct path exists from MLL to GND and MLL is discharged.

 For state ‘2’, mem1 must be in the ROFF state and mem2 must be in the RON state, so VX

and VY are low and high respectively. Then, ML2 and MR2 are ON, a direct path from VDD

to GND exists via ML1 and ML2; MLL is discharged.

 For state ‘1’, both memristors must be in the RON state; so during the search operation, VX

and VY are high. Then, ML2 and MR2 are OFF and ON respectively. As MR1 is OFF,

there is no direct path from VDD to GND; MLL and MLR retain their values, as result of

the no-match.

Search ‘1’

For the search ‘1’ operation, BL1 and BL2 are low and high respectively. So, ML1 is OFF and MR1 is ON. As mentioned previously, if the TCAM state is matched with the searched data,

ML is discharged, else no change will occur.

 If the TCAM cell is in state ‘0’, both memristors are in the ROFF state; therefore, VX and

VY are very low (ML2 is ON and MR2 is OFF). So, if the TCAM cell is in state ‘0’, then

there is no direct path from VDD to GND (i.e. no match is found).

 If the TCAM cell is in state ‘1’ or ‘2’, mem2 is in the RON state and VY is high. Therefore,

as MR2 is ON, there is a direct path from MLR to GND, thus causing MLR to discharge.

Search ‘2’

For the search ‘2’ operation, the result is always a match because it is the “don’t care” state.

So, both BL1 and BL2 are high and ML1 and MR2 are ON. A direct path exists from the MLL and

MLR to GND, thus always resulting in a match.

2.3.2.3. Simulation Results

In this section, the performance evaluation of the TCAM cell of Figure 31 is presented using HSPICE at 32nm CMOS technology. The model of [52] is employed for the memristor with a memristance range of 100-19kΩ.

Write time

The write time is the time for the memristor to be in the desired state. By setting the voltage across the memristor to a constant value (equal to 0.9 V), it has been found that the time for fully biasing a single memristor to its state is approximately 200 ns. To fully charge both memristors, the write time can be estimated as follows (under the assumption that the voltage drop across transistors M1 or M2 is given by 0.45V).

Write ‘0’ Operation

Fig. 32. a) First Step of Write ‘0’ Operation b) Second Step of Write ‘0’ Operation 53

For the write ‘0’ operation, both memristors must be in the ROFF state. Figure 32 shows the voltages of the TCAM cell of Figure 31 when the write ‘0’ operation is performed. There are two steps for writing a ‘0’: (1) the first step is used to bias mem2 to ROFF, (2) the second step is used to bias mem1 to ROFF. From Figure 32, the time required for both memristors to be in the correct states is given by 600 ns (200 ns for the first step and 400 ns for the second step). However this is a rather pessimistic estimate; as shown in Figure 32, during the write ‘0’ operation, VY is equal to 0.45V due to the voltage drop across M2 (Figure 31). Hence, the write ‘0’ operation can be accomplished by using only the second step, because the voltage drop across each memristor is equal to 0.45V, and both memristors are biased to the ROFF state. In this case, the write time is 400ns. However, if the voltage drops across a transistor is reduced (VY is increased to 0.72V), the time for the write

‘0’ operation increases. The first step of the write ‘0’ operation takes 200ns, while the second step takes 280ns for writing to mem1 and mem2 respectively; so, the total time of the write ‘0’ operation is nearly 480ns. However, by using step 2 for the write ‘0’ operation, the times to bias mem1 and mem2 are 280ns and 520ns, i.e. the total write time of this operation is 520ns.

Write ‘1’ Operation

Fig. 33. a) First Step of Write ‘1’ Operation b) Second Step of Write ‘1’ Operation

Similarly to the write ‘0’ operation, there are also two steps for the write ‘1’ operation. As shown in Figure 33, the first step is used to bias mem1 to RON, while the second step is used to bias mem2 to RON. Due to the voltage drops across M1 and M2 in Figure 31, VX and VY are less than

0.9V. When BL1 and BL2 are both high (0.9V), VX and VY drop to 0.45V. The time for the write

‘1’ operation is 800ns (400ns for the first step and 400ns for the second step). To reduce the time for the write ‘1’ operation, the voltage drop across M1 and M2 must be reduced, i.e. increasing VX and VY. When BL1 and BL2 are high, VX and VY are both equal to 0.72V, then the time is 280ns for each step, i.e. the total write time of this process is 560ns.

Write ‘2’ Operation

Fig. 34. Write ‘2’ Operation

For the write ‘2’ operation (as shown in Figure 34), mem1 must be in the ROFF state, while mem2 must be in the RON state. Since there is a voltage drop across M2, VY is equal to 0.45V. The time for the write 2 operation is 400ns. So, the voltage drop across M2 must be decreased, and VY is increased. If VY is equal to 0.72V, the write time of this operation is 280ns. 55

From the above discussion, if the voltage drops across M1 and M2 are both 0.45V, the write time (TW) of the proposed TCAM cell is at most 800ns. However, as the voltage drop is equal to 0.18V (0.9-0.72), then the write time is 560 ns.

Table 9. Write time (TW) of proposed TCAM cell

Current State Next State TW (ns) 0 1 400 0 2 200 1 0 400 1 2 200 2 0 200 2 1 200

It has been found that the time for changing the memristance from RON to ROFF is given by

200ns, while the time for changing from ROFF to RON is given by 165ns. Hereafter, the worst case analysis is pursued, i.e. 200ns is considered. The write ‘0’ (‘1’) operation requires twice this amount, i.e. 400 ns as shown by simulation in Table 9. The maximum write time occurs when the

TCAM cell changes from state 0 to 1 or vice versa, i.e. both memristors must change state, so taking more time than in the other cases.

Threshold Voltage Selection

Consider the threshold voltage of ML2, as related to VX in Figure 31. Simulation has shown that when the TCAM cell is in state ‘1’, both memristors must be in the RON state, i.e. VX is nearly equal to VDD. When the TCAM cell is in state ‘0’ or ‘2’ (the total memristance of these states is very high), VX is just slightly higher than 0V.

For selecting the threshold voltage of ML2, consider VX during a search operation. VX slightly increases when mem1 is in state ‘0’ (the TCAM cell is in state ‘0’ or ‘2’). It is equal to 56

0.899V if mem1 is in state ‘1’ (i.e. the TCAM cell is also in state ‘1’). In the proposed design, the threshold voltage of ML2 is set to 0.735V because during the search operation the increase of VX from 0 to 0.735V is sufficiently large to allow the match line to discharge.

For the threshold voltage of MR2, MLR is discharged if ‘1’ or ‘2’ is searched in the TCAM cell. If the data in the TCAM cell is ‘1’ or ‘2’, mem2 is in the RON state; then during the search operation, VY is approximately equal to VDD (0.899V in this case). So, the threshold voltage of

MR2 can be selected to be any value lower than the supply voltage; hereafter the threshold voltage of MR2 is selected to be 0.735V to allow the match line to discharge as ML2. If the search ‘1’ or

‘2’ operation occurs, MLR is then discharged.

Search Operation

The search operation for the TCAM cell of Figure 31 is simulated. The memristor model of [52] cannot reverse bias and keep the memristor state if the voltage across it is zero; so, a memristance value is directly used in the circuit simulation, i.e. a value of 100Ω for RON and 19kΩ for ROFF (with the threshold voltage as found previously).

Fig. 35. Match line voltage of TCAM in figure 31 during the search ‘0’ operation when TCAM data is state ‘0’

Fig. 36. Match line voltage of TCAM in figure 31 during the search ‘0’ operation when TCAM data is state ‘1’

Fig. 37. Match line voltage of TCAM of figure 31 during the search ‘0’ operation when TCAM data is state ‘2’

Fig. 38. Match line voltage of TCAM in figure 31 during the search ‘1’ operation when TCAM data is state ‘0’

Fig. 39. Match line voltage of TCAM in figure 31 during the search ‘1’ operation when TCAM data is state ‘1’

Fig. 40. Match line voltage of TCAM in figure 31 during the search ‘1’ operation when TCAM data is state ‘2’

Fig. 41. Match line voltage of TCAM in figure 31 during the search ‘2’ operation when TCAM data is state ‘0’

Fig. 42. Match line voltage of TCAM in figure 31 during the search ‘2’ operation when TCAM data is state ‘1’

Fig. 43. Match line voltage of TCAM in figure 31 during the search ‘2’ operation when TCAM data is state ‘2’

Table 10. Simulation results of the search operation in the TCAM cell; D denotes a discharged match line and S denotes a stable (unchanged) match line

Search TCAM State MLL MLR MLL∩MLR Output 0 0 D S D Match 0 1 S S S Not-Match 0 2 D S D Match 1 0 S S S Not-Match 1 1 S D D Match 1 2 S D D Match 2 0 D S D Match 2 1 S D D Match 2 2 D D D Match

The HSPICE simulation results shown of figures 35 to 43 are summarized in Table 10. In

Table 10, MLL and MLR are separate to show the result of the match operation on both sides of the TCAM cell. By combining these two lines together into a single line (i.e. the output of an AND gate with MLL and MLR as inputs), the simulation results show that if the data in the TCAM cell is matched with the input (searched) data, then correctly a match line (MLL or MLR) will be discharged,; else, the match lines will be keeping the values unchanged (as shown in Figures 36 and 38).

Consider next the search time; the search time depends on the discharging rate of the match lines. However, as mentioned previously, during the search operation, if the TCAM cell is in state

‘0’ or ‘2’, VX will increase from 0V to VDD at a rate dependent on the memristance range. Consider state ‘2’ of the TCAM cell during a search ‘0’ operation. In the search ‘0’ operation, BL1 is high and BL2 is low; also, ML1 is ON. To discharge MLL for the outcome of the match operation, ML2 must be ON also. In figure 31, ML2 (PMOS) is ON only when VX is less than the threshold voltage.

So, the match result of this search operation is accurate if and only if the time required for VX to increase from 0 to the threshold voltage of ML2 is less than the time for the match line to be discharged to GND. 62

Table 11. Searching time (TS) of proposed TCAM design

Search TCAM State Output TS (ns) 0 0 Match 7 0 1 Not-Match N/A 0 2 Match 7 1 0 Not-Match N/A 1 1 Match 8 1 2 Match 7 2 0 Match 7 2 1 Match 7 2 2 Match 8

Table 11 shows the search time (TS) in the proposed TCAM cell; the simulation results show that the search time of the proposed TCAM cell is at most 8ns (i.e. less than the 12ns required for reaching the threshold level of a memristor [39]). So during the search operation, the memristors keep their states.

Transistor Sizing

The design of the proposed TCAM cell has been evaluated using different feature sizes.

The simulation results are shown in Table 12 for different technology scaling. At the same supply voltage, the write time at 32nm is less than at 45 and 65nm; moreover as shown in Figure 44, when the supply voltage is increased, the write time of the proposed TCAM cell decreases. Since the voltage across a memristor is increased, the rate of change in memristance is also faster. Therefore, the write time at a higher supply voltage is lower.

As for the search time, the simulation results in Table 12 show that the search time at 32nm is less than at 45 and 65nm; however at the same scaling, the simulation results show that the search time is higher at a higher supply voltage. This occurs because the match lines will take longer to completely discharge. Also, a memristor will slowly change its voltage as result of a search 63 operation; however as shown in Table 12, the search time is less than the time for reaching the threshold level, so a state change cannot occur.

Table 12. Comparison between transistor size and write time for a memristor range of 100 – 19kΩ

Technology VDD (V) Writing time (ns) Searching time (ns) 1 270 6.08 32nm 1.1 210 6.28 1 300 6.70 45nm 1.1 230 7.20 1 330 6.81 65nm 1.1 270 7.30

Fig. 44. Write time (ns) of proposed TCAM cell vs supply voltage at 32nm technology

The simulation results in Table 12 and Figure 44 show that the proposed TCAM cell can be implemented at 32nm with a low write time. Also correct operation is achieved because its search time is faster than the time required for reaching the threshold level of the memristor. The 64 rate of change of the memristors is very low if the supply voltage is reduced at low scaling (the supply voltage of 32nm is less than for 45nm and 65nm). This feature is also advantageous for power dissipation as at 32nm, less power will be also required.

2.3.2.4. Discussion

In this section, several tradeoffs in the TCAM cell design are presented. By using it as a storage element, the state of a memristor may change during the search operation, thus resulting in an incorrect output. To avoid this from resulting in an erroneous outcome, the following tradeoffs must be considered in the design of the cell.

Memristor Range Vs Write Time

Consider the write time, i.e. the time to fully bias each memristor to its state; if the memristor range is large, the write time could be large. Since the memristor can change its state during the search operation the number of search operations could be large too if it has a large memristance range (provided the threshold level of the memristor is not taken into account).

Transistor Size (M1 and M2) Vs Write Time

When the size of these transistors is increased, the voltage drop across them decreases.

Therefore, VX and VY of the TCAM cell in Figure 31 during the write operation increase; hence, the write time decreases.

Transistor Size (M1 and M2) Vs Search Time

When the size of these transistors size is increased, the discharge of the voltage from MLL and MLR to GND will be faster, thus resulting also in a faster search time.

Memristance Range Vs Search Time

The search time corresponds to the time that MLL or MLR will completely drop from its current value to GND; so if the rate of increase of VX during the search operation is higher than the decreasing rate of the match line, the match line will not completely discharge, thus resulting in a wrong output. Hence, the memristance range must be large, such that the rate of increase of VX will be slower (and the match line will have more time to discharge).

Supply Voltage Vs Search Time

If the supply voltage is increased, the search time will also increase as at a higher value of supply voltage, a match line will require more time to fully discharge its value (VDD) to GND.

2.3.2.5. MCAM Operation

In this section, the TCAM design of Figure 31 is extended to MCAM operation and compared with the cell of [39]; a MCAM cell stores data (‘0’ and ‘1’), and searches (for ‘0’, and

‘1’) to perform a match/no-match operation. In this section, two approaches are presented for operating as a MCAM cell.

The first approach utilizes the proposed TCAM cell unchanged, i.e. the same operations as discussed previously are used to store and search ‘1’ and ‘0’ as data. In this case, for writing ‘0’

(‘1’), both memristors must be in the ROFF (RON) state. By using the same search operation as outlined previously, the write time of the TCAM cell is slower than for the MCAM cell of [39], because two clock cycles are required to completely bias both memristors to the desired states (as shown previously).

To improve the write time, a modification is required to the cell; since a MCAM cell stores only a bit (‘0’ or ‘1’), so the second approach proposed for MCAM operation changes the proposed

TCAM cell such that only one memristor is needed. In this case, mem1 must be in the RON state, 66

while mem2 is in a state depending on the write operation. As the memristance of RON is very low, then VX and VY are very close. So for the search operation, VX and VY depend on mem2 as follows.

 If mem2 is in the ROFF state, during the search operation, VX and VY are both low.

 If mem2 is RON, then VX and VY are approximately equal to VDD

Therefore, the MCAM operation can be accomplished by using the search ‘0’ and ‘1’ features of the TCAM.

Fig. 45. Write operation of the proposed TCAM cell when operating as a MCAM cell a) Write ‘0’ Operation b) Write ‘1’ Operation

Figure 45 shows the write operation of the TCAM cell by operating as a MCAM cell. By forcing BL1 to VDD, mem1 is placed in the RON state or kept at the same (constant) value. By adjusting the voltage at BL2 and in3 to control the memristance of mem2, the data (‘0’ and ‘1’) is written in the TCAM. The write time of the proposed TCAM is the same as the MCAM of [39] requiring only one clock cycle.

Next, a comparison is pursued between the proposed TCAM cell when operated as MCAM and the MCAM cell of [39]. 67

Voltage Supplies

푉퐷퐷 The MCAM of [39] requires two voltage supplies (VDD and ) while the proposed 2

TCAM uses only VDD; an additional circuitry for generating half VDD is therefore required for the

MCAM design of [39]. Moreover these cells utilize different voltages for VDD: the supply voltage of the MCAM is to 3V while the TCAM is only 0.9V (based on CMOS technology).

Power Dissipation

The power dissipation of the proposed TCAM is significantly less than the MCAM of [39] due to the lower supply voltage that it is used. Moreover, the number of transistors in the proposed

TCAM cell (6T) is less than the MCAM (7T) thus resulting in a lower power dissipation. When compared with a 5T MCAM, the proposed TCAM cell (6T) uses a lower number of input lines (5 lines) thus resulting in a lower noise during the write and read operations.

Write/Search Times

When comparing the proposed TCAM operation with the MCAM, the write time of the proposed TCAM cell is higher than for the MCAM cell of [39] because two memristors are connected in series. For the search operation, simulation has shown that due to the lower supply voltage, the search time of the proposed TCAM cell (at most 8ns) is less than the one for the MCAM

(12ns). However when comparing the proposed TCAM for CAM operation and the MCAM of [39], by simulating at the same power supply, the write time of the proposed TCAM is less than the

MCAM because the voltage across the memristor is higher than for the MCAM. The write time of the proposed TCAM (when used as CAM) is faster than the MCAM of [39].

2.4. Conclusion

This chapter presents applications of memristor as the nonvolatile storage element. Two types of nonvolatile memory are presented, the memristor-based nonvolatile memory and the memristor-based Ternary Content Addressable Memory (TCAM) cell. The proposed memristor- based nonvolatile memory circuit consists of a memristor, two ambipolar transistors, and two transistors, hence its hybrid nature is proposed. In this cell, the memristor is utilized as storage element due to excellent features such as non-volatility, linearity, low power and good scalability.

The proposed hybrid cell also utilizes ambipolar transistors for the control of the memristance in the operation of the memory cell. This is very important during the READ operation; the ambipolar transistors keep nearly separate the voltages of the bitlines (BL and BL’) and the memristor, such that the memristance of the proposed memory cell is not significantly affected (while still allowing a bidirectional WRITE operation). The hybrid memory cell has been analyzed with respect to the two memory operations (READ and WRITE) and the characteristics of the memristor range for its

ON/OFF states.

Macroscopic models have been utilized to characterize the non-volatile feature of the memory cell (for example a ambipolar transistor is modeled by two transmission gates and two

MOSFETs). In the proposed memory cell, the voltage across the memristor is low, so every time that a READ operation is performed, the memristor slightly changes its value. Therefore a refresh operation may be required for multiple consecutive READ operations. This operational feature is also related to the substantial difference in READ and WRITE times (nearly two orders of magnitude) and the memristance range.

Extensive simulation results using HSPICE have been provided to substantiate the performance of the proposed memory cell; different metrics (READ time, WRITE time, and power dissipation) have also been assessed under different operating conditions (such as by varying feature size and supply voltage). Simulation results show that the READ operation of the proposed memory cell is very fast, requiring low operating voltages with significant saving in power. The 69 results show also that the proposed memory cell is significantly better than NAND/NOR (CMOS- based) flash memories.

Another memristor application that is proposed in this chapter is a memristor-based

Ternary Content Addressable Memory (TCAM) cell. The proposed memristor-based TCAM cell is designed by using memristors as nonvolatile storage elements and CMOSs as controlled elements. Since ternary logic is used, two memristors which are connected in series are utilized to represent each state of the TCAM. The proposed TCAM has been extensively analyzed by considering memristance range, threshold voltage, transistor size and supply voltage with respect to memory operations such as write and search. The proposed memory cell operates robustly and design considerations involving memristance range and threshold selection have been analyzed to achieve fast operation for writing and searching at 32nm feature size. Simulation results using

HSPICE have confirmed that the proposed design offers significant performance improvements compared with other CAM designs utilizing memristors. 70

III. PROGRAMMABLE METALLIZATION CELL (PMC)

3.1. Introduction

Programmable Metallization Cell (PMC) is a device technology that uses the phenomenon of resistive switching in design [60]. It has excellent speed (<10 ns), scalability to a sub-22-nm regime, extremely low power consumption (in nanowatts), good retention and endurance [60]. This chapter presents the fundamental of PMC device, HSPICE macromodel, and memory applications of PMC.

3.2. Fundamental of Programmable Metallization Cell (PMC)

The Programmable Metallization Cell (PMC) also known as the Conducting Bridge

Random Access Memory (CBRAM), or solid-electrolyte memory is a resistive switching memory element based on the migration of metallic ions through a solid electrolyte and the subsequent formation and dissolution of a metallic conductive filament (CF) connecting the two electrodes [61,

62]. The switching process of a PMC device must be considered to better understand its electrical characteristics.

Fig. 46. Switching processes in the PMC a) The CF vertically grows prior to set occurs, b) the CF laterally dissolves prior to reset

The set (OFF to ON state transition) and the reset (ON to OFF state transition) processes of a PMC device are shown in Figure 46.

 Under a positive bias, the top active electrode is oxidized, and the fast metal ions (Ag+ or

Cu2+) drift toward the bottom electrode and form the CF. Thus, the CF vertically grows

until it reaches the top electrode, at which time the set occurs. Following the set, the CF

grows laterally and its diameter continues to increase, because more metal ions are present

around it [60, 61].

 For the reset process, a negative voltage bias occurs across the PMC (Figure 46b); the CF

tends to laterally dissolve, because the enhanced lateral electric field is at the top of the CF

[63]. The reset process is completed when the diameter of the conductive filament shrinks

down to zero at the top electrode. After the reset, the CF vertically dissolves and its height

keeps decreasing.

So, the switching process of a PMC has a transition point that occurs whenever the tip of the CF touches or separates from the top electrode.

The resistance of a PMC is dependent on the CF height (h) and CF radius (r) for finding the ON and OFF-state resistance (Ron and Roff). The OFF state occurs when the tip of the conductive filament is separated from the top electrode; in this case, h is less than the film thickness of the solid electrolyte or the height of the PMC (L). Once h is found, the OFF-state resistance (Roff) is given by the sum of two resistors in series [60].

(휌 ℎ + 휌 (퐿 − ℎ)) 푅 = 표푛 표푓푓 (10) 표푓푓 퐴 where ρon is the CF resistivity, ρoff is the non-conducting solid-electrolyte resistivity, L is the film thickness of the solid electrolyte and A is the area at the bottom of the CF (on the assumption that it is cylindrical before the set process). 72

The ON-state resistance of a PMC (Ron) occurs when the tip of CF touches the top electrode; the resistance value is based on the CF radius (r). As the shape of the conductive filament is conical, then the cell resistance of a PMC in the ON state is as follows

휌 퐿 푅 = 표푛 (11) 표푛 휋푟푅 where R is the radius at the bottom of the CF.

Since h and r vary based on time and the bias voltage across the PMC cell, the evolution rates of the CF height and radius are given by [60].

푑ℎ −퐸 훼푉 = 푣 푒푥푝 ( 푎)푠𝑖푛ℎ ( ) (12) 푑푡 ℎ 푘푇 푘푇

푑푟 −퐸 훽푉 = 푣 푒푥푝 ( 푎)푠𝑖푛ℎ ( ) (13) 푑푡 푟 푘푇 푘푇 where α is a fitting parameter [60], Ea is the activation energy, kT is the thermal energy, vh is the

CF vertical growth velocity, r is the radius of CF at the top of the filament, vr and β are the fitting parameters for the evolution velocity and the electric field dependence respectively, and V is the bias voltage across the PMC [60]. h and r are found based on the evolution rates of (12) and (13) and the PMC resistance is calculated by using (10) and (11).

However, [64, 65, 66] have reported that the variation of the compliance current (Icomp) at the set point can modulate the resistance of the PMC; the relationship between the compliance current and the ON-state resistance of the PMC is given by [60]

퐶 푅푠푒푡 = 푅표푛 = (14) 퐼푐표푚푝

where C (equal to 0.08V) is a fitting parameter. At a larger value of compliance current

(Icomp as given by (14)), metal ions from the top electrode are supplied to the solid electrolyte at a higher rate, then a strong CF is formed when the set occurs. To dissolve the stronger CF during the reset operation, a large current or voltage is needed to reduce the metal ions in the conductive filament so that the PMC can be switched to the OFF state [60].

3.3. Macromodel of PMC

To simulate the electrical characteristics of a PMC, different models have been proposed in the technical literature. [60] has presented a physics-based compact modeling of a PMC in which its resistance is calculated based on (10)-(14). There are considerable concerns with the model of

[60], because [60] is not fully HSPICE compatible. Also, the evolution rates of the CF height (in

(12)) and the CF radius (in (13)) require complex calculations that are not matched with a HSPICE simulation. Moreover, the initial CF radius when the PMC is switched from the OFF to the ON states is suddenly increased in value, while its evolution rate (i.e. (13)) is incorrectly kept at a nearly constant value.

[67] has presented a different PMC model using Verilog-A; this model can be used to generate the I-V characteristics of a PMC, but its resistance is not continuously characterized.

Switching time is not considered in the model [67], its switched operation from the set to the reset states (or vice versa) is only based on the voltage drop across the PMC. Moreover, there is still no

HSPICE compatibility for the model of [67].

3.3.1. Proposed Macromodel

A new HSPICE macromodel of a PMC is proposed in this manuscript. The CF height (h), the CF radius (r) and the state of the PMC are found when considering the CF volume, hence it is a geometry-based model. As shown in the flowchart of Figure 47, the proposed PMC macromodel has two terminals, in and out: in is the input terminal, while out is the output terminal. The switching time of the PMC is found when there is a voltage difference across these terminals. The instantaneous volume of the CF (Vol(t)) is then found from the switching time, based on the values of the instantaneous volume of the CF (Vol(t)), the CF height, the CF radius and the state of PMC are found. Finally, the PMC resistance is calculated based on (10) and (11).

Fig. 47. Flowchart of the proposed PMC Macromodel

Fig. 48. Circuit model of programmable metallization cell (PMC)

Figure 48 presents the circuit model of a PMC, basically it is a variable resistor. The PMC resistance is given by

푅푃푀퐶 = 푅표푓푓푉퐶표푓푓 + 푅표푛(1 − 푉퐶표푓푓) (15) where VCoff represents the state of the PMC, i.e. if the PMC is in the OFF (ON) state, then VCoff is

1 (0). Roff and Ron are the OFF and the ON-state resistances of the PMC (given previously in (10) and (11) respectively). Since the OFF and ON-state resistances of the PMC are based on its CF height (h) and radius (r) (as parameters in (10) and (11) respectively), the largest CF height (hth) and the least CF radius (rth) must be calibrated for switching the PMC resistance from the OFF to the ON states. During switching, the CF height and radius reach the largest (hth) and the least (rth) values respectively. The resistance in the proposed macromodel is made to be continuous by selecting the value of hth in which the OFF-state resistance is close to the ON-state resistance at a

CF radius of rth,. 75

The relationships between the switching time of the set and reset processes and the pulse amplitude are given as follows

푡푠푒푡 = 푡푝 = 훼 ∗ 푒푥푝(훽 ∗ |푉푖푛,표푢푡|) (16)

푡푟푒푠푒푡 = 푡푛 = 훾 ∗ 푒푥푝(훿 ∗ |푉푖푛,표푢푡|) (17)

Where tset or tp is the switching time of the set process that occurs when a positive voltage drop exists across the PMC. treset or tn is the switching time of the reset process that occurs when a negative voltage drop exists across the PMC. Curve fitting is then utilized for the other parameters.

Their values are based on experimental data; if the results of [60] are utilized, the following values are found.

 α and β are the fitting parameters of the set process that are equal to 679.27 and -16.73

respectively.

 γ and δ are the fitting parameters of the reset process that are equal to 149.97 and -14.86

respectively.

The change in the CF volume at each time step (denoted by tstart) can be found by considering the switching time of the set and reset processes, i.e. the time that the CF of a PMC changes from an height equal to zero to a point where the CF radius at the top electrode increases up to a value equal to the radius at the bottom of the CF (R) (or vice versa). In this paper, the shape of the CF is assumed to be conical; it is then converted to a cylindrical form when the CF radius at the top electrode (r) reaches the radius at the bottom of the CF (R). As the metal ions drift toward the bottom electrode at a constant rate, then the changing rate of the CF volume (dVol) is also constant and given by

휋푅2∗ℎ푡ℎ∗푡푠푡푎푟푡 dVolp= (18) 푡푝

−휋푅2∗ℎ푡ℎ∗푡푠푡푎푟푡 dVoln= (19) 푡푛

(18) and (19) give the changing rates of the CF volume during the set and reset processes

(positive and negative). When a positive voltage exists across a PMC, the CF volume is increased; 76 however if a negative voltage exists across the PMC, the CF volume decreases. The instantaneous

CF volume can be found from the changing rate of the CF volume as follows.

푉표푙(푡) = 푉표푙푝푟푒푣 + 푑푉표푙(푡) − 푉표푙푎푑푗 (20) where Volprev is the CF volume at the previous (simulated) time step, Voladj is the adjusted CF volume that is used to control the CF volume (i.e. between 0 and the largest value).

The changing rate of the CF volume is dependent on (18) and (19); so, a circuit is employed to find the changing rate of the CF volume at a specific time (dVol(t)).

Fig. 49. Voltage polarity checking circuit

Figure 49 shows the circuit that is used to check the polarity of the voltage difference across the PMC.

 If a positive voltage is dropped across the PMC, switch swp1 is ON, while switch swp2 is

OFF, the voltage at node p0 is 1V.

 If the voltage difference across the PMC is negative, switches swp1 and swp2 are OFF and

ON respectively, the voltage at node p0 is zero

The changing rate of the CF volume at a specific time (dVol(t)) is given by

푑푉표푙(푡) = 푑푉표푙푝 ∗ 푉푝0 + 푑푉표푙푛 ∗ (1 − 푉푝0) (21)

where Vp0 is the voltage at node p0 of the voltage polarity checking circuit (Figure 49).

Fig. 50. Previous CF volume stored circuit

The CF volume at the previous time step (Volprev) is simulated in the proposed macromodel using the circuit in Figure 50. The voltage source Evolt generates the instantaneous CF Volume (in

(20)), while the initial voltages at nodes vtp1 and vtp2 are given by the initial CF volume of the

PMC (Volini). Two capacitors are employed to store the value of the CF volume at the previous time step (Volprev). Volprev is found by generating a voltage pulse whose value changes at every time step. This is accomplished as follows.

 Switches swv1 and swv2 are ON when the voltage pulse is equal to 0 and 1 respectively.

 The instantaneous CF volume of the PMC is stored at vtp1 and vtp2 based on the value of

the voltage pulse (note that the previous CF volume is found from different nodes).

 If the voltage pulse is 1, the CF volume at the previous time step is equal to the voltage at

vtp1.

 If the voltage pulse is zero, the previous CF volume of the PMC is found from the voltage

atvtp2.

So,

푉표푙푝푟푒푣 = 푉푉푇푃1 ∗ 푉푝푢푙푠푒 + 푉푉푇푃2(1 − 푉푝푢푙푠푒) (22)

Where Vpulse is the pulse voltage that is generated to control switches swv1 and swv2. 78

After finding Volprev and dVol(t), the adjusted CF volume Voladj is considered (based on

(20)) to ensure that the CF volume remains in range.

 When the CF volume is larger than the largest value, Voladj is given by the difference between

the instantaneous CF volume (Volprev + dVol(t), or Vvols) and its largest value (VVols – Volmax).

 If the CF volume is negative, the adjusted CF volume is given by a value that is equal to the

instantaneous CF volume (i.e. VVols).

Hence,

푉표푙푎푑푗 = 푉푎푑푗퐻 ∗ (푉푉표푙푠 − 푉표푙푚푎푥) + 푉푎푑푗퐿 ∗ 푉푉표푙푠 (23) where VVols is the instantaneous CF volume; it is equal to the sum of the previous CF volume and the changing rate of the CF volume at each time step (Volprev + dVol(t)), Volmax is the largest allowed CF volume, VadjH (VadjL) is the voltage to control the adjusted CF volume, it is equal to 1 when VVols – Volmax is positive (VVols is negative); otherwise, it is set to zero. Therefore, the instantaneous CF volume of the PMC (Vol(t)) is given by (20).

After finding the CF volume, the state of the PMC (OFF or ON) can be established. As the shape of the CF is conical, the largest CF volume in the OFF state is equal to the volume of the cone when its height is at the largest value (hth). This is also referred to as the threshold volume of the CF (Volth). If the CF volume is larger than its threshold value, the PMC is switched to the ON state, otherwise the PMC is in the OFF state. (24) gives the threshold volume of the CF as used for the state of the PMC, i.e. by comparing it with the instantaneous CF volume.

1 푉표푙 = 휋푅2ℎ (24) 푡ℎ 3 푡ℎ

Next, the CF height and radius of the PMC are found. In the OFF state, the shape of the CF is conical and its volume is given by (25), while the CF height is given by (26)

1 푉표푙(푡) = 휋푅2ℎ (25) 3

푉표푙(푡)∗3 ℎ = (26) 휋푅2 79 where h is the instantaneous CF height whose value is bound between 0 and the largest CF height

(hth).

However if the PMC is in the ON state, the CF radius (r) must be also considered. In the

ON state, the CF shape is in frustum cone form whose volume is given by (27). The CF radius is found in (28).

휋ℎ 푉표푙(푡) = (푅2 + 푅푟 + 푟2) (27) 3

2 2 2 √3(4휋ℎ푡ℎ푉표푙(푡)− ℎ푡ℎ 휋 푅 )− 휋푅ℎ푡ℎ 푟 = (28) 2휋ℎ푡ℎ

where the CF radius has its least and largest values at rth and R respectively. After the CF height and radius are found, the resistance is calculated using (10) and (11) respectively. Finally, the electrical characteristics of the PMC can be simulated.

Based on (16) and (17), the switching time is still constant when the size of the PMC varies; this is however incorrect. Since the relationship between the switching time and the PMC size is not precisely known and for simplicity of analysis the switching time is made to be linearly dependent with the CF volume. Therefore, the relationships between switching time of the set and reset processes and the pulse amplitude of the PMC are given by

푉표푙푚푎푥 푡푠푒푡 = 푡푝 = 훼 푒푥푝(훽 ∗ |푉𝑖푛, 표푢푡|) ∗ (29) 푉표푙푑푒

푉표푙푚푎푥 푡푟푒푠푒푡 = 푡푛 = 훾 푒푥푝(훿 ∗ |푉𝑖푛, 표푢푡|) ∗ (30) 푉표푙푑푒

2 Where Volmax is the largest CF volume (equal to πR hth), Volde is the default value of the CF volume.

This is found using experimental data, i.e. if [60] is used, the CF height is 49.5nm and the CF radius is 20.57114nm.

So, the changing rate of the CF volume at each time step is related to the voltage difference across the PMC; the voltage difference across the PMC (that is used to calculate the switching time in (29) and (30)) is between the least and largest voltage values (to limit the CF volume within the 80

range of 0 and Volmax). The largest and least voltage differences across the PMC are given in (31) and (32) respectively and tp and tn are equal to tstart.

1 푡푠푡푎푟푡푉표푙푑푒 푉푃푀퐶,푀푎푥 = − 푙푛( ) (31) 훽 푉표푙푚푎푥∗훼

1 푡푠푡푎푟푡푉표푙푑푒 푉푃푀퐶,푀푖푛 = − 푙푛( ) (32) 훿 푉표푙푚푎푥∗훾

3.3.2. Model Simulation

In this section, the proposed HSPICE macromodel of a PMC is assessed; the data of [60]

(shown in Table 13) is initially utilized for the physical parameters. The electrical characteristics of a PMC are simulated and assessed as follows.

Table 13. Parameters used in simulation (from [60])

Parameters Value Parameters Value 7 ρon (Ω•nm) 4*10 R (nm) 20.57114 11 ρoff (Ω•nm) 1.33*10 α 679.27

hth (nm) 49.5 β -16.73

rth (nm) 0.75 γ 149.97 A (nm2) 1330 δ -14.86

3.3.2.1. CF Height and Radius

The variation of the CF height (h) and radius (r) must be considered to evaluate the electrical characteristics of a PMC; they are shown in Figures 51b and 51c respectively by utilizing the voltage pulse sequence of Figure 51a.

Fig. 51. a) Voltage pulse sequence across PMC b) CF height c) CF radius of PMC Vs Simulation time (ms)

The CF height increases (Figure 51) when a positive voltage is dropped across the PMC.

The CF radius is found when the CF height reaches its largest value. However, if a negative voltage is dropped across the PMC, the CF radius is reduced; when the CF radius is reduced to its least value the tip of the CF is separated from the top electrode and the CF height decreases. These characteristics are well matched with the variations of CF height and radius (Figure 51).

3.3.2.2. I-V and R-V Plots

The I-V and R-V plots of a PMC are generated next. These are found by simulating the DC double sweep of Figure 52.

Fig. 52. Voltage difference across PMC for generating I-V and R-V plots

Fig. 53. I-V characteristics of the proposed PMC macromodel

Fig. 54. R-V characteristics of the proposed PMC macromodel

Figure 53 and 54 present the I-V and R-V characteristics of the proposed PMC macromodel when the voltage difference across the PMC is swept (Figure 52). The initial CF height starts at

0nm; as a positive voltage drop occurs across the PMC, the CF height increases (and its increasing rate is dependent on the voltage difference). If the voltage difference across the PMC is high, then the changing rates of the CF height and radius are also high and the PMC is suddenly switched to the ON state. The reset process is similar to the set process; when the negative voltage difference across the PMC is high, the changing rates of the CF height and radius are also high. Therefore, the

PMC is suddenly switched to the OFF state. The I-V and R-V curves (Figures 53 and 54 respectively) clearly show these characteristics.

 When the positive voltage difference across the PMC is larger than 0.5V, the changing rate

of the CF height and radius are high. So, the PMC is suddenly switched to the ON state

(low resistance).

 When the negative voltage difference across the PMC is larger than 0.35V, the PMC is

suddenly switched to the OFF state (high resistance).

3.3.2.3. Switching Time

A comparison between the switching times of the set and reset processes using the proposed macromodel and the experimental results [60] is pursued.

Fig. 55. Percentage errors between the switching time of the proposed PMC macromodel and the experimental results [60] for the set and reset processes vs pulse amplitude (V)

Figure 55 shows the percentage errors of the switching time for both the set and reset processes of the proposed PMC macromodel and the experimental results [60] versus pulse amplitude. The proposed PMC macromodel simulates the switching time for both processes very closely to the experimental data of [60], i.e. at a largest error less than 0.08%.

3.3.2.4. Set Voltage and Ramp Rate

Next, the relationship between the set voltage and the ramp rate of the proposed PMC macromodel is established. The ramp rate of the DC sweep (Figure 52) is defined as the voltage step divided by the duration time; so, when the ramp rate is increased, the set voltage also increases.

Fig. 56. I-V characteristics of the proposed PMC macromodel when the ramp rate of the DC Sweep is 1, 3 and 5V/s

Figure 56 presents the I-V characteristics of the proposed PMC macromodel when the ramp rate of the double DC sweep is changed. When the ramp rate is increased, the set voltage of the proposed PMC macromodel also increases. This is similar to the experimental results of [60].

3.3.2.5. Sensitivity

The sensitivity of the proposed PMC macromodel to different parameters is assessed with respect to the I-V characteristics, the R-V curve and the switching times of the set and reset processes; in all cases, only a single parameter is changed at a ±5% level (i.e. all other parameters are left unchanged) for sensitivity analysis.

Film thickness of the solid electrolyte (L)

The variation of the film thickness of the solid electrolyte (L) is related to several parameters of the proposed macromodel. Table 14 presents the sensitivity of the parameters that are affected by the variation of L.

Table 14. Parameter sensitivity when the film thickness of the solid electrolyte (L) is changed by ±5%

Parameter Percentage Variation -5% 0% 5% L (nm) 47.025 49.5 51.975

VPMC (V) ±0.7 Switching time (set) 5.30ms 5.58ms 5.86ms Switching time (reset) 4.33ms 4.56ms 4.79ms

ROFF,MAX (MegΩ) 4752.6 5000 5247.6

RON,MIN (MegΩ) 1.4294 1.5038 1.5782

When L is changed by ±5%, the switching times of the set and reset processes also change; the switching time increases when L increases. Also when the film thickness of the solid electrolyte is increased, the largest and least PMC resistances increase, because resistance is related to the

l length of the PMC (as per the well-known equation R = ρ ). A

CF Radius at the bottom (R)

Table 15. Parameter sensitivity when the radius at the bottom of the CF (R), is changed by ±5%

Parameter Percentage Variation -5% 0% 5% R (nm) 19.5426 20.57114 21.60

VPMC (V) ±0.7 Switching time (set) 5.03ms 5.58ms 6.15ms Switching time (reset) 4.12ms 4.56ms 5.03ms

ROFF,MAX (MegΩ) 5540.3 5000 4535.1

RON,MIN (MegΩ) 1.6663 1.5038 1.3639

Table 15 shows the sensitivity of parameters of the PMC that are affected by changing the radius at the bottom of the CF (R) by ±5%. By utilizing a voltage with pulse amplitude of 0.7V, the switching times, the largest and the least resistance values change as function of R. When R is increased, the switching times are slower, because the CF volume increases. However when R is increased, the largest and least PMC resistances are smaller, because the area of the CF that is in contact with the top electrode, is larger.

3.3.2.6. Additional Experimental Assessment

In this section, the proposed PMC macromodel is used to simulate the electrical characteristics of a different PMC found in [68]. The experimental results of [68] are utilized; the parameters α and β that are used to simulate the relationship between the switching time and the voltage drop across the PMC (as in (29)) are now given by 5.0853 and -5.947 respectively. Since the switching times of the set and reset processes of [68] are very close, γ and δ are set to the same values as α and β respectively. The largest and least PMC resistances are now 1GΩ and 1kΩ respectively, while the ρon and ρoff values (found from (10) and (11), are equal to 78.57 Ω•nm and 87

78.57*106Ω•nm respectively. The PMC of [68] operates at 90nm, so its optimum thickness is 3nm.

The values of hth, rth, and R are 89.95nm, 0.01nm, and 1.5nm respectively.

Fig. 57. I-V characteristics of the proposed PMC macromodel when the data of [68] is employed and the ramp rate of the DC sweep is 1 V/s

Figure 57 shows the I-V characteristics of the proposed PMC macromodel when the experimental data of [68] is employed. The set voltage of the PMC in [68] is approximately 0.45V and is close to the simulation results (Figure 57). Figure 58 presents the plot of switching time versus pulse amplitude of the proposed PMC macromodel and the experimental data of [68]; the percentage difference between simulated and analytical values (using (4.7)) for the switching time versus the voltage drop across the PMC is shown in Figure 59. The proposed PMC macromodel can simulate the relationship between switching time and pulse amplitude close to the experimental results with an error of less than 1%. 88

Fig. 58. Relationship between switching time and pulse amplitude of voltage drop across PMC from simulation and experimental data [68]

Fig. 59. Percentage difference between the switching time of experimental and simulated data versus pulse amplitude

3.4. Applications of PMC

Applications of Programmable Metallization Cell (PMC) are considered next. By using

PMC as nonvolatile storage elements, different types of PMC-based nonvolatile memory cell are generated.

3.4.1. PMC-Based Nonvolatile SRAM (7T1P)

This memory cell utilizes a resistive RAM (RRAM) together with a 6T SRAM core.

RRAM is used as nonvolatile storage element of the SRAM. When power supply is OFF, data in the SRAM is stored and can be recover when the power supply is back. This memory cell improve performance of the current SRAM.

Fig. 60. The proposed non-volatile 7T1P Cell

Figure 60 presents the non-volatile memory cell; it consists of a 6T SRAM core and a

RRAM made of a MOSFET and a PMC (1T1P), hence this is referred to as a 7T1P cell. The 1T1P

RRAM is connected to node D of the 6T core; the data in the 6T core is also stored in the nonvolatile element, such that it can be retained if there is no supply voltage, i.e. during normal operation, the data is stored at nodes D and DN (Figure 60) and the cell operates as a SRAM. Data is recovered from a loss of the supply voltage by using the non-volatile memory element.

3.4.1.1. Store Operation

The store operation (from SRAM to PMC) requires two steps. The voltage at node Ctrl2 is at VDD and GND during the first and second steps respectively, while the voltage at node Ctrl1 is at VDD to turn ON the transistor M7. In the first step, the PMC resistance is in the Reset state (high resistance value), then the voltage at Ctrl2 is at GND to program the data into the PMC.

 If the data to be stored is a ‘1’, the voltage at D is at VDD. During the first step, the PMC

resistance is unchanged. However in the second step, there is a positive voltage drop across

the PMC, so its resistance is switched to a low value (RON).

 If the data to be stored is ‘0’, the voltage at D is at GND. During the first step (so the

voltage at node Ctrl2 is at VDD), the PMC resistance is switched to a high value (ROFF), In

the second step (the voltage at node Ctrl2 is at GND), the PMC resistance remains

unchanged, because the voltage drop across it is nearly zero.

3.4.1.2. Restore Operation

The restore operation (from PMC to SRAM) is as follows. This operation transfers the data stored in the PMC to D and DN. This is accomplished as follows. The voltages of Ctrl1 and Ctrl2 are at VDD, while the voltages at the lines BL, BLN and WL are at GND; so, M7 is ON and the voltage from Ctrl2 is passed through the PMC to D.

 For a ‘1’ stored in the PMC (low resistance value), the voltage at D is at VDD, while the

voltage at DN is discharged through M3.

 For a ‘0’ stored in the PMC (high resistance value), instability in circuit operation may

occur due to the discharge of a node between D and DN. However, the small values of the

ON-state resistance of M1 and M7 compared to the high resistance of the PMC result in a

low voltage at D. This turns M3 OFF, thus preventing the discharge of DN. So, the voltage

at node D is at GND. 91

As the ON-state resistance of the PMC (RON) is very large (about 1.5MΩ), during the restore operation for a ‘1’, the voltage at D is incorrectly at GND. Hence, to avoid this from occurring, the ON-state resistance of the PMC must be appropriately adjusted. As shown in (11), the value of RON is related to the CF height and radius. So, the ratio of the CF radius to height must be considered to reduce the ON-state resistance of a PMC.

Figure 61 shows that when the ratio of CF radius to height is increased, the ON-state resistance of the PMC decreases. For a low ON-state resistance, the CF height (L) and CF radius

(R) are selected as 3 nm and 15 nm respectively (the values of hth and rth are 2.975nm and

0.5226481nm respectively), so RON is now given by 169.697kΩ, while ROFF is 564.2424MegΩ.

Fig. 61. Least PMC resistance (ON-state resistance) vs ratio of largest CF radius and height of PMC

The switching time is related to the voltage difference across the PMC; so, the supply and the voltages at Ctrl1 and Ctrl2 (VDH) must be increased during this operation to reduce the store time. However when the supply voltage is increased, the voltages at D and DN also increase. During the store operation, if the supply voltage is increased to VDH, the voltage at D is increased to a value greater than zero volts when a ‘0’ is stored in the 7T1P cell. In the first step of the store operation

(so the voltage at Ctrl2 is at VDH), the PMC is biased to the OFF state. However in the second step, the voltage at Ctrl2 is switched to GND; as the voltage at D is not equal to zero, then a positive 92

voltage is dropped across the PMC. If VDH has a very large value, the voltage difference across the

PMC will bias its resistance to the ON-state, yielding an incorrect operation. Therefore, the voltage difference across the PMC during the store operation and its store time must be considered in more detail.

Fig. 62. Store time of 7T1P cell when its supply is changed

Fig. 63. Voltages at D and DN of 7T1R cell vs. supply voltage (voltages at Ctrl1 and Ctrl2 are 0)

Figure 62 shows the time for the store ‘0’ and ‘1’ operations when the supply and the voltages at Ctrl1 and Ctrl2 (VDH) are increased, i.e. when the supply voltage is increased, the time of the store ‘0’ operation decreases, but then it starts to increase. Moreover when VDH increases

(Figure 63) and a '0' is stored in the memory cell, the voltage at D increases larger than 0V when

VDH is larger than 1.8V. So when VDH increases, the voltage difference across the PMC also increases; however when VDH is further increased, the voltage difference across the PMC reaches the largest value and then it decreases, so the time of the store ‘0’ operation is slower. For the store

'1' operation, the voltage at D is related to VDH and the voltage at Ctrl2 is at GND (in the 2nd step of the store operation), so when VDH increases, the voltage difference across the PMC also increases, hence the time of the store ‘1’ operation is fast.

For equal store times, the voltage difference across the PMC during the store ‘0’ and ‘1’ operations must be close. Consider Figure 62; the least difference in values occurs when VDH is at

2.375V, i.e. the time of the store '0' operation is about 149.5ps, while the time of the store '1' operation is 173.7ps. So, VDH for the store operation is selected to be 2.375V when a 32nm CMOS feature size is used. The supply voltage (VDD) is 0.9V during normal operation, but it must increase to 2.375V for the store operation of the 7T1P cell. Since the store time for '0' is 149.5ps while the store time for a ‘1’ is 173.7ps. For the store operation of the proposed 7T1P cell, the width of the voltage pulse at Ctrl2 is 149.5ps when the voltage at Ctrl2 is VDH (2.375V), and is 173.7ps when the voltage at Ctrl2 is GND. So, the store operation of the proposed 7T1P cell takes 323.2ps (i.e. the sum of the two values for the steps).

As mentioned previously for the restore operation, the restore operation starts with the appropriate voltage at Ctrl1 and Ctrl2, while the supply voltage is at its default value (0.9V at

32nm). As the value of the resistance of the PMC in the proposed 7T1P cell is very large, the voltages required at Ctrl1 and Ctrl2 must be at values higher than VDD. The data in the 7T1P cell is restored at D and DN (Figure 64) depending on the resistance value in the resistive element. 94

Fig. 64 Voltages at D and DN during the restore operation when a ‘1’ is stored in the 7T1P cell

Figure 64 shows the voltages at D and DN of the proposed 7T1P cell during the restore operation for a ‘1’; the voltage at D is increased to VDD during the restore operation (while the voltage at node DN decreases to GND). The voltage at D is retrieved using the voltage at Ctrl2 and the PMC resistance; the restore time is reduced by increasing the so-called restore voltage provided at Ctrl2.

Fig. 65. Restore time of 7T1P cell and restore voltages at Ctrl1 and Ctrl2, when data ‘1’ is stored in the PMC (the PMC resistance is 169.697kΩ)

Figure 65 shows the relationship between the restore time and voltage (i.e. the values of the voltages at Ctrl1 and Ctrl2 during the restore operation). When the restore voltage is increased, 95 the restore time decreases. However, the restore voltage is restricted to the least value of 1.035V

(when the PMC resistance is 169.697kΩ) while its largest value is given by 2.11V. If the restore voltage is less than 1.035V, a ‘1’ cannot be retrieved, because the voltage at D cannot be increased to VDD. However if the restore voltage is larger than 2.11V, the PMC resistance will change from state ‘1’ to state ‘0’, because there will be a very large voltage difference across the PMC.

Fig. 66. Least restore voltage (voltage at Ctrl1 and Ctrl2) vs. ON-state resistance of PMC

Figure 66 shows the least restore voltage of the proposed 7T1P cell when the ON-state resistance of the PMC is changed, i.e. when the ON-state resistance of the PMC is increased, the least restore voltage of the 7T1P cell also increases.

3.4.1.3. Store/Restore time

In this section, the 7T1P non-volatile cell is compared with the 7T1R cell [69] (using an oxide-based resistive RAM (OxRRAM)) and a 7T1M cell (using a memristor-based RRAM) [7,

57]. The comparison between the 7T1P, 7T1R and 7T1M cells is performed at 32nm CMOS feature size and a supply voltage of 0.9V. 96

The difference between 7T1P, 7T1R and 7T1M is the type of resistive element used in the cell, so the resistance ranges employed in the ON and OFF states of the non-volatile elements are also different. The store and restore voltage values (supply voltage, voltages at Ctrl1 and Ctrl2 for the store operation, voltages at Ctrl1 and Ctrl2 for the restore operation) are adjusted to accomplish the fastest store and restore times at 32nm CMOS feature size.

Table 16. Time comparison of 7T1P, 7T1R [69] and 7T1M cells in which a PMC, an OxRRAM, and a memristor are used as storage element (32nm feature size)

Nonvolatile Cell Operation 7T1P 7T1R 7T1M Store ‘0’ time 149.5ps 20.03ns 159.1µs Store ‘1’ time 173.7ps 20.031ns 1.4653µs Restore ‘0’ time 21.58ps 36.93ps 50.04ps Restore ‘1’ time 19.66ps 38.54ps 44.49ps ON-State Resistance 169.697kΩ 1kΩ 100Ω OFF-State Resistance 564.24MegΩ 1MegΩ 16kΩ Store Voltage (V) 2.375V 0.9V 1.85 Restore Voltage (V) 1.85V 0.9V 0.78V

As shown in Table 16 (bold entries have the least values), the store and restore times of the proposed 7T1P cell are faster than the 7T1R and 7T1M cells respectively. As the ON and OFF- state resistances of a PMC have larger values than the corresponding resistances of a OxRRAM and a memristor, then the voltage drop across the resistive element during the store operation is larger. So the resistance switching in a PMC occurs at a rate higher than for either an OxRRAM or a memristor.

For the restore operation (Table 16), the restore voltage of a 7T1P cell is larger than the restore voltages of the 7T1R and the 7T1M cells; the restore time of 7T1P is faster than for 7T1R and 7T1M. This is the result of the resistive feature of the 7T1P, in which its OFF-state resistance is much larger than for the 7T1R and the 7T1M cells and hence, the restore voltage has a larger 97 value. For a 7T1M cell, the OFF-state resistance of a memristor is very low compared with a PMC and OxRRAM; so if the restore voltage is large (such as VDD), the '0' data cannot be restored to the cell, hence the restore voltage of the 7T1M cell must be reduced and its restore time is significantly slower.

The store and restore times of the proposed 7T1P cell are also faster than for the 7T1R cell

[69]. The read and write times of the proposed 7T1P and 7T1R cells are the same, because the same

SRAM core (at the same supply voltage and CMOS transistors size) is employed. However when the supply voltage is OFF, the supply and store voltages (i.e. the voltages at Ctrl1 and Ctrl2) in the proposed 7T1P cell must be adjusted for a store operation. In [69], the supply and control voltages of a 7T1R cell during the store operation are not changed, so its store time is slower than for a 7T1P cell.

For the restore operation, the PMC resistance range is very large; the restore voltage (i.e. the voltages at Ctrl1 and Ctrl2) in the 7T1P cell must be increased for the ‘1’ state. With an increase of the restore voltage, the restore time of the 7T1P cell is faster than a 7T1R cell in which the restore voltage is equal to the default value of the supply voltage.

3.4.1.4. CMOS Feature Size

Next, the store and restore operations of the 7T1P cell are simulated when the CMOS feature size is changed. At each CMOS feature size, the performance and the supply voltage change too. A comparison of the store and restore operations of the 7T1P cell at different CMOS feature sizes is performed when the largest store voltage (VDH) and the restore voltage of 7T1P cell are constant.

Table 17. Comparison of delays in the 7T1P cell when the CMOS feature size and supply voltage are varied.

CMOS Feature Size Operation 32nm 45nm 65nm Supply Voltage (V) 0.9 1.1 1 1.1 1.1 Store time ‘0’ (ps) 149.5 146.3 8.252 7.483 6.644 Store time ‘1’ (ps) 173.7 163.44 155.4 155.4 43.21 Restore time ‘0’ (ps) 21.58 15.45 21.10 18.50 22.50 Restore time ‘1’ (ps) 19.66 22.82 21.91 23.69 25.81 Restore Voltage (V) 1.85 Largest Store Voltage (V) 2.375

Table 17 presents the comparison between the store and restore operations of a 7T1P cell when its CMOS feature size is varied. When the CMOS feature size is reduced, the store time of the 7T1P cell is slower. VDD is reduced at a smaller CMOS feature size; so, the rate for the supply voltage to increase from VDD to VDH is less. The voltage difference across the PMC during the store operation takes longer to reach the largest value, thus causing the store time to be slower. The PDP and power dissipation are related to the store time, so at a lower CMOS feature size, they are higher than at a larger CMOS feature size. For the restore operation, the supply and restore voltages are constant; the restore time is related to the feature size of CMOS, i.e. a lower CMOS feature size results in a faster restore time. The power dissipation of a cell is larger at a small CMOS feature size for the same supply voltage (1.1V)).

Table 18. Comparison of average power dissipation and Power Delay Product (PDP) for each operation of a 7T1P cell when varying the CMOS feature size

CMOS Feature Size Operation 32nm 45nm 65nm Supply Voltage (V) 0.9 1.1 1 1.1 1.1 Store '0' Power (µW) 314.843 315.264 185.683 184.794 118.124 Store '1' Power (µW) 322.025 322.209 193.79 193.824 49.5424 Restore ‘0’ Power (µW) 9.8345 23.6606 12.863 18.8156 16.1862 Restore ‘1’Power (µW) 19.9928 31.283 21.9135 26.7557 24.674 PDP (Store '0') (*10-15) 47.069 46.123 1.53226 1.3828 0.7848 PDP (Store '1') (*10-15) 55.9357 52.6618 30.115 30.12 2.1407 PDP (Restore '0') (*10-15) 0.2122 0.36556 0.2714 0.34809 0.3642 PDP (Restore '1') (*10-15) 0.39306 0.71388 0.4801 0.6338 0.6368

3.4.1.5. Process Variation

The process variations of the 7T1P and 7T1M cells are evaluated next. The variation of each parameter is assumed to be Gaussian distributed with a variation of 3σ/µ in percentage where the mean is µ and the standard deviation is σ. The store and restore times are evaluated.

Variability of Non-Volatile Memory Element

The variations of the PMC and memristor dimensions are initially considered at a 3% value.

The threshold height (hth) and radius (rth) must be considered in the PMC macromodel. The threshold height of the PMC (hth) is given at 95% of its largest CF height (L); the threshold radius

(rth) is found when the ON state resistance of the PMC is equal to the OFF state resistance when the CF height is at the threshold height value (hth), so

휌표푛 퐿∗퐴 푟푡ℎ = (33) 휋푅∗(휌표푛ℎ푡ℎ+ 휌표푓푓(퐿−ℎ푡ℎ))

A 32nm CMOS feature size and the default values of Table 4-4 are considered.

100

Table 19. Percentage variation (3σ/µ) of store and restore times of 7T1P and 7T1M cells when each parameter is varied.

Nonvolatile Cell Parameter Operation 7T1P 7T1M Store ‘0’ 0.4749 0.1121 L (7T1P - Store), Store ‘1’ 0.5206 1.1335 D (7T1M-Store) Restore ‘0’ 0.0352 0.17859 (3%) Restore ‘1’ 1.806 15.887 Store ‘0’ 1.24556 CF Radius (7T1P) Store ‘1’ 0.892859 N/A (3%) Restore ‘0’ 0.046715 Restore ‘1’ 11.26218 Store ‘0’ 46.1372 0.239557 Store Voltage (5%) Store ‘1’ 13.475 3.60087 Restore ‘0’ 0.26394 21.0443 Restore Voltage (5%) Restore ‘1’ 4.74122 43.788 Store ‘0’ 0.002228 0 Store ‘1’ 0 0 Threshold Voltage (Vth): (3%) Restore ‘0’ 4.5289*10-3 89.096*10-15 Restore ‘1’ 55.579*10-15 111.24*10-15

Table 19 shows the percentage variations of the store and restore times of the 7T1P and

7T1M cells when each parameter is varied. Consider first the store time; when L is varied, the percentage variation of the store time is less than when the CF radius is varied. However, the percentage variation for the 7T1P cell is similar to the percentage variation when the dimension of the memristor [52] is varied by the same level (i.e. 3%) in a 7T1M cell.

For the restore operation, the time of the restore '0' operation is barely changed when the

PMC dimension varies. The OFF state resistance of a PMC is very large, so a change of this parameter will not significantly affects the restore '0' time. For the restore '1' operation, its resistance is low, so a resistance change will significantly affect the restore times of both the 7T1P and 7T1M cells. Moreover, when the dimension of the memristor is changed, the variation in restore '0' time 101 of a 7T1M cell is larger than for a 7T1P cell (when the dimension of the PMC is changed). The

OFF-state resistance of a memristor is significantly less than for a PMC, hence resulting in a high variability in restore time.

Variability of Store and Restore Voltages

Next, the variations of the store and restore voltages for the 7T1P and 7T1M cells are considered at a 5% value; the results are shown in table 7. The variation of the store voltage highly affects the store time of a 7T1P cell, because the changing rate of a PMC is highly depended on the voltage different across it, i.e. the store time is significantly changed by varying the store voltage.

As for the restore voltage, its variation affects more a 7T1M cell than a7T1P cell, because the resistance of the memristor is smaller than the PMC.

Variability of the Threshold Voltage of MOSFET

The percentage variation of the MOSFET threshold voltage [70] at 32nm CMOS feature size is 3%. The store and restore times of the 7T1P and 7T1M cells show a small variability with respect to the threshold voltage; because the gate voltages of the 7T1P and 7T1M cells during the store and restore operations are very large.

3.4.2. Crossbar Memory

The crossbar memory is a memory array made of horizontal and vertical conducting wires; a switch is placed at each wire crossing. A switch of a crossbar array has two distinct states, the

ON and the OFF states, corresponding to the low and the high resistance values respectively [71].

Each switch acts as a memory element and is programmed by a sufficiently high voltage pulse on the corresponding word and bit lines; its state is read by sensing either the corresponding voltage, or current [72]. 102

Molecular switches have been proposed as memory elements of a crossbar array [71]. [72] has shown that by substituting molecular switches with CNTFETs, a crossbar array shows improvements in both the sense voltage on/off ratio and the noise margin compared with a molecular-based implementation. A PMC is investigated next as a switching element in a crossbar.

The PMC is then assessed in this section due to ease of fabrication and the large on/off resistance ratio. Two read-out schemes are considered next for such implementation.

3.4.2.1. Read-Out Schemes

The first read out scheme is shown in Figure 67; the supply voltage is biased to the selected switch and the current that flows from the selected switching element to ground, is then measured

[71]. A resistor (with a value given by Rsense) is connected to the selected column; the current of a column is detected, while the voltage drop across the resistor (Vsense) is compared with a reference voltage (Vref) to determine the state of the selected switch. All other switches on the unselected bit and word lines are connected to ground.

Fig. 67. Read-out scheme I

As analyzed in [71], the sense voltage of a crossbar memory is affected only by the resistances connected to the same bitlines of the selected switch. A larger array size results in a 103 smaller absolute value of the sense voltage. Moreover, when there is a large number of ON switches in the array, the sense voltage is also smaller. Therefore, switches with a high RON/ROFF ratio are best for larger memory arrays [72]. For a n x n array, the worst case scenario occurs when the state of Rn (Figure 67) is read and all other switches on the same column are ON. Let the sense voltage on/off ratio be defined as the sense voltage ratio between the ON and OFF states for the selected switch while all other switches in the array are ON. The read-out margin of the crossbar can then be found using the sense voltage on/off ratio.

The second read-out scheme is shown in Figure 68. All unselected bitlines and/or the wordlines [72] are biased to the voltage source V', while an ideal current measurement is assumed for the selected column, i.e. by measuring the current on a zero-resistance meter between the crossbar and ground instead of the voltage drop across the sense resistor.

Fig. 68. Read-out scheme II

For this read-out scheme, the relative noise margin of the sense current is considered as figure of merit [71]. The ON state current (ION) is the sense current when the selected switch is in the ON state and the other switches are in the OFF state. Equivalently, the OFF state current (IOFF) 104 is the sense current when all switches are in the OFF state. Therefore, the noise margin of the sense current is given by.

퐼 −퐼 푁표𝑖푠푒 푀푎푟𝑔𝑖푛 = 푂푁 푂퐹퐹 (34) 2(퐼푂푁+퐼푂퐹퐹)

3.4.2.2. Simulation Results

In this section, a 100x100 PMC-based crossbar is initially evaluated and compared with a

MOSFET-based crossbar array [72]. The CF height (L) and the CF radius (R) of the PMC are 3nm and 15nm respectively (the values of hth and rth are 2.975nm and 0.5226481nm respectively). The

ON state resistance of the PMC (RON) is now given by 169.697kΩ, while the OFF state resistance of the PMC (ROFF) is 564.2424MegΩ. In this paper, a 32nm NMOS transistor is employed as a switch and its ON and OFF states are determined by the gate voltage (at VDD and GND respectively). At this feature size, it is also assumed that the metal interconnect of the crossbar is about 32nm in width, with a unit resistance of 16.526Ω/µm and a capacitance of 276.214aF/µm

[72].

For the first read-out scheme, a comparison is pursued between crossbar memories using

PMCs, MOSFETs and memristors as switches. The sense resistance (Rsense) of the crossbar is set at

100kΩ.

Table 20. Sense voltage (V) of crossbar memories

Scenarios MOSFET PMC Memristor All switches OFF 504.1*10-6 0.156*10-3 7.20*10-3 All switches ON 2.204*10-3 8.58*10-3 4.5623*10-3 One switch ON, remaining switches OFF 0.3551807 0.329 0.3639 One switch OFF, remaining switches ON 5.419*10-6 10.3*10-6 4.3948*10-3 Sense voltage on/off ratio 406.8422 827.29 1.0381 (worst case scenario)

105

The highest value of the sense voltage occurs when the selected switch is in the ON state, while the remaining switches are in the OFF state; when the selected switch is in the ON state, its resistance is low, so the voltage drop across it is low and therefore the sense voltage of the crossbar is large. When the remaining switches are in the OFF state, no leakage is encountered; therefore, the sense voltage of the crossbar is very large. However, if the remaining switches in the crossbar are in the ON state, a significant leakage to GND exists and the sense voltage is very low. Table 20 shows also the worst case scenario of the sense voltage on/off ratio.

Consider the two scenarios when “all switches are ON” and “one switch is OFF and the remaining switches are ON”. The values of the sense voltage on/off ratio for these scenarios are very different. The sense voltage on/off ratio for a PMC is significantly larger than for a MOSFET, while the sense voltage on/off ratio of a memristor is very small (due to the smaller resistance range). Hence, the memristor is not well suited to operate as a switch in a crossbar memory. A PMC has a very large resistance range and its sense voltage on/off ratio is very large too and significantly better than a MOSFET based crossbar, thus showing the best performance among the devices considered as switches in this manuscript. Hence, a memristor-based crossbar memory will not be further considered in the evaluation.

Fig. 69. Ratio of sense voltage and supply voltage versus sense resistance of the crossbar

106

Figure 69 shows the sense voltage of the PMC and MOSFET crossbars of 100x100 size, when the sense resistance is varied under the worst case scenario; the readout margin of a PMC- based crossbar is larger than the MOSFET-based crossbar. Moreover, the sense resistance must be increased for a larger read-out margin.

Next, consider the second read-out scheme by biasing V' to all unselected bitlines and/or the wordlines [72]; in this paper, the voltage V' is set to 0V and an ideal current measurement is utilized (i.e. at a zero resistance level for the selected bitline).

Table 21. ON/OFF current ratio of MOSFET and PMC-based crossbar memories

MOSFET PMC 20x20 100x100 20x20 100x100 ON Current (A) 3.2034*10-5 3.1899*10-5 5.3029*10-6 5.3002*10-6 OFF Current (A) 5.2037*10-9 5.2032*10-9 1.595*10-9 1.595*10-9 ON/OFF ratio 6155.95 6130.57 3324.49 3323.01

Table 21 presents the on/off current ratio of the MOSFET and PMC based crossbar memories; when the size of the crossbar memory is increased, the on/off current ratios of both the

PMC and MOSFET-based crossbars decrease. Moreover the on/off current ratio of a MOSFET- based crossbar is larger than for the PMC-based crossbar, because a MOSFET is an active device

(the PMC is a passive device), i.e. its ON current is larger than the ON current of a PMC.

Furthermore, the simulation results in Figure 70 show that when the relative noise margins of the PMC and MOSFET based crossbars are considered, an increase in array dimension has no implication as a nearly constant value is attained. 107

Fig. 70. Relative noise margin versus crossbar dimension

3.4.3. PMC-Based NVSRAM with Concurrent SEU Detection and Correction

As shown in the figure 60, the commonly used NVSRAM scheme is utilized; it consists of two parts: a volatile (6T) SRAM core and a RRAM circuitry (consisting of a 1T and a 1X, where

X denotes the type of resistive element). In the 7T1R cell of [69] (X=R), non-volatile data is kept as resistance in an oxide-based resistive element.

It has been shown [69] that in a NVSRAM cell, the non-volatile storage node has a very large charge, so it is extremely tolerant to a SEU; this implies also that a NVSRAM cell has an inherent redundancy in stored data and the data stored in the non-volatile (resistive) element still holds correct data if the SRAM cell is affected by a SEU.As corruption of the data stored in the non-volatile element of a cell due to a SEU is highly unlikely (if not impossible), the data stored in the RRAM is a reliable duplicate of the one stored in the SRAM core; moreover, the resistive element in a NVSRAM is usually placed on a different plane in the chip layout, thus ensuring that multiple upsets are highly unlikely to affect both versions of the same data and preserving data independence in the storage functions. A detailed assessment of the critical charge at the volatile and the non-volatile storage nodes has been pursued in [69]; in all cells, it has been shown that the non-volatile storage node has a charge order of magnitude larger than the critical charge, thus 108 making the data stored in the RRAM very reliable. The utilization of this feature however requires modifying existing operations (such as read and write) as well as introducing new ones (such as restore).

Fig. 71. Proposed non-volatile SRAM (7T1P) cell

Figure 71 presents the proposed nonvolatile 7T1P cell; it consists of a 6T SRAM core and a Resistive RAM (RRAM). The RRAM is a 1T1P circuit, i.e. it uses one transistor (M7) and a PMC element, X=P. The proposed 7T1P cell has the following operations (different from the only instant-on behavior of the cell of [69]):

 Write (Store): the data is written to both the SRAM core and the PMC.

 Read: the data is read from the SRAM core contingent upon no occurrence of an SEU.

 Restore: if an SEU occurs, Concurrent Error Detection (CED) is employed and this

operation is evoked, such that the data stored in the PMC is transferred to the SRAM core

for correction.

 Instant-On: the data stored in the SRAM core is volatile, i.e. it is lost when there is no

power supply. Once the supply voltage is again made available, the instant-on operation is

started and the data stored in the PMC is transferred also to the SRAM core. 109

In the 7T1P cell, non-volatile data is kept in the form of PMC resistance. The proposed cell utilizes the same circuitry as the NVSRAM of [69] but its operations are different. The NVSRAM cell of this paper utilizes the resistive element as a reliable back-up, such that a SEU affecting the

SRAM cell can be detected using a novel CED circuit and correction is implemented by the restore operation. The instant-on operation is still possible and is evoked when following the loss of power, data stored in the resistive element is transferred also to the SRAM core upon the availability of power. Moreover, as for the occurrence of an SEU, the node of critical charge is considered [73,

74, 75]; as shown in a later section (and consistent with other works on NVSRAMs [69]), this node is DN. Once a SEU occurs, it results in a state change at DN, thus also causing D to change accordingly (due to the cross-coupled inverter scheme of the SRAM core).

Next, the three operations of store, restore, and instant-on are presented in more detail.

3.4.3.1. Store Operation

During the store (write) operation, data is written in both the PMC and the SRAM core.

Write '0' Operation

To write ‘0’ as data, the value of the voltage at D must be at GND, while the value of the

PMC resistance must be ROFF (high resistance). The voltages at BL and BLB are at GND and VDD respectively. The memory cell is selected by setting the voltage at WL to VDD. The changing rate of the resistance of the PMC is related to the voltage difference across it [76]; transistor M7 is turned ON by increasing the supply voltage and the voltage at Ctrl1 to Vdh during the write operation; so the PMC is written with the data corresponding to the voltages at D and Ctrl2. As the voltage at node D is at GND and the voltage at Ctrl2 is at Vdh, then a negative voltage is dropped across the PMC and its resistance is set to the OFF state (high resistance).

Write '1' Operation

In this case, the voltage at node D must be at VDD while the PMC resistance must be placed in the ON state (low resistance). So, the voltage at WL must be at VDD for selecting the memory 110

cell, while the voltages at BL and BLB are at VDD and GND respectively. The voltage at D is increased by increasing the supply voltage of the 7T1P cell to Vdh. M7 must be ON to generate the voltage difference across the PMC. So, the voltage at Ctrl1 is Vdh, while the voltage at Ctrl2 is at

GND. A voltage difference across the PMC exists and the write ‘1’ operation is executed.

The write operation of the 7T1P requires one clock cycle; this is better than [76] in which two clock cycles are needed. The write operation in the proposed cell stores data at the same time in both the RRAM and the core.

3.4.3.2. Restore Operation

The restore operation transfers (copies) the data stored in the PMC to the SRAM cell, i.e. at node D. The data stored in the PMC is read by setting the voltages at Ctrl1 and Ctrl2 to GND and VDD respectively. If a '0' ('1') is stored in the PMC, the voltage at DP is at GND (VDD). For the restore operation, VWL is set at VDD, while the voltages at BL and BLB are varied depending on the stored data, i.e. for a '0' ('1') in the PMC, the voltages at BL and BLB are given by GND and VDD

(VDD and GND) respectively and the voltage at D is at GND (VDD).

3.4.3.3. Instant-On Operation

The proposed 7T1R cell can still operate in an instant-on mode as presented in [76]; so, when the supply voltage is lost, the voltage at D is also lost due to the volatile nature of the SRAM core. However the non-volatile element retains the stored data. The instant-on operation is employed to bring back the value stored in the resistive element to the SRAM core. Instant-on operation is started by setting the voltages at Ctrl1 and Ctrl2 (Vcont) at values larger than VDD; so,

M7 is turned ON, while the voltage at D varies depending on the value of the PMC resistance. Due to the high value of the PMC resistance in state '0', an uncertainty may exist due to a discharging node between D and DN. However, the small values of the ON state resistance of M1 and M7

(compared to the high resistance of the PMC) result in a low voltage at node D. This finally turns 111

OFF M3, thus preventing the discharge of node DN; so, the voltage at D is at 0V. If a '1' is stored in the PMC (with a low value of resistance), the voltage at D is at VDD while the voltage at DN is discharged through M3. Therefore, the data in the PMC is correctly restored to D.

The proposed design is hybrid in nature, because it has a further novelty in the circuit, namely the use of ambipolar transistors in the XOR gates for the dual-rail checker. The dual-rail checker is used for the concurrent error detection (CED). It is shared among the 7T1P memory array. Selector is used to control the connection between 7T1P cells and CED circuit. Figure 72 presents the connection between 7T1P array and CED.

Fig. 72. Connection between 7T1P Array and CED circuit

3.4.3.4. Proposed XOR Gate

A CMOS XOR gate requires at least 8 transistors; for a two-input CMOS XOR gate, two more inverters are needed to generate the reverse logic. Therefore, the total number of transistors is increased to 12. Ambipolar transistors are employed in this paper to reduce the number of transistors in an XOR gate based on their characteristics to behave as either NMOS or PMOS. The reduction in the number of transistors also improves the power dissipation [77].

Figure 73 presents the proposed XOR gate using ambipolar transistors and inverters. The two input signals are given by IN1 and IN2, while the output of the XOR gate is Out. The following cases are possible in the operation of the XOR gate. 112

Fig. 73. Proposed XOR gate using ambipolar transistors

Both IN1 and IN2 are ‘0’

In this case, node IN2 is connected to the polarity gate of the ambipolar transistors; when

IN2 is set to GND, both ambipolar transistors behave as NMOS. So, the ambipolar transistors operate based on the voltage at IN1. IN1 is at GND, so the ambipolar transistors AMB1 and AMB2 are ON and OFF respectively. The voltage at O1 is given by the difference between the supply voltage and the threshold voltage drop across AMB1 (VDD–Vth). Therefore, the output voltage

(VOut) is at GND.

IN1 and IN2 are ‘0’ and ‘1’ respectively

In this case, both ambipolar transistors behave as PMOS; AMB1 is OFF, while AMB2 is

ON. The voltage at O1 of the proposed XOR gate is given by the threshold voltage of the ambipolar transistor (Vth), so the output voltage is given by VDD.

IN1 is ‘1’ and IN2 is ‘0’

In this case, both ambipolar transistors behave as NMOS; AMB1 is OFF, while AMB2 is

ON. The voltage at O1 is at GND, so the voltage at Out is at VDD.

IN1 and IN2 are ‘1’

Both ambipolar transistors behave as PMOS. AMB1 is ON, while AMB2 is OFF. The voltage at O1 is at VDD and its output voltage is at GND.

Hence, the circuit of Figure 73 correctly operates as an XOR gate.

113

3.4.3.5. Dual-rail checker

The proposed XOR gates are connected in parallel (Figure 74) in a dual-rail checker circuit

[78][79]. In the absence of an SEU, the copies of the data stored in the SRAM at D and in the

RRAM at DP are the same. Two comparisons between the node pairs D and DP and DN and DP are executed to establish the CED feature. The dual-rail checker is connected to the proposed

NVSRAM cell (Figure 71) via a selector circuit; for CED, M7 is turned OFF, while the voltage at

Ctrl2 is at VDD, the voltages at the three nodes are provided as inputs to the two XOR gates.

Fig. 74. Dual-rail checker for CED

Fig. 75. Ambipolar-based dual-rail checker

114

Table 22. Voltages at nodes D, DN, and DP of proposed 7T1P cell and output voltages of a dual- rail checker

Input Voltage (V) Output Voltage (V) Status VD VDN VDP VER1

0 VDD 0 0 VDD No SEU

0 VDD VDD VDD 0 SEU

VDD 0 0 VDD 0 SEU

VDD 0 VDD 0 VDD No SEU

Table 22 shows the input and output voltages for the dual-rail checker. Every store operation writes to both the RRAM and the SRAM; so, the SRAM core and the RRAM are monitored by the dual-rail checker. As also applicable to hardened memory cells found in the technical literature [80, 81], the condition of logic inversion always applies to VDN and VD .Two cases are applicable.

 If either an SEU does not cause a logic inversion in the SRAM or there is no SEU, then VDP = VD.

 If a SEU causes a logic inversion in the SRAM, then VDP = VDN .

The outputs of the dual-rail checker also ensure that a single fault occurring in one of the

XOR gates will be detected as generating an invalid code at the output, i.e. this circuit is self- checking too. The restore operation therefore is required when VER1 = VDD and VER2=0. As described previously, the restore operation permits the data stored in the PMC to be written back in the SRAM core, thus correcting the SEU. Figure 75 presents the ambipolar-based dual-rail checker that utilizes the proposed XOR gates. Node DP is inserted at node IN1 of both the proposed

XOR gates while nodes D and DN are connected to node IN2 of XOR1 and XOR2 respectively.

3.4.3.6. Simulation Results

In this section, the proposed NVSRAM cell is evaluated by simulation. HSPICE is utilized as simulation tool, while the model in section 3.3 [76] is employed for simulating the PMC; the resistance range of the PCM is given by 30kΩ – 100MegΩ [76] The largest values for the CF height 115

(L) and CF radius (R) of the PMC are given by 1.5nm and 25.2nm respectively, while the threshold

CF height (hth) and the radius (rth) of the PMC [76] are selected at the values of 1.45nm and 0.225 nm respectively. Therefore, the OFF state resistance of the PMC is given by 99.958MegΩ, while the ON state resistance of the PMC is given by 30.063kΩ. The macroscopic model of Figure 76 is utilized for an ambipolar transistor; the transistor sizes are adjusted to generate the symmetric conduction between the PMOS and NMOS behaviors at 32nm CMOS feature size.

Fig. 76. Model of ambipolar transistor

Ambipolar-based XOR gate

In Figure 73, two inverters and two ambipolar transistors are needed in the proposed XOR gate. Figure 77 shows the input and output voltages of an inverter at 32nm CMOS feature size; so the delay is 18.37ps for the '1' to '0' transition and 17.41ps for the '0' to '1' transition. Figure 78 shows the input and output voltages of the proposed XOR gate.

116

Table 23. Delay, power dissipation, and Power Delay Product (PDP) of the proposed XOR gate

State Power PDP Delay (ps) Dissipation -15 IN1 IN2 OUT (µW) (*10 ) 0 0 0 324.2 5.1496 1.6695 0 1 1 338.3 14.1325 4.781 1 0 1 7.9 105.353 0.8323 1 1 0 36.5 27.4792 1.003

Fig. 77. Input and output voltages of inverter

Fig. 78. Input and output voltages of the proposed XOR gate

117

Table 23 shows the delay, power dissipation and PDP for the proposed XOR gate under the four input combinations (bold entries identify the worst cases). The proposed XOR gate encounters a larger delay when the voltage at IN1 is at GND due to the threshold voltage drop across the ambipolar transistor. The worst cases for the power dissipation and the power delay product (PDP) of the proposed XOR occur when one of the inputs is at 1.

NVSRAM Write Operation

To write to the 7T1P cell, data must be written to D and the PMC. As mentioned previously, the supply voltage and the voltage at Ctrl1 must be increased to Vdh, while the voltage at Ctrl2 must have an opposite value of the data to be stored. Vdh is related to the voltage difference across the

PMC, in this paper, Vdh is 3.5V.

Fig. 79. Voltages at D and DN, and PMC resistance value of 7T1P cell when '0' and '1' are written to the proposed memory cell

Figure 79 shows the voltage at D and DN as well as the PMC resistance when '0' and '1' are written into the memory cell. The simulation is divided into five parts marked as follows: N/A,

Write '0', N/A, Write '1', N/A. For N/A, the voltages of WL, Ctrl1 and Ctrl2 are at GND and no 118 read or write operation is executed. However when data is written into the memory cell, the voltages at D and DN are increased and the PMC resistance changes as follow.

Table 24. Delay, power dissipation, and Power Delay Product (PDP) of proposed 7T1P cell for write '0' and '1' operations (when the PMC resistance is 100MegΩ and 70kΩ respectively)

Write Operation '0' '1' Delay (ps) 0.023 3.827 Power dissipation (µW) 871.2 795.271 PDP (*10-15) 0.020038 3.0443

 The PMC resistance has the highest value for '0'; after the write '0' operation, the voltage

at D decreases to GND.

 The PMC resistance is switched to the lowest value for '1' and the voltage at D after the

write '1' operation is at VDD.

Table 24 shows the delay, power dissipation and power delay product (PDP) of the proposed 7T1P cell for both cases of the store (write) operation. The write '0' operation is faster than the write '1' operation; during the write '1' operation, the PMC resistance is reduced and the voltage difference across the PMC also decreases, thus the switching time of the PMC is slower.

The power dissipation (PDP) of the write '0' operation is higher (lower) than the write '1' operation for the same reasons. The write operation of DICE takes 5.011 ps at 32nm feature size; despite the presence of the RRAM, the proposed cell has better performance than DICE because the SRAM core in the NVSRAM utilizes the 6T configuration rather than the feedback arrangement of [74].

NVSRAM Read Operation

In the proposed scheme the read operation requires reading both the SRAM core and the

RRAM.

119

- SRAM Read

The process of precharging the bitline voltages (BL and BLB) to VDD is initiated prior to the read operation; the word line voltage (VWL) of the selected memory cell is then at VDD, such that the voltage stored in the SRAM cell is made available at both BL and BLN.

Table 25. Delay, power dissipation, and Power Delay Product (PDP) for read operation of the SRAM core in the proposed cell

Read Operation '0' '1' Delay (ps) 7.81 8.61 Power dissipation (µW) 9.68425 9.38908 PDP (*10-15) 0.075634 0.08084

Table 25 shows the delay, the power dissipation, and the power delay product (PDP) of this read operation; while the read ‘0’ operation has the least delay, the least power dissipation (but the highest PDP) is accounted for the read ‘1’ operation. For comparison purposes, the read operation for DICE takes 10.041ps at 32nm, again higher than the proposed scheme.

- RRAM Read

Fig. 80. Plot of voltage at DP versus read time of 7T1P cell when data stored in the PMC is read

120

For reading the RRAM, the PMC resistance is monitored as the voltage at node DP. The data stored in the PMC is found by having the voltage of node Ctrl1 at GND (to turn OFF transistor

M7 and separate D and DP); also the voltage of Ctrl2 must be at VDD. Figure 80 plots the voltage at node DP when the data stored in the PMC is read. If a '0' is stored in the RRAM (i.e. the PMC resistance is very large), the voltage at DP is very small; if a '1' is stored in the RRAM (so the PMC resistance is very small), when the voltage at Ctrl2 is at VDD, the voltage at DP increases up to VDD.

So by measuring the voltage at DP during the read operation, its delay is 1.923ps, so smaller than the delay for reading the SRAM core.

Dual-rail Checker

Next, the performance of the dual-rail checker is established. By using the proposed XOR gate (Figure 73) and connecting the voltage at DP to both the polarity gates of the ambipolar transistors, the results of Table 26 are found for delay, power dissipation and power delay product

(PDP).

Table 26. Voltages at D, DN, and DP of 7T1P cell and output voltage, delay time, power dissipation, and PDP of dual-rail checker

Input (V) Output (V) Power PDP Delay (ps) Dissipation -15 VD VDN VDP VER1 VER2 (µW) (*10 )

0 VDD 0 0 VDD 365.4 5.43125 1.98458

0 VDD VDD VDD 0 321.6 4.03484 1.2976

VDD 0 0 VDD 0 364.5 4.19548 1.52925

VDD 0 VDD 0 VDD 301.4 5.77171 1.73959

The worst case for the delay and the PDP occurs when a ‘0’ is stored at both D and DP

(this corresponds to one of the fault free cases); the other fault free case (i.e. for ‘1’ at both D and

DP) accounts for the worst power dissipation. 121

For comparison purpose, consider a CMOS implementation of a dual-rail checker. The

CMOS XOR gate of [9] is used in place of the proposed ambipolar-based XOR gate. This CMOS gate requires 8 transistors, so the total number of transistors in a CMOS-based implementation of a dual-rail checker is 18 (one inverter is used to find inverse voltage of node DP). The delay, power dissipation and PDP of a CMOS-based dual-rail checker are shown in Table 27. This circuit is faster and has a better PDP, however it incurs in a larger power dissipation and requires a larger number of transistors compared with the proposed ambipolar-based implementation.

Table 27. Voltages at D, DN, and DP of a 7T1P cell and output voltage, delay time, power dissipation, and PDP of a dual-rail checker implemented in CMOS

Input (V) Output (V) Power Delay PDP Dissipation (ps) -15 VD VDN VDP VER1 VER2 (µW) (*10 )

0 VDD 0 0 VDD 58.46 22.4873 1.31461

0 VDD VDD VDD 0 51.12 15.6201 0.798497

VDD 0 0 VDD 0 57.92 22.6245 1.31041

VDD 0 VDD 0 VDD 52.82 15.3855 0.812661

Restore Operation

Data correction occurs when a SEU has affected the SRAM core and its occurrence is detected by the dual-rail checker. So following the detection for the two faulty cases (i.e. DN=DP), a restore operation takes place to copy back the value of the data stored in the RRAM to the SRAM core. The voltage at WL (VWL) is at VDD, while VBL and VBLB are selected depending on the value to be restored, i.e. the voltage at D is made the same as the voltage at DP.

Table 28 shows the delay, power dissipation and power delay product (PDP) of the 7T1P

SRAM for both cases of restored data from the RRAM to the SRAM core. The worst values (bold entries) for the delay and PDP (power dissipation) are encountered when a ‘1” (‘0’) is restored. 122

Table 28. Delay, power dissipation, and Power Delay Product (PDP) of proposed 7T1P cell for write '0' and '1' operations (when the PMC resistance is 100MegΩ and 70kΩ respectively)

Restore Operation Data '0' Data '1' Delay (ps) 18.90 22.56 Power dissipation (µW) 25.86157 21.32144 PDP (*10-15) 0.4678357 0.4810118

It should be noted that as commonly found in coding circuits [79], a dual-rail checker is used for the word output of a memory; in this arrangement, error detection and correction are evoked once a read operation is executed and the voltages at D, DN and DP are checked. The correction of the SEU requires more time to be corrected using the proposed scheme than by hardening [74, 81] due to delay in the CED circuitry.

CMOS Feature Size

In the previous sections, the NVSRAM cell has been simulated by using the (basic) CMOS

Predictive Technology Model (PTM) at a feature size of 32nm. Next the high performance (HP)

PTMs at feature sizes of 16, 22, and 32nm are utilized to assess the proposed NVSRAM cell.

Table 29. Delay (ps) of each operation of the proposed NVSRAM cell using high performance (HP) CMOS PTMs at different feature sizes

Delay of each Feature Size (nm) Feature Size (nm) Operation (ps) 16 22 32 16 22 32 Supply Voltage (V) 0.7 0.8 0.9 0.9 Write ‘1’ 2.843 0.907 0.791 2.858 0.896 0.791 SRAM Read ‘1’ 6.237 7.823 8.767 4.978 6.872 8.767 SRAM Read ‘0’ 5.809 7.318 8.231 4.653 6.438 8.231 Dual-Rail Checker 31.602 44.849 60.495 21.402 36.856 60.495 Restore ‘1’ 10.89 15.28 20.04 7.97 13.62 20.04 Restore ‘0’ 9.54 14.32 19.13 7.31 12.98 19.13

123

Table 29 shows the delay of each operation of the proposed NVSRAM cell; a reduction in the CMOS feature size causes an increase of the write time and the delay of the dual-rail checker, but a decrease in all other operations. At a lower CMOS feature size, performance is overall improved, but the reduction in supply voltage affects the write operation, i.e. the write time of the proposed NVSRAM cell at a lower CMOS feature size is higher. As the capacitance of CMOS at lower feature sizes (such as 16nm and 22nm) is less than the capacitance of CMOS at a larger feature size (32nm), then when the data stored in the PMC must be read, the voltage at the node DP increases at a higher rate for a lower CMOS feature size. The read '0' operation requires a longer delay to allow the voltage at DP to have the correct value. Moreover, as DP is connected to the polarity gate of the ambipolar transistor and the threshold voltage of the ambipolar transistors is set to half of the value of the supply voltage, the voltage at DP for a read '0' operation slightly increases from GND to VDD, thus degrading the performance of the dual-rail checker to compare the '0' data.

So at a lower CMOS feature size, the increase of the voltage at DP affects the performance of dual- rail checker, i.e. the dual-rail checker at a lower CMOS feature size is slower.

Single Event Upset (SEU) Tolerance

A Single Event Upset (SEU) in a SRAM cell occurs when a charged particle strikes the most sensitive node and flips the state of the SRAM cell, causing a change in stored data. The sensitivity of SRAM to radiation is quantified by the critical charge parameter, Qcrit, as the least amount of charge required to change the state of the cell [73, 82]. Table 30 shows the critical charge of the 7T1P cell for the three nodes D, DN and DP for ‘0’ and ‘1’ as data stored in the cell. The critical charge is given by the bold entries and occurs always at node DN. Table 30 confirms the findings of [69], namely that the node at the resistive element has a very high charge and the data stored in the resistive element is not connected to the node of critical charge, i.e. unlikely to be affected by a SEU. The charge at DP is many orders of magnitude higher than the critical charge; this is caused by the resistance value and the voltage across the PMC. 124

Table 30. Charge of nodes D, DN and DP of the proposed 7T1P cell

Charge for Stored Data Value Node '0' '1' D 6.1504*10-16 6.1802*10-16 DN 5.9049*10-16 5.9536*10-16 DP 9.5062*10-15 9.3147*10-9

The Soft Error Rate (SER) is considered next for the proposed cells; it is derived from the critical charge by using the analytical model of [41]. In this model, the SER is given by

−푄퐶푟𝑖푡−푃푀푂푆 −푄퐶푟𝑖푡−푁푀푂푆 SER = K*(퐴푑푖푓푓_푃푀푂푆 ∗ 푒푥푝 ( ) + 퐴푑푖푓푓_푁푀푂푆 ∗ 푒푥푝 ( ) (35) ɳℎ표푙푒 ɳℎ표푙푒

where K is the overall scaling factor, ɳ is the measured charge collection efficiency at a given radiation. Using (35) at 32nm CMOS feature size, the SERs of the proposed 7T1P cells using the ambipolar-based CED and CMOS-based CED are 3.96au and 6.32au respectively.

Table 31. Voltages at D, DN, and DP of 7T1P cell and output voltage, delay time, power dissipation, and PDP of dual-rail checker

Input (V) Output (V) Power PDP Delay (ps) Dissipation -15 VD VDN VDP VER1 VER2 (µW) (*10 )

0 VDD 0 0 VDD 11.811 8.7933 0.10386

0 VDD VDD VDD 0 66.88 35.491 1.2703

VDD 0 0 VDD 0 11.808 8.7967 0.095976

VDD 0 VDD 0 VDD 66.904 35.491 2.3745

Next, the performance of the dual-rail checker is established. By using the proposed XOR gate (Figure 73) and connecting the voltage at DP to both the polarity gates of the ambipolar transistors, the results of Table 31 are found for delay, power dissipation and power delay product

(PDP).The worst case for the delay, the power dissipation and the PDP occurs when a ‘1’ is stored in the PMC and the voltage at node DP must change from GND to VDD. 125

For comparison purpose, consider a CMOS implementation of a dual-rail checker. The

CMOS XOR gate of [9] is used in place of the proposed ambipolar-based XOR gate. This CMOS gate requires 12 transistors, so the total number of transistors in a CMOS-based implementation of a dual-rail checker is 24. The delay, power dissipation and PDP of a CMOS-based dual-rail checker are shown in Table 32. This circuit is faster and has a better PDP, however it incurs in a larger power dissipation and requires a larger number of transistors compared with the proposed ambipolar-based implementation.

Table 32. Voltages at D, DN, and DP of a 7T1P cell and output voltage, delay time, power dissipation, and PDP of a dual-rail checker implemented in CMOS

Input (V) Output (V) Power Delay PDP Dissipation (ps) -15 VD VDN VDP VER1 VER2 (µW) (*10 )

0 VDD 0 0 VDD 58.46 22.4873 1.31461

0 VDD VDD VDD 0 51.12 15.6201 0.798497

VDD 0 0 VDD 0 57.92 22.6245 1.31041

VDD 0 VDD 0 VDD 52.82 15.3855 0.812661

Area

The area of the proposed 7T1P SRAM cell is found by using Cadence to design the layout of the proposed cell while the PMC and the ambipolar transistors are stacked on a different plane

[83].

126

Fig. 81. Layout of the proposed NVSRAM (7T1P)

Fig. 82. Layout of the proposed Ambipolar-based CED

Fig. 83. Layout of the CMOS-Based CED

127

Figures 81 to 83 show the layout of the proposed 7T1P NVSRAM cell and its ambipolar- based and CMOS-based CED circuits respectively. The PMC and the ambipolar transistors are stacked on a different plane than this circuit, hence only the area of the MOSFETS is considered.

The total area of the proposed 7T1P NVSRAM cell is 2523.4598λ2 while the areas of the ambipolar- based CED and the CMOS-based CED are 2525.6173λ2 and 6472.3457λ2 respectively (where λ denotes is half of the CMOS feature size).

Circuit Complexity

The proposed cell is analyzed with respect to the number of CMOS transistors (or equivalent for the non-CMOS elements such as the PMC and the ambipolar transistors) as measure of circuit complexity. The three parts of the proposed NVSRAM cells require the following number of transistors:

1. SRAM core: 6T

2. RRAM: 1T and 1 PMC; the PMC has a length commeasurable to a 1T at the feature size

of 32nm (as reported in a previous section), so even though not encountered in the layout

due to the conducting bridge nature of this resistive element, the PCM is at most equivalent

to 1T. Therefore, the RRAM has a circuit complexity of 2T

3. CED: in the proposed hybrid implementation, the dual-rail checker requires two XOR

gates. As presented in figure 75, 3 inverters (6T) and 4 ambipolar transistors are required

for the CED circuits. Implementations of an ambipolar transistor in [77] are at the same

scale as the one used here for the CMOS-based MOSFET. Hence, the CED requires an

equivalent complexity of 10T. A CMOS implementation requires 18 transistors; it is faster

and has a better average PDP, while incurring in a higher power dissipation.

The total number of transistors in the proposed NVSRAM inclusive (exclusive) of the CED circuitry is therefore 18 (8). As stated previously, the 10T CED circuit is provided at the output of the memory so it is shared among all cells; hence, the overhead of the proposed scheme is mostly 128 associated with the added non-volatile function, i.e. 2T. This is significantly less than the 12 or 11T required by [74] and [81], respectively

3.4.4. PMC-Based Logic in Memory

In this section, two PMC-based LiM cells are proposed; the programmable metallization cell (PMC) is used as nonvolatile storage element, while CMOS transistors (as well as ambipolar transistors) are used as control/processing elements. The operations of these cells are voltage-based, so different from the current-mode of previous LiM schemes [41][42].

Fig. 84. General structure of the proposed (PMC-based) LiM cell

Figure 84 presents the general structure of the proposed PMC-based LiM cell. The memory is a Resistive RAM (RRAM) that consists of a transistor and a programmable metallization cell

(PMC), so 1T1P. The voltage at node D corresponds to the data stored in the PMC, while its complementary value (DN) is generated by using an inverter. The logic circuit of the LiM cell is then designed using different schemes. In the first scheme, ambipolar transistors are employed in the proposed cell to implement some of the logic functions for LiM.The second scheme is CMOS- based and implements the AND/OR/XOR/Inverter (AOXI) functions as part of the logic circuit of

Figure 84. 129

Throughout this manuscript, the proposed cells are simulated using HSPICE as simulation tool, while the model that presented in the previous section is employed for simulating the PMC.

The resistance range of the PMC is given by 30kΩ – 100MegΩ. The largest values for the CF height (L) and CF radius (R) of the PMC are given by 1.5nm and 25.2nm respectively, while the threshold CF height (hth) and the radius (rth) of the PMC are selected as 1.45nm and 0.225 nm respectively. Therefore, the OFF state resistance of the PMC is given by 99.958MegΩ, while the

ON state resistance of the PMC is given by 30.063kΩ. Unless otherwise specified, a 32nm CMOS feature size is assumed (with a supply voltage of 0.9V).

3.4.4.1. Write Operation

The write operation for LiM starts by setting the voltage at BL and Ctrl2; the voltage at

WL is at VDD. When there is the required voltage difference across the PMC, the write operation starts. To improve the write time of the PMC, the supply voltage must be increased. In this paper, the supply voltage used in the simulation is given by 2.45V and the time of the write '1' (write '0') operation is 17.301ps (20.628ps).

3.4.4.2. Read Operation

Fig. 85. Voltage at node D in the read operation for a '1' as data stored in the PMC

130

Figure 85 shows the voltage at D of the proposed PMC when a '1' is stored as data in the cell. As the PMC resistance in state '1' ('0’) is low (high), the voltage at D increases to VDD

(remains at GND). The read delay is 20.43ps.

3.4.4.3. Ambipolar-Based LiM

Fig. 86. First proposed (ambipolar-based) LiM cell

Figure 86 shows the first proposed LiM cell; two ambipolar transistors are utilized together with MOSFETs. In addition to the ambipolar transistors, 7 MOSFETs and 1 PMC are required in the cell of Figure 86, i.e. it is a 7T2A1P cell. The LiM cell operates as follows. The data stored as

PMC resistance is read as voltage at node D by setting the voltage at lines WL and Ctrl2 to GND and VDD respectively. If a '0' ('1') is stored in the cell, the voltage at D is at GND (VDD). The input data is given by the voltages at nodes XA and XO and by precharging the voltage at node OUT

(VOUT) to VDD prior to starting any logic operation. Next, the simulation results for the cell in Figure

86 are presented. 131

AND Function

For the AND operation, the voltages at XCont and XO are always set to GND (0V), transistors MXOR and ML2 are OFF and ON respectively. The input (as voltage at XA) is then

ANDed with the stored data (voltage at D). The only case for the voltage at OUT to remain at its value (VDD) occurs when both voltages at D and XA are at VDD. So, when the voltages at D and

XA are at VDD, DN and XAB are at GND and transistors ML1 and ML3 are OFF. As transistor

MXOR is also OFF, then there is no direct path between the match line (OUT) and GND, thus the voltage at OUT retains at its value.

Fig. 87. AND operation between a '1' stored in the PMC and '0' as input data

Figure 87 shows the voltage at D, the precharged voltage and the output voltage when a '1' is stored in the PMC cell and a '0' is provided as input data. The PMOS transistor is used to precharge OUT to VDD prior to start a logic operation and the RRAM is read. So after reading the data stored in the RRAM (occurring at 20ps), the gate voltage of the precharged transistor (i.e. the voltage at node Pre) is at VDD. The voltages at OUT and precharge are separated and depending on the stored and input data, the AND operation is then performed. 132

Table 33. Performance of proposed LiM cell when operating the AND function

D XA OUT Delay (ps) Power (µW) PDP (*10-16J) 0 0 0 43.471 12.793 5.5613 0 1 0 32.207 11.437 3.6835 1 0 0 37.066 17.634 6.5363 1 1 1 20.02 19.201 3.8439 Average 33.191 15.266 4.906

Table 33 shows the delay, power dissipation and power delay product (PDP) of the proposed LiM cell when the AND operation is executed. Note that the delay is measured from the start of the read operation for the PMC till the output voltage reaches a stable state. The worst case delay is 43.471ps and occurs when both the stored and input data values are '0'.

OR Function

For the OR operation, the voltages at XA and XCont are at VDD and GND respectively and transistors ML1 and MXOR are OFF. The only condition for which the voltage at OUT is discharged to GND is when the stored and input data are '0'. The voltages at DN and XOB are at

VDD and transistors ML2 and ML3 are ON, i.e. the voltage at OUT is discharged to GND.

Table 34. Performance of proposed LiM cell when operating the OR function

D XA OUT Delay (ps) Power (µW) PDP (*10-16J) 0 0 0 32.207 11.437 3.6835 0 1 1 20.05 0.38001 0.076192 1 0 1 20.02 19.201 3.8439 1 1 1 20.03 12.029 2.4094 Average 23.077 10.762 2.503

Table 34 shows the delay, power dissipation and power delay product (PDP) when the OR operation is executed; the worst case delay is 32.207ps. 133

XOR Function

For the XOR operation, the voltages at XCont and XOB are at VDD and GND respectively and the transistors MXOR and ML2 are ON and OFF. As mentioned previously, the behavior of an ambipolar transistor is regulated by the voltage at its polarity gate. If the voltage at the polarity gate is VDD (GND), then the ambipolar transistor behaves as a PMOS (NMOS). Hence, an ambipolar transistor can operate as XOR gate [77]. So, when an ambipolar transistor operates as a

PMOS and its gate voltage is at GND, then the voltage at OUT is not discharged to GND. However, there is still a voltage drop across the ambipolar transistor; a second ambipolar transistor is used to address this problem, i.e. an NMOS behaving ambipolar transistor and a PMOS behaving ambipolar transistor, such that in the discharging process, the voltage at OUT is at GND.

Table 35. Performance of proposed LiM cell when operating the XOR function

D XA OUT Delay (ps) Power (µW) PDP (*10-16J) 0 0 0 40.273 12.437 5.009 0 1 1 20.05 0.39056 0.078308 1 0 1 20.05 17.215 3.4517 1 1 0 36.095 14.546 5.2505 Average 29.117 11.147 3.447

Table 35 shows the delay, power dissipation and power delay product (PDP) for the XOR operation. The worst case delay is 40.273 ps.

Full Adder

Next, the proposed LiM cell is utilized to design a full adder.

푆푢푚 = 퐴 ⨁ 퐵 ⨁ 퐶 (36)

퐶표푢푡 = (퐴 ∙ 퐵) + [퐶푖푛 ∙ (퐴 ⨁ 퐵)] (37) 134

(36) and (37) give the logic equations of a full adder where A and B are the input (one-bit) numbers, Cin is the carry-in input, Sum is the sum output and Cout is the carry-out bit. Four proposed

LiM cells (shown in Figure 88) must be utilized to design a full adder.

Fig. 88. Full adder using first proposed LiM cell

Since the input data of a cell is inverted and the output data from cells A and C are used as inputs to cells B and D respectively, the output of cells A and C must be inverted. As shown in

Figure 88, Sum is calculated by using cells A and B, while Cout is calculated from cells C and D respectively. Cell A generates the XNOR operation between the input bits A and B by setting the voltages at XCont and XOB to VDD and GND. This output voltage is connected to the input XAB of cell B; the XOR operation between A, B, and Cin is executed by setting the voltages at XCont and XOB to VDD and GND while Cin is provided as voltage at D. Two cells in series are required to generate Cout. As shown in Figure 88, the output of cell C is connected to XOB of cell D. The operation of the full adder is generated by controlling the voltages at D, XAB, XCont and XOB of each cell, as shown in Figure 88.

Since cells B and D are connected in series to cells A and C respectively (Figure 88), this simulation must take into account these two steps. 135

Fig. 89. Voltages at Pre1, Pre2, Cout and Sum when A, B, and Cin are in states '1', '1', and '0' respectively

Figure 89 shows the voltages at nodes Pre1, Pre2, Cout and Sum when the inputs A, B, and

Cin are '1', '1', and '0' respectively. Pre1 is connected to cells A and C, while Pre2 is connected to cells B and D (Ctrl2 of all cells is at VDD).

Table 36. Performance of full adder when implemented using proposed LiM cells

-16 A B Cin Cout Sum Delay (ps) Power (µW) PDP (*10 ) 0 0 0 0 0 59.947 33.448 20.051 1 0 0 0 1 58.194 40.620 23.639 0 1 0 0 1 51.593 35.522 18.327 1 1 0 1 0 59.136 34.118 20.176 0 0 1 0 1 60.010 25.263 15.161 1 0 1 1 0 54.743 44.776 24.512 0 1 1 1 0 54.751 39.177 21.450 1 1 1 1 1 40.04 39.811 15.940 Average 54.8018 36.5919 19.907

Table 36 shows the delay, power dissipation and power delay product (PDP) of a full adder implemented using the proposed LiM cells. The worst case delay is 60.01ps

136

3.4.4.4. CMOS-Based LiM

This section presents the second proposed LiM cell; this cell still utilizes a PMC as a nonvolatile storage element, while only CMOS transistors are used as control and processing elements.

Fig. 90. Second proposed (CMOS-based) LiM cell (9T1P)

Figure 90 shows the proposed cell that implements the AND/OR/XOR/Inverter (AOXI) logic function. This cell requires 9 MOSFETs and 1 PMC, i.e. it is 9T1P. The data stored in the

PMC is read by setting the voltage at lines WL and Ctrl2 to GND and VDD respectively; if this data is '0' ('1'), the voltage at D is at GND (VDD). As in the previous proposed design, prior to any logic operation, the voltage at OUT (VOUT) is precharged to VDD. The input voltages are provided at XA,

XO, Cinv and ContX, such that the AND/OR/XOR/ Inverter function between the stored and input data is generated.

AND Function

For the AND operation, the voltages at Cinv, ContX, and XO are always at GND (0V).

Transistors Minv and ML5 are OFF while transistor ML2 is ON. The AND operation depends on the value of the input data given by the voltage at XA.

137

Table 37. Performance of proposed CMOS-based LiM cell for AND function

D XA OUT Delay (ps) Power (µW) PDP (*10-16J) 0 0 0 29.94 14.411 4.3148 0 1 0 32.67 11.234 3.6701 1 0 0 34.942 16.779 5.8629 1 1 1 20.03 16.664 3.3379 Average 29.3955 14.772 4.29643

The voltage at OUT remains at its value (VDD) only when D and XA are at VDD (DN and

XAB are at GND). Transistors ML1 and ML3 are OFF; as transistor ML5 is also OFF, then there is no direct path between OUT and GND, and OUT retains its value. For the other conditions, transistor ML2 is always ON; so depending on the voltages at DN and XAB, if transistor ML1 or

ML3 is ON, then a direct path between the supply voltage and GND exists, i.e. the output voltage is discharged to GND. The simulation results (Table 37) show that the worst delay of the proposed cell is 34.942ps, so better than for the first proposed cell.

OR Function

For the OR operation, the voltages at Cinv, ContX, are always at GND (0V) while the voltage at XA is at VDD. Transistors Minv, ML3, and ML5 are OFF, while the input signal is provided at XO.

Table 38. Performance of proposed CMOS-based LiM cell for OR function

D XA OUT Delay (ps) Power (µW) PDP (*10-16) 0 0 0 32.67 11.234 3.6701 0 1 1 20.05 0.4428 0.088781 1 0 1 20.03 16.664 3.3379 1 1 1 20.03 9.4527 1.8934 Average 23.195 9.448 2.24755

138

Table 38 shows the simulation results for the OR operation. VOUT is discharged to GND if and only if the voltages at D and XO are GND. Else, a direct path between VDD and GND does not exist and the output voltage retains at its value. The simulation results in Table 38 show that the worst delay of the proposed LiM cell for the OR function is 32.67ps.

XOR Function

For the XOR operation, the voltages at XA and Cinv are VDD and GND respectively, so transistors Minv and ML3 are OFF. The voltage at Contx is the same as the voltage at XO.

Table 39. Performance of proposed CMOS-based LiM cell for XOR function

D XA OUT Delay (ps) Power (µW) PDP (*10-16) 0 0 0 32.670 11.234 3.6701 0 1 1 20.05 0.44755 0.089734 1 0 1 20.03 16.664 3.3379 1 1 0 33.090 12.764 4.2236 Average 26.46 10.2774 2.83033

Table 39 shows the simulation results. The operation of transistors ML1, ML2, ML4, and

ML5 is dependent on the input signal and the comparison with the data stored in the cell.

Inverter

The proposed cell requires the implementation of the inverter function for the stored data; transistor Minv is provided for this purpose. Transistors ML2 and ML5 are OFF while transistor

Minv is ON by setting the voltages at XO and Cinv to VDD (and GND for ContX).

139

Table 40. Performance of proposed CMOS-based LiM cell for inverse function

D OUT Delay (ps) Power (µW) PDP (*10-16) 0 1 20.05 0.40612 0.081427 1 0 35.820 10.057 3.6024 Average 27.935 5.23156 1.8419

Table 40 shows the delay, power dissipation and PDP for the inverter function. The worst case delay occurs when a '1' is stored and is given by 35.82ps.

Full Adder

Next, the proposed 9T1P cells are connected as a full adder (Figure 91). The full adder requires the lines WL, BL and Cinv to be at GND, to controlling the voltage at Ctrl2, and precharge the voltage of Pre. (Note: Figure 92 shows the timing diagram of the full adder of Figure 91.)

Fig. 91. Full adder using second proposed LiM cell

140

Fig. 92. Voltages at Ctrl2 and Pre of cells A, B, C, and D of full adder

Table 41. Metrics of the full adder cell when implemented using proposed LiM cell

-16 A B Cin Cout Sum Delay (ps) Power (µW) PDP (*10 ) 0 0 0 0 0 54.511 44.269 24.132 1 0 0 0 1 55.774 41.442 23.114 0 1 0 0 1 53.507 27.092 14.496 1 1 0 1 0 55.034 44.598 24.544 0 0 1 0 1 52.214 45.431 23.721 1 0 1 1 0 52.340 43.918 22.987 0 1 1 1 0 52.151 27.830 14.513 1 1 1 1 1 40.03 59.876 23.969 Average 51.945 41.807 21.4345

Table 41 shows the delay, power dissipation and PDP of the full adder when using the proposed cells; the delay of a full adder when four 9T1P LiMs are employed, is smaller than using four 7T2A1P LiM cells; however the number of transistors in this design is larger, i.e. 41 transistors and 4 PMCs are now utilized.

141

3.4.4.5. Variation Analysis

The proposed LiM cell designs are analyzed next with respect to statistical variations in the transistors as well as the PMC with respect to the different logic operations.

Ambipolar-based LiM

The percentage variations of the MOSFET (threshold voltage (Vth) and channel length (L))

[70] and the dimension of the PMC (CF height (h) and CF radius (r)) in the ambipolar-based

(7T2A1P) LiM cell are established first. They are evaluated using a Gaussian distribution with a variation of 3σ/µ (in percentage) at 32nm feature size.

Table 42. Percentage variation (3σ/µ) of delay of the proposed 7T2A1P LIM (AND operation)

CMOS PMC (5%) Metric Vth (3%) L (2%) h r Delay 1.361*10-13 2.919 3.058*10-3 22.87*10-3 Power dissipation 2.02*10-13 4.286 6.968*10-3 94.23*10-3 PDP 0 0.0024 3.9104*10-3 96.42*10-3

Table 43. Percentage variation (3σ/µ) of delay of the proposed 7T2A1P LIM (OR operation)

CMOS PMC (5%) Metric Vth (3%) L (2%) h r Delay 1.225*10-13 1.9899 1.962*10-5 1.962*10-5 Power dissipation 4.5195*10-14 4.915 1.809*10-5 1.809*10-5 PDP 0 3.245 7.346*10-6 7.346*10-6

Table 44. Percentage variation (3σ/µ) of delay of the proposed 7T2A1P LIM (XOR operation)

CMOS PMC (5%) Parameter Vth (3%) L (2%) h r Delay 9.7956*10-14 1.8935 1.563*10-5 0.2015 Power dissipation 4.156*10-14 2.939 1.531*10-5 4.3125 PDP 0 2.243 2.3855*10-6 4.2675

142

Tables 42 to 44 show that the results under these variations for the delay, power dissipation and PDP for the three operations of this cell. Variation of the threshold voltage affects performance substantially less than the channel length. For the variation of the PMC dimension, the simulation results show that the variation of the CF height affects less the delay, power dissipation and PDP than a variation of the CF radius. Since the CF height (ratio) is related to the OFF (ON) state resistance of the PMC and the value of this resistance is very large (small), its change has a very marginal (significant) effect on the performance of the proposed cell.

Next the performance of the proposed PMC-Based LiM is considered at different CMOS feature size; high performance CMOS (HP-CMOS) PTMs are employed.

Table 45. Performance of the proposed 7T2A1P LiM varying CMOS feature size (supply voltage is fixed at 0.9V)

Logic Operation Feature Size Delay (ps) Power (µW) PDP (*10-16) 16nm 33.492 13.413 4.4921 AND 22nm 39.414 12.863 5.0696 32nm 44.502 15.052 6.6983 16nm 26.8 11.921 3.1949 OR 22nm 29.792 11.268 3.3569 32nm 32.437 13.011 4.2205 16nm 31.364 15.450 4.8458 XOR 22nm 36.424 15.395 5.6074 32nm 41.193 17.672 7.2795

Table 45 presents the performance of the proposed 7T2A1P LiM cell when varying the

CMOS feature size (at a 0.9V supply voltage). At a lower CMOS feature size, both the delay and

PDP are smaller. Therefore, the performance of the proposed 7T2A1P LiM cell is better when a lower CMOS feature size is employed.

143

CMOS-based LiM

The percentage variations of the MOSFET (threshold voltage (Vth) and channel length (L))

[70] and the dimension of the PMC (CF height (h) and CF radius (r)) in the 9T1P CMOS-based cell are also found at 32nm CMOS feature size.

Table 46. Percentage variation (3σ/µ) of delay of the proposed 9T1P LiM cell (AND operation)

CMOS PMC (5%) Metric Vth (3%) L (2%) h r Delay 1.9759*10-13 1.543 4.023*10-9 0.0846 Power dissipation 1.076*10-13 3.569 6.715*10-8 0.3348 PDP 0 2.534 7.113*10-8 0.4089

Table 47. Percentage variation (3σ/µ) of delay of the proposed 9T1P LiM cell (OR operation)

CMOS PMC (5%) Metric Vth (3%) L (2%) h r Delay 0 2.107 4.669*10-9 4.669*10-9 Power dissipation 1.38*10-13 5.100 8.875*10-8 8.875*10-8 PDP 0 3.304 9.302*10-8 9.302*10-8

Table 48. Percentage variation (3σ/µ) of delay of the proposed 9T1P LiM cell (XOR operation)

CMOS PMC (5%) Metric Vth (3%) L (2%) h r Delay 1.192*10-13 2.107 4.669*10-9 0.295 Power dissipation 1.62*10-13 5.1005 8.875*10-8 2.356 PDP 0 3.304 9.302*10-8 2.091

Table 49. Percentage variation (3σ/µ) of delay of the proposed 9T1P LiM cell (inverter operation)

CMOS PMC (5%) Metric Vth (3%) L (2%) h r Delay 0 1.690 1.969*10-13 0.202 Power dissipation 1.028*10-13 3.565 2.859*10-6 1.476 PDP 0 2.278 2.859*10-6 1.291

144

Tables 46 to 49 show that the variation of the threshold voltage has less of an effect on the delay, power dissipation and PDP of the proposed CMOS-based LiM cell than the variation of the channel length. The performance of this cell is strongly related to the operation of the MOSFETs, hence the variation of the channel length seriously affects both the delay and the power dissipation.

As for the PMC dimension, variation of the CF height of the PMC has less of an effect on the delay, power dissipation and PDP than the variation of the CF radius. The same reasons as previous presented for the ambipolar-based cell are also applicable in this case.

Next, the performance of the proposed 9T1P LiM cell at different CMOS feature size is also found under the same conditions as for the first proposed cell.

Table 50. Performance of the proposed 9T1P LiM at difference CMOS Feature Size where its supply voltage is fixed at 0.9V

Logic Operation Feature Size Delay (ps) Power (µW) PDP (*10-16) 16nm 25.622 14.990 3.8406 AND 22nm 28.192 14.342 4.0431 32nm 30.439 16.703 5.0842 16nm 27.099 11.781 3.1925 OR 22nm 30.192 11.096 3.3500 32nm 32.919 12.784 4.2085 16nm 26.975 13.474 3.6346 XOR 22nm 30.091 12.522 3.7679 32nm 33.190 13.574 4.5054 16nm 28.912 11.776 3.4046 Inverter 22nm 32.457 10.561 3.4278 32nm 35.679 11.084 3.9545

Table 50 shows the performance of the proposed 9T1P LiM cell when its CMOS feature size is varied, while its supply voltage is kept constant at 0.9V. As expected, at a lower CMOS feature size, performance is improved. Delay and PDP of the proposed 9T1P LiM cell at lower

CMOS feature size are improved. 145

3.4.4.6. Comparison

In this section, the proposed cells are compared with the LiM cell that presented in the

Chapter 1 [41]. The worst case delay, power dissipation, PDP, write time and circuit complexity are considered for the three functions (AND, OR, XOR) as well as the full adder design.

Table 51. AND function comparison

Performance Metric Ambipolar-based CMOS-based [41] Delay (ps) 43.471 34.942 81.365 Power (µW) 19.201 16.779 10.688 PDP (*10-16) 6.5363 5.8629 8.6237 Write time 3.827ps 3.827ps 2ns Circuit Complexity 7CMOS+2AMB+1PMC 9CMOS+1PMC 14CMOS + 2MTJs+1C Full Swing Output Yes Yes No

Table 52. OR function comparison

Performance Metric Ambipolar-based CMOS-based [41] Delay (ps) 32.207 32.67 78.125 Power (µW) 19.201 16.664 10.649 PDP (*10-16) 3.8439 3.6701 8.2596 Write time 3.827ps 3.827ps 2ns Circuit Complexity 7CMOS+2AMB+1PMC 9CMOS+1PMC 14CMOS +2MTJs+1C Full Swing Output Yes Yes No

Table 53. XOR function comparison

Performance Metric Ambipolar-based CMOS-based [41] Delay (ps) 40.273 33.090 78.445 Power (µW) 17.215 16.664 10.644 PDP (*10-16) 5.2505 4.2236 8.3215 Write Delay 3.827ps 3.827ps 2ns Circuit Complexity 7CMOS+2AMB+1PMC 9CMOS+1PMC 14CMOS +2MTJs+1C Full Swing Output Yes Yes No

146

Tables 51 to 53 compare these three LiM cells; the proposed LiM cells are superior than

[41] in most figures of merit. The proposed cells have advantages such as lower delay, lower PDP, higher switching speed (as reflected in the lower write delay of the resistive element), reduced circuit complexity and full output voltage swing. The LiM cell of [41] has the lowest power dissipation.

Table 54. Full adder comparison

Performance Metric Ambipolar-based CMOS-based [41] Delay (ps) 60.01 55.774 92.894 Power (µW) 44.776 59.876 17.573 PDP (*10-16) 24.512 24.544 16.148 Write time 3.827ps 3.827ps 2ns Circuit Complexity 28CMOS+8AMB+4PMC 41CMOS+4PMC 32CMOS+4MTJs+2C Full Swing Output Yes Yes No

Table 54 presents the comparison between full adders made of of the proposed LiM cells

(so requiring 4 PMC-based LiM cells) and the MTJs of [41]. The delay and write time using the proposed cells are improved compared with [41] and the outputs of the corresponding full adders have a full voltage swing. However, power dissipation and PDP of these full adders are worse due to the larger circuit complexity encountered for these designs compared with [41]. The proposed

CMOS-based LiM is better than the ambipolar-based LiM when the performance of the logic functions is considered; when implementing the full adder, the ambipolar-based LiM uses a smaller number of transistors than the CMOS-based LiM. Therefore, circuit complexity and power dissipation of the ambipolar-based LiM cell are better than for the CMOS-based LiM cell.

147

3.5. Conclusion

In this chapter, the HSPICE macromodel of a programmable metallization cell (PMC) has been proposed. The electrical characteristics of the PMC have been generated by considering a geometry-based model such that the vertical and lateral growth/dissolution of the metallic filament has been simulated. The I-V, R-V plots and the relationship between the switching time and the pulse amplitude of the PMC have been modeled at a very small error compared with experimental data. This paper has also shown that different from other models found in the technical literature, the switching time and voltage of the proposed macromodel are interrelated as well as with the voltage drop across the PMC. The selection of the parameters in the proposed model is based on the basic operational features of a cell (such as resistance, relationship between switching time and pulse amplitude), so the electrical characterization of a PMC is simple, easily to simulate and intuitive. Simulation results have shown that when the CF height or radius of the proposed PMC macromodel is varied, its electrical characteristics are also changed and the simulated results are close in values with experimental data.

Additionally, the application of a PMC as a nonvolatile element in a 7T1R NVSRAM cell has been presented; the simulation results have shown that substantial improvements for the store

'0' and store '1' operations are possible compared with a cell utilizing a memristor as storage element. For the restore operation, the proposed cell using the PMC is used as storage element is also faster than a 7T1R cell (in which a OxRRAM is used as storage element [69]) by 41.56% and

48.99% for the restore '0' and '1' operation respectively. Table 55 shows the ranking of a 7T1R

NVSRAM when the resistive element is changed.

148

Table 55. Ranking of 7T1R NVSRAM cells by resistive element

PMC OxRRAM Memristor Store ‘0’ Operation 1 2 3 Store ‘1’ Operation 1 2 3

Delay Restore ‘0’ 1 2 3 Operation Restore ‘1’ 1 2 3 Operation Store Operation 3 1 2 Voltage Restore Operation 3 2 1 Resistance Range 1 2 3 Operating Voltages 3 1 2

In most cases, the PMC outperforms the other two resistive technologies (OxRRAM and memristor) under two read-out schemes; a PMC-based crossbar shows improvements in both the sense voltage on/off ratio and the read-out margin when the read-out scheme I is employed.

However when using the read-out scheme II, the on/off current ratio of a PMC-based crossbar is large but its value is lower than the on/off current ratio of a MOSFET-based crossbar. As for the relative noise margin, the simulation results in this paper show that the relative noise margin of a

PMC-based crossbar is close to the one for the MOSFET-based crossbar and its value is not affected by the crossbar dimension.

Moreover, this chapter has also presented a novel approach to concurrent error detection and correction of a SEU in a new memory cell. The proposed memory cell is hybrid in nature because it utilizes the following circuits: a) a 6T SRAM core, b) a RRAM consisting of a 1T and a

Programmable Metallization Cell (PMC) as non-volatile resistive element, c) two XOR gates in a dual-rail checker scheme (in which each XOR gate consists of a two ambipolar-based implementation). Different from other SEU tolerant cells [74, 85, 86], the proposed memory cell is non-volatile and utilizes a dual-rail checker for concurrent error detection and the so-called restore operation for correction. The operational principles of the proposed NVSRAM have been discussed 149 and extensive simulation results have been presented for all of its operations. In the absence of a

SEU, the proposed cell has faster read and write times compared with designs using hardening [74,

81]; however, the utilization of the restore operation accounts for a higher delay in SEU correction.

The utilization of a PMC results in a very large resistive range, low hardware overhead (due to the bridging nature of this type of resistive element), fast switching, but at the expense of the requirement of higher voltage values for the store/restore operations and consequently higher power dissipation and PDP value. This requirement suggests that the proposed cell is best suited for memories requiring non-volatile operation with very frequent read operations (but infrequent write), such as in the new generation of look-up tables (LUTs) in FPGAs. The implications of the proposed approach to memory operation at system-level for FPGAs with multi-context configurability are under investigation.

Another application of PMC that is presented in this chapter is PMC-based Logic in

Memory (LiM). Logic-In-Memory (LiM) is a processing paradigm that exploits the large volume of storage found in today’s computing systems for performance improvements of specific computational applications. This paper has proposed two novel designs for a non-volatile LiM cell; in this type of cell, a resistive RAM (RRAM) that consists of a transistor and a Programmable

Metallization Cell (PMC), is utilized as storage element. The first cell employs ambipolar transistors and CMOS in its logic circuit (7T2A1P), while the second proposed LiM cell uses only

MOSFETs (9T1P) to implement logic functions such as AND, XOR and OR. Ranking of these cells with the current-based cell of [41] according to different circuit-level figures of merit is shown in Table 56.

150

Table 56. Ranking of the nonvolatile Logic in Memory

Performance Metric Ambipolar-based (7T2A1P) CMOS-based (9T1P) [41] Delay 2 1 3 Power dissipation 3 2 1 PDP 2 1 3 Write time 1 1 3 Circuit Complexity 2 1 3 Full Swing Output 1 1 3

As shown in Table 56, [41] shows the best performance in terms of power dissipation. The proposed ambipolar-based LiM cell design improves over [41], but it shows lower performance under most metrics when compared with the proposed CMOS-based cell. The larger circuit complexity due to the ambipolar transistors results in the slight degradation in few figures of merit, such as delay and PDP. Therefore, the proposed CMOS-based cell has the best performance in all metrics, except power dissipation. 151

IV. PHASE CHANGE MEMORY

4.1. Introduction

The phase change memory (PCM) has emerged in recent years as one of the most promising technologies for future non-volatile solid-state memories with significant implications on the entire storage hierarchy [84]. PCM has attracted considerable attention due to its low latency, good endurance, long retention and high scalability compared to other non-volatile memories.

Phase change memories have been advocated for replacing flash memories [85] because a PCM cell is not only significantly faster and smaller, but it is also more reliable (up to 100 million write cycles) [86]. The PCM cell was first proposed by S. R. Ovshinsky in the early 1960’s based on the chalcogenide alloy GST [87], but at that time its excessive costs did not allow its immediate commercialization. Currently, GST can be integrated into ICs, so manufacturing costs have been considerable reduced [88]. Also following recent advances, a PCM cell has attained fast read/write times, high scalability, low power operation and good reliability. PCM performance improves with scaling, so the development and design of this type of NVM has been very aggressive. Additionally, its high resistance ratio (>103) provides a further advantage for multilevel storage operation.

This chapter discusses fundamental concept of phase change memory (PCM). The HSPICE macromodel of PCM and its memory applications such as PCM-based CAM cell, PCM-based

TCAM cell are introduced. Furthermore, the multilevel storage of PCM and the effect of its drift operation to the multilevel storage element are considered. The separation concept of PCM resistance to different levels of PCM-based multilevel storage is proposed at the end of this chapter.

4.2. Fundamental of Phase Change Memory (PCM)

The phase change memory (PCM) is regarded as one of the most promising alternatives among emerging technologies for non-volatile memory design. PCM has a high density, good speed, low operating voltage, excellent scaling capabilities and compatibility with a complementary 152 metal oxide semiconductor (CMOS) process. Data storage in a PCM is related to the phase transformation of the chalcogenide alloy (e.g. Ge2Sb2Te5, GST) that exhibits an amorphous and a crystalline phase [89]. The amorphous state has a high resistance and is commonly referred to as the reset state; the crystalline phase has a low resistance and is referred as the set state [84].

Fig. 93. Thermal physical model of PCM device (Vertical Section)

A PCM device is fabricated by using a thin film chalcogenide layer in contact with a metallic heater. When a programming voltage/current pulse is applied to the PCM, a high current density flows in the resistive heater, thus raising the temperature of the active region as per the

Joule effect [89]. The Joule heat generated in this region melts or crystallizes the phase change material to an amorphous, or a crystalline state. Ge2Sb2Te5 has a suitable melting point and a high crystallization state, so it is used as phase change material. The temperature dependence of the phase change (PC) process is shown in Figure 94 [85].

153

Fig. 94. Temperature/time dependence of phase change process [85]

In figure 94, the pulse with a high amplitude is used to melt and quench the PC element to an amorphous state (Reset State), while the longer pulse with a low amplitude is used to crystallize the PC element (Set State) [85]. Since switching between amorphous and crystalline states is based on the crystalline fraction of the PCM, the electrical resistance of the PCM cell is given as

푅푃퐶푀 = (1 − 퐶푥)푅푎 + 퐶푥푅퐶 (38)

where Rc and Ra are the resistances of the PCM when it is fully crystalline and fully amorphous, respectively. Cx denotes the so-called crystalline fraction: when Cx is zero, then the

PCM is fully amorphous; when Cx is equal to one, the PCM is fully crystalline [89].

Fig. 95. Measured I-V characteristics for a PCM cell in either a Set or a Reset state, corresponding to a crystalline or amorphous phase of the active chalcogenide [86] 154

Figure 95 shows the I-V characteristics of the PCM cell [86]; the two phases of crystalline

(Set) and amorphous (Reset) states are clearly exhibited. If the PCM is in the Reset state

(amorphous) and the voltage across the PCM cell is higher than the threshold value (Vth), then a snapback behavior occurs and the resistance of the PCM is switched to the RON (ON state) value.

If the PCM is in the ON state, it will switch back to the OFF state if and only if the voltage across the PCM is less than the so-called ON/OFF intersection point. In figure 95, the threshold voltage of the PCM cell is approximately 1.2V, while the ON/OFF intersection point is approximately

0.7V.

4.2.1. Drift Behavior of PCM

Based on its electrical characteristics [86, 87, 88], the resistance and the threshold voltage of a PCM cell change when the cell is not been programmed. This phenomenon is commonly referred to as drift. The resistance drift is believed to be the result of structural relaxation (SR) phenomena that are thermally activated as an atomic rearrangement of the amorphous structure

[89]. Under the assumption that the annealing temperature of the PCM cell is constant [88], the drifts of the PCM resistance and the threshold voltage are given [89] as follows

푇 표푓푓 휐푟 푅(푡) = 푅0( ) (39) 푇0

푇 표푓푓 휐푡 푉푇 = 푉푇0 + 푙표푔( ) (40) 푇0

where R(t) and VT are the resistance and threshold voltage of the PCM cell during a drift of time length Toff, i.e. Toff denotes the time during which the PCM cell is not been programmed;

훖퐫 and 흊풕 are referred to as the drift coefficients of the PCM resistance and the threshold voltage.

R0 and VT0 are the initial values (prefactors) for the resistance and the threshold voltage respectively (as set by PCM specifications). T0 is the time constant of the drift [88].

The resistance drift exponent of the PCM (훖퐫) varies depending on R0 [87], i.e. at a larger value of R0, the mean value of the drift exponent (훖퐫) tends to have a larger value. The relationship 155

between the resistance drift exponent (훖퐫) and the initial resistance of the PCM cell (R0) can be found as follows; By rearranging (39),

푅 푇 푇 = ( 표푓푓)휐푟 (39.1) 푅0 푇0

푅푇 푇표푓푓 푙푛 ( ) = 휐푟 ∗ 푙푛 ( ) (39.2) 푅0 푇0

As T0 is normalized (for simplicity of analysis, it is made equal to an unitary time) and R0 is given (as per PCM specifications), then (39.2) can be rewritten as

푙푛(푅푇) − 푙푛(푅0) = 휐푟 ∗ 푙푛(푇표푓푓) (39.3)

So, ln(RT) and ln(Toff) have constant values (denoted by A and B respectively) at a specified Toff [89]; hence, (39.3) is now written as

퐴 − 푙푛(푅0) = 휐푟 ∗ 퐵 (39.4)

퐴 1 휐 = − 푙푛(푅 ) (39.5) 푟 퐵 퐵 0

1 퐴 Let 훼 (훽) denote − (− ), i.e. α and β are constant. For example using the simulation 퐵 퐵 data and parameters of [87], α and β are found by curve fitting to be equal to 0.0153 and 0.1138 respectively. The relationship between 휐푟 and R0 is therefore given by

휐푟 = 훼푙푛 (푅0) − 훽 (41)

4.3. Macromodel of PCM

To simulate the electrical characteristics of a PCM, different models have been proposed in the technical literatures. [84] has presented a compact SPICE model with Verilog-A; the resistance of the PCM is based on (38). However, [84] does not propose any circuit, but few equations for each model parameter, thus not necessarily making possible to validate the results by

HSPICE simulation. Moreover, the model of [84] cannot simulate the I-V curve of a PCM, because it does not consider the holding voltage and the crystallized rate [84] at different resistance ranges

(so, different from the physical implementation of a PCM cell). [85] has presented a more complete 156

HSPICE model. In this model, the change in resistance is not continuous, again not fully matching the physical phenomena encountered in a PCM cell. Moreover, the holding voltage is not considered, thus failing to generate the I-V curve. [90] has presented a HSPICE macromodel; this model can be used to generate the I-V characteristics of a PCM into four distinct regions; however, the resistance of the PCM is still not continuously characterized. As for the holding voltage, voltage sources are added to the model to adjust the voltage and match it to the I-V curve; however, the adjustment does not follow a specific methodology. In addition to the above three, other models can be found in the technical literature [91-97]; however, these models encounter the same limitations and disadvantages, such as ignoring the holding voltage, discontinuous behavior of the

PCM resistance, lacking of verification with simulation data, or utilization of model parameters that cannot be matched to an HSPICE simulation environment.

Fig. 96. Flowchart of the proposed PCM macromodel

This book proposes a new HSPICE macromodel of a PCM cell; its flowchart is shown in

Figure 96. The proposed macromodel is based on the electrical characteristics that the resistance of a PCM changes linearly in the ON-state (RPCM = RON) [86], and keep its value if the PCM is in the 157

OFF-state. The electrical characteristics of this PCM model are similar to previously presented models [84, 91, 90-97]; however, the following characteristics are now considered:

1. The dependence on the programming time and PCM resistance range

2. The holding voltage

3. The variation of threshold voltage when the crystalline fraction is varied.

The proposed macromodel can simulate the electrical characteristics of a PCM cell, and the issues left partially unaddressed by previous models are resolved, such that the drift behaviors of the resistance and threshold voltage of a PCM cell can be simulated.

As shown in the flowchart of Figure 96, the proposed macromodel has two terminals, in and out. in is the input terminal, while out is the output terminal. When there is a voltage difference across these terminals, the resistance value is calculated based on the initial crystalline fraction (Cx) by considering also the time when the PCM is not been programmed or read prior to simulation

(Toff). Different blocks are utilized. The voltage difference between in and out is used to calculate the temperature (Temperature Estimation); once the temperature of the PCM cell is found, the decision circuit establishes the behavior of the PCM for either programming (Set or Reset state), or keeping its value (read behavior). Following this step, the crystalline fraction of the PCM cell is found (Crystalline Fraction). Based on the voltage difference across the PCM, if the cell is not been programmed, the drift behavior is found by the proposed drift model following the operational model. The two models of the proposed macromodel are described next.

4.3.1. Operational Model

As shown in Figure 96, this model consists of different blocks; for HSPICE simulation, these blocks are implemented as circuits relating the features of the PCM cell to the electrical characteristics. The function of each circuit is discussed next.

158

4.3.1.1. Main PCM Circuit

The circuit model of Figure 97 characterizes the electrical characteristics of the PCM cell.

When the PCM is in the ON-state, the resistance is given by RON; however if the PCM is in the

OFF-state, the resistance of the cell is based on the crystalline fraction (Cx) (and ultimately, its drift behavior).

Fig. 97. Circuit model of a PCM cell

The input and output voltages (nodes in and out) of the circuit in Figure 97 determine the state (ON or OFF) and the resistance of the cell. When there is a voltage difference across in and out, the Control ON State (Figure 96) establishes the state of the PCM.

1. When the PCM is in the ON state, the I-V curve does not pass through the starting value, so

it is shifted by the value of the holding voltage (as provided by the voltage source Vh).The

switch sw2 is ON, while the switch sw1 is OFF. The resistance of the PCM is reduced to

RON.

2. In the OFF state, sw1 is ON, while the switch sw2 is OFF. The value of the resistance is

based on the crystalline fraction (the drift behavior of the PCM will be described in more

detail in later sections).

159

4.3.1.2. Temperature Calculation

After establishing the state and the resistance, the temperature must be found too. [84] assumes that the temperature in the active region is uniform, while the temperature outside the dispersed-heat region is close to room temperature (300K). [84] also assumes that the temperature of the dispersed-heat region is constant. According to the Joule/dispersed heat phenomena and the experimental finding that 30% of the total heat is dissipated in the BEC, the temperature of the

PCM cell is [84].

0.7푊푗(푟2−푟1) 3푘푟2 푇 = (1 − 푒푥푝 (− 2 푡)) + 300 (42) 2휋푘푟1푟2 (푟2−푟1)푟1 퐶

where r1 and r2 are the radii of the active region and the PCM cell respectively (r2 is equal to the thickness of the GST cell), Wj is the Joule heat, C is the thermal capacity of the phase change material, k is the thermal conductivity, and t is the time.

The product of the current and the voltage that pass through the PCM cell, is the Joule heat, i.e.

푊푗 = 퐼푅 푉푅 (43)

where IR and VR are the current and the voltage across the active region, respectively. From

(43), Wj is simulated by the voltage controlled voltage source VCVS (i.e. the power is equal to the product of the voltage between in and out and the current that passes through the resistor Rtest1).

The simulation time is calculated using an integrator circuit [98].

Fig. 98. Integrator Circuit [98]

160

In figure 98, its output voltage is given by

푔푎𝑖푛 푡 푉표푢푡 = − ∫ 푉𝑖푛푑푡 + 푉표푢푡(0) (44) 푅푖퐶푖 0

To find the simulation time, a very fast input pulse (from 0 to 1V, so not a DC value) in placed at the input of the integrator. After finding the simulation time, the temperature of the cell is found by using a VCVS, i.e. the value of the VCVS is equal to the temperature (given in (42)).

4.3.1.3. Decision and Control Circuits

After finding the temperature, a decision circuit is used to determine the cell state (Figure

99). Let the voltage at node Tr correspond to the temperature of the PCM; so to check whether the cell is in the programming state, the temperature must be compared with the glass transition point

(Tx) and the melting point (Tm). A temperature comparator (Figure 99) is used to simulate this process. The output nodes of this circuit are out1 and out2.

 The voltage at node TmTr corresponds to the voltage difference between the melting point

and the temperature of the PCM cell (Tm – TPCM).

 The voltage at node TxTr corresponds to the voltage difference between the glass transition

point and the temperature of the PCM (Tx-TPCM).

Consider the switches. When the voltage at node TxTr is positive, sw_dc3 is ON, while switch sw_dc1 is OFF. The other switches are controlled by the voltage at node TmTr. When the voltage at node TmTr is positive, sw_dc2 and sw_dc6 are ON while switch sw_dc4 and sw_dc5 are

OFF. Next, the PCM temperature must be considered and the characteristics of the temperature comparator circuit (Figure 99) allow to establish the following conditions (as applicable to the operation of a cell).

161

Fig. 99. Decision circuit as temperature comparator

 If the PCM temperature is less than Tx (VTxTr and VTmTr are positive), so there is no

programming; sw_dc2 and sw_dc3 are ON and sw_dc6 is ON. Both output voltages (out1

and out2) are at GND.

 If the temperature of the PCM cell is higher than Tx but less than Tm (programming for

crystalline phase), then the voltages at node TmTr and TxTr are positive and negative

respectively. So switches sw_dc1, sw_dc2, and sw_dc6 are ON. Hence, out1 and out2 are

1 (i.e. Vd) and 0, respectively.

 If the temperature of PCM cell is higher than Tm (Tr > Tm) (programming for amorphous

phase), the voltages at nodes TmTr and TxTr are negative; switches sw_dc1, sw_dc4, and

sw_dc5 are ON. The output voltage at node out1 is GND, while the voltage at node out2 is

Vd.

After comparing the temperatures, the programming behavior of the PCM is found from the output voltages at out1 and out2 of the decision circuit. The circuit of Figure 100 is utilized and the crystalline fraction of the PCM is calculated as follows: if the voltage at node Cx is equal to Vd

(0), the PCM is fully in the crystalline (amorphous) phase. 162

Fig. 100. Crystalline fraction calculation circuit

The voltages at out1 and out2 of the decision circuit are used to find the crystalline fraction of the PCM cell and control the circuit behavior (switches sw_cx2 and sw_cx3) as its state model

[84]. Provided sw_cx1 and sw_cx4 are ON, the truth table for the voltage at node Cx is given in

Table 57.

Table 57. Truth table of crystalline fraction calculation circuit (provided switches sw_cx1 and sw_cx4 are ON)

out1 out2 Voltage at node Cx 0 0 Hold 1 0 Charged 0 1 Discharged

Consider the I-V characteristics of the PCM (Figure 101), if PCM is in the amorphous

(crystalline) phase, the voltage across the cell must be higher than Vth (Vx) for programming to take place. So the state model of [84] must be adjusted. Switches sw_cx1 and sw_cx4 are therefore added to the circuit (Figure 100); sw_cx1 and sw_cx4 are dependent on the voltage across the PCM

(Vin,out). Assume the PCM is in the OFF state. If the cell is the crystalline phase, the voltage across the PCM must be higher than Vx (Figure 101) [94] to change to the ON (programming) state. If the 163

cell is in the amorphous phase, the voltage across the PCM must higher than Vth to change the state from OFF to ON. However if the PCM is in the ON state, to switch the state back to OFF (i.e. no programming), the voltage across the PCM cell (Vin,out) must be lower than Vx (as shown in Figure

101). Therefore, sw_cx1 and sw_cx4 control the ON state, i.e. if the PCM is in the ON state, both switches are ON, else they are OFF.

Fig. 101. I-V curve of a PCM cell [94]

Based on the I-V characteristic of the PCM (Figure 101), the partial Set or Reset state may be possible. The threshold voltage of the partial Set or Reset state is not the same as the threshold voltage of the PCM when it is in a fully amorphous phase, because it depends on the crystalline fraction (Figure 101). The relationship between the threshold voltage and the crystalline fraction is given by

( ) 푉푡ℎ,푛푒푤 = 푉푡ℎ + 푉푥 − 푉푡ℎ ∗ 푉퐶푥 (45)

where Vth,new is the threshold voltage of the PCM cell, Vth is the threshold voltage of PCM cell when PCM is in the amorphous phase (crystalline fraction of the cell is zero), VCx is the voltage 164

at node Cx (as corresponding to the crystalline fraction of the PCM cell). Vx is the intersection point of the ON and OFF states (Figure 101); its value is given by

푉ℎ푅푠푒푡 푉푥 = (46) 푅푠푒푡− 푅푂푁

where Vh is the holding voltage, Rset is the resistance of the PCM cell when it is in the fully crystalline phase, and RON is the ON state resistance of the PCM.

Fig. 102. Control switch circuit

A Schmitt trigger [99] is used to control the ON state of the PCM. Figure 102 shows the control switch circuit. The voltage difference between the input and output nodes of the PCM

(Vin,out) is provided as input to the Schmitt trigger. The threshold voltage of this circuit is varied depending on the crystalline fraction (given previously in (45)). The control of the ON state must be compatible with the I-V curve (Figure 101). This is accomplished as follows: the control switch voltage (out_sm) is determined, then out_sm is used to control the switches in the main PCM circuit

(Figure 97) as follows: when out_sm is equal to VDD (GND); PCM is in the ON (OFF) state, sw2 is ON (OFF) and sw1 are OFF (ON) respectively; sw_cx1and sw_cx4 in Figure 100 are ON if out_sm is equal to VDD.

The operational model is therefore complete; no drift is considered in this model. In the next section, the drift model of the proposed HSPICE macromodel of PCM is presented. Prior to 165 presenting the drift simulation model as part of the proposed macromodel, an analysis of the drift is pursued first.

4.3.2. Drift Model

The operational model has been presented in a previous section; no drift has been considered in this model. In this section, the drift model is added as part of the proposed HSPICE macromodel. The resistance and threshold voltage behaviors of a PCM cell due to drift are added to the flowchart (as shown previously in Figure 96).

For establishing the drift behavior, the initial crystalline fraction and the initial Toff (i.e. the time for which the PCM cell is not been programmed prior to simulation) must be specified; these values are used to find the initial resistance and the initial threshold voltage of the PCM. During drift, Vth and RPCM increase as function of the time after programming [86]. The drift behavior is simulated in the Drift Behavior and Estimated Rreset, Vth blocks (Figure 96).

During drift behavior, PCM resistance (RPCM) in (38) must be changed to account for the drift. In this book, the resistance of the PCM cell during drift is given by

푅푃퐶푀 = (1 − 퐶푥,푏푑)푅푎 + 퐶푥,푏푑푅퐶 + ∆푅 (47)

where ∆R is the total change in resistance due to drift and Cx,bd is the crystalline fraction before the drift behavior. During drift, the resistance of the PCM cell is given by (39) and is not related to the crystalline fraction. However, the crystalline fraction is used to calculate R0 (as given previously in (38)). Therefore, ∆R is given by

∆푅 = 푅(푡) − 푅0 + ∆푅퐵퐷 (48)

where R(t) is the time-dependent resistance of the PCM cell during drift (whose value is equal to (39)), R0 is the initial resistance (i.e. before the drift behavior), ∆RBD is the resistance due to the past drift (i.e. prior to the current drift). Note that ∆RBD must be added to ∆R, because R0 is included already in ∆RBD. 166

Consider next the drift behavior of the threshold voltage (VT). Figure 103 shows that the resistance and the threshold voltage are closely related to Toff. When the resistance of the PCM increases, its threshold voltage also increases.

Fig. 103. Measured VT versus RPCM (at variable Toff, fixed reset pulse and fixed measured Toff = 5s) [86]

This linear relationship between the resistance and the threshold voltage is expressed as

푉푇 = 훾푅푃퐶푀,퐴 + 푉푇0 (49)

where VT is the threshold voltage when the PCM is in a totally amorphous phase (Cx=0).

VT0 is the intersection value on the VT axis and γ is the slope; so using Figure 103, VT0= 0.5623,

−7 and γ=3.64 ∗ 10 . RPCM,A is the resistance of the PCM cell when it is in a totally amorphous phase.

It is given by (47) for Cx,bd = 0, i.e.

푅푃퐶푀,퐴 = 푅푎 + ∆푅 (50)

4.3.2.1. Changing Drift Parameter

The initial values of some parameters (such as Cx and Toff) must be specified; for the other parameters a simple adjustment suffices. The initial resistance of the PCM (R0) is given (in (47)) 167

with 0 as initial value of ∆R. Once the initial Toff value is also specified, the initial resistance and threshold voltage drifts of the PCM can be calculated. The resistance drift of the PCM cell (∆R), and the threshold voltage of the PCM cell during Toff are calculated by using (39), (41), (48), and

(49) respectively. Figure 104 shows in flowchart form the drift parameter calculation; the calculation of R(t), ∆R, and VT utilizes a voltage controlled voltage source (VCVS).

Fig. 104. Flow Chart for Drift Parameter Calculation

Some parameters (RPCM, VT, and ∆R) change during simulation. The change of ∆R is considered first, because the other two parameters are changed by the drift. Crystalline programming is utilized to find the decrease in ∆R when the drift does not occur (else, ∆R is increased). Figure 105 shows the circuit for simulating the change of ∆R; the value of ∆R is given at node ∆R (as the voltage across the capacitor CR1). 168

Fig. 105. ∆R circuit

The initial value of ∆R is generated by the voltage controlled voltage source (∆Rinit). Switch sw1 is ON only at the beginning; it is OFF for the remaining part of the simulation time to ensure that the initial value of ∆R does not disturb the current behavior of the simulation circuit. If the

PCM cell is in crystalline programming, ∆R could be reduced at the same rate as the crystalline programming rate. However if the PCM cell is not been programmed or read, the drift behavior occurs and a new ∆R value must be added. In this case (Figure 105), the new ∆R value is passed to the ∆R node by sw2 and its new value is given by

∆푅푛푒푤,푑푟𝑖푓푡 = ∆푅 + ∆푅퐵퐷 (51)

where ∆Rnew,drift is the new ∆R during the drift behavior and ∆RBD is the resistance drift of the PCM cell prior to simulating the current drift behavior (Figure 106). ∆R is equal to R(t) – R0

(where R0 is the initial resistance of the PCM cell prior to the drift behavior); if there is a resistance drift in R0 (denoted by ∆RBD), then this resistance value must be also included into the new ∆R. So,

∆Rnew,drift is given in (51). sw2 is used to control the drift behavior because ∆Rnew,drift is transferred to node ∆R; so, sw2 is ON only when the voltage difference across the PCM is less than 0.1V i.e. confirming that the PCM cell is not been programmed. 169

During crystalline programming, ∆R decreases at a rate equal to the crystallization rate. As shown in Figure 105, the resistance RR1 adjusts the crystalline programming rate of ∆R. sw3 and sw4 establish whether the circuit is in crystalline programming as follows.

 sw3 is controlled by the voltage at node out_sm of the control switch circuit (for the

ON/OFF state of the PCM cell in Figure 102). If the voltage at out_sm is equal to VDD (i.e.

the cell is in the ON state), then sw3 is ON.

 sw4 determines the crystalline programming by checking the voltage at nodes out1 and

out2 from the decision circuit (Figure 99): if the voltage difference between out1 and out2

is equal to VDD(i.e. the crystalline programming behavior), then this switch is ON.

So, ∆R changes depending on the state of the PCM cell. As discussed previously during drift, a new value of ∆R (given in (51)) is sent to node ∆R (Figure 105). Let ∆RBD be the value of

∆R prior to the drift behavior, ∆RBD is found using the circuit in Figure 106.

Fig. 106. Circuit for ∆R Before Drift

The initial value of ∆R is kept at node ∆RBD by utilizing switch SWBD1; SWBD1 is ON at the beginning of the simulation process and is OFF immediately once the simulation starts. Its purpose is to keep the value of ∆R prior to the drift; so, ∆R is kept constant in this circuit during the drift behavior, but it changes during programming. During programming, ∆R is equal to the voltage at node ∆R (Figure 105). SWBD2 checks the cell state, SWBD2 is ON when the PCM is 170

in the ON state (out_sm = VDD), and OFF when the PCM is in the OFF state. Therefore, ∆R is kept unaltered, or it changes as needed.

The circuit in Figure 106 is also used to find the values of the crystalline fraction prior to the drift behavior (CX,BD), the resistance drift and the crystalline fraction prior to programming

(∆RBP, CX,BP) as follows.

 CX,BD is retained by changing the initial ∆R (∆Rinit) to the initial Cx value and ∆R for the

ON state to Cx. SWBD1 and SWBD2 behave the same as in the circuit of Figure 106.

 The circuit of Figure 106 is used to establish and retain the crystalline fraction prior to

programming (CX,BP), By changing the behavior of the switch SWBD2 in CX,BD to be ON

when the voltage difference between the nodes in and out is less than 0.1 V, this ensures

that the PCM cell is not been programmed.

 The resistance drift prior to programming (∆RBP) is also established by the circuit of Figure

106. During drift, anew ∆R is added to the circuit (it is constant during programming).

∆Rnew,drift in (51) is used as ∆R and added during the drift behavior. The value of the PCM

resistance prior to the programming behavior (∆RBP) is then found.

The model for the drift behavior can be established after finding ∆RBD, ∆RBP, CX,BD, and

CX,BP. So, the resistance of the PCM cell is now given by

푅푃퐶푀 = (1 − 퐶푥,푟푑)푅푎 + 퐶푥,푟푑푅퐶 + ∆푅 (52)

(52) denotes the resistance of the PCM cell at different behaviors as dependent on the crystalline fraction (Cx,rd). If the PCM cell is not been programmed, the drift behavior must be considered. Initially, the resistance of the PCM cell is equal to (39) i.e. dependent on Toff andR0.

The circuit in Figure 106 is used to change the crystalline fraction as follows.

 If the PCM is in amorphous programming, the crystalline fraction is unchanged (equal to

the voltage at node Cx in the crystalline fraction calculation circuit of Figure 108). 171

 If the PCM is in crystalline programming, Cx,rd is equal to the crystalline fraction when ∆R

is less than 0 (i.e. there is no valid ∆R). However, if there is a resistance drift (i.e. a value

of ∆R greater than 0), the crystalline fraction is equal to its value prior to programming

(CX,BP). Then, ∆R is reduced at the same rate as the crystalline programming rate.

Fig. 107. Crystalline fraction of PCM cell at different circuit behavior (Cx,rd)

4.3.2.2. Crystalline Programming

Figure 107 shows the simulation circuit for finding the crystalline fraction at different behaviors. In Figure 107, sw1 is used to set the initial value of the crystalline fraction, so sw1 is only ON at the start of the simulation. The initial crystalline fraction is given at node Cx,rd. During the drift behavior, sw2 is ON to retain the crystalline fraction value prior to the drift behavior (Cx,bd).

The resistance of the PCM cell is then calculated as follows. From (48),

∆푅 = 푅(푡) – 푅0 + ∆푅푏푑 (48)

where R0 is the (initial) resistance of the PCM cell prior to the drift behavior given by

R0 = (1 − Cx,bd)Ra + Cx,bdRC + ∆Rbd (53) 172

∆Rbd is the resistance drift of the PCM cell prior to the drift behavior; by combining (52)

(Cx,rd equals to Cx,bd) with (48) and (53), the resistance of the PCM cell is written as

RPCM = (1 − Cx,bd)Ra + Cx,bdRC + ∆R

RPCM = (1 − Cx,bd)Ra + Cx,bdRC + R(t) − R0 + ∆Rbd

R0 is found from (5.16), so the resistance of the PCM cell is made equal to (39), i.e.

푅푃퐶푀 = 푅(푡) (54)

Switch sw3 is used to check the PCM behavior. When the PCM cell is been programmed, sw3 is ON; sw3 is controlled by the output voltage of the control switch circuit, i.e. when the voltage at node out_sm is equal to VDD (1V), sw3 is ON (else, it is OFF). The change in the crystalline fraction is found after establishing the behavior. If the voltage difference between out2 and out1 is equal to 1V or VDD, amorphous programming has occurred and sw4 is ON (else it is

OFF). When the PCM is in amorphous programming (Figure 107), the crystalline fraction is equal to Cx (Figure 108), because ∆R does not change during amorphous programming, and

푅푃퐶푀 = (1 − 퐶푥)푅푎 + 퐶푥푅퐶 + ∆푅 (55)

Two conditions can occur for crystalline programming (Figure 107). Switch sw7 is ON when ∆R is higher than zero, while switch sw6 is ON when ∆R is less than zero, (i.e. there is no resistance drift). In Figure 107, switch sw5 is used to check the crystalline programming behavior.

It is ON when crystalline programming occurs. Having established that the PCM is in crystalline programming, ∆R is found as follows.

 If ∆R is higher than zero, then there is a resistance drift during crystalline programming;

∆R is reduced at the same rate as the crystalline programming rate (note that the crystalline

fraction during crystalline programming is initially fixed to Cx,BP).

 If the resistance drift is equal to zero, the current crystalline fraction (Cx in the crystalline

fraction calculation circuit) is used. 173

As per previous discussion, if ∆R during the crystalline programming reduces to zero and crystalline programming still continues, the crystalline fraction (Cx,rd) changes from Cx,BP to Cx.

This may result in a non-continuous simulated behavior of the resistance of the PCM cell; hence, a circuit must be included to avoid this erroneous feature in the model. Figure 108 shows the crystalline fraction calculation circuit when considering the drift behavior. The circuit in dash lines is added to the basic crystalline fraction calculation circuit to prevent the non-continuous modeling of the resistance. The crystalline fraction during drift is changed in the circuit of Figure 108 as follows.

Fig. 108. Crystalline fraction calculation circuit under drift behavior

The crystalline fraction (Cx) is equal to zero (one) (given previously in (38)) when the PCM cell is in a totally amorphous (crystalline) phase, so

푅푅푒푠푒푡−푅푃퐶푀 퐶푥 = (56) 푅푅푒푠푒푡− 푅푠푒푡 174

During the drift behavior, RPCM is calculated from (52) (and it is dependent on ∆R and

Cx).When RPCM increases, Rreset should increases too, else the crystalline fraction of PCM cell (Cx) could be negative. To avoid this erroneous state, in this paper Rreset is assumed to increase at the same rate as ∆R; so, Cx in (56) is now positive. The increase of Rreset during the drift behavior is given by

푅푅푒푠푒푡,푑푟𝑖푓푡 = 푅푎 + ∆푅 (50) where Ra is the initial reset resistance of the PCM cell. From (50), the crystalline fraction during drift (CX,dd)is

푅푅푒푠푒푡,푑푟푖푓푡−푅푃퐶푀,푑푟푖푓푡 퐶푋,푑푑 = (57) 푅푅푒푠푒푡,푑푟푖푓푡− 푅푆푒푡 where RPCM,drift is the resistance of the PCM cell during the drift behavior and RReset,drift is the maximum resistance of the PCM during the drift behavior(and therefore the crystalline fraction is between 0 to 1).

An additional circuit is added for a comprehensive calculation of RPCM under the different behaviors; its function is as follows. During the drift behavior, the crystalline fraction changes as in (57). Switch sw_d2 checks the behavior; so if the PCM is not been programmed (Vin,out is less than 0.1V), switch sw_d2 is ON. The new crystalline fraction is added to the main crystalline fraction calculation circuit. If the voltage across the PCM is higher or equal to 0.1V, sw_d2 is OFF.

Switch sw_d1 is ON when ∆R is 0, so the PCM can be programmed according to the operational model of the previous section. However if ∆R is higher than zero, sw_d1 is OFF, the crystalline programming from Vd will not be passed to the Cx node. So, sw_d3 and sw_d4 are ON. So, switch sw_dc3 is ON when crystalline programming is occurring while switch sw_dc4 is ON when ∆R is higher than zero. For the circuit in Figure 108, if ∆R is higher than zero and in crystalline programming, the new Cx value (given by (57)) is added to the crystalline fraction circuit. If the resistance drifts (∆R) is zero, sw_d4 is OFF and sw_d1 is ON and the crystalline programming continues. 175

As for the threshold voltage drift of a PCM cell, it is dependent on the crystalline fraction

(given in (45)); by combining it with (45), the drift threshold voltage is used as value when the

PCM is in a totally amorphous state (Cx=0). The threshold voltage of the PCM cell at a different crystalline fraction and the drift time can then be found.

4.2.2.3. Toff Calculation

An important parameter for simulating the drift behavior, is Toff, i.e. the drift time as occurring when the PCM cell is not been programmed.

Fig. 109. Toff Calculation Circuit

An integrator with a reset switch (Figure 109) is used as Toff calculation circuit. Vtoff is the input voltage and is defined as follows.

 Vtoff is equal to 1V when the voltage difference across the PCM is less than 0.1V. This

starts the Toff calculation process.

 If the voltage difference across the PCM is higher than 0.1, Vtoff is equal to zero, and no

drift behavior is simulated. 176

The reset switch is ON when the PCM is in the ON-state (out_sm = 1V); in this case, the value of Toff is reset because no Toff is calculated during programming. Since Toff is found by an integrator, simulation must start by programming/reading the PCM cell and setting the initial input of the integrator to 0. So, the resistance of the PCM cell during the drift behavior is found from

(39) and if Toff is less than T0, then the correct initial conditions are provided, i.e. Toff is made equal to T0 and ∆R to 0.

4.3.2.4. Programming Time and PCM Range

Based on the previously described simulation model, the programming time is dependent on Rc and Rd in the crystalline fraction calculation circuit (Figure 100) and RR1 in the ∆R circuit

(Figure 105).These resistances are not constant, they vary with the PCM range (defined as the difference in resistance between the fully amorphous and the fully crystalline phases, Rreset – Rset).

For example in [91] Rreset and Rset are given by 200k and 7k respectively; the programming time of the Reset state (Treset) is 10ns, while the programming time of the Set state (Tset) is 200ns. The values of Rc and Rd in the crystalline fraction calculation circuit and RR1 in the∆푅 circuit (Figure

105) are given as follows.

37∗10−3∗푟푎푛푔푒 푅 = (58) 푐 193∗103

3∗10−3∗푟푎푛푔푒 푅 = (59) 푑 193∗103

37∗10−3∗∆푅 푅 = 퐵푃 (60) 푅1 193∗103

where “range” in (58, 59) is defined as the difference between the expected maximum resistance when the PCM is in a totally amorphous state (given by (60)) and the resistance when the PCM is in a totally crystalline state (RRESET,drift – Rc), ∆RBP is the resistance drift prior to programming.

177

4.3.3. Model Simulation

This section presents the simulation results of the proposed macromodel. The sources of the experimental data are as follows: [84] for the temperature calculation, [91] for the physical parameters and [86, 87, 88] for degradation calculation. The proposed macromodel can be adjusted to start as specified by the initial time step; this corresponds to the parameter tstart. Then, tstart is used to calculate the simulation time from the integrator circuit. Table 58 shows the physical parameters of the PCM cell used in the simulation; Table 59 shows the calibrated parameters at electrical level in HSPICE. These values are selected as follows.

1. To find the simulation time (or Toff), a very fast input voltage must be provided to the

integrator or the integrator with reset (Figure 109). This pulse (from 0 to 1V) is provided

as input to the in node. By using the calibrated value in Table 59, the initial simulation time

is given by

V ∗tstart t = − out,integ (61) 0.1

where t is the initial simulation time or Toff, Vout,integ is the output voltage of the integrator,

tstart is the initial time step that is used in simulation.

2. As found previously in (61), the value of the simulation time is given by multiplying the

initial time in (61) by 1000. This is subsequently divided back to its original value by the

temperature calculation circuit.

3. In the simulation of this paper, t0 is selected as 1ns; Toff can be found by dividing it by t0

(as per (39)).

4. 0.1V is used as threshold. If the voltage difference across the PCM is smaller than 0.1V, it

is assumed that the cell is not been programmed and the drift behavior occurs. If Vin,out is

higher than 0.1V, the drift behavior does not occur.

178

Table 58. Physical parameters for PCM simulation

Parameter Value

Radius of active region (r1) 50 nm

Radius of PCM cell (r2) 100 nm -3 -1 -1 Thermal conductivity of Ge2Sb2Te5 (k) 4.63*10 J.cm.K .S -3 -1 Thermal capacity of Ge2Sb2Te5 (C) 1.25 J.cm K Volume of PCM cell (V) 7*10-14 cm3

Glass Transition Point (Tx) 200 C

Melting Point (Tm) 600 C

Static resistance of reset (Rreset) 200 kΩ

Static resistance of set (Rset) 7 kΩ

Dynamic-On Resistance (Ron) 1 kΩ

Holding Voltage (Vh) 0.45 V

Programming time of Reset (Treset) 10 ns

Programming time of Set (Tset) 200 ns

Table 59. Calibrated parameters for PCM simulation

Figure Circuit Name Parameter Value

101 Main Rtest1 1Ω

102, 113 Integrator Ri 1Ω

102, 113 Integrator Ci 10*tstart

104, 112 Cry. Fraction Cm 1µF

109 ∆R Estimation CR1 1µF

110 ∆R Before Drift CR2 15µF

111 Cx,rd C1 1µF

By selecting the model parameters as detailed above, the electrical characteristics of a PCM cell are simulated and assessed as follows

179

4.4.3.1. R-I and I-V Curves

To assess the electrical characteristics of a PCM cell, the so-called R-I curve must be generated; this plot allows to test the validity of the proposed macromodel as according to the measured behavior of fabricated devices

Fig. 110. I-R curve of PCM cell

Fig. 111. I-V curve of PCM cell

A pulse sequence must be provided for generating the R-I curve; this sequence consists of

Reset, Read and Set pulses with an increasing amplitude for the Set pulse until it reaches the same 180 amplitude as the Reset pulse. The simulated R-I curve is given in Figure 110. The I-V plot is generated next. Simulation results are utilized by considering the programming pulse sequence as follows. Figure 111 shows the I-V curve of the PCM cell when its initial state is amorphous (Cx =

0), crystalline (Cx = 1), or partial (Cx = 0.9).The simulation results show that the snapback behavior of the PCM cell is generated by the proposed macromodel. Moreover, an accurate characterization of the threshold voltage is accomplished as dependent on the crystalline fraction; this plot closely resembles Figure 101 (taken from [94]).

4.4.3.2. Evaluation of Drift Behaviors

Figures 112 and 113 present the plots of the drift behaviors. The dependence of the resistance or threshold voltage drift on Toff and the crystalline fraction is evident. When the value of the crystalline fraction is nearly amorphous (Cx is close to 0), the drift behavior is more pronounced than when the PCM cell is nearly in the crystalline state (Cx is close to 1).

Fig. 112. Resistance drift of PCM at different crystalline fraction (Cx)

181

Fig. 113. Threshold voltage drift of PCM cell at different crystalline fraction (Cx)

The threshold voltage drift (Figure 113) increases when Toff is increased. Crystalline programming is simulated by providing an input voltage higher than the threshold value for the

PCM cell to be in the ON state; then, the input voltage is decreased until the temperature of the

PCM cell is higher than the glass transition point (but lower than the melting point).

Next, a comparison is made between the resistance and threshold voltage drifts generated by the proposed macromodel and experimental data.

 For the resistance drift at 65 C annealing temperature, the simulation results show that for

a reset resistance (Rreset) of 500kΩ, and a crystalline fraction of PCM (Cx) of 0(totally

amorphous phase), the simulated values are very close to the experimental results of [88]

(Figure 114).

 For the threshold voltage drift, the experimental data of [86] (Figure 103) is utilized to

establish the relationship between the resistance of the PCM and the threshold voltage. As

shown in Figure 115, the simulation results show that simulation using the proposed

macromodel captures with a significant accuracy the relationship between the threshold

voltage and PCM resistance as encountered in experimental data [86]. 182

Fig. 114. Resistance drift ofexperimental data [88] and simulation resultsof proposed macromodel (at R0=500 kΩ)

Fig. 115. Plot of threshold voltage (Vth) versus PCM resistance of experimental data [86] and simulation results of proposed macromodel

Tables 60 and 61 present the percentage errors of the resistance and threshold voltage drifts of the PCM cell.

183

Table 60. Comparison of the resistance drift at different Toff

Resistance of PCM cell (MΩ) TOFF (s) Experimental [88] Simulated Error % 1*10-4 1.4 1.3609 -2.79286 1*10-3 1.65 1.6627 0.769697 1*10-2 2.05 2.0313 -0.9122 1*10-1 2.5 2.4817 -0.732 1 3.1 3.0319 -2.19677

Table 61. Comparison of the threshold voltage drift at different PCM resistance values

Threshold Voltage (V) RPCM (kΩ) Experimental [86] Simulated Error % 410 0.71 0.71154 0.216901 510 0.75 0.74794 -0.27467 650 0.8 0.7989 -0.1375 800 0.85 0.8535 0.411765

These Tables show that the experimental values for the resistance and threshold voltage drifts of [86, 88] are very close to the simulation results using the proposed macromodel. In both cases, the error is at most 3%, thus confirming the accuracy of the macromodel of this paper.

4.4.3.3. Model Comparison

Next, the proposed PCM macromodel is compared with the macromodel of [98] by the same authors. The macromodel presented in this manuscript is very different from [98]. The first difference is its circuit simplicity; this feature is evident when considering the multiplier circuits

(such as for the computation of 0.1휐푟) required to simulate the drift behavior of the PCM cell. In

[98] this important computational step is also dependent on the simulation time, making the PCM macromodel of [100] more complicated. In the proposed macromodel, a single voltage controlled voltage source (VCVS) is used to generate the drift behavior. The complexity of macromodelling 184 is also reflected in the accuracy, because the additional circuits utilized in [98] generate a larger percentage of error in the simulation process [100]. So for example, the error between simulated and experimental data for the resistance drift grows exponentially as function of Toff (at values of

Toff greater than 500ns the error exceeds 4%). By comparison the proposed macromodel has an error below 1% even at a value of Toff of 1ms.

As for the drift behaviors of the resistance and threshold voltage, the model of [98] assumes that when the PCM cell is not been programmed, the reset (amorphous) resistance (Ra) increases with time, while the crystalline fraction of the PCM cell (Cx) remains constant [100]. However this is a simplifying assumption used in the macromodel of [98], thus further contributing to the previously described error. So in the macromodel proposed in this paper, a different scheme is utilized for the drift behavior; the increasing rate of the PCM resistance is based on the initial PCM resistance (R0) and Toff (as given in (39)). Also to model the resistance drift behavior, the drift exponent of the PCM cell (υr) has a constant value; however in the proposed macromodel, υr is depended on R0. Therefore, the resistance drift in the proposed macromodel is more accurate and a smaller error with experimental data is encountered. The constant value of the drift exponent is also used in [98] to model the threshold voltage drift of the PCM [100], thus incurring in a similar error also for this drift behavior.

4.4. Applications of Phase Change Memory (PCM)

Applications of Phase Change Memory (PCM) are considered next. By using PCM as nonvolatile storage elements, different types of PMC-based nonvolatile memory cell are generated.

In this book, Phase Change Memory (PCM) is used as nonvolatile storage element of CAM and

TCAM cells.

185

4.4.1. PCM-Based CAM and TCAM Cells

In this section, the basic principles of the proposed cells are presented. The basic memory core consists of a phase change memory (PCM) as storage element and a CMOS transistor as control element, i.e. this is a 1T1P memory core.

Fig. 116. Block diagram of proposed PCM-based CAM/TCAM cells

Figure 116 presents the block diagram of the proposed CAM and TCAM cells inclusive of the write and search circuitry. The write operation is performed by the write driver. For the search operation, the data in the 1T1P memory core must be read and its value is established using a differential sense amplifier. It is then compared with the search data by using the comparison circuit, such that the output of the search operation is generated. If the stored data is the same as the search data, a match outcome is generated. However if the stored data is different from the search data, the output generates a mismatch outcome. So in the proposed cells, circuits are added to the 1T1P memory core; a differential sense amplifier is used to read the stored data, while comparison between stored and search data is accomplished using a dedicated circuit.

186

7T1P Memory Core

Fig. 117. The proposed 1T1P memory core

Figure 117 shows the 1T1P core in which a PCM is used as storage element and a NMOS transistor (M1) is used as control element. The write and read operations of this proposed 1T1P memory core are established by controlling the voltages at the bitline (BL) and the word lines (WL).

Write Operation

For this operation, the write voltage is obtained as input from BL, while WL is used as selection line. When the word line voltage (VWL) is at VDD, the transistor M1 is ON, the voltage of

BL (VBL) is passed through M1 and drops across the PCM. The 1T1P core can be written based on the value of VBL. The relationship between the data value and the resistance of the PCM is given as follows.

 The state ‘0’ corresponds to the amorphous phase of the PCM (high resistance value).

 The state ‘1’ corresponds to a low resistance value when the PCM is in the crystalline

phase.

 The state ‘2’ or don’t care state (as required for TCAM operation) is given by an

intermediate resistance programmed by an intermediate phase, i.e. between the amorphous

and the crystalline phases.

187

Read Operation

Initially, the bitline is precharged to the Vread value; as the word line is at VDD, M1 is ON.

So VBL flows through M1 and drops across the PCM; the data stored in the core is found by checking the value of VBL. If a ‘1’ (low PCM resistance) is stored in the 1T1P core, VBL is easily passed to

GND because for a ‘1’ the value of VBL is very low. However if a ‘0’ is stored, the value of VBL is higher than for state ‘1’. Therefore, the data stored in the memory core is correctly read.

The change from the OFF to ON states does not occur in this memory core due to the feature of the PCM that during the read operation the voltage drop across the PCM must be less than the threshold voltage. Therefore, the read voltage of the 1T1P core that is precharged to VBL, is limited to the value of Vh; as the holding voltage of the PCM is less than VDD, the value of VBL must change to reflect the data stored in the memory core.

Differential Sense Amplifier

The match or mismatch outcome of the proposed CAM and TCAM cells is generated by using a differential sense amplifier for the data stored in the memory core and by employing at least an ambipolar transistor to compare the stored with the search data. At the designated read time, a differential sense amplifier is required for changing VBL to a two-valued voltage (i.e. either GND

(0V) or VDD), as corresponding to the state stored in the 1T1P core.

Fig. 118. Differential sense amplifier [9] 188

Figure 118 shows the differential sense amplifier of [9]; the difference in values is found by comparing VBL with the threshold voltage of the differential sense amplifier (Vths), then inverters are employed to drive the voltage difference to the output (Vout). If a ‘0’ is stored in the 1T1P core,

VBL is high; if VBL is higher than Vths, then the voltage at node out is at GND (0V). If a ‘1’ is stored in the 1T1P core, the input voltage of the differential sense amplifier (VBL) is less than Vths, so the voltage at node out is VDD.

Two differential sense amplifiers are required for the intermediate state of the TCAM operation; the values of the threshold voltages of the differential sense amplifiers (Vths1, Vths2) are set between the ‘0’ and ‘2’ and the ‘2’ and ‘1’ states respectively, while the output voltages are given at nodes O1 and O2 (Figure 119).

Fig. 119. 1T1P memory core and differential sense amplifiers for TCAM operation

Table 62. Output voltages of differential sense amplifiers for CAM and TCAM operations

CAM TCAM State Vout (V) VO1 (V) VO2 (V) 0 0 0 0

2 N/A 0 VDD

1 VDD VDD VDD 189

Table 62 shows the output voltages of the differential sense amplifiers used for CAM (Vout) and TCAM operation (VO1, VO2). The data stored in the 1T1P memory core is detected from these output voltages.

4.5.1.1. Comparator Circuits

After the data stored in the 1T1P memory core is adjusted by the differential sense amplifier to a two-value voltage (0V and VDD), a comparison circuit is used to compare the stored voltage with the search voltage. The match or mismatch outcome of the proposed CAM and TCAM cells is generated by using the match line voltage (VML) for the output of the comparator circuit.

CMOS-Based Comparator Circuits

Figure 120 presents the CMOS comparator circuit for CAM. By precharging VML to VDD prior to the search operation, the stored voltage is provided at node out (while the search voltage is at node search). If there is a match with the stored data, VML retains its value, else it is discharged.

Fig. 120. CMOS-based CAM comparator circuit

190

The TCAM comparison circuit is given in Figure 121, the stored voltage from the output of the two differential sense amplifiers are provided at nodes O1 and O2, while the search voltages are at nodes S1 and S2. This circuit operates as the CAM counterpart.

Fig. 121. CMOS-based TCAM comparator circuit

Ambipolar-Based Comparator Circuit

The comparator circuit for CAM operation is shown in Figure 122; the stored voltage (as output voltage of the differential sense amplifier) is connected to the gate of the ambipolar transistor

(AMB) while the search voltage is connected to the polarity gate of the ambipolar transistor.

Fig. 122. Ambipolar-based CAM comparator circuit 191

The search voltage (Vsearch) is connected to the polarity gate of AMB for control, i.e. for a search ‘0’ (‘1’) operation, Vsearch is at GND (VDD), so AMB behaves as NMOS (PMOS). The (match or mismatch) outcome of the CAM is established by precharging the match line voltage (VML) to

VDD and using the output voltage from the differential sense amplifier (Vout) as follows.

 Search ‘0’ operation: Vsearch is at GND, AMB behaves as a NMOS and VML is precharged

to VDD. Based on Table 1, if a ‘0’ is stored in the 1T1P core, Vout is at GND, so AMB is

OFF and VML does not change its value, i.e. the match outcome is generated. If state ‘1’ is

stored in the 1T1P core, Vout is at VDD. As AMB operates as a NMOS and its gate voltage

is at VDD, AMB is ON and VML is discharged; so, the mismatch outcome is generated.

 Search ‘1’ operation: If a ‘1’ is stored in the 1T1P core, Vsearch is at VDD and Vout is also at

VDD. AMB behaves as a PMOS and its gate voltage is at VDD, hence AMB is OFF. VML

retains its value, so the match outcome is generated. If a ‘0’ is stored in the 1T1P core, Vout

is at GND, AMB behaves as a PMOS and its gate voltage is at GND. AMB is ON and VML

is discharged. A match outcome is then generated.

Fig. 123. Ambipolar-based TCAM comparator circuit

Figure 123 shows the comparator circuit for TCAM operation; it employs 2 ambipolar transistors. As for CAM, the match line voltage (VML) is precharged to VDD. The search data is 192 provided at the polarity gate of each ambipolar transistor (i.e. the search voltage of each state is at lines S1 (VS1) and S2 (VS2)), while each stored data value from the differential sense amplifiers is provided to the gate of each ambipolar transistor. Table 63 shows the operation of this circuit for

TCAM.

 Search ‘0’ operation: the voltages at S1 and S2 are both at GND. If a ‘0’ is stored in the

1T1P core, the voltages at O1 and O2 are at GND (Table 62). Both ambipolar transistors

(AMB1 and AMB2) behave as NMOS and their gate voltages are at GND; so AMB1 and

AMB2 are OFF, VML retains its value and the match outcome is generated. If ‘1’ is stored

in the 1T1P core, the voltages at O1 and O2 are at VDD. When both ambipolar transistors

behave as NMOS and their gate voltages are at VDD, both ambipolar transistors are ON and

the value of VML is discharged. The mismatch outcome is generated. If a ‘2’ is stored, the

voltages at O1 and O2 are at GND and VDD respectively. Both ambipolar transistors behave

as NMOS and their gate voltages are at GND and VDD. AMB1 is OFF, while AMB2 is ON.

VML retains its value and the match outcome is generated.

 Search ‘1’ operation: the voltages at S1 and S2 are at VDD. If a ‘0’ is stored, the voltages

at O1 and O2 are at GND. Both ambipolar transistors behave as PMOS and their gate

voltage are at GND, then the ambipolar transistors are ON and the match line voltage (VML)

is discharged. A mismatch outcome is generated. If a ‘1’ (‘2’) is stored, the voltages at O1

and O2 are at VDD (GND and VDD respectively). AMB1 and AMB2 are OFF (ON and OFF

respectively); VML retains its value and a match outcome is generated.

 Search ‘2’ operation: the voltages at S1 and S2 are at GND and VDD respectively. When a

‘0’ (‘1’) is stored in the 1T1P memory core, the voltages at O1 and O2 are at GND (VDD),

AMB1 behaves as NMOS and is OFF (ON) while AMB2 behaves as PMOS and is ON

(OFF). Since only one ambipolar transistor is ON, VML retains its value and a match

outcome is generated. When a ‘2’ is stored in the 1T1P core, the voltages at O1 and O2 are

at GND and VDD respectively. AMB1 behaves as NMOS, while AMB2 behaves as PMOS. 193

Their gate voltages are at GND and VDD respectively, both AMB1 and AMB2 are OFF and

VML retains its value. The match outcome is generated.

Table 63. Voltages at nodes O1, O2, S1, S2, and match line voltage of proposed TCAM comparator circuit

Search VS1 VS2 Stored VO1 VO2 VML Outcome

0 0 0 0 0 VDD Match 0 0 0 1 1 1 GND Mismatch

0 0 2 0 1 VDD Match 1 1 0 0 0 GND Mismatch

1 1 1 1 1 1 VDD Match

1 1 2 0 1 VDD Match

0 1 0 0 0 VDD Match

2 0 1 1 1 1 VDD Match

0 1 2 0 1 VDD Match

Simulation Results

The simulation results of the proposed CAM/TCAM cells are presented in this section.

HSPICE is used as simulation tool and the basic PCM model as described in section 4.3.1. is employed for the PCM; its resistance range is initially given by 7kΩ –200kΩ. The macroscopic model of Figure 124 is utilized for the ambipolar transistor; its transistor sizes are adjusted to generate the symmetric conduction between the PMOS and NMOS behaviors. The different circuits of the proposed CAM/TCAM cells are initially evaluated separately. Unless explicitly stated, simulation is performed at a CMOS feature size of 32nm and a supply voltage (VDD) of 0.9V. The performance of the memory cells is obtained by combining the performance of the different circuits.

194

Fig. 124. Model of an ambipolar transistor

1T1P Core

The two basic operations (read and write) of the 1T1P core (Figure 117) are considered first.

Write Operation

For the write operation, the bitline voltage (VBL) must be controlled when the word line voltage (VWL) is set to VDD, VBL is passed through M1 and is dropped across the PCM. For the write operation, the PCM resistance is switched to the ON-state value, and its crystalline fraction (Cx) is changed; Figure 125 shows the write time versus the range of the PCM resistance (i.e. from the amorphous to a crystalline phase); as expected this relationship is linear.

Fig. 125. Write time Vs PCM resistance range of 1T1P memory core when the PCM is programmed from amorphous (‘0’) to crystalline phase (‘1’) 195

Read Operation

The read operation requires to precharge VBL to Vread; when VWL is at VDD, the data in the memory core is read and detected from the bitline voltage. Figure 126 shows the read time versus

VBL of the 1T1P core for a read operation; so, the bitline voltage varies with the read time based on the PCM resistance stored in the core. If a ‘0’ is stored, the PCM is in an amorphous phase and its resistance is high (200kΩ); so the bitline voltage for the read operation of state ‘0’ is higher than for state ‘1’ (in this case, its resistance has the lower value of 7kΩ). Hence at a low PCM resistance

(state ‘1’), VBL is easily passed though the PCM cell to GND.

Fig. 126. Bitline voltage of a 1T1P core for a read operation (the bitline capacitance is 0.03pF)

Intermediate PCM Resistance

The bitline voltage (VBL) of the PCM resistances of the ‘0’ (200kΩ) and ‘1’ (7kΩ) states in a CAM, are different, the VBL of each state is detected when the 1T1P core is read between 0.2ns and 1ns. If ternary data is stored, state ‘2’ (and its intermediate PCM resistance) must be also considered. In this section, the selection of the intermediate PCM resistance of the 1T1P core for the “don’t care” state (‘2’) in TCAM operation is considered.

196

Fig. 127. Bitline voltage of 1T1P memory core when the intermediate PCM resistance is varied

Figure 127 shows that VBL of the 1T1P core during the read operation varies based on the

PCM resistance. The selection criterion for selecting the value of the intermediate PCM resistance for state ‘2’ is that the value of its bitline voltage should be in the middle of the voltages for state

‘0’ (200kΩ) and state ‘1’ (7kΩ). Such a value should be therefore not biased toward none of these two states; therefore for the read operation, the voltage difference between state ‘0’ and the intermediate state must be nearly equal to the voltage difference between state ‘1’ and the intermediate state.

Table 64. Read time and bitline voltage difference (between state ‘0’ and intermediate state and between state ‘1’ and intermediate state) at intermediate PCM resistance values

Intermediate PCM Resistance Read Time (ns) Bitline Voltage Difference (V) 20kΩ 0.378 0.135 30kΩ 0.83 0.19 50kΩ 1.64 0.178 70kΩ 2.514 0.15 80kΩ 3.015 0.14

Table 64 shows the read time of a 1T1P memory core at different values of possible intermediate PCM resistance. At 30kΩ, the bitline voltage difference between state ‘0’ and the 197 intermediate state (equal to the difference between state ‘1’ and the intermediate state) has the highest value at the least read time. Therefore based also on the simulation results of Figure 127,

30kΩ is the appropriate intermediate PCM resistance to represent state ‘2’of a 1T1P memory core.

So the read time of the TCAM cell is then selected such that the values of the voltage differences between the states are high and nearly the same; this occurs at 0.83ns. Recall that the read time of the CAM is given when the bitline voltage difference between states ‘0’ and ‘1’ (7k – 200k) is half of the holding voltage (Vh/2), i.e. 0.294ns; so the read time of the CAM is faster than the read time of the TCAM due to the constraint of the intermediate PCM resistance.

Differential Sense Amplifier

After reading the 1T1P core, the bitline voltage takes the value of the stored data through either a single, or two differential sense amplifiers (Figure 118).

Fig. 128. Output voltage of differential sense amplifier when Vths = 0.15V

Figure 128 shows the output voltage of the differential sense amplifier when changing the input voltage; the threshold voltage of the differential amplifier (Vths) is given by 0.15V.The simulation results of Figure 128 show that the voltage at node out of the differential sense amplifier is switched at 0.25V. So when the input voltage of the differential sense amplifier (VBL) is less than 198

0.25V, the voltage at out is VDD; however if VBL is higher than 0.25V, the voltage at out is at GND

(0V). Table 65 (66) presents the voltage at out (O1 and O2) when the 1T1P core is connected to the differential sense amplifier(s) in a CAM (TCAM).

Table 65. Bitline voltage of 1T1P core and output voltage of differential sense amplifier for CAM operation at read times for the two states

State PCM (kΩ) VBL (0.294ns) Vout (V) 0 200 0.4372 0 1 7 0.2072 0.9

Table 66. Bitline voltage of 1T1P core and output voltages of differential sense amplifiers for TCAM operation at read times for the three states

State PCM (kΩ) VBL (0.83ns) VO1 (V) VO2 (V) 0 200 0.407 0 0 2 30 0.230611 0 0.9 1 7 0.0386 0.9 0.9

As shown in Figure 128, the switching voltage of the differential sense amplifier is different from the threshold voltage of the differential amplifier (Vths), so the relationship between Vths and its switching voltage must be established. Figure 129 shows that the switching voltage changes linearly with the threshold voltage of the differential sense amplifier.

199

Fig. 129. Threshold voltage of differential sense amplifier (Vths) and its switching voltage

Fig. 130. Input and output voltages of differential sense amplifier versus simulation time

Consider next the delay due to the differential sense amplifier; as shown in Figure 130, when the input voltage of the differential sense amplifier is changed from GND to the holding voltage (Vh), the voltage at node out does not suddenly change, i.e. there is a delay in the switching process. The delay of the differential sense amplifier at a 32nm CMOS feature size is given by

0.067ns.

Comparator Circuit

The data stored in the 1T1P core must be changed to a two-valued voltage by a differential sense amplifier; so for the search operation of the CAM/TCAM cells, a circuit is required for comparing the stored data with the search data. In this paper, CMOS-based circuits (Figures 124 and 125) and ambipolar transistors (Figures 126 and 127 respectively) are considered for the operation of data comparison. The results are presented for both CAM and TCAM cells. 200

Comparator Circuit of CAM

The model in Figure 124 is employed to simulate the ambipolar-based comparison circuit of the CAM of Figure 122. The initial values for the voltages at nodes A1 and B1in the model for the ambipolar transistor are given by VDD and GND respectively; the characteristics of the ambipolar transistor are then generated and the delay of the comparison circuit is established.

Table 67. Search time of CMOS and ambipolar-based CAM comparator circuits at a supply voltage (VDD) of 0.9V

Search time (ns) State Stored Voltage (V) Search Voltage (V) CMOS Ambipolar 0 N/A N/A 0 GND (0V) 1 0.74 0.731 0 0.74 0.238 1 VDD (0.9V) 1 N/A N/A

Table 67 presents the CAM search time for the CMOS (Figure 120) and ambipolar-based

(Figure 122) circuits (where the search time is defined as the amount of time that the match line voltage (VML) is discharged until its value is less than half of the supply voltage (VDD/2)). The search times for ‘0’ and ‘1’ are equal for the CMOS-based circuit. The ambipolar-based comparison circuit is overall better than the CMOS-based one especially for the search ‘0’ operation. For the ambipolar-based cell, the time of the search ‘1’operation is slower than the search time of the search

‘0’ operation. During the search ‘1’ operation, the ambipolar transistor behaves as a PMOS; so if a

‘0’ is stored in the 1T1P core, a mismatch outcome is generated and VML discharges its value.

However, the match line voltage is not discharged down to 0V due to the threshold voltage drop across the ambipolar transistor. As per definition, the search time of a ‘1’ operation is slower than the search time of the ‘0’ operation.

201

Comparator Circuit of TCAM

For the comparison circuit of the TCAM cell (Figures 125 and 127) and using the same definition as given previously, the search time of the TCAM comparison circuit is given in Table

68.

Table 68. Search time of the CMOS and ambipolar-based TCAM comparator circuits at a supply voltage (VDD) of 0.9V

Search Time (ns) Search VS1 VS2 Stored VO1 VO2 CMOS Ambipolar 0 0 0 0 0 N/A N/A 0 0 0 1 1 1 4.814 0.347 0 0 2 0 1 N/A N/A 1 1 0 0 0 4.814 1.55 1 1 1 1 1 1 N/A N/A 1 1 2 0 1 N/A N/A 0 1 0 0 0 N/A N/A 2 0 1 1 1 1 N/A N/A 0 1 2 0 1 N/A N/A

The search time of the TCAM ambipolar-based comparison circuit is larger than the search time of the CAM comparison circuit. This is caused by the comparison circuit (of larger complexity for TCAM) and the discharging process of the match line voltage. Moreover as in the CAM case, the comparison circuit based on ambipolar transistors is faster than the CMOS-based circuit. Also it should be noted that the ambipolar-based comparison circuits requires a significantly lower number of transistors (at most two, if the ambipolar transistors are implemented by CNTFETs [101] for TCAM operation).

202

Delay

The total delay for the search operation is given by adding the delay of each circuit in the cell; the results are shown in Table 69.

Table 69. Delay of proposed CAM/TCAM cells for a search operation

Delay (ns) Circuit CAM TCAM 1T1P Memory Core 0.294 0.83 Differential Sense Amplifier [9] 0.067 0.067 Comparison Circuit 0.731 1.55 Total Delay 1.092 2.447

As expected, the proposed CAM cell is faster than its TCAM counterpart; this is mostly caused by the comparison circuit that must take into account the third state for TCAM operation.

However, if accomplished by utilizing a SB-CNTFET [101] (as equivalent to the macromodel of the ambipolar transistor of Figure 124), the delay of the comparison circuit can be significantly reduced because [101] has shown that the inverter delay of a SB-CNTFET at a diameter of 1nm, is nearly 1ps.

Power Dissipation

The power dissipation of each circuit in the proposed cells is found next.

1T1P Core

Only the power dissipation during a read operation is considered when assessing the power dissipation of the 1T1P core. The resistive element is simulated as a variable resistor because the 203 proposed PCM macromodel does not simulate the power dissipation of a PCM cell. Hence the power dissipation of the write operation is not presented.

Fig. 131. Power dissipation of the 1T1P memory core during a read operation

Figure 131 shows the power dissipation of the proposed 1T1P core during the read operation for the three values of stored data; irrespective of the state (and the PCM resistance), the power dissipation of 1T1P memory cell is high at the beginning of the operation, but it decays at higher read times. The high value of the initial power dissipation is due to the switching (from OFF to ON) of the transistor M1.When the state of the transistor is stable (i.e. the ON state) at higher read times, the power dissipation decays reaching a constant and low value. Moreover, Figure 131 shows that the average power dissipation of state ‘1’ is higher than for states ‘2’ and ‘0’ respectively, i.e. when the data in the core is ‘1’ (so the PCM resistance is 7kΩ), VBL is easily passed to GND, thus dissipating more power than for the other two states of larger PCM resistance values.

204

Differential Sense Amplifier

A differential sense amplifier (Figure 118) consists of 9 transistors; therefore, its power dissipation is larger than the 1T1P core (with only 1 PCM and 1 transistor). However, only one

(two) differential sense amplifier(s) is (are) used per column of a CAM (TCAM).

Fig. 132. Average power dissipation of differential sense amplifier

Figure 132 shows the average power dissipation of the differential sense amplifier when the input voltage is switched from 0 to the holding voltage Vh (Figure 130); the average power dissipation of the differential sense amplifier is high at the beginning but it rapidly decreases. This occurs because initially the input voltage is switched from GND to Vh (Figure 130) and the output voltage is switched from VDD to GND.

Table 70. Average power dissipation, average miss delay and power delay product of each circuit in the proposed CAM and TCAM cells

Average Power Average Miss PDP Circuit State/outcome (µW) Delay (ns) (fJ) 0 2.38 0.294 0.6998 1T1P (CAM) 1 4.269 0.294 2.9642 0 1.3542 0.83 1.1240 1T1P (TCAM) 1 4.4175 0.83 3.6726 2 3.4963 0.83 2.9019 205

Differential Sense Amplifier N/A 22.3939 0.067 1.5004 Comparator (CAM) mismatch 43.728 0.731 31.965 Comparator (TCAM) mismatch 24.696 1.55 38.2788

Table 70 shows the average power dissipation, average miss delay and power delay product

(PDP) of each circuit in the proposed CAM and TCAM cells. In both cells, state ‘1’ consumes the highest power, while state ‘0’ consumes the least due to the high resistance value (200kΩ).

Moreover, in state ‘1’ (7kΩ), the bitline voltage can be passed easier to GND. For the average power dissipation of the comparison circuit (CAM and TCAM cells), the macromodel of the ambipolar transistor (Figure 124) is used; this is a very pessimistic value, because the power dissipation in Table 70 accounts for the 10 transistors used in this macromodel (Figure 124) rather than the power dissipation of a fabricated device (using for example a single CNTFET [101]).The average power dissipation and the PDP of both comparator circuits made of a single CNTFET should be even lower than the values obtained for the macromodel of the ambipolar transistor

(Figure 124).

PCM Resistance Range

In this section, the influence of increasing the PCM resistance range (i.e. by varying the resistance of highest value) is assessed for the read/write times as well as the PDP. In the

CAM/TCAM cells, the resistance for state ‘0’ is changed to 100kΩ and 300kΩ.

Table 71. 1T1P core performance under different PCM resistance ranges; (at 32nm feature size and a supply voltage of 0.9V)

PCM Resistance Range Write time (ns) Read time (ns) PDP (fJ) 7kΩ-100kΩ 94.81 0.318 3.0869 CAM 7kΩ-200kΩ 199.34 0.294 2.9642 7kΩ-300kΩ 301.78 0.279 2.9237 206

7kΩ-100kΩ 94.81 1.046 3.693 TCAM 7kΩ-200kΩ 199.34 0.83 3.6726 7kΩ-300kΩ 301.78 0.795 3.6666

Table 71 presents the write time of the 1T1P core (when the data in the memory core is changed from state ‘0’ to state ‘1’), the read time, and the power delay product (PDP) of the 1T1P core for the read ‘1’ operation. As shown in Table 71, the write time of the 1T1P core changes depending on the PCM resistance range (refer also to Figure 125). As for the read time of the CAM and TCAM, the read time of a smaller PCM resistance range results in a larger value. The same effect is observed for the PDP.

1T1P Core/Bitline

In this section, the number of 1T1P cores connected by a single bitline is initially considered at a read time of 0.294ns (0.83ns) for CAM (TCAM) operation as established before.

Fig. 133. Bitline voltage vs number of 1T1P cores per bitline, (read time of 0.294ns and CAM operation)

Figure 133 shows that the bitline voltage of state ‘1’ (7kΩ) increases when the number of

1T1P cores connected to it is increased. However for state ‘0’ (i.e. the PCM resistance is high at 207

200kΩ), the bitline voltage is almost constant because its value is close to Vh (as precharged to the bitline during the read operation). However, the difference between the bitline voltages of states

‘0’ and ‘1’ is still large, hence the read operation can be still executed correctly.

Fig. 134. Bitline voltage vs number of 1T1P cores per bitline, (read time of 0.83ns and TCAM operation)

Figure 134 shows the same plots for TCAM operation. In this case, the lower PCM resistance values (7kΩ and 30kΩ) exhibit the same dependency with the number of 1T1P cores connected to a bitline. Overall, the same considerations as for the CAM case are applicable, even though the bitline voltage differences between adjacent states are smaller to account for the three states of a TCAM.

CMOS Feature Size

In previous sections, the CMOS feature size of the proposed CAM and TCAM cell designs as presented has been fixed to 32nm. Next these designs are also assessed when different HP (high performance) PTMs are utilized at the lower feature sizes of 22 and 16nm.

Table 72. Delay of the proposed CAM and TCAM cells for the search operation when the CMOS feature size is changed (supply voltage is 0.9V)

CAM TCAM Circuit 16nm 22nm 32nm 16nm 22nm 32nm 208

1T1P Memory Cell 0.247 0.265 0.294 0.738 0.78 0.83 Differential Sense Amplifier 0.024 0.045 0.067 0.024 0.045 0.067 Comparator 0.208 0.32 0.731 0.334 0.53 1.55 Total Delay 0.479 0.63 1.092 1.096 1.355 2.447 Table 72 presents the delay of the proposed CAM and TCAM cells for the search operation

(the supply voltage is kept at the constant value of 0.9V). The search time of the CAM is still faster than the TCAM, but the impact of the reduction in feature size is more pronounced for the proposed

TCAM cell. Table 73 shows the same results when VDD is also reduced at the lower feature sizes.

As the values in the PCM resistance are still the same, a reduction in power supply is not as beneficial to the search time as the feature size.

Table 73. Delay of proposed CAM/TCAM cells for the search operation when both CMOS feature size and supply voltage are changed

CAM TCAM Circuit 16nm 22nm 32nm 16nm 22nm 32nm 1T1P Memory Cell 0.338 0.309 0.294 1.023 0.93 0.83 Differential Sense Amplifier 0.039 0.053 0.067 0.039 0.053 0.067 Comparator 0.394 0.5 0.731 0.698 0.87 1.55 Total Delay 0.771 0.862 1.092 1.76 1.853 2.447 Voltage Supply(V) 0.7 0.8 0.9 0.7 0.8 0.9

Comparison

In this section, the proposed CAM/TCAM cells are compared with different schemes found in the technical literature and employing PCM, memristor, MTJ and CMOS. The simulation results are presented as follows.

1T1M Memristor-Based Cell

The proposed memory cell using a PCM is compared with the 1T1M (memristor-based) memory cell of [7]. Consider a comparison according to the following figures of merit. 209

Write Time

The memristance changes based on the direction of the current and the voltage across the memristor. The write operation of the 1T1M memory cell is accomplished by varying VBL based on the value of the data to be written.

 For a write ‘0’ operation, VBL is given by –VDD, while VWL is at VDD, M1 is ON and a

voltage drop exists across the memristor. The memristance is changed to the higher value

(given by ROFF) due to the negative voltage drop across the memristor.

 For a write ‘1’ operation, VBL and VWL are at VDD; so M1 is ON, and there is a positive

voltage drop across the memristor. The memristance is biased to the smaller value (RON)

and the write ‘1’ operation is completed.

 As for state ‘2’, the memristance must take an intermediate value, so the write operation of

the 1T1M is similar to the 1T1P counterpart.

Simulation has been performed by using the same resistance range (7kΩ - 200kΩ) and supply voltage (VDD is 0.9V) for both cores. The (programming) temperatures of the PCM during the write operation from state ‘0’ to state ‘1’ and state ‘1’ to state ‘0’ are fixed to 705K and 1200K degrees respectively, while the threshold resistance of the memristor for a refresh operation is given by 96.5kΩ. Table 74 shows that the 1T1P core has a faster write time than the 1T1M core because the changing rate of the PCM resistance is faster than the changing rate of the memristance. When the CMOS feature size is reduced, the write times for both the 1T1P and 1T1M cores decrease.

Moreover, the write times of the CAM and TCAM cores have similar values, because the write time in Table 74 is for the operation from state '0' ('1') to state '1' ('0'). Table 74 also shows the number of (successive) read operations that the 1T1M core can undertake prior to a refresh operation (not applicable to the 1T1P core).

210

Table 74. Write time, read times and number of read operations prior to refresh for 1T1P and 1T1M cores

CAM TCAM Core 16nm 22nm 32nm 16nm 22nm 32nm

Write time PCM 198.66ns 198.93ns 199.34ns 198.66ns 198.93ns 199.34ns (‘0’ to ‘1’) Memristor 1.837µs 2.023µs 2.202µs 1.837µs 2.023µs 2.202µs

Write time PCM 6.49ns 6.50ns 6.51ns 6.49ns 6.50ns 6.51ns (‘1’ to ‘0’) Memristor 1.263µs 1.385µs 1.435µs 1.263µs 1.385µs 1.435µs

Read time PCM 0.247 0.265 0.294 0.738 0.78 0.83 (ns) Memristor 0.247 0.265 0.294 0.738 0.78 0.83 Number of PCM N/A N/A N/A N/A N/A N/A Reads prior to refresh Memristor 8.17 9.01 9.36 3.08 3.17 3.24 (*103)

Read Time

The read times of the 1T1P and 1T1M cores are the same because in both cases data is stored in terms of resistance (of equal value and range). Moreover as mentioned previously, the read time of the CAM core is faster than the TCAM core. When the CMOS feature size is reduced, the simulation results of Table 74 show that the read times of the 1T1P and 1T1M core decrease as expected.

Number of Read Operations Prior to Refresh

A memristor based core requires a refresh operation to be performed following a number of consecutive read '0' operations; this is required to prevent the stored value of the memristance to reach the threshold value (in this paper, the threshold resistance is given by the mid-value between

ROFF and RON, i.e. 96.5kΩ). The simulation results of Table 74 show that the TCAM cell requires the refresh operation more often than a CAM cell. This occurs because the read time of a TCAM is slower than a CAM; hence, the memristance in the 1T1M core will change following each read 211 operation. Furthermore, when the CMOS feature size is reduced, the number of consecutive read operations prior to the refresh operation of a 1T1M core is also reduced, i.e. at a lower CMOS feature size, the bitline voltage is transferred easier to the memristor, so the read time is faster. Note that for the proposed 1T1P core, no refresh operation is required because the read voltage is limited to the holding voltage (Vh) and the PCM resistance retains its value.

PCM-Based CAM/TCAM Cell of [23]

In this section, the comparison between the proposed CAM and TCAM cells and the PCM- based cells of [23] is pursued.

Table 75. Match line current (IML) of CAM cell of [23] during the search operation, PCM resistance range is 7kΩ – 200kΩ at 32nm CMOS feature size

Stored Search IML (A) -9 0 0 (VSL = 0) -1.38*10 (200kΩ) -6 1 (VSL = 0.4) -1.97*10 -9 1 0 (VSL = 0) -1.38*10 (7kΩ) -5 1 (VSL = 0.4) -4.15*10

The cells of [23] utilize the circuit shown previously in Figure 135; in these cells, the output of the search operation is presented in the form of a match line current (IML). Table 75 shows the match line current of the 1T1P core [23] when the PCM resistance values for CAM operation are given by 7kΩ and 200kΩ. The match or mismatch outcome of the CAM requires the adjustment of the search voltage VSL of Table 75 as function of IML. Hence, a current differential amplifier is required.

212

Fig. 135. CAM and TCAM cells of [23]

Fig. 136. Current differential amplifier [9]

Figure 136 presents the current differential amplifier for comparing the IML of the 1T1P core of [23] with the reference current. IML (reference current) is provided as input at node i1 (i2); the output voltage Vout is then generated, i.e. if IML is less than the reference current (Iref), the voltage at node Out is given by 0V; else, it is given by VDD.

Table 76. Comparison between proposed 1T1P CAM/TCAM cells and CAM/TCAM cells of [23] at 32nm CMOS feature size and supply voltage of 0.9V

CAM TCAM Circuit [23] Proposed [23] Proposed Write Time (ns) 199.34 199.34 209.53 199.34 Search Time (ns) 1.326 1.092 1.346 2.447 Number of Transistors/Core 1 1 2 1 Number of PCM s/Core 1 1 2 1 PDP of Search Operation (fJ) 46.6886 36.4296 48.41 43.4518 213

Table 76 shows the comparison results; the write time of the proposed TCAM cell is faster than the TCAM of [23] due to the higher number of 1T1P cores required to represent a state. There is no difference in write time for CAM operation, because both 1T1P cores use a transistor and a

PCM. As for the search time, Table 76 shows that the search time of the proposed CAM cell is faster than for the CAM of [23] due to the use of a voltage versus a current sense amplifier [23].

However for TCAM, the search time of the proposed TCAM cell is slower than the TCAM of [23].

One of the reasons is that in the proposed TCAM cell, the search time is based on the selection of the time at which the values of the bitline voltage differences between state pairs of the TCAM are closest. Moreover, the comparison circuit of the proposed TCAM consists of two ambipolar transistors, so the discharging rate of the match line is slower. As for the PDP, the proposed cells have better values than in [23]. Hence apart from the search time for the TCAM, all figures of merit are improved using the proposed cells.

CMOS-Based CAM/TCAM Cells

In this section, the comparison between the proposed CAM/TCAM cells with CMOS- based CAM/TCAM cells [23] is pursued at 32nm feature size and at a supply voltage of 0.9V.

Table 77. Comparison of the proposed and CMOS-based CAM/TCAM cells (in which a 6T SRAM is used as storage core)

CAM TCAM Circuit PCM CMOS PCM CMOS Write time (ns) 199.34 0.045 199.34 0.033 Search Time (ns) 1.092 0.589 2.447 0.562 Operating Voltage (V) 0.9 0.9 0.9 0.9 PDP of Search Operation (fJ) 36.4296 14.1285 43.4518 12.4515 Number of Transistors/Core 1 10 1 16

214

Table 77 shows the simulation results; the write and search times of the proposed

CAM/TCAM cells are slower than the CMOS-based counterparts, because the crystallization rate of the PCM is slow. As expected, the deterioration in write and search times of the proposed cell with respect to a CMOS cell is compensated by the lower number of transistors required for implementation and the non-volatile nature of the proposed memory cells.

MTJ-Based CAM Cells [102]

In this section, a comparison between the proposed CAM cell and the magnetic tunneling junction (MTJ)-based CAM cells of [102] is presented at 32nm CMOS feature size and a supply voltage of 0.9V.

Table 78. Comparison of proposed CAM cell and MTJ-based CAM cells [102] (32nm CMOS feature size, supply voltage of 0.9V, match line capacitance of 0.03pF)

CAM Circuit PCM MTJ NAND [102] MTJ NOR [102] Write time (ns) 199.34 1.5 1.5 Search Time (ns) 1.092 0.576 1.044 PDP of Search Operation (fJ) 36.4296 52.367 79.7632 Number of Transistors/Core 1 6 5 Number of PCMs (MTJs)/Core 1 2 2

The results are shown in Table 78. The search times of the MTJ-based CAM cells are faster than the proposed CAM cell; however, the numbers of transistors in the core and the resistive elements are higher in [102]. Moreover, the PCM-based cell proposed in this paper achieves a remarkable improvement in PDP.

215

4.4.2. Multilevel Storage of Phase Change Memory

This section analyzes the multilevel storage (ML) of a Phase Change Memory (PCM) under the presence of drift in its resistance. The circuit of a PCM memory cell as shown in figure 137 is considered; the data is kept as PMC resistance, while a MOSFET is used as selection device. During a programming (write) operation, a current is sent through the bitline (BL), while the word line

(WL) is ON to select the memory cell. If the PCM cell has a resistance RRESET (high value), the voltage at line BL is higher than when the PCM has RSET (low value) as resistance.

Fig. 137. Phase Change Memory (PCM) Memory Cell

A PCM cell can be used as a multilevel memory to increase capacity; this is made possible by its high resistance range, i.e. the difference between the resistances of the SET and RESET states. However, after a PCM cell is programmed, its resistance increases with time; this phenomenon is generally known to as the resistance drift. The resistance drift is believed to be the result of structural relaxation (SR) phenomena that are thermally activated as an atomic rearrangement of the amorphous structure [89]. It has been observed that the drift is significant in the high resistance state (RESET state), in which the phase change material is programmed to the amorphous phase. The low resistance state (SET state) shows a nearly negligible time-dependence of resistance [89]. The rate of resistance increase exhibits a behavior that is strongly related to the time elapsed after programming; this relationship is given by [89].

푇표푓푓 휐푟 푅(푡) = 푅0( ) (39) 푇0 216

휐푟 = 훼푙푛 (푅0) − 훽 (41)

(39) and (41) present the PCM resistance drift and the relationship between the resistance drift exponent (υr) and the initial resistance of the PCM cell (R0) respectively. Consider as an example the experimental data of [103]; by using curve fitting, α and β of (41) are equal to 0.0153 and 0.1138 respectively.

R0 and υr depend on various features [104], so it is very difficult to accurately model their variability through a theoretical analysis [103]. Based upon extensive measurements with PCMs

[105] [106], R0 and υr appear to approximately follow a Gaussian distribution. The drift exponent

υr tends to increase as the initial resistance increases due to more pronounced structural relaxation effects [103]. Hence, a multilevel PCM experiences a difference in resistance drift over time, leading to a significant degradation in data integrity [103].

Fig. 138. PCM resistance distribution over time

Figure 138 shows the basic principles of resistance drift in a multilevel PCM cell. The resistance at each level varies according to a Gaussian distribution and accounts for less (more) drift when the PCM is in the (amorphous) crystalline phase. The resistance of each level changes during Toff (equal to T in figure 138) following the selection of the so-called threshold resistances 217

(as separating the levels). This results in an erroneous output following a read operation due to overlapping levels in a PCM cell.

The erroneous effects of the resistance drift in a PCM cell can be alleviated if the threshold resistances could also vary with time. In [103], a time-aware fault-tolerant scheme is used for correcting the resistance drift of a PCM. The drift behavior of the threshold resistance is taken into account by the so-called lifetime of the PCM (td) in the form of time tag bits and using them to find the threshold resistances; however, the lifetime and the threshold resistances require an extensive calculation at an added circuit complexity and overall degradation in figures of merit, such as delay and power dissipation of the memory.

4.4.2.1. Multilevel Storage

Multilevel storage is achieved through an accurate programming of the PCM cell into intermediate resistance values, i.e. the values between the SET and RESET states [107]. This scheme however, is susceptible to process and material variability; for example, the temperature that is generated in the PCM by using the same programming pulse, varies from cell to cell.

Therefore, a single pulse programming arrangement is not a viable option for multilevel PCM storage, because the resulting resistance level distributions are rather broad and difficult to predict

[107]. A possible solution is to employ an iterative programming strategy that starts by reading the most recent resistance value of a PCM and comparing with a reference value; then, a programming current is utilized to bias and adjust the resistance of the PCM cell to the desired value. The PCM cell is then read again to ensure that the new resistance is as expected; if not, the above process is repeated. The process of iteratively programming and verifying a multilevel PCM cell (through multiple read and write operations) is shown in figure 139; at each iteration, the programming pulse varies, as based on the difference between the programmed (actual) and the target resistances [144].

218

Fig. 139. Iterative scheme of programming and verification for a multilevel PCM cell [107]

A reference resistance (Rref) is required for each level in the execution of the iterative programming process and the verification of the programmed PCM cell (figure 139). However, the reference resistance is not constant due to the drift behavior. In this paper, the reference resistances of the PCM cell under a drift behavior are generated as follows. A row of PCM cells (figure 140) is utilized. Initially, the resistance of every PCM cell in a row is set to the threshold value of that level. During a read or write operation, the input current is provided to each line in the so-called row of threshold resistance (denoted by L1, L2, … , LN). The transistor (denoted by Mth in figure

140) is then turned ON, every PCM cell on the row (figure 140) is read by monitoring the voltage of each line (i.e. L1, L2, … , LN). The resistance of each PCM cell in the row (figure 140) is therefore given by

푉푁 푅푁 = (62) 퐼푟푒푎푑

where RN is the PCM resistance at line N, VN is the voltage at line N when the read current

(Iread) is provided to every line.

219

Fig. 140. Row of PCM cells for calculating the threshold resistance of a level

After reading the resistances of the lines (figure 140), the threshold resistance is established by calculating the median value, i.e. the median resistance is selected as the threshold resistance of a level.

The threshold resistances are used to partition the resistance of each PMC cell into regions

(figure 141). During a read or write operation, the threshold resistances are used as reference resistances (RREF) and compared to the corresponding resistances of a PCM cell (figure 139). All resistances however drift; so by selecting the threshold resistance between regions, the drift of the threshold resistances still occurs, but its effects are mitigated. However, an additional criterion

(referred to as resistance separation) must be considered too.

Fig. 141. Separation of initial resistance of a PCM cell

Figure 141 shows the resistance separation of a PCM cell used in this manuscript; the initial resistance of the PCM is divided into several so-called regions. The following regions are defined 220 for a level: the write region (w), the read region (r) and the blank region (b). The write region (w) is the region for which the PCM resistance can be written at the designated level, the write region starts from swX and ends to wX, where SwX and wX are the starting and ending values of the write region of level X respectively. ThX is the threshold resistance of level X. If the resistance of the

PCM is higher than swX and less than ThX, this region is referred to as the read region of level X.

The threshold resistance of a level (ThX) is selected from a row of PCM cells (figure 140).

Since the drift behavior of each PCM cell is different, it is possible that ThX drifts higher than the starting value of the write region of the next level (Sw(X+1)), thus resulting in an incorrect write/read operation. A blank region (b) is used to protect this from occurring. As shown in figure

141, each level of a PCM is separated into 3 regions; the last level (level N) has only a single (write) region, because the PCM is totally amorphous; at this high resistance value, no threshold resistance is needed. Due to the non-linear nature of the resistance drift behavior (as evidenced by (39)), the read and blank regions must be increased at least by the same rate as the PCM resistance drift. The process for finding the level separation is then discussed in the next section.

Fig. 142. Flowchart of initial resistance separation for a cell with N levels

221

4.4.2.2. Level Separation

Figure 142 shows the flowchart of the process for finding the initial level separation of the

PCM. The PCM cell is separated into levels; each level is simulated and compared with its value for the read region. If the PCM resistance is in the read region of that level, then the data is valid and retained. However if the PCM resistance is not in the read region of the level, then the data in the cell is erroneous due to the drift behavior. A comprehensive process referred to as the flat initial level separation must be undertaken. This is found by calculating the percentage accuracy in each level and adjusting the level resistances till the percentage accuracies are very close.

The level separation and threshold resistance selection are analyzed in more detail next.

The level separation resistance is given in percentage form, i.e. it is equal to 0% (100%) when the

PCM is totally crystalline (amorphous). The percentage resistance of the PCM is converted into the crystalline fraction (CX) by the following equation

%푅푒푠𝑖푠푡푎푛푐푒 퐶 = 1 − (63) 푋 100

As shown in figure 141, the initial level separation is generated by dividing the PCM cell into resistance levels because the range of the PCM resistance increases over time due to the drift behavior.

Example: Consider the threshold resistances of a 3-level PCM cell; assume that the initial threshold resistances of levels 1 and 2 (Th1 and Th2) are separated by 25% and 75% respectively. So for a

PCM resistance range of 7kΩ – 200kΩ, the initial threshold resistance of level 1 (Th1) is given by

55.25kΩ.

As shown in figure 142, the flat initial level separation requires few operations (Initial

Resistance Separation, Resistance Simulation, Percent Accuracy Calculation, and Level Separation

Adjustment) as discussed in more detail next.

222

Initial Resistance Separation

The PCM resistance is initially separated into regions (figure 141) in a non-linear fashion, i.e. the read and blank regions increase in percentage resistance. The initial PCM resistance separation is given as follow:

 The starting values of level 1 (sw1) and level N (swN) are given by 0 and 100-%write

respectively (where %write denotes the constant percentage of the write region at any

level).

 The ending value of the write resistance of level N (wN) is given by 100%,

 The PCM resistances of the other levels are set to values that are evenly separated.

Example: Consider a PCM cell with 3 levels and assume that the write region is given by 1%.

Table 1 shows the initial level separation as generated using the above process. Note that the percentage resistances of Th1, Sw2, and Th2 are selected by having the read and blank regions to be nearly equally spaced.

Table 79. Initial Percentage PCM Resistance Separation

Sw1 W1 Th1 Sw2 W2 Th2 Sw3 W3 % 0 1 25 50 51 75 99 100

PCM Resistance Simulation

After partitioning into levels (as shown in figure 141), the drift must be assessed and therefore the PCM cell and its resistance are simulated. Two cases are possible for a resistance outside of the read region.

 The threshold resistance of the previous level (Th(X-1)) drifts until its threshold value is

higher than the starting value of the write resistance in the current level (SwX). This error

is referred to as the lower band error. 223

 The resistance drifts until its value is higher than its threshold resistance (ThX); this error

is referred to as the higher band error.

The worst case of resistance drift at a level is considered by simulating the lower and higher band errors. The lower and higher band errors at a level are found by considering the starting value of the write resistance (SwX), the ending value of the write resistance (wX) and the threshold resistance (ThX) of the level. The threshold resistances are generated using a row of PCM cells and finding the median value (figure 140). The PCM cell is simulated after the initial resistance of a level is found based on the initial level separation.

Fig. 143. Flowchart of reference threshold resistance process

Fig. 144. Flowchart of reference threshold resistance process for each level of a PCM cell in a memory array

Figure 143 shows the flowchart of the proposed threshold resistance process; the initial threshold resistance (ThX) is selected from the level separation. The threshold resistances are found 224 by setting the initial PCM resistance of every cell in a row to the initial threshold value (ThX) and finding the median resistance among the PCM cells on the row after a drift time Toff.

After the threshold resistances are established for the levels, these values are used as reference resistances (figure 144) for the read and write operations.

 During the read operation, the reference resistances (RREF) are compared with the PCM

resistances; if a PCM resistance is in the read region of the same level, then the PCM cell

stores the correct data; if the PCM resistance is not in the read region of the level, then this

cell cannot be used to hold data and a rewrite operation (as corrective action) is required.

 For the write operation, the reference resistances are used as band of the write region. The

least band is selected as the starting value of the write region of a level (SwX), while the

largest band is selected as the write region (wX). So if the PCM resistance is in this write

band, the write operation can be correctly performed.

Consider next the errors that are possible in this process. The lower band error occurs when the threshold resistance of the previous level (RTH(X-1)) is higher than the starting value of the write resistance of the current level (RTMIN). Consider the starting value of the resistance of the write operation at a designated level (swX). The initial resistance is set to swX to find the lower band error of a level; the threshold resistance of the previous level (RTH(X-1)) is then considered. After a time Toff, the threshold resistance of the previous level and RTMIN change their values. It is possible to establish whether the PCM can correctly hold data by comparing the drift value of RTH(X-1) with

RTMIN.

The drift behavior of a PCM cell varies depending on several factors; moreover, the drift behavior of each PCM cell has a different value of initial resistance (R0). The percentage error of the lower band (Emin) is given by

퐿 퐸 = 푖 ∗ 100 (64) 푚𝑖푛,𝑖 푀 225

where Emin,i is the percentage error of the lower band for level i, Li is the number of PCM cells at level i for which RTH(i-1) is higher than RTMIN,i, and M is the number of PCM cells in the memory array. Only the first level of the PCM has no percentage error for the lower band (Emin,1 = 0), because the previous threshold resistance of level 1 does not exist (as based on (64)).

The percentage error of the higher band error can be found by using a similar method as for the lower band error. By initially setting the resistance of the PCM array to the ending value of the write region (wX), this resistance (RTMAX) is compared with the threshold value ThX after a time Toff. If RTMAX is less than its threshold value, then the PCM cell can retain its value as data.

Else, the PCM cell cannot keep the data. The percentage error of the higher band is given by

퐻 퐸 = 푖 ∗ 100 (65) 푚푎푥,𝑖 푀 where Emax,i is the percent error of higher band when considered at level i, Hi is the number of PCM cells at level i such that its resistance (RTMAX,i) is higher than its threshold value (RTH,i). As the write region of the last level (WN) is at 100% resistance, the threshold resistance of the last level (ThN) can be ignored; so, hereafter the higher band percentage error of level N is also ignored.

Percentage Accuracy Calculation

The percentage accuracies of each level of a PCM are calculated as follows.

L A = 100 − E = (1 − i) ∗ 100 (66) min,i min,i M

H A = 100 − E = (1 − i) ∗ 100 (67) max,i max,i M where Amin,i and Amax,i are the percentage accuracies of the PCM at level i for the lower and higher band errors respectively.

Due to the variation in PCM resistance, the percentage accuracy of a cell is not the same in each simulation run; so, the lower and higher band percentage accuracies (Amin,i and Amax,i) are simulated for NS times to find the average values as well as the average percentage accuracy of each level. The flat percentage accuracy for both the lower and higher bands is then established. At high 226 percentage accuracy, a PCM can better tolerate a resistance drift for that level; the level resistances must be adjusted by flattening the percentage accuracy. If the least and highest values of percentage accuracy are substantially different, part of the resistance for the region with the highest percentage accuracy will be allocated to the region with the least percentage accuracy. This will effectively balance the percentage accuracies among the levels of the PCM cell; this process is referred to as the flat initial level separation.

Example: Based on the initial percentage resistance of each level (Table 79), the threshold resistance (ThX) and starting value of every level (SwX) are found. Consider an initial resistance range from 7kΩ – 200kΩ.

Table 80. Initial resistances when the PCM resistance range is from 7kΩ – 200kΩ

Sw1 W1 Th1 Sw2 W2 Th2 Sw3 W3 R (kΩ) 7 8.93 55.25 103.5 105.43 151.175 198.07 200

Table 80 shows the initial resistances; the threshold resistances are found next. After time

Toff, the resistances of the PCM cells in a row vary and increase; Th1 is found by selecting the median value of a row of cells and using it as the threshold resistance Th1. This process is also applied to the other threshold values. After selecting the threshold resistance of every level by using the method described above, the percentage accuracy of a level is found by comparing the PCM resistance with these values. The values of the percentage accuracy of the lower and higher bands are shown in table 81.

Table 81. Percentage accuracies when the PCM resistance is initially separated (from Table 79)

Level 1 2 3 %Lower Band - 99.0079 80.5079 %Higher Band 100 90.4539 -

227

The percentage accuracy is dependent on the level separation (table 80), i.e. the lower band percentage accuracy is related to the blank region of the previous level, while the higher band percentage accuracy is related to the read region of the considered level. As shown in table 81, the percentage accuracy of the lower band of level 3 has the least value, while the percentage accuracy of the higher band of level 1 has the largest value. Moreover, the blank region of level 2 is small, while the read region of level 1 is rather large. The PCM can better tolerate the drift behavior by reducing the resistance from the read region of level 1 and increasing the resistance of the blank region of level 2. If the least and largest values of the percentage accuracies of this level separation are not very close, the process of adjusting the level resistances results in a flat percentage accuracy.

4.4.2.3. Adjustment and Selection

As presented previously, the level resistances must be adjusted to balance the average percentage accuracies of a PCM. This adjustment process is treated in more detail in this section.

If the values of the least and largest percentage accuracies are close together, tolerance to the resistance drift is said to be balanced and the so-called flat initial level separation is accomplished.

Else, an adjustment in resistances must be considered.

The percentage accuracy of the higher band is related to the read region of a level. As the write region (w) has a constant value, then the higher band percentage accuracy is related to the resistance difference between the ending value of the write region (Wx) and its threshold value

(ThX). If the value of the higher band of the percentage accuracy of level X is the highest

(maximum), a portion of the resistance from this region can be allocated to the region with lower percentage accuracy. Also, the reverse scenario can be taken care by such adjustment. By using this method, the percentage accuracies of the levels of a PCM are made close in value.

Example: Based on table 81, the percentage accuracy of the higher band of level 1 (corresponding to its read region) is the highest. However, the percentage accuracy of the lower band of level 3 has 228 the least value (as the blank region of level 2 is low). So, the initial resistances must be adjusted.

If the difference between the least and largest percentage accuracies of the levels (table 81) is high

(i.e. 5%), then the percentage resistance in each region is adjusted by 0.1%; if such difference is less than 2%, then the adjustment is given by 0.01%. However, if the difference is higher than 2% but less than 5%, the percentage resistance is adjusted by 0.05%. So based on the average percentage accuracy of each level (table 81), the initial percentage resistance of the read region of level 1 is reduced by 0.1%, while the percentage resistance of the blank region of level 2 is increased by the same amount. These adjustments are shown in table 82.

Table 82. Adjustments to the initial percentage resistance separation

Sw1 W1 Th1 Sw2 W2 Th2 Sw3 W3 %(Non-Adjusted) 0 1 25 50 51 75 99 100 % (Adjusted) 0 1 24.9 49.9 50.9 74.9 99 100

Based on the initial percentage level separation, the percentage resistance of Th1 is changed to 24.9%. For the resistance range of the other regions to be the same, Sw2 must be 49.9, because the blank region of this level is still equal to 25% (50-25 = 49.9-24.9). The percentage for the write region is fixed at 1%, so W2 is 50.9%. Next, the percentage resistance of Th2 is considered, because Sw3 and W3 are fixed at 99% and 100% respectively. The percentage accuracy of the lower band of level 3 has the least value; so the blank region of level 2 must be increased by

0.1%. From the non-adjusted level separation, the resistance difference between Sw3 and Th2 is

24%. By increasing this resistance range by 0.01%, the percentage resistance of Th2 is equal to

74.9% and Sw3 – Th2 = 24.1%. Therefore, the initial level separation is completed. By simulating again the PCM cell, the percentage accuracy is found for each region and the initial PCM resistance is adjusted until the percentage accuracies are close in value.

229

Table 83. Flat initial percentage resistance Separation and percentage accuracy of each level for a PCM cell with 3 levels

Level 1 2 3 Sw 0 9.15 99 W 1 10.16 100 Th 3.16 22.16 - %Lower Band - 99.9346 99.9952 %Higher Band 99.9016 99.8909 -

Table 83 shows the flat initial level separation when the write region is fixed at 1%; the percentage resistance of each level is adjusted until the difference in percentage accuracy (between the least and the largest values) is close. The value of the adjustment of the percentage resistance is very small, especially when a PCM cell is separated into a large number of levels. In this case, the percentage accuracy of a level is reduced, because the read and blank regions in a level cannot tolerate a large resistance drift, i.e. a very small change in percentage resistance results in a large change in percentage accuracy for that level.

Example: Consider a PCM cell with 8 levels and assume that when starting the simulation, the higher band percentage accuracy of level 1 has the least value (66.3145%), while the lower band percentage accuracy of level 4 has the largest value (85.7085%). The PCM resistance is adjusted by 0.25 per simulation for the regions that have the least and largest percentage accuracies.

Therefore as established by simulation, the higher band percentage accuracy of level 1 has the largest value (97.2452%), while the percentage accuracy of the lower band of level 2 has the least value (69.4807%). Simulation shows that even a 0.25 percentage resistance change is too high, because the percentage accuracy of the higher band of level 1 is increased from the least

(66.3145%) to the highest percentage accuracy (97.2452%) in the next simulation. A flat level separation is required if the difference between the largest and the least values of the percentage accuracy is less than a specified requirement (1% for example); if the adjustment value is high, a 230 flat level separation cannot often be found. So, the adjustment in each simulation should be selected appropriately.

Fig. 145. Maximum (largest) and minimum (least) percentage accuracies vs adjustment of percentage resistance (4 levels/cell, write region is 1% and 5%)

Figure 145 shows the largest and least values of percentage accuracy when a flat level separation is found and the adjustment to the percentage resistances in each simulation run is varied.

The difference in percentage accuracy is also high at a higher value of adjustment. However, the difference is low by using a lower value of adjustment in each simulation run.

Fig. 146. Number of simulations required for finding a flat initial level separation (4 levels/cell, write region is 1%) 231

Figure 146 shows the relationship between the number of simulations and the adjusted percentage resistance. The flat level separation is found when the difference between the largest and least values of percentage accuracy is less than 3%. The simulation results of figure 146 confirm that at a low value of adjustment in percentage resistance, the number of simulations for finding the flat level separation is higher than at a higher value of adjustment in percentage resistance. So, a PCM cell can better tolerate the resistance drift at lower values of adjustment in percentage resistance. Furthermore, when the adjusted value of the percentage resistance is increased by 0.5, the flat PCM level separation cannot often be found.

The threshold resistance of each level must be selected after finding the level separation for flattening the percentage accuracy. As discussed previously, the threshold resistance of each level is selected from a row of cells (figure 143) by using the previously described median method.

As found in the literature, the use of a method based on the mean (average) value for selecting the threshold resistances requires that every cell in a row must be continuously read, thus resulting in a high power consumption. Moreover, the resistance does not drift when the PCM is being read, so the threshold resistances found by using a mean-based method may not have correct values.

However by using the median, only one PCM cell in a row is selected as holding the reference resistance; hence, the power dissipation is significantly less. Moreover, the proposed median method is not significantly affected when considering the drift behavior of the threshold resistances.

4.4.2.4. Cell-Level Simulation

In this section, the simulation and discussion of the proposed schemes for flat level separation and threshold resistance selection are pursued at cell-level. The data of [98] is used to simulate the resistance drift of a PCM cell; MATLAB is used as simulation tool.

232

Table 84. Parameters for simulating PCM cell

Parameter Value

Reset Resistance (Ra) 200 kΩ

Set Resistance (Rc) 7 kΩ

α Constant of 휐푟 (3) 0.0153

β Constant of 휐푟 (3) 0.1138

SDMR of 휐푟 20%

Time constant (T0) 1 ns Percent write region of each level (w) 1% Number of PCM cell in each row (Nth) 16 Number of PCM cell in the memory array (M) 10,000

Table 84 shows the parameters used to simulate the level separation, the threshold resistance selection and the percentage accuracy of each level. The resistance value (RT) of the

PMC prior to drift is calculated using (39); based upon experimental measurements [105, 106], the two parameters (R0 and υr) approximately follow a Gaussian distribution and the drift exponent (υr) increases as R0 increases [98]. υr is calculated from (5.4) using a constant R0 and

2 varies according to a Gaussian distribution 풩(휇υ, 휎υ ), υr is used as the mean value of the Gaussian distribution (i.e. 휇υ). The resistance drift is calculated using the flowchart of Figure 147 by assuming that all levels have the same standard deviation to mean ratio (SDMR) [98].

Fig. 147. Flowchart of resistance drift calculation 233

Initially, the percentage resistance (denoted as %resistance) is selected and the crystalline fraction of a PCM cell (CX) is calculated. The initial PCM resistance (R0) is then generated. The drift coefficient (υr) is calculated using (41); as υr varies according to a Gaussian distribution, the mean of this distribution is set to υr. Its deviation is given by 0.2*υr as required for a 20% SDMR

(Table 84) [98]. (39) is used to calculate the resistance drift of a PCM cell (RT) at time Toff (where

Toff is the time that the PCM cell is allowed to drift). t0 is the time constant at which R0 is read, i.e. t0 is nearly equal to zero, because the initial resistance is read following a write operation (in this paper, t0 has a value given by 1ns).

Fig. 148. Average resistance of PCM vs Toff at different values of crystalline fraction (Cx)

The resistance drift behavior of the PCM cell is found by simulating and making its variation to follow a Gaussian distribution. Figure 148 presents the average resistance under drift as found at different crystalline fraction values (CX).

Flat Initial Level Separation

In this section, the PCM resistance is divided into regions. The following conditions are applicable to this process.

 The percentage of the write region at every level (w) is constant (1% is assumed). 234

 The number of PCM cells in each row for finding the threshold resistances (N) is given by

100.

 The number of PCM cells in the memory array (M) is given by 10,000.

The flat initial level separation is generated as follows. The percentage accuracy in a level changes during each simulation run as per the Gaussian distribution in PCM resistance; the PCM cell is simulated by using 100 runs and the average percentage accuracy of each level is found. The resistances in the regions are adjusted to balance the average percentage accuracies of the levels of a cell (figure 142); the least and largest values of percentage accuracy are then found.

Table 85. Flat initial PCM level separation, 4 levels/cell and write resistance is 1%

Level 1 2 3 4 sw 0 5.2036 21.85697 99 w 1 6.2036 22.85697 100 Th 2.5 10.66245 41.00371 - %Lower Band - 98.424 98.9862 99.136 %Higher Band 98.8204 98.2047 97.9745 -

Table 85 shows the flat initial level separation of a cell with 4 levels at a 1% write region.

Based on the simulation results of table 85, the starting value of the write resistance in level 1 (Sw1) is at 0% (the percentage accuracies of the lower band of level 1 and the higher band of level 4 are not considered). The percentage accuracies of both the lower and higher bands are close together at nearly 98%. Hence, a flat initial level separation is found.

Table 86. Flat initial level separation for a PCM cell with 8 levels and write region is 1%

Level 1 2 3 4 5 6 7 8 sw 0 2.12 5.21 9.99 17.67 30.73 54.13 99 w 1 3.12 6.21 10.99 18.67 31.73 55.13 100 Th 1.49 4.03 7.79 13.61 23.32 40.03 70.98 - %Lower Band - 79.34 80.13 81.08 81.49 81.60 81.28 81.12 %Higher Band 79.78 79.65 79.54 78.63 78.37 78.476 78.74 - 235

Table 86 shows the flat initial level separation of a cell with 8 levels and a 1% write region; the average percentage accuracies of the levels of this PCM are very close together, so the tolerance of every level due to drift is nearly the same.

Write Region

In a previous section, the PCM resistance is divided into multiple levels using the method of figure 142. In this section, the effects of the write region (w) (as found in the flat initial level separation) and the average percentage accuracy are considered.

Fig. 149. Flat initial level separation of a PCM cell with 4 levels

Figure 149 presents the percentage resistance of each region of a PCM cell with 4 levels.

The initial level resistance accuracies are changed when the percentage of the write region (w) changes; this occurs, because the percentage accuracies of the levels must be balanced to generate the flat initial level separation.

236

Fig. 150. Average percentage accuracy in each level of a PCM array at 4 levels per cell and Toff at 1 year

Fig. 151. Average percentage accuracy of levels in a PCM array at 4 levels per cell and Toff at 1 month and 1 year

Fig. 152. Average percentage accuracy of PCM (4 levels per cell and Toff at 1 ms, 1 second, 1 minute, 1 hour, 1 month and 1 year)

237

Figures 150 and 151 show the average percentage accuracy of each level when the PCM resistance is initially separated using the proposed technique. At a higher resistance percentage of the write region, the average percentage accuracy is lower due to the adjustment in resistance among regions; the simulation results of figure 152 show that the average percentage accuracy at a lower value of Toff is higher (however, the write operation at a higher percentage value of the write region (w) is faster than at a lower value).

Threshold Resistance

The threshold resistance is calculated from a row of PCM cells (figure 140) using the median method described previously. In this section, the number of PCM cells in a row for finding the threshold resistance of each level is furthermore considered. The relationship between the average percentage accuracy (A) and the number of PCM cells in a row (N) is also found.

Fig. 153. Average percentage accuracy of a PCM cell with 4 levels, 1% write region (Toff time of 1ms, 1second, 1 minute, 1 hour, 1 day, 1 week, 1 month and 1 year)

Figure 153 plots N versus A when finding the threshold resistances by using the proposed median method (at a constant write region percentage of 1%); when N is reduced, A is also substantially reduced, thus showing that the proposed method is viable for large memory arrays. 238

However, at a higher number of PCM cells per row, the found median value is likely to be appropriate to account for the resistance drift. Moreover, as shown in figure 153, N does not significantly affect A, because it reaches a saturated value (albeit at different Toff values).

Median method for threshold resistance selection

As described in a previous section, a median method is used in this manuscript to select the threshold resistance of each level of a PCM cell. However, its value does not change following its calculation to account for the drift behavior; so, a corrective action in the form of a rewrite is required. The average percentage accuracy is increased by rewriting those cells on a row with the least and largest resistance values.

Fig. 154. Average percentage accuracy of PCM cell at 1% write region, and Toff time is 1 second, 1 month, and 1 year vs number of read operations for proposed median method

Next the percentage accuracy of the adjusted (denoted as Adjustment) and non-adjusted

(also referred to as Basic) PCM resistances are compared. Figure 154 plots the average percentage accuracy when increasing the number of read operations, i.e. every time that a PCM cell is read,

Toff is reset to zero. Due to the Gaussian distribution of the drift, the resistance value varies when the cell is read or written; hence, its average percentage accuracy is reduced if a PCM cell is read 239 several times. The simulation results of figure 154 show that the adjusted PCM resistance can better tolerate the drift. Moreover, figure 154 shows that Toff affects the percentage accuracy of a PCM cell and the number of read operations that a PCM can tolerate, i.e. at a low Toff value due to the smaller variation in resistance, the PCM can better tolerate the drift and the number of read operations. At a small number of read operations, the average percentage accuracies of both the adjusted and non-adjusted PCM resistance values are reduced, because the resistance variation is increased, i.e. after many read operations, the percentage accuracy drops to nearly zero, because the PCM resistance variation is very high.

4.4.2.5. Array-Level Simulation

After finding the level separation and flattening the percentage accuracy, the average percentage accuracy of a PCM cell must be considered at different number of levels. The PCM resistance is separated into 3, 4, and 8 levels per cell and the relationship between the average percentage accuracy and the time Toff is established (initially using 1% as a constant percentage write region).

Fig. 155. Average percentage accuracy of PCM cell at 1% write region 240

Figure 155 shows the average percentage accuracy of a PCM with multiple levels; the average percentage accuracy decreases at a higher number of levels, because the resistance is used for tolerating the drift in the storage medium.

Next, a comparison between PCM arrays with cells separated into 4 and 8 levels is pursued.

The simulation results are shown in figure 155, at a Toff of 1 year, the average percentage accuracy of a 4-level cell is 98.3245%, while the average percentage accuracy of a 8-level cell is 79.1117%.

So, an increase in memory density of the array causes a decrease in percentage accuracy. The percentage accuracy of a 8-levels/cell is significantly less than for a 4-levels/cell, so the number of rewrite operations in a 8-levels/cell must be significantly higher than for 4-levels/cell. This results in slower write and read operations.

By considering the number of faulty cells in a PCM array of 1M (106) cells (4 levels/cell), the number of cells per row (N) is 16 (as in a previous section); also, a mission time is considered and Toff is now expressed as a percentage of the mission time.

Fig. 156. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 1% write region, and 16 PCM cells per row to find the resistance levels

241

Fig. 157. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 1% write region, and 32 PCM cells per row to find the resistance levels

Fig. 158. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 1% write region, and 64 PCM cells per row to find the resistance levels

Fig. 159. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 5% write region, and 16 PCM cells per row to find the resistance levels 242

Fig. 160. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 5% write region, and 32 PCM cells per row to find the resistance levels

Fig. 161. Average number of faulty cells in the 1M PCM array versus Toff (%); 4 levels/cell, 5% write region, and 64 PCM cells per row to find the resistance levels

Figures 156 to 161 show the number of faulty cells at different write region (1% or 5%) and number of PCM cells in a row (16, 32, and 64); when Toff increases, the number of faulty cells also increases. Moreover, when changing the percentage of the write region from 1 to 5%, the number of faulty cells increases too.

243

Fig. 162. Average number of faulty cells in a 1M PCM array versus time (Toff); write region is 1% and mission time is 1 year

Fig. 163. Average number of faulty cells in a 1M PCM array versus time (Toff); write region is 5% and mission time is 1 year

As shown in figures 162 and 163, the number of faulty cells in an array made of PMC cells with 8 levels is very high when compared with arrays made of 3 and 4 levels/cell; so the number of re-write operations that must be executed to correct the cells affected by the drift behavior in a

8-levels/cell array is also higher.

Lifetime

In this section, a comparison of the so-called lifetime is pursued using the proposed flat

PCM level separation and the resistance margin scheme [89]. The resistance margin between any 244 adjacent states is increased in [89] to prevent the post-drift resistance levels to overlap (figure 164

[89]). The margins between any two adjacent states are non-uniform and increase significantly; for example a 5 fold resistance difference between any pair of adjacent states (i.e. Rstate00/Rstate01 =

Rstate01/Rstate10 = Rstate10/Rstate11 = 5) allows data to be valid for 2 years at room temperature [89].

Fig. 164. PCM resistance separation by using resistance margin [89]

The resistance margin scheme of [89] incurs in few disadvantages. One of the most evident disadvantages is with respect to the lifetime, i.e. the time for the PCM to be viable as storage (and its stored data is correct); the lifetime of a PCM cell is dependent on the resistance margin. If the resistance margin is set at a high value, the lifetime of the PCM cell is also high. However, the write time of [89] is rather slow due to very high difference between the highest and lowest resistance levels. When using the flat PCM level separation scheme proposed in this paper, the write time is dependent on the resistance range of the PCM cell; this is faster than [89] because the resistance difference between levels of the proposed method is smaller than for [89].

Moreover, by using the proposed method, tolerance to drift is excellent and different from

[89] it does not show a significant dependency on Toff. For a Toff of 15 years, a PCM cell (with 4 levels and a 1% write region) has an average percentage accuracy of 98.047% using the proposed method. This is significantly better than [89] in which the same PCM cell can tolerate the drift behavior of only 2 years. One of the reasons for this improvement is that the threshold resistance 245 in the proposed scheme increases with time, while the threshold resistance of [89] remains unchanged.

Speed

Due to the drift behavior, when the PCM is operative for a long time, the difference in resistance between levels increases, causing a slower write time (when Toff is equal to zero, the write time from SET to RESET is 10ns [98]). For example after drifting for 1 second, the write time from SET to RESET is increased to 46.63ns. A refresh operation is then required. Reference resistances are however required for a correct execution of the refresh operation; each reference resistance is given by the value of its corresponding threshold resistance for the initial flat level separation (as described previously). So after the PCM has been used for a long time, the refresh operation resets the resistance of each level using a row of PCM cells (as used for threshold resistance, Figure 140) to the initial reference value. So the write time of the PCM cell is restored to nearly 10ns again, because the resistance difference between levels is comparatively very small.

Fig. 165. Write time of PCM cell from crystalline to amorphous phases vs. Toff; initial PCM resistance range is from 7kΩ to 200kΩ

246

Figure 165 plots the write time of a PCM cell after time Toff; the write time from the crystalline to the amorphous phase increases due to the resistance drift. To alleviate this problem, the value of the PCM must be refreshed by rewriting the PCM resistance to its initial value.

Consider next the read time, this is found by connecting the PCM cell to a sense amplifier.

Assume that the bit line capacitance (CBL) is 5fF, and ignore the resistance of the transistor (its value is significantly smaller than the RESET resistance of the PCM). When the read current (Iread) is sent to the bit line (BL), CBL charges. When the bit line capacitance is fully charged, the voltage at BL is compared with the reference value. So at a read time tr,

Q CBLVsense Iread = = (68) tr tr

Since Vsense = Iread*RPCM, (68) is rewritten as

CBLIreadRPCM Iread = (69) tr

tr = CBL ∗ RPCM (70)

Fig. 166. Read time of RESET resistance vs Toff; the initial PCM resistances of the SET and RESET are 7kΩ and 200kΩ

Figure 166 plots the read time of a PCM cell. The PCM resistance increases when Toff increases, so the read time also increases. The read time of PCM can be reduced by refreshing the

PCM resistance to its initial value. 247

Fig. 167. Write and Read times of RESET resistance when initial PCM resistance (R0) is changed; Toff is 1 year and initial PCM resistance of SET state is 7kΩ

Figure 167 shows the write time for the operation when a PCM cell is programmed from the SET to the RESET states and the read time (i.e. the PCM cell is in the RESET state) for a 1- year drift and an initial resistance of the SET state of 7kΩ. The write and the read times of a PCM cell increase as the initial PCM resistance is increased.

Multilevel PCM Comparison

Next, the proposed threshold resistance selection method is compared with the time aware fault-tolerant scheme of [103]. [103] monitors the PCM memory and its lifetime using time tags; the lifetime information is utilized to adjust the quantization of the memory cell resistance and for

ECC decoding. The quantization and analysis of the PCM resistance at each level require to consider the lifetime (td) and to perform complex calculations. Additional circuits are needed to calculate td, establish the relationship between the lifetime and the resistance drift and finally calculate the new values of the resistance for each PCM cell. Moreover, the lifetime of a PCM

[103] is limited by the number of bits that are used to represent the time tag, i.e. if the number of bits is low, the lifetime estimate of [103] is not very accurate. The proposed method uses the PCM resistance as lifetime of the whole PCM array, while [103] uses a time tag per data word, because 248 the time tag in each word is different. These features are not present in the proposed method, because only the median circuit is required to find the threshold resistances.

4.5. Conclusion

This chapter presents the concept of Phase Change Memory (PCM), HSPICE macromodel of PCM, and applications of phase change device. The proposed HSPICE macromodel matches the electrical characteristics with the operational features of a PCM; so it is able to comprehensively assess the cell with respect to the crystalline fraction during the programming operation, the continuous parameter change (such as the resistance) and the drift behavior commonly encountered in its operation. The proposed macromodel consists of two interrelated models that are used to capture different phenomena (such as the programming time and the temperature) that affect the drift of threshold voltage and resistance. The electrical based modeling by HSPICE allows to accurately establishing the different relationships exhibited in the drift behavior.

An analytical framework has been proposed for the drift; this framework is versatile and it accounts for timing considerations such as the drift occurrence when the PCM is not been read or programmed. Simulation results have been presented; they show that the proposed macromodel is very accurate. The simulation results have been compared with experimental data (taken from [86] and [88]); the largest error is only 2%, thus confirming the viability and correctness of the proposed macromodel.

Once the HSPICE macromodel of PCM is established. The applications of phase change device are considered next. The two novel CAM and TCAM cells have been proposed in this chapter; the designs of these cells utilize a single phase change memory (PCM) as storage element.

Compared with other PCM-based cells [23], the proposed cells operate on a voltage basis, hence making the search/comparison operations considerably simpler. A further novelty of this paper is the comparator circuit; in the proposed cells, this circuit is designed by using 1 (2) ambipolar transistor(s) for CAM (TCAM). The proposed comparator circuits have been shown to be superior 249 to their CMOS-based counterparts. Also it should be noted that the ambipolar-based comparison circuit requires a significantly lower number of transistors (at most two, if the ambipolar transistors are implemented by CNTFETs for TCAM operation [101]). The search time of the TCAM comparison circuit is larger than the search time of the CAM comparison circuit; this is caused by the comparison circuit (of larger complexity for TCAM) and the discharging process of the match line voltage.

Different from [23], the proposed cells utilize both a 1T1P core; it has been shown that as expected, the write time of the core is linear to the range of the PCM resistance (i.e. from the amorphous to a crystalline phase). The power dissipation of the 1T1P memory cell is high at the beginning of the read operation, but it decays at higher read times; the high value of the initial power dissipation is due to transistor switching (from OFF to ON). When the state of the transistor is stable (i.e. the ON state) at higher read times, the power dissipation decays reaching a low constant value. Also it has been shown that the write time of the 1T1P core changes depending on the PCM resistance range. As for the CAM and TCAM, the read time of a smaller PCM resistance range results in a larger value. The same effect has been observed for the PDP.

For TCAM operation, the value of the intermediate PCM resistance has been selected such that it is not biased toward none of the other two states, i.e. for the read operation, the voltage difference between state ‘0’ (with a resistance of 200kΩ) and the intermediate state must be nearly equal to the voltage difference between state ‘1’ (with a resistance of 7kΩ) and the intermediate state. This selection affects the performance of the proposed TCAM cell, but it ensures that its operation is highly scalable for array-level operation when multiple 1T1P cores are connected to a bitline.

An extensive assessment and comparison of the proposed memory cells with other

CAM/TCAM cells using emerging technologies for non-volatile operation have been presented.

Extensive simulation results have been obtained using HSPICE for the cells of [7, 23, 102]; the qualitative ranking of the simulation results is reported in Table 87 (88) for CAM (TCAM). These 250 tables show that for CAM, the proposed cell offers advantages in circuit complexity as well as PDP performance. For TCAM, the proposed cell is superior in terms of write time too. The proposed cells have significant novelties; the use of a single PCM as non-volatile storage element for

CAM/TCAM design is therefore, viable in applications in which circuit complexity and power requirements are stringent.

Table 87. Ranking of non-volatile CAM cells

PCM Memristor MTJ [102] MTJ [102] Proposed [23] (1T1M) [7] NAND NOR Write time 2 2 3 1 1 Search time 3 4 3 1 2 PDP (Search) 1 2 N/A 3 4 Transistors/core 1 1 1 3 2 Storage elements/score 1 1 1 2 2 Reads prior to refresh 1 1 2 1 1 Circuit complexity 1 5 2 4 3

Table 88. Ranking of non-volatile TCAM cells

Proposed PCM [23] Memristor (1T1M) [7] Write time 1 2 3 Search time 2 1 2 PDP (Search) 1 2 N/A Transistors/core 1 1 1 Storage elements/core 1 2 1 Reads prior to refresh 1 1 2 Circuit complexity 1 3 2

Another application of PCM that is considered in this chapter is PCM-based multilevel storage element. This paper has proposed a system-level scheme for alleviating the resistance drift in a multilevel phase change memory (PCM). The proposed scheme relies on separating the levels of a PCM cell and checking the correctness of the stored data in the presence of drift. The effects 251 of time are assessed and a solution based on calculating the median value is proposed. The resistance of a PCM cell is initially divided into three regions (write, read, and blank) to account for the correct write/read operations as well as the drift (as occurring over time at higher resistance values). The percentage accuracy of a level of the PCM has been considered and balanced between levels to generate the so-called flat initial level separation. The impact of the write region has been analyzed based on this initial level separation; when the write region is increased, the blank region will have to decrease to compensate for the resistance newly allocated to the write region. Hence, the percentage accuracy will decrease at a larger write region. Moreover it has been shown that the largest and least values of resistance in the row of the PCM cells that are used to find the threshold resistance of each region, must be also rewritten to the median value to account for the resistance variation due to drift (as modeled by a Gaussian distributed process).

As for multiple storage, this paper has shown that when the number of levels is increased, the average percentage accuracy of a PCM cell decreases, thus also resulting in more cells to hold erroneous data. Hence, the increase in density for storage causes also a decrease in drift tolerance due to the smaller separation between levels. The proposed threshold resistance selection and the time aware fault tolerance of [103] have been compared. It has been shown that the proposed method is significantly simpler than [103], because only the median calculation circuit is required to find the PCM resistance at each level. It has been shown that the proposed method is significantly better than the approach of [89] in which the margin of PCM resistance at each level is predefined, thus having a limited tolerance to the resistance drift. The impact of the increase in write/read time due to the resistance drift has also been investigated; a refresh operation has been proposed to address this problem. This operation returns the PCM resistance to its initial value thus also decreasing the write/read time. 252

V. RACETRACK MEMORY

5.1. Introduction

Racetrack Memory (RM) (also commonly known as a domain-wall memory) is another example of an emerging technology for storage; it is based on the current-induced domain wall motion in magnetic nanowires [108, 109, 110]. A hybrid cell made of a CMOS circuit and a RM has the potential to provide substantial advantages over a flash memory for non-volatile storage.

Features such as fast switching time, lower power consumption, good endurance, excellent retention, and multilevel capability [111] make RM one of the most promising candidates for the next generation of stand-alone and embedded memories [108]. Moreover, CMOS integration and fast data-access can be achieved [108] by combining the RM with magnetic tunnel junction (MTJ) nanopillars as read and write heads.

The first RM prototype (with a capacity of 256 bits) was based on in-plane magnetic anisotropy [112]. However, the two in-plane magnetization directions of the storage layer were separated by a low energy barrier, thus leading to short data retention at nanometric feature sizes

[113]. This limitation however has been overcome by utilizing perpendicular magnetic anisotropy

(PMA) in CoFeB/MgO structures as a high energy barrier [114, 115]. Density, speed and power dissipation of a PMA-based RM are significantly improved compared to an in-plane RM [115,

116]. Moreover, the write head circuitry can take advantage of the nucleation current and higher switching speed, while the read head circuitry [117] relies on the high energy barrier for data retention.

This chapter presents the fundamental of racetrack memory, its HSPICE macromodel, and its memory applications.

253

5.2. Fundamental of Racetrack Memory (RM)

Racetrack Memory is a new concept of a Magnetic RAM (MRAM); it is based on controlling the domain wall (DW) motion in ferromagnetic nanowires [118, 119, 120]. Its data is kept in the form of a magnetization direction between two artificial potentials (or constrictions) to pin the DW (so, no current pulse is applied) [121]. The distance between two constrictions (W) can be extremely small (in the range of few nanometers) for extremely high density and compact storage (in excess of a GB within a small die). The scalability potential of a RM is one of the most pronounced advantages compared to other nonvolatile memories [121]. A RM consists of three parts: the write head, the read head and the propagation part.

 The write head nucleates a local domain in the magnetic stripe by spin-transfer torque

(STT).

 The read head detects the stored data in the racetrack cell through the tunnel magneto-

resistance (TMR) effect [108].

 The propagation part controls the motion of the domain wall and drives the stored data

from the write to the read heads.

The basic cross sectional structure of a RM is presented in Figure 168.

Fig. 168. The cross-section structure of racetrack memory. At the back-end process, the magnetic stripe is implemented above the CMOS/MTJ interfacing circuits, nodes Rin and Rout are for reading, Win and Wout for writing, and Pin and Pout for the shift operation [121].

254

A CoFeB magnetic stripe is separated by constriction to store data; two CoFeB/MgO MTJs are used as write and read heads. CMOS circuits are required to write, read, or shift data in the cell and generate the corresponding voltages. The number of stored data in the cell is equal to the number of constrictions. The CMOS circuit dominates the whole area of a RM as the magnetic stripe is implemented at the back-end through a 3D integration in a manner similar to a MRAM

[121]. The widths of the write and read heads are different. For the write operation, a lower resistance with a larger width is required to reduce the rate of the oxide barrier breakdown, as one of the most significant constraints in a high-speed STT switching mechanism [121]. A high resistance of the MTJ1 with a smaller width is needed for reading, because it can greatly improve sensing [121, 122].

5.3. Macromodel of Racetrack Memory

A macromodel is needed to simulate the electrical characteristics of a RM; [121] presents a Verilog-A model of a RM consisting of three parts, i.e. the write head, the read head and the propagation part.

 The write head operates in the write operation, i.e. when the write voltage is larger than its

critical value and the write time is larger than the write duration; the write data varies

depending on the polarity of the write voltage.

 The read head operates in the read operation of a RM and is dependent on the value of the

stored data.

 The propagation part implements the shift operation of a RM. The shift operation operates

when the shift voltage is larger than the critical shift value. The shift direction depends on

the polarity of the shift voltage.

The macromodel of [121] can simulate some of the electrical characteristics of a RM; however, there are few substantial limitations in this model. 255

 In the write head, the write voltage is limited only to a pulse waveform. If the write voltage

consists of a triangle waveform, the model of [121] generates incorrect results.

 The shift operation of [121] is only partially completed. Its track is shifted if the shift

voltage is larger than the critical value even though when there is a spike voltage.

Moreover, the shift duration of pulses of a different amplitude is the same.

Furthermore, the model of [121] is not compatible with HSPICE and cannot be used for a full electrical-based simulation.

5.3.1. Proposed Macromodel of Racetrack Memory

This chapter introduces a new HSPICE macromodel of a RM, such that the write, the read and shift operations can be simulated. MATLAB is used as tool to generate the HSPICE code, thus making it flexible also with respect to operational features.

Fig. 169. Flowchart of the proposed HSPICE macromodel of a RM

Figure 169 shows the flowchart of the proposed HSPICE macromodel of a RM; it has 6 terminals: the input write head (Win), the output write head (Wout), the input read head (Rin), the output read head (Rout), the input propagation (Pin) and the output propagation (Pout). The write, read 256 and shift operations are simulated depending on the track (index) that is connected to the write and read heads.

 In the write operation, the data stored in the track that is connected to the write head, is

sent to the write circuit. When the write voltage (Vwrite) is provided, the proposed model

finds the write frequency and the write percentage in each time step. When the total write

percentage is larger than 100, the write operation is executed (and the write data is

dependent on the polarity of the write voltage). When the data is written into the memory

cell, its value is also written into the main circuit. The data of the racetrack that is connected

to the write head is adjusted.

 The read operation executes on the data stored in the track (index) that is connected to the

read head and then provided at the output.

 The shift operation closely resembles with the write operation. When a shift voltage is

provided, the proposed macromodel establishes the shifting speed and the shift percentage

in each time step. When the total shift percentage is larger than a bound (in this case given

by 100), the shift operation starts. The write and read head indices are then changed.

Fig. 170. Track separation; the layer of track with alphabet F is fixed while the other tracks (V) are varied

The total number of tracks must be considered, because the racetrack indices of the write and read heads change in the macromodel. The racetrack in Figure 170 is divided into N tracks. Nb 257 denotes the number of variable bits in the racetrack (each with a binary value of '1' or '0'). FL and

FR denote the numbers of tracks of fixed layer; they are used to connect the write and read heads when the first (index 5) and the last (index 12) variable tracks are read and written respectively.

The values of FL and FR depend on the locations of the write and read heads. If the read head is at the right of write head, so for example the write head has an index 1 (1) while the read head has an index 2 (3), then FL is equal to 1 (2). The number of fixed layers on the right side (FR) is equal to the number of fixed layers on the left side (FL); in the example, the number of tracks in FL is 1 (2); so, FR also 1 (2).

The total number of tracks (N) of the proposed macromodel is given as follows.

퐹퐿 = 퐹푅 = 𝑖푛푑푒푥[푤푟𝑖푡푒ℎ푒푎푑] − 𝑖푛푑푒푥[푟푒푎푑ℎ푒푎푑] (71)

푁 = 푁푏 + (2 ∗ 퐹퐿) (72)

where FL is the number of fixed tracks on the left and FR is the number of fixed tracks on the right.

So, the HSPICE macromodel of a RM can be computed after the total number of tracks

(N), the number of fixed tracks on the left (FL) and the number of fixed tracks on the right (FR) are found.

5.3.1.1. Propagation Part

The propagation part controls the movement of the index in the RM by tracking the shift voltage (Vshift); the shift operation is performed by considering the voltage difference between Pin and Pout.

Fig. 171. Main circuit of the propagation part

258

Figure 171 presents the circuit of the shift operation in a RM. The resistor Rprop denotes the propagation resistance (shift resistance) whose value is given by [121].

푟푎푢∗푐 푅 = (73) 푝푟표푝 푏∗푡ℎ𝑖푐푘_푓

where rau is its resistivity, c is the total length of the racetrack, b is the width of the racetrack and thick_f is the thickness of the free layer.

The index of the RM is shifted if the shift voltage is larger than the critical shift voltage; so, the comparison between the (nominal) shift and the critical shift voltages must be generated. A comparison between the shift and the critical shift voltages is found by utilizing a voltage controlled voltage source (VShfC) with a value given by (74), i.e.

푉푆ℎ푓푐 = |푉푆ℎ𝑖푓푡| − |퐼푐0 ∗ 푅푝푟표푝| (74)

where

푉푠ℎ𝑖푓푡 = 푉푃𝑖푛 − 푉푃표푢푡 (74.1)

퐼푐0 = 푏 ∗ 푡ℎ𝑖푐푘_푓 ∗ 퐽푐0 (74.2)

and Ic0 is the critical shift current of the RM [121]. VPin and VPout are the voltages at nodes

Pin and Pout. Jc0 is the critical shift current density.

a) b)

Fig. 172. a) Shift voltage comparison circuit; b) Shift polarity checking circuit

259

Figure 172a presents the shift voltage comparison circuit; the switches ShCu and ShCd are

ON when the voltage at node ShfC is positive and negative respectively. The voltage at node

ShfOut is 1V (0V) if the shift voltage is larger (smaller) than the critical shift voltage.

The shift direction is dependent on the polarity of the shift voltage (Figure 168); so, if the shift voltage is positive, the RM moves to the left (i.e. the index increases). However if the shift voltage is negative, the track of the RM moves to the right and the index value decreases. This is implemented by the circuit in Figure 172b); the switches ShP1 and ShP2 are ON when the shift voltage is positive and negative respectively, so the polarity of the shift voltage is given by

푉푆ℎ푓푃표푙 = 2푉푆ℎ푓푃표푙1 − 1 (75)

For implementing (75) in the simulation model, a voltage controlled voltage source is utilized. VShfPol is 1V if the shift voltage (Vshift) is positive; otherwise, the voltage at node ShfPol

(VShfPol) is given by -1V. As shown in [121], the shift operation occurs if the shifting time is longer that the shift duration (as given in (76)). The percentage shift in each time step must be considered from [121] and

푐∗푏∗푡ℎ𝑖푐푘_푓∗푅 푆ℎ𝑖푓푡_퐷푢푟푎푡𝑖표푛 = | 푃푟표푝 | (76) 푃∗퐹푎∗푉푠ℎ𝑖푓푡

푡푠푡푎푟푡∗100∗푉 푉 = 푆ℎ푓푃표푙 (77) 푃푟푒푝 푠ℎ𝑖푓푡_푑푢푟푎푡𝑖표푛

where P is the polarization rate, Fa is the factor of the velocity, shift_duration is the time in which the RM is shifted to the next index. VPerp is the shift percentage in each time step of the

RM and tstart is the initial time step of the simulation.

As given in (76), the shift duration is dependent on the shift voltage; therefore, the total shift percentage must be considered after finding the shift percentage in each time step.

260

Fig. 173. Previous shift percentage circuit

Figure 173 shows the previous shift percentage circuit. The voltage source Pershift is equal to the voltage at node Pershift; switches Spsh1 and Spsh2 are ON when the pulse voltage (Vpulse) is

1V and 0V respectively. Switches Spi1 and Spi2 are ON only at the beginning of the simulation such that the initial voltages at nodes Pshf1 and Pshf2 are 0V; following this step, the switches are

OFF. The previous shift percentage of the proposed macromodel of a RM is found as

푉푃푠ℎ푝푟푒푣 = 푉푃푠ℎ푓2 ∗ 푉푝푢푙푠푒 + (1 − 푉푝푢푙푠푒) ∗ 푉푃푠ℎ푓1 (78)

where VPshprev is the shift percentage of the previous time step and Vpulse i is changed between 1V and 0V at every time step.

After finding the previous shift percentage of a RM, the total shift percentage of the current time step is given by

푉푃푠ℎ푓푠1 = (푉푃푠ℎ푝푟푒푣 + 푉푃푒푟푝) ∗ 푉푆ℎ푓푂푢푡 (79)

The total shift percentage is found using (79) if the shift voltage is larger than the critical shift voltage; otherwise, its value is 0V. The RM is shifted if the total shift percentage is larger than

100 (with its shift direction controlled by the polarity of the shift voltage), then the number of shifted bits (VNbshf) is found from (80), while the shift percentage after the shift operation (VPershift), is given in (81)

푉 푉 = 𝑖푛푡( 푃푠ℎ푓푠1) (80) 푁푏푠ℎ푓 100 261

푉 푉 = ( 푃푠ℎ푓푠1 − 푉 ) ∗ 100 (81) 푃푒푟푠ℎ𝑖푓푡 100 푁푏푠ℎ푓

The shift operation changes the index of the RM; so for control, the racetrack indexes that are connected to the write and read heads, must also be found. Since the distance between the write and read heads is known, only the racetrack index that connects to the write head must be considered (a process similar to the one used for finding the total shift percentage is utilized). The write head index of the previous time step is therefore found by using the previous write head index circuit (Figure 174)

Fig. 174. Previous index write head circuit

A voltage source LocW is used for finding the current RM index that is connected to the write head (Figure 174). Switches SLoc1 and SLoc2 are ON when Vpulse is 1V and 0V respectively.

By setting the initial voltage at node LocWP1 and LocWP2 to the initial value of write head’s index, the write head index of the previous time step is then given by

푉퐿표푐푤푝푟푒푣 = 푉퐿표푐푊푃2 ∗ 푉푝푢푙푠푒 + (1 − 푉푝푢푙푠푒) ∗ 푉퐿표푐푊푃1 (82)

The current write head index is found as the sum of the write head index of the previous time step and the number of shift indices in the same (current) time step. The new write head index of a RM is therefore given by

푉퐿표푐푊 = 푉퐿표푐푤푝푟푒푣 + 푉푁푏푠ℎ푓 (83)

The least and largest voltages at node LocW are 1 and N-FR respectively, because the write head index of a RM is limited in this range. The other condition required when calculating the shift 262 percentage. The total shift percentage is set to 0 if the write head index reaches its least or largest values and its shift voltage is either negative or positive. Voltage controlled voltage sources are employed; the value of the voltage source VLmin (VLmax) is 1 if the index of the write head is also 1

(N-FR), otherwise it is 0.

Fig. 175. Shift percentage controlled circuit

Figure 175 shows the circuit that is used to control the shift percentage; this circuit resets the total shift percentage to 0 when the write head index reaches its least or largest value (i.e. the shift voltage is negative, or positive respectively). Switches ShLmn and ShLmx are ON when the voltages at nodes Lmin and Lmax are 1V, otherwise they are OFF. Switches ShLp and ShLn are

ON when the voltage at node ShPol1 is positive and negative respectively. The switches ShLmni and ShLmxi are ON when the voltages at nodes Lmin and Lmax are 0V, i.e. the write head index has not the least or largest value. Else, they are OFF. The total shift percentage (given previously in (81)) is now given by

푉 푉 = ( 푃푠ℎ푓푠1 − 푉 ) ∗ 100 ∗ 푉 (84) 푃푒푟푠ℎ𝑖푓푡 100 푁푏푠ℎ푓 푆ℎ푓퐵표푢푛푑 where VShfBound is 1V and 0V, i.e. equal to the voltage at node ShfLB. 263

As there is a voltage drop across a switch in the shift percentage controlled circuit, the output voltage (i.e. the voltage at node ShfLB) does not fully swing between 1 and 0V. So, a new voltage source (for the voltage at node ShfBound, VShfBound) is needed to adjust the controlled voltage for a full swing.

5.3.1.2. Write Head

The write head of a RM is required in the write operation. The main model of the write head is shown in Figure 6-9. Nodes Win and Wout are the input and output nodes of the write head, while the resistor Rwsen is used to detect the current that passes through the write head. Resistors

Rw1, Rw2, …, Rw(X) denote the resistances in each track of the RM, while Sw1, Sw2, …, Sw(N-

1) are the selection switches (ON only when the corresponding track is connected to the write head).

The write head operates as follows.

Fig. 176. Main circuit of the write head

As mentioned previously, the least value of the write head index is 1 (i.e. it occurs when the first bit of RM is read), while the largest value of the write head index is N-FR (occurring when the last variable track of RM is written). The value of the index X in Figure 176 is N-FR. The write head index must be considered next. The switches in the main circuit of the write head are ON when the selected index is connected to the write head. The switches are controlled by

푉퐶표푛푡푤 = 푉퐿표푐푊 (85) 264

(85) denotes the voltage that controls write head switch. For this case, i.e. if VContw is 3V, then switch Sw3 is ON, and all other switches are OFF.

Similar to the shift operation, the polarity of the write voltage and the comparison between the write voltage and its critical write voltage must be considered next.

a) b)

Fig. 177. a) Write polarity checking circuit b) Write voltage comparison circuit

Figure 177 shows the write voltage polarity checking circuit; switches SwPol1 and SwPol2 are ON when the write voltage is positive and negative respectively. The voltage at node wPol1 is

1V (0V) if a positive (negative) voltage is dropped across the write head. Therefore using voltage control voltage sources,

푉푤푟𝑖푡푒 = 푉푤𝑖푛 − 푉푤표푢푡 (86)

푉푤푃표푙 = 2푉푤푃표푙1 − 1 (87) where Vwrite is the write voltage and VwPol is the polarity of the write voltage. In (86) and (87), the voltage at node wPol is 1 (-1) if a positive (negative) voltage is dropped across the write head.

Voltage controlled voltage sources are employed for the write and critical write voltage values [121]. The voltage at node WrC is equal to (88), so in the simulation model the comparison is given by

푉푊푟퐶 = (|푉푤푟𝑖푡푒| − |퐼푐푃 ∗ 푉푅푤푟𝑖푡푒|) ∗ 100 (88) 265

퐼푐푃 = 푔푝 ∗ 푠푢푟푓푎푐푒 (88.1)

훾푒 푔푝 = 훼 푀푠퐻푘푡푠푙 (88.2) (40휋∗휇퐵∗푃표푙푎푟)

√푇푀푅∗(푇푀푅+2) 푃표푙푎푟 = (88.3) (2∗(푇푀푅+1)

푉푅푤푟𝑖푡푒 = 푉푅푤푃 ∗ 푉퐷푎푡푎푊 + (1 − 푉퐷푎푡푎푊) ∗ 푉푅푤퐴푃 (88.4)

where IcP is the critical current for the write operation (88.1), gp is the critical current density (88.2) [121], surface denotes the value of the MTJ surface. α is the Gilbert damping coefficient, γ is the gyromagnetic ratio (Hz/Oe), e is the elementary charge, Ms is the saturation field in the free layer (Oersteds), Hk is the perpendicular anisotropy field (Oersteds), tsl is the height of the free layer. µB is the Bohr magnetron constant, Polar denotes the polarization state, TMR is the tunnel magnetoresistance with zero voltage bias, VRwrite is the racetrack resistance when its index is connected to the write head. VDataW denotes the data stored in the track whose index is connected to the write head. VRwP and VRwAP are the write head resistance of a RM when its track is parallel and anti-parallel to the fixed track respectively. A constant factor (given by 100) is multiplied in (88) to the difference between the write voltage and the critical write voltage values for ease of detection. The write voltage comparison circuit is shown in Figure 177b. Consider that switches SwCu and SwCd are ON when the voltage at node WrC is positive and negative respectively; therefore, the voltage at node WrOut is 1V if the write voltage is larger than the critical write value. Else, the voltage at node WrOut is 0V.

The write operation is performed if the write time is longer than its write duration [121].

The write percentage in each time step of a racetrack cell is given by

2 2 휋 퐸푚 푒∗1000∗푀푠∗푠푢푟푓푎푐푒∗푡푠푙∗(1+푃푃 ) 푤푟𝑖푡푒_ 푑푢푟푎푡𝑖표푛 = |[퐶퐶 + 푙표푔( ∗ ( ))] ∗ 4 | (89) 4 퐾푏푇∗40휋 (4휋∗2∗휇퐵∗푃푃∗10 ∗||퐼푠푒푛푤|− 퐼푐푃|

푀 ∗푡푠푙∗푠푢푟푓푎푐푒∗퐻 퐸 = 푠 푘 (89.1) 푚 2

푡푠푡푎푟푡∗100∗푉 푉 = 푊푃표푙 (90) 푃푒푟푤 푤푟𝑖푡푒_푑푢푟푎푡𝑖표푛 266

where CC is Euler's constant, Em is the variable of the Slonczewski model. Kb is

Boltzmann’s constant. T is the room temperature (in Kelvin). PP is the electron polarization percentage. Isenw is the current that passes through the write head; its value can be found from the current that passes through the resistor Rwsen in the main circuit of the write head. VPerw is the write percentage in each time step. tstart is the initial time step. The write_duration is the write time that is required before the RM is written.

Fig. 178. Previous write percentage circuit

The write percentage in each time step is given in (90) after finding the write duration of the RM (89). For simulation, the write percentage of the previous time step must be retained to calculate the total write percentage. Figure 178 presents the previous write percentage circuit. The value of the voltage source Perwrite corresponds to the voltage at node Perwrite. The switches

SperW1 and SperW2 are ON when Vpulse is 1V and 0V respectively. The previous write percentage is found by setting the initial voltages of nodes Pw1 and Pw2 to 0V, i.e.

푉푃푤푝푟푒푣 = 푉푃푤2 ∗ 푉푝푢푙푠푒 + (1 − 푉푝푢푙푠푒) ∗ 푉푃푤1 (91)

The total write percentage is the sum of write percentage of the previous and the current time steps. When the total write percentage reaches the upper bound (in this case100), the write operation starts and the data is written. The total write percentage is related to several parameters 267 and the operations in the RM. Control of its value is therefore required; the following conditions are required to control the total write percentage of the RM.

Reset on Shift

The percentage of the write operation at the write head must be reset to 0 when the track is shifted. So, a comparison between the current and the previous write head indices is performed, i.e.

푉퐿표푐퐶표푚푝 = 푉퐿표푐푤 − 푉퐿표푐푤푝푟푒푣 (92)

(92) is the difference between the current and the previous write head indices. If the voltage at node LocComp is 0, the current write head index is the same as the write head index of the previous time step. Else, the voltage at node LocComp is given by the other value. The voltage at node LocpRes depends on VLocComp; if VLocComp is 0V, VLocpRes is 1V, otherwise VLocpRes is 0V. The shift operation of a RM is initiated and the total percentage write operation is reset to 0 when the

RM is shifted to the other index.

Reset on Matching

This condition is related to the total percentage of the write operation when the stored data is matched with the write data. When consecutively applying the required write voltages (in a sequence of positive, negative and positive values) to the same racetrack index, the total write percentage changes, i.e. according to the above sequence, it increases, decreases and increases. As the initial write percentage is not at 0% for changing the polarity of the write voltage, the write time may take longer; so, this condition addresses this issue.

268

Fig. 179. Data stored in each track of the RM

So, the stored and the write data must be monitored; Figure 179 shows the circuit to store all data in the RM. X and Y are the index numbers of the racetrack. The values of X and Y in Figure

179 are given by (FL+1) and (N-FR) respectively, i.e. same as the values of the indices of the free layer of the RM. Switch ScX controls the racetrack index that connects to the write head; it is ON when VContw is equal to its index (X), otherwise it is OFF. Switches SpX and SnX control the total write percentage of the RM (VPerwrite). They are ON when VPerwrite is equal or larger than 100% for

SpX and equal or less than -100% for SnX.

Due the voltage drop across the switches in the circuit for storing data (Figure 179), the voltage at node DataXc may not be at 1V or 0V if a '1' ( '0') is written. To establish the value of the data stored in each track, the values of the voltage sources Vs1 and Vs0 are 2V and -1V respectively. The voltage at node DataXc is larger than 1V if a '1' is stored and lower than 0V if a

'0' is stored, The data stored in each track is established using the voltage controlled voltage source

(VdataX) with value of 1V (0V) if the voltage at node DataXc is larger (lower) than 0.5V.

269

Fig. 180. Write head data circuit

The write head data circuit (Figure 180) is employed to find the value of the data currently stored in the track that is connected to the write head. The voltage source is denoted by dataX where

X is the index number of the racetrack; switch SdwX controls the index of the racetrack that is connected to the write head, i.e. it is ON when VContw is equal to the index value X, otherwise it is

OFF. A voltage controlled voltage source is employed to compare the data stored in the racetrack index with the write data, so for simulation

푉푊푠푤퐶 = 푉푑푎푡푎푊 − 푉푤푃표푙1 (93)

where VdataW denotes the current stored data and VwPol1 denotes the polarity of the write voltage. Hence, if the free layer of a RM is parallel to the fixed layer (data '1') and the write voltage is positive (VwPol1 = 1V), this track is written to state '0' (anti-parallel) [121]. The reverse scenario is also applicable, i.e. if the free layer of the RM is anti-parallel with the fixed layer (data '0') and the write voltage is negative (VwPol1 = 0V), then this track is written to state '1'.

The execution of the write operation depends on the voltage difference between VdataW and

VwPol1 (93), i.e. a write operation is executed if the voltage at node WswC (VWswC) is 0V. A voltage controlled voltage source is employed, such that the control voltage is 1V when performing the write operation. The control write voltage is generated with a voltage source (Vwswitch) of 1V if

VWswC is 0V; otherwise Vwswitch is 0. 270

Preventing the Write Operation

The write operation cannot be executed when the write voltage is 0V; this condition protects the RM from an unintentional write operation. A voltage controlled voltage source

(VVwrite0) is used; its value is 1V if the write voltage is not equal to 0V, otherwise VVwrite0 is 0V.

Based on the above presented conditions, the percentage write operation of a RM is controlled and given by

푉푃푒푟푊푟𝑖푡푒1 = 푉푃푤푝푟푒푣 + 푉푃푒푟푊 (94)

푉푊푟𝑖푡푒퐶표푛푡 = 푉푊푟푂푢푡 ∗ 푉퐿표푐푝푅푒푠 ∗ 푉푤푠푤𝑖푡푐ℎ ∗ 푉푉푤푟𝑖푡푒0 (95)

An AND function is utilized for multiplying the voltages VWrOut,VLocpRes, Vwswitch, and

VVwrite0 (their values are at 1 or 0V). The write percentage of a RM is found by using the circuit in

Figure 181.

Fig. 181. Write Percentage estimation circuit

Switches GPwrite1 and GPwrite0 are ON when the voltage at node WriteCont is 1 and 0 respectively. Switch SGPi is ON only at the beginning of simulation to set the initial voltage at node Perwrite. The write percentage of a RM is found from the voltage at node Perwrite.

After storing the data into the RM, the tunnel magneto-resistance (TMR) effect is employed. The value of the write head resistances of a RM when a '1' (RwP) and a '0' (RwAP) are stored are given by [121]. 271

푉푅푤푃 = 푅0 (96)

푉푅푤퐴푃 = 푅0 ∗ (1 + 푉푇푀푅푅푤) (97) where

푡 ∗1010∗푒푥푝(1.025∗푡 ∗1010∗ 푃ℎ𝑖퐵푎푠) 푅 = 표푥 표푥 √ (96.1) 0 퐹퐴퐴∗√푃ℎ𝑖퐵푎푠∗푠푢푟푓푎푐푒∗1012

푇푀푅 푉푇푀푅푅푤 = 2 (96.2) 푉푤푟𝑖푡푒 (1+ 2 ) 푉ℎ

and R0 is the MTJ resistance when the bias is 0V, VRwP and VRwAP denote the write head resistances for the data values of '1' and '0' respectively. VTMRRw is the real tunnel magnetoresistance of the write head. tox denotes the height of the oxide barrier, Phibas denotes the energy barrier height of MgO (eV), FAA is the factor for calculating the resistance. Vh denotes the voltage bias when the TMR (real) is equal to 1/2TMR(0).

The write head resistance is obtained from the parallel and anti-parallel resistances of the

RM (i.e. (96) and (97)) as

푉푅푤푉 = 푉푅푤푃 ∗ 푉퐷푎푡푎푋 + 푉푅푤퐴푃 ∗ (1 − 푉퐷푎푡푎푋) (98)

where VRwV is the write head resistance in each index of the RM. The write head index starts at 1 with a constant resistance value; however, the resistances of the fixed track (FL and FR) are given by

푉푅푤퐹퐿 = 푉푅푤푃 ∗ 푉퐷푎푡푎퐹퐿 + 푉푅푤퐴푃 ∗ (1 − 푉퐷푎푡푎퐹퐿) (99)

푉푅푤퐹푅 = 푉푅푤푃 ∗ 푉퐷푎푡푎퐹푅 + 푉푅푤퐴푃 ∗ (1 − 푉퐷푎푡푎퐹푅) (100) where VDataFL and VDataFR are the voltages for data at FL and FR. Therefore, the simulation of the write head is complete by using (98)-(100) and the resistances in the main circuit of the write head

(RwX) are established.

272

5.3.1.3. Read Head

The read head is used to read the data stored in the RM by utilizing the tunnel magnetoresistance (TMR) effect. Figure 182 shows the main model of the read head; resistors RrY,

Rr(Y+1), Rr(Y+2), … , RrN correspond to the data stored in each track. SrY, Sr(Y+1), Sr(Y+2),

… , SrN are the selection switches; a switch is ON when the corresponding track is connected to the read head, otherwise it is OFF.

Fig. 182. Main model of the read head

The racetrack index to which the read head is connected, must be considered for controlling a selection switch in the main model of the read head. Using (71) and FL, the index of the racetrack that is connected to the read head, is given by

푉퐶표푛푡푟 = 푉퐶표푛푡푤 + 퐹퐿 (101)

So, the value of Y in the main model of the read head is given by FL+1, while the value of

VContr is within the range given by FL+1 and N. Using (101), the switches in the main model of the read head are ON when VContr is equal to its index (such as Y, Y+1, …, N), otherwise it is OFF.

The read and write resistances for the same data are not equal, because the area of the read head is different from the write head. The following equations are employed to find the read resistance of a RM.

푉푅푟푉 = 푉푅푟푃 ∗ 푉퐷푎푡푎푋 + 푉푅푟퐴푃 ∗ (1 − 푉퐷푎푡푎푋) (102) 273

Where

푅 푉 = 0 (102.1) 푅푟푃 푡2 ∗0.528∗1020 (1+( 표푥 )∗푉2 ) 2∗푃ℎ𝑖퐵푎푠 푟푒푎푑

푅 ∗(1+푉 ) 푉 = 0 푇푀푅푅푟 (102.2) 푅푟퐴푃 푡2 ∗0.528∗1020 (1+( 표푥 )∗푉2 ) 2∗푃ℎ𝑖퐵푎푠 푟푒푎푑

푉푟푒푎푑 = 푉푅𝑖푛 − 푉푅표푢푡 (102.3)

푇푀푅 푉푇푀푀푅푅푟 = 2 (102.4) 푉푟푒푎푑 (1+ 2 ) 푉ℎ

VRrV is the read resistance of the free layer in a RM. VRrP (VRrAP) is the read resistance of a racetrack when a '1' ('0') is stored. VTMRRr is the real TMR of the read head. For a fixed layer racetrack, the read voltages in the FL and FR zones are given by

푉푅푟퐹퐿 = 푉푅푟푃 ∗ 푉퐷푎푡푎퐹퐿 + 푉푅푟퐴푃 ∗ (1 − 푉퐷푎푡푎퐹퐿) (103)

푉푅푟퐹푅 = 푉푅푟푃 ∗ 푉퐷푎푡푎퐹푅 + 푉푅푟퐴푃 ∗ (1 − 푉퐷푎푡푎퐹푅) (104) where VDataFL and VDataFR are the voltages of the data at the fixed layer of a racetrack for FL and

FR. The racetrack resistances in FL and FR are constant by fixing the data stored in the FL and FR zones.

5.3.2. Model Simulation

Simulation results of the proposed HSPICE macromodel of a RM are reported in this section. MATLAB is used to generate the HSPICE code. Table 89 shows the simulation parameters employed by the proposed macromodel of a RM; c is the total length of a racetrack, while a is the length of each track of a RM. So, the largest number of possible tracks is given by c/a. Every equation in the model (except the ones for the write_duration (89), the shift_duration (76), and R0

(96.1)) are generated by using a voltage controlled voltage source. (76), (89), and (96.1) are generated by using these parameters. The capacitance of every capacitor that is used in the model, is given by 100*tstart as determining the execution of the simulation.

274

Table 89. Simulation Parameters [121] Parameter Value Parameter Value Parameter Value -9 a 65*10 Ms 15800 T 300 -9 4 b 65*10 Hk 0.1734*10 PP 0.52 -6 -9 -10 c 1*10 tsl 1.3*10 tox 8.5*10 -9 -28 thick_f 1.3*10 µB 9.27*10 Phibas 0.4 P 0.72 TMR 1.2 FAA 332.253 0.62*101 Surface Jc0 (A/m2) a*b Surface (Ellipse) πab/4 2 (Square) -12 -23 Fa 30*10 Kb 1.38*10 Vh 0.5 α 0.025 rau 100*10-9 e 1.6*10-19 γ 1.76*107 CC 0.577

The racetrack shape is assumed to be square. Initially, 4 bits per cell are assumed and the write and read heads are initially connected at indices 2 and 3 of the RM respectively. Therefore, the RM is divided into 6 tracks consisting of 4 variable tracks, 1 fixed left track (FL), and 1 fixed right track (FR). The polarities of FL and FR are given by anti-parallel ('0') and parallel ('1') as shown in Figure 183 respectively. Initially at the beginning of simulation, all data stored in each track of the racetrack cell (V) is assumed to be '0'.

Fig. 183. Simulated Racetrack Memory (RM)

The read operation of a RM is executed using a pre-charged sense amplifier (PCSA) for data sensing [118]. The precharge circuit in figure 184 is employed to precharge the bitlines 275 voltages (BL and BLB) to GND at the beginning of the read operation; this is accomplished by setting the voltage at node Pre to VDD, the bitlines voltages BL and BLB are precharged to GND.

When VSen in the read hardware is at VDD, the voltage at node Pre is at GND.

Fig. 184. Precharged circuit used for the read head

In this chapter, a 32nm CMOS feature size is initially employed, so its supply voltage is

0.9V. The HSPICE evaluation of the write, read and shift operations of a RM is presented next.

5.4.2.1. Write Operation

The write operation of a RM requires the control of the write voltage.

Fig. 185. Write operation of RM when the racetrack index 2 is connected to the write head

276

Figure 185 presents the simulation results for the write operation. By controlling the voltage difference across the RMs, Figure 185 shows that R1 has as data a '0' ('1') if its write voltage is positive (negative), while R2 has as data a '1' ('0'), i.e. the polarity of R2 is the opposite of R1. If the write voltage of both RMs are at 0.9V, then the write time for writing a '0' is 6.2ns while the write time for writing a '1' is 11.8ns. Hence the write time is given by the worst case, i.e. 11.8ns for writing a '1'.

5.4.2.2. Shift Operation

The shift operation is required to operate the write/read operation and observing the racetrack index that connects to the head (because it is assumed that the shift voltage is fixed at

0.9V).

Fig. 186. Shift operation of the Racetrack Memory (RM)

Figure 186 shows the simulation results for the shift operation. When the shift voltage is positive (negative), the shift percentage increases (decreases). Its value restarts at 0 if the absolute shift percentage reaches 100 i.e. the index of the racetrack cell that connects to the write head is shifted to a different value. If the racetrack index at write head reaches its boundary, then the value 277 of the total shift percentage is 0 (and its value is not changed until the reverse shift voltage is biased). The delay of the RM when its track is shifted to the next index, is 5.2ns.

5.4.2.3. Read Operation

A pre-charged sense amplifier (PCSA) for data sensing [118] is employed to read the data stored in a RM. The precharge circuit in Figure 184 is used to precharge the voltages of both bitlines prior to the read operation. The voltages of both bitlines are precharged to GND by setting the voltage at node Pre to VDD when the voltage at node Sen is 0V. The data stored in the RM is read when VSen is VDD and the voltage at node Pre is GND. The simulation of the read hardware is shown in Figure 187.

Fig. 187. Read operation of the Racetrack Memory (RM)

278

This simulation (Figure 187) is divided into 4 steps. In the first step, R1 and R2 are written with a '0' and a '1' respectively, while the precharge operation is executed. The read operation is then executed when Vsen is VDD and the precharged node (VPre) is GND. The racetrack resistance is high (low) because a '0' ('1') is stored as data. The voltages at BL and BLB are given by VDD and

GND respectively. In step 3, R1 and R2 are written with a '1' and '0' respectively. The read operation in step 4 biases the voltages of BL and BLB to GND and VDD. So if a '1' ('0') is read, the voltages of BL and BLB are given by GND and VDD (VDD and GND) respectively.

5.4.2.4. Operational Sequence

This section simulates the sequence of write, shift and read operations of a RM. A 4 bits racetrack cell is used and the sequence 1101 is the data written into the memory. The indices 2 and

3 are connected to the write and read heads respectively.

Fig. 188. Simulation of RM; W0 and W1 denote the write '0' and '1' operations, Sf presents Shift forward operation to the next index while Sb is shift backward operation to the previous index, P presents the precharged operation while R presents the read operation.

279

Figure 188 shows the simulated voltage at each node of the RM when a transient simulation is generated. After writing data into the RM, the shift operation is executed by setting the voltage at node Pin to VDD, so that the racetrack is shifted to the next index. When all data bits are written in the cell, the read operation starts by shifting the racetrack back till the read head is connected to the first free layer (index 2) of the RM. Figure 188 shows that the proposed HSPICE macromodel simulates correctly the write, read, and shift operations of a RM.

5.4.2.5. Comparison with Experimental Results

Comparison between the proposed HSPICE macromodel of a RM and [121] is also pursued.

Fig. 189. Relationship between DW motion speed (V) and current density (Jp) at a total length of racetrack (c) of 1µm

Fig. 190. Percentage error of the proposed macromodel and the simulation results

280

The velocity of the domain wall (racetrack) is linearly dependent (Figure 189) with the current density (when the current density is larger than its critical value). Compared with

[117][121], the results in Figures 189 and 190 show that the proposed macromodel is in good agreement and highly accurate with the micro-magnetic simulation of the domain wall (DW) motion speed of [121].

5.4. Applications of Racetrack Memory

Racetrack Memory (RM) has been advocated as a promising candidate for future nonvolatile memories. Its features such as fast switching time, lower power consumption, good endurance, excellent retention, nonvolatile storage, and multilevel capability make racetrack very attractive for many applications requiring both stand-alone and embedded memories [108]. In this section, applications of racetrack memory are presented.

5.4.1. Racetrack-based Nonvolatile Memory

Due to nonvolatile and multilevel storage capabilities of racetrack device, racetrack memories (RMs) are utilized as nonvolatile storage elements while CMOSs are used as controlled elements. New circuits for the write, read and propagation operations (i.e. to store, read, and shift data in a RM) are proposed in this section. These circuits are connected to the write head, the read head and the propagation part, such that these operations are properly controlled and executed.

5.4.1.1. Write Circuit

The write circuit is connected to the write head; two complementary RMs store a data bit to avoid the difficulty encountered when reading them. In the write operation, a voltage difference exists across the racetracks (R1 and R2) by controlling the voltages at nodes in and inB, while the transistors MWC1 and MWC2 are ON.

281

Fig. 191. Proposed write hardware (racetrack cells and write head circuit)

Figure 191 shows the proposed write circuit; it consists of 6 transistors and 2 complementary RMs. The values of the voltages at nodes in and inB are opposite. The polarities of

R1 and R2 are also different. Due to the very low resistance, the write voltage (Vdw) must be larger than VDD. The write operation of the proposed write circuit relies on controlling the input voltages

(Vin and VinB). The control voltage (VCont) must be equal to Vdw to turn ON the transistors MWC1 and MWC2. The data stored in the cell varies depending on the voltages at nodes in and inb. When the voltages at nodes in and inB are Vdw and GND (GND and VdW), the transistors MWP1 and

MWN2 are OFF (ON) while the transistors MWN1 and MWP2 are ON (OFF) respectively. The voltage difference across a RM is negative (positive) and data is written. The width to length ratio of a transistor must be large, because a large voltage difference is required across a racetrack cell to write the data. Moreover, the voltage difference across the RM in the proposed write circuit is large too.

5.4.1.2. Read Hardware (Racetrack Cells and Read Head Circuit)

Stored data is difficult to read due to the low resistance of a RM. Therefore, a complementary scheme (made of two RMs) is used; this permits a sense amplifier to be used for 282 the read operation. In this paper, a precharged sense amplifier circuit is used; this circuit consists of 5 transistors and 2 RMs.

Fig. 192. Proposed read hardware (racetrack cells and read head circuit)

Figure 192 shows the proposed read hardware (consisting of the RMs and read head circuit); this hardware is connected to the read head. R1 and R2 are complementary RMs, whose values are opposite. Transistors MP1 and MP2 are ON by precharging the voltages at BL, BLB, and the sense voltage (VSen) to GND. However, the inverters are in an instability state. When VSen is VDD, the voltages at BL and BLB vary depending on the resistance of R1 and R2: if the resistance of R1 is less (higher) than R2, the voltages at BL and BLB are GND (VDD) and VDD (GND) respectively. The data stored in the RMs is then read.

5.4.1.3. Propagation Circuit

The propagation circuit is used to shift the domain wall of the RM; a new circuit is proposed. The voltage difference across the shift terminals is very small due to the very low shift resistance of a RM. To increase the shift voltage, the supply voltage of the shift circuit must be 283 increased; the width to length ratio of the transistors (W/L) must also be increased. The shift operation operates by connecting the proposed propagation circuit to the terminals Pin and Pout of the RM.

Fig. 193. Proposed propagation hardware (racetrack cells and propagation circuit)

Figure 193 shows the proposed propagation circuit for controlling the domain wall motion of a RM. The shift operation requires the voltage at WLshf to be at the supply voltage of the shift operation (Vdsf). The voltages at lines BLshf and BLBshf are varied depending on the shift direction, i.e. the shift direction is to the left (right) when the voltages at the bitlines BLshf and

BLBshf are GND and Vdsf (Vdsf and GND).

5.4.1.4. Circuit Evaluation

In this section, the proposed three circuits for the write, read and shift operations are assessed by HSPICE simulation.

Write Operation

The write operation using the proposed write circuit is performed by setting Vdw at 3V and the width to length ratio of the transistors (W/L) in the proposed write circuit to 10.

284

Fig. 194. Write operation of the proposed write hardware

Figure 194 shows the voltages at nodes in, inB and Cont of the proposed write circuit and the data stored in the RM. The voltage at node Cont controls the write operation; the write operation executes only if VCont is equal to Vdw. Its value depends on the voltages at in and inB; when the voltages at in and inB are GND (Vdw) and Vdw (GND), R1 and R2 are in state '0' ('1') and '1' ('0') respectively, so the complementary RMs are written. The write time of the first operation (i.e. when

R1 and R2 are in state '0)' is 6ns. The write times of '1' and '0' are 6.5ns and 4.5ns respectively. So, the worst case of the write time is 6.5ns.

Next, the proposed write circuit is compared with the write circuit of [123]. Delay, power dissipation and power delay product (PDP) are the evaluated Figures of merit. The write circuit of

[123] uses 4 transistors and 2 RMs (connected in series); the RMs in the proposed write circuit are connected in parallel, yielding a larger voltage drop. The voltage drop across the RM is reduced due to the series connection and therefore, its write time is slower.

285

Table 90. Performance comparison between the proposed write circuit and the write circuit of [123] at 32nm CMOS feature size and W/L = 10 Proposed [123] Initial Write Delay (ns) 6 62.6 Write Delay (ns) 6.5 18.7 Power Dissipation (mW) 4.736 7.5662 PDP (*10-11) 3.07837 14.1488 Number of Transistors 6 4 Write Voltage (V) 3 3

As shown in Table 90 (bold entries identify the best metrics), the delay of the proposed write circuit is shorter than for [123]; improvements are also accounted in power dissipation and

PDP. However, the number of transistors in the proposed write circuit is larger than [123]. The write operation is controlled by the two additional transistors (MWC1 and MWC2); this capability is not available using the circuit of [123].

Shift Operation

The shift operation requires Vdsf to be 3V and W/L in the shift circuit to be equal to 10.

Fig. 195. Shift operation of RM when the proposed propagation circuit is employed

286

So the shift operation requires the voltage at WLshf to be Vdsf (Figure 195). The racetrack is shifted left (right) when the voltage at BLshf is GND (Vdsf) and the voltage at BLBshf is Vdsf

(GND). The delay of the proposed circuit for shifting a track is 5.17ns, while the power dissipation and power delay product (PDP) are 2.4527mW and 1.268*10-11 respectively.

Read Operation

Next, a comparison between the proposed read hardware and other circuits found in the technical literatures is pursued. These circuits are the seven transistors (7T) based precharged sense amplifier circuit (PCSA) of [122] and the five transistors (5T) conventional LATCH-based sense amplifier (SA) of [124]. The simulation results are presented in Table 91.

Table 91. Performance comparison between the proposed hardware and the circuits of [122] and [124] for read operation Proposed 7T-PCSA [122] 5T-SA [124] Read Delay (ps) 56.55 46.98 56.05 Power Dissipation (µW) 4.827 8.0027 5.64934 PDP (*10-16) 2.72968 3.7597 3.16646 Number of Transistors/RM 5 7 5 Number of Racetrack Cells 2 2 2 Voltage Swing 0 - 0.9 0 - 0.9 0.25 - 0.66

Table 91 show that the delay of the proposed read hardware is faster than 5T-SA [124] but slower than 7T-PCSA. However its power dissipation and power delay product (PDP) are the smallest; moreover, the number of transistors in the proposed scheme is also less than 7T-PCSA while providing a full voltage swing at the output. Therefore, the proposed scheme offers many advantages such as a reduced number of transistors, full voltage swing, low power dissipation and improved PDP.

287

5.4.1.5. Array Evaluation

Consider next the evaluation of a RM array; in the array, it is assumed that each RM has its own write, read and propagation circuits. Performance at array level for the write and propagation operations is not related to the size of a memory array, so, this section concentrates only on the read operation; in this case, the array consists of a number of read hardware circuits connected to the same bitlines (also referred to as the array size).

Fig. 196. Delay vs. number of read hardware circuits connected to the same bitline

Figure 196 shows the plot of the delay for the four worst cases of data values; the delay increases linearly by increasing the array dimension. However, at most 121 circuits can be connected to the same bitline; a further increase causes a failure in the operation of the memory due to the increased capacitance on the bitline, i.e. reducing the driving capability of the read hardware. An additional precharge circuit is therefore needed to address this problem.

288

Fig. 197. Number of precharged circuits vs delay when the number of read hardware (racetrack cells and a read head circuit) is fixed at 256

Figure 197 shows the plot of the delay of the proposed read hardware and the number of precharged circuits that are connected to the same bitline; the delay is not linearly related to the number of precharged circuits, i.e. it is dependent on the number of read hardware circuits that are connected to the same bitline. Compared with the 7T-PCSA circuit [122], the proposed read hardware circuit has better performance because only one 7T-PCSA can be used for each bitline.

Fig. 198. Array of Read Hardware (Racetrack Cells + Read Head Circuit) when adding a LATCH cell at each pair of bitline voltages

289

Therefore, LATCH cells must be connected in an array of read hardware circuits

(consisting of racetrack cells and a read head circuit) to improve performance (Figure 198). The delay at array level is improved by setting the voltage at node SenP to VDD and Vsen to VDD.

Table 92. Delay of the proposed read hardware when LATCH is connected to the bitlines; the number of precharged circuits is given by 2 Delay (ns) Array Size No LATCH With LATCH 32x32 0.876 0.3793 64x64 1.691 0.6775 128x128 3.326 1.2788 256x256 6.789 2.5357

Table 92 shows the results for the delay at different array sizes with/without the LATCH cells; the improvement in delay is due to the better driving capability of the bitlines in the presence of the LATCH cells.

5.4.1.6. Variation Analysis

In this section, a variation analysis of different features is pursued for the RM and related circuits.

CMOS Feature Size

The proposed write, read and propagation circuits when varying the CMOS feature size are assessed next using high performance (HP-CMOS) PTMs [58].

290

Table 93. Performance of the proposed write, read and propagation circuits when varying the CMOS feature size and supply voltage CMOS Feature Size Parameters 16nm 22nm 32nm Write Supply Voltage (V) 2.5 Write Delay (ns) 40.8 12.1 6.4 Read Supply Voltage (V) 0.7 0.9 0.8 0.9 0.9 Read Delay (ps) 39.84 18.62 46.12 29.69 45.31 Read Delay + LATCH (ps) 43.99 19.90 50.94 31.93 48.6 Shift Supply Voltage 2.5 Propagation Delay (ns) 7.59 6.83 6.03

Table 93 shows the delay of the proposed write, read and propagation circuits; the delays of the proposed write and propagation circuits increase when reducing the feature size due to the reduction of the CMOS capacitance and the lower voltage difference across the RM. As for the proposed read circuit, its delay is related to sensing; so, at a lower CMOS feature size, its performance improves. The simulation results of Table 93 also show that the delay of the proposed read hardware is closely related to the supply voltage, i.e. when a higher value is utilized for the supply voltage, the read delay is reduced considerably. The plots of the write and propagation delays and supply voltage values are presented in Figures 199 and 200. At low feature sizes, the write delay reaches a minimum value before increasing its value, i.e. the operation of the proposed write circuit is closely related to the supply voltage. At a lower feature size, the capacitance is reduced, so a larger supply voltage is required for the write voltage to be greater than the critical value.

291

Fig. 199. Plot of write delay vs. supply voltage at different CMOS feature sizes

Fig. 200. Plot of shift delay vs. supply voltage at different CMOS feature sizes

Figure 200 shows that at a larger value of supply voltage, the propagation shift delay reduces; the plots in Figure 199 show the same trends, i.e. nearly irrespective of feature size.

Moreover, the least values of shift supply voltages (i.e. higher than the critical value) at 16nm,

22nm, and 32nm HP CMOS are 0.65, 0.645 and 0.63V respectively.

292

Read Resistance

Fig. 201. Plot of read delay vs nominal variation of racetrack resistance (denoted by Δ); racetrack R1 and R2 are in state '0' and '1' respectively

Figure 201 shows the plot of read delay versus percentage variation of the racetrack resistance (Δ) when R1 and R2 are in states '0' and '1' respectively. For a positive Δ, the read delay is high (low) for R1−Δ and R2+Δ (R1+Δ and R2−Δ). This is due to the resistance difference between R1 and R2: the delay at a large resistance difference (i.e. at R1+Δ and R2−Δ) is lower.

When Δ is negative, the reverse scenario applies.

Fig. 202. Power dissipation of the proposed read hardware when varying the racetrack resistance (R1 and R2 are in state '0' and '1' respectively)

293

Fig. 203. Power Delay Product (PDP) of the proposed read hardware when varying the racetrack resistance (R1 and R2 are in state '0' and '1' respectively)

Therefore, the power dissipation of the proposed read hardware behaves inversely with Δ

(Figure 202); the PDP shows a trend similar to the delay (Figure 203), because it is mostly affected by the delay rather than by the power dissipation.

Threshold Voltage

The percentage variation of the MOSFET threshold voltage in the proposed write, read and propagation circuits at 32nm CMOS feature size is given by 3% [77]. The delays of the write, read and propagation circuits are evaluated by using a Gaussian distribution with a variation of 3σ/µ in percentage.

Table 94. Percentage variation (3σ/µ) of delays of the proposed write, read, and propagation hardware when varying threshold voltage of CMOS Data Hardware 0 1 Write Hardware 8.611*10-3 47.652*10-3 Read Hardware 0.279*10-12 0.279*10-12 Propagation Hardware 0

294

Table 94 shows the results; the percentage variation of the write delay is larger than the others, because a variation in the CMOS threshold voltage is related to the write voltage. The percentage variation of the proposed read circuit is not dependent on the value of the stored data, because complementary RMs are used. The percentage variation of the propagation circuit is equal to zero due to the large supply voltage (3V). Similar conclusions are also applicable at lower feature sizes.

Number of Racetracks per Cell

Previously, it has been assumed that the RM consists of 4 variable tracks and 2 fixed tracks

(FL and FR). The height, width, and length of the RM are fixed to the values given previously in

Table 89. Performance of the racetrack cell is affected when the racetrack dimension is changed.

Fig. 204. Racetrack Memory

Consider the variation of racetrack dimension as in figure 204; the length of a track of the

RM (a) changes by increasing the number of racetracks per cell (N) and keeping constant the total length of the RM (c). Other features such as the write time, shift time and racetrack resistance in both the write and read heads also change with a variation of the racetrack length (a).

295

Fig. 205. Write duration of a RM when increasing the number of racetrack bits per cell. The write voltage is fixed at 0.9V

Figure 205 shows the plot of the write time versus N when the read head is located next to the write head; so the write time decreases when the number of racetracks per cell increases. This occurs because the area of a racetrack is smaller. Moreover the shift duration is related to c; an increase in N does not affect the shift delay.

Fig. 206. Write and read resistances of a RM when increasing the number of racetracks per cell (N). Write and read voltages are fixed at 0.9V

The write and read resistances change when changing the racetrack dimension, i.e. Figure

206 shows the write and read resistances when increasing N. When the number of racetracks per cell is increased, the areas of the write and read heads are reduced, i.e. increasing the racetrack 296 resistance. So, the write and read resistances of a high density RM are larger than for a low density

RM.

Process Variation

Next, the effect of variations on the MOSFETs in the proposed write, read and propagation circuits is evaluated. The percentage variations of the MOSFET threshold voltage (Vth) and the channel length (L) of a MOSFET at 32nm CMOS feature size are given by 3% and 2% respectively

[77]. The delays of the write, read and propagation circuits are simulated by using a Gaussian distribution with a variation of 3σ/µ (in percentage).

Table 95. Percentage variation (3σ/µ) of delays of the proposed write, read, and propagation circuits when varying threshold voltage and channel length of CMOS Vth (3%) L (2%) Hardware Data Value Data Value 0 1 0 1 Write 8.61*10-3 47.65*10-3 1.853 2.43 Read 0.28*10-12 0.28*10-12 4.234 3.164 Propagation 0 0.317

Table 95 shows the results; when varying the threshold voltage, the delays of write and read circuits are affected. This is not applicable to the delay of the proposed propagation circuits, i.e. it is constant.

 The percentage variation of the write delay is larger, because a variation in the CMOS

threshold voltage is directly related to the write voltage.

 The percentage variation for the read circuit is not dependent on the value of the stored

data, because complementary RMs are employed.

 The percentage variation of the propagation circuit is equal to zero due to the large supply

voltage (3V). 297

Note that similar conclusions are also applicable at lower feature sizes.

The simulation results in Table 95 show that the variation of CMOS channel length (L) highly affects the delay of the proposed write, read, and propagation circuits. The delay of the proposed read circuit is more strongly related to the variation the channel length; this occurs because while the delays of the proposed write and propagation circuits are also related to the switching speed of RM, the performance of the proposed read circuit is related only to the CMOS features.

Tolerance

Two features related to tolerance to SEU are assessed next.

Tolerance: Critical Charge

When using the LATCH cell, if a charged particle strikes the most sensitive node, then the stored data will change its value. The minimum amount of charge to change the state of cell is usually referred to as the critical charge (Qcrit).

Table 96. Critical Charge in each node of the proposed write and propagation circuits resulting in a error in the stored and shifted data respectively when voltage at nodes Cont and WLshf are at supply voltage Circuit Critical Node Store Data '0' Store Data '1' Win N/A -9.3636*10-11 Write Wout -9.3636*10-11 N/A Pin -1.1022*10-11 Propagation Pout -1.1022*10-11

Table 96 presents the critical charges of the proposed write and propagation circuits. For the write, the most sensitive nodes are Win and Wout. If particles strike at node Win or Wout while the voltage at node Cont is Vdw, the data stored in the RM changes its value. However if the controlling voltage is at GND, the particle will not affect the data stored in the RM. For the 298 propagation circuits, Pin and Pout are the most sensitive nodes. Particles striking at these node may shift the domain wall of the RM. Similar to the proposed write hardware, the shift operation is executed by the propagation circuit only when the voltage at WLshf is at supply voltage. If the voltage at WLshf is at Vdsf and the amount of charge that strikes this node is larger than the critical value, the shift operation is executed and an error is encountered in the operation of the cell.

Table 97. Critical charge in each node of the proposed read hardware with/without LATCH at the bitline voltages Store Data '1' Store Data '0' Node Basic Basic With LATCH With LATCH (without LATCH) (without LATCH) BL -6.3295*10-17 -5.9626*10-17 6.477*10-17 6.0315*10-17 BLB 6.477*10-17 6.0315*10-17 -6.3295*10-17 -5.9626*10-17 INV1 N/A -6.2277*10-17 N/A 6.2164*10-17 INV2 N/A 6.2164*10-17 N/A -6.2277*10-17

For the critical charges of the proposed read hardware, the stored data does not change during a read operation; the critical charge of the proposed read hardware is the last charge that changes the output voltage. The simulation results in Table 97 show that the critical charge of data

'1' and '0' are reversed while the read hardware with the LATCH cell inserted at the end of the bitlines can tolerate the charge strike to the cell better than the read hardware without the LATCH cell.

Tolerance: Error detection in stored data

If the data stored in R1 and R2 are the same such as they both are either in state ‘1’ (or in state '0'), then an error indicator must be generated.

This error is detected through a read operation by setting the voltage at line SenP of

LATCH to GND. If the data stored in R1 and R2 are both in state '1' ('0'), the proposed read 299 hardware is unsTable and the bitlines BL and BLB have voltage values of 0.4779V and 0.4801V.

As the bitlines voltage are not to VDD or GND, then this condition can be used to detect the presence of an error.

Fig. 207. The proposed detection circuit

Detection of a fault in a complementary RM can be accomplished using the circuit in Figure

207. Prior to executing the read operation, the voltage at line PreB is at GND to precharge the match line voltage (VML); it will be at VDD when the read operation is executed. An error in a complementary RM is detected depending on the bitline voltages (BL and BLB).

 If there is no error in the cell, the voltages of BL and BLB are at VDD or GND. The match

line voltage (VML) retains its value.

 If there is an error in the cell, the voltages at BL and BLB are at half of the supply voltage

and the corresponding transistors (MD1-MD4) are ON. The match line voltage is

discharged.

As NMOS transistors are employed in the detection circuit, VML may not have a full swing; so, an inverter is used to address this problem and a full swing is observed as output voltage (VOut).

300

Table 98. Output voltage of the proposed detection circuit Racetrack State Bitlines Voltages VOut (V) R1 R2 BL BLB

0 0 0.479 0.479 VDD

0 1 VDD GND GND

1 0 GND VDD GND

1 1 0.477 0.477 VDD

Table 98 shows the output voltage of the proposed detection circuit when varying stored data; if there is an error, the output voltage is at VDD, else it is at GND. Delay, power dissipation, and power delay product (PDP) of the proposed detection circuit are presented in Table 99.

Table 99. Delay, power dissipation, and Power Delay Product (PDP) of the proposed detection circuit Racetrack (R1R2) 00 01 10 11 Delay (ps) 105.74 64.10 64.10 108.55 Power Dissipation (µW) 6.6568 6.5786 6.5786 6.7477 PDP (*10-16) 7.0389 4.2169 4.2169 7.324

As match line voltage (VML) has a constant value when there is no fault in the cell, the delay of the proposed detection circuit is the same as the delay of the proposed read circuit. As for the power dissipation and PDP, there is no direct path between VDD and GND when the resistance

R1 and R2 are different. Therefore, the power dissipation (power delay product) of the proposed detection circuit in the absence of a fault in the cell is less than the power dissipation (power delay product) in the presence of a fault (R1 = R2).

301

5.4.2. Racetrack-based CAM and TCAM cells

This paper introduces new RM-based CAM and TCAM cells; the racetrack memories are utilized as (non-volatile) storage elements while CMOS transistors are used for control and performing the required operations. In the proposed designs, the search operations of the CAM and

TCAM cells execute by utilizing novel circuits to read the RMs and compare data at cell and array- levels. An extensive analysis of variation in both the RM and the MOSFET show that these cells continue to operate correctly in the presence of variations in threshold voltage, feature size and RM dimension. Cell and array-level design considerations and performance are assessed using

HSPICE. A comparison with a previously published RM-based CAM cell [108] shows that the proposed design requires a smaller number of transistors, thus accomplished a lower power dissipation and PDP, albeit incurring in a longer search delay.

5.4.2.1. Previous RM-Based CAM Cell Design

RM has been proposed as a storage element for a binary CAM cell in [108].

Fig. 208. RM-based CAM cell of [108]

302

The circuit (inclusive of the comparison operation) of the RM-based CAM of [108] is shown in Figure 208, 13 transistors and 2 RM cells are employed. The search operation of this

CAM executes by comparing the data stored in the complementary RM cells (R1 and R2) and the input voltage. However, the number of transistors in the comparison circuit is large; hence, circuit complexity and power dissipation of the RM-based CAM of [108] are also high. Novel cells are proposed next, while addressing these figures of merit for better performance.

5.4.2.2. Proposed Racetrack-based CAM and TCAM cells

The data stored in the cell must be considered when implementing CAM and TCAM designs. The proposed CAM and TCAM cells consist of 2 and 4 RMs respectively, i.e. 2 RMs per state pair.

Table 100. States of the CAM and TCAMs cells when using RMs to store data Racetrack State State R1 R2 R3 R4 0 0 1 N/A N/A CAM 1 1 0 N/A N/A 0 0 1 0 1 TCAM 1 1 0 1 0 2 1 0 0 1

Table 100 presents the states of each RM in the CAM and TCAM designs. The operations are discussed next. The memory operations are based on the circuitry presented in the previous section for the write, read and propagation operations.

303

Write Operation

The write operation of a RM-based CAM (TCAM) cells requires a single (double) write circuit.

Fig. 209. Proposed write circuit (racetrack cells and write head circuit) of a CAM (for a TCAM: 2 write circuits are required)

Figure 209 shows the write circuit of the RM-based CAM cell; six transistors and two RMs

(6T2R) are employed. In the write operation, the data stored in the cells appear as voltages at nodes in and inB, when the voltage at node Cont is at VDD. R1 and R2 have opposite values, so the write circuit of a CAM has 2 states ('0' and '1'). For a TCAM, 4 RMs are required (Table 100), hence the circuit of Figure 209 must be duplicated, i.e. two write circuits are needed.

Search Operation

Figure 210 shows the read circuits for executing the search operation. During a read operation, the sense voltage (VSen) is at VDD and the data stored in the RM cells are provided as output voltages at BL and BLB.

304

Fig. 210. Proposed read circuit (RM cells, read head circuit) of a) CAM and b) TCAM

a) b)

Fig. 211. a) Comparison circuit of RM-based CAM and TCAM b) Balancing circuit for sense amplifier read operation of TCAM

Comparison between the stored and searched data is performed after reading the data stored in the cells. Figure 211a) shows the comparison circuit of the proposed CAM and TCAM cells. The search voltages are at nodes S1 and S2 while the stored data is detected from the bitlines voltage

(BL and BLB). The match or mismatch outcome of the proposed CAM and TCAM cells is obtained by precharging the match line voltage (VML) to VDD prior to the search operation.

Table 101. Store and Search Voltage of CAM when using comparison circuit

Search VS1 VS2 Stored VBL VBLB VML Outcome

0 1 0 1 0 VDD Match 0 0 1 1 0 1 GND Mismatch 1 0 0 1 0 GND Mismatch 1 1 0 1 0 1 VDD Match 305

Table 102. Store and Search Voltage of TCAM when using comparison circuit

Search VS1 VS2 Stored VBL VBLB VML Outcome

0 1 0 1 0 VDD Match 0 0 1 1 0 1 GND Mismatch

0 1 2 0 0 VDD Match 1 0 0 1 0 GND Mismatch

1 1 0 1 0 1 VDD Match

1 0 2 0 0 VDD Match

0 0 0 1 0 VDD Match

2 0 0 1 0 1 VDD Match

0 0 2 0 0 VDD Match

Tables 101 and 102 show the outcomes of the CAM/TCAM cells for the search operation.

If the data stored in the cell is mismatched with the search data, VML is discharged, else its value remains unchanged. The outcome of the search operation is given by the voltage of the match line.

The circuit shown in Figure 211b) is used to balance the sense amplifier for a correct output voltage of the proposed read circuit (BL and BLB) of the TCAM cell.

Fig. 212. Array of proposed RM-based CAM/TCAM cells

306

Figure 212 shows the scheme of the write and read circuits of a RM-based CAM/TCAM array; note that the comparison circuit is located outside of the array, thus saving area and power dissipation compared to [108].

5.4.2.3. Simulation Results

This section presents simulation results for the proposed CAM/TCAM cells. HSPICE is utilized as simulation tool, while the model in the previous section is employed for simulating the

RM. A high performance CMOS (HP-CMOS) feature size of 32nm is assumed using the corresponding PTM; the physical parameters used for the RM [121] are listed in Table 103 (with a supply voltage of 3V for the write operation).

Table 103. Macromodel parameters for RM Paramet Parameter Value Parameter Value Value er -9 a 65*10 m Ms 15800 Oe T 300 K -9 b 65*10 m Hk 1734 Oe PP 0.52 % -6 -9 -10 c 1*10 m tsl 1.3*10 m tox 8.5*10 m thick_f 1.3*10-9 m e 1.6*10-19 C Phibas 0.4 eV P 0.72 TMR 1.2 FAA 332.253 Surface Surface Jc0 0.62*1012 A/m2 a*b πab/4 (Square) (Ellipse) 11 -23 -1 -12 3 γ 1.76*10 Hz/Oe Kb 1.38*10 J∙K Fa 30*10 m /C -9 -24 -1 α 0.025 rau 100*10 Ω∙m µB 9.27*10 J∙T

Vh 0.5 V CC 0.577

307

Cell Operations (Write and Search)

Consider first the write operation.

Fig. 213. Write operation of the write circuit at 32nm CMOS feature size (supply voltage at 3V)

Figure 213 shows the write operation of the RM-based CAM and TCAM cells. The write operation is executed when the voltage at node Cont is equal to VDD. If the voltages at nodes in and inB are GND (Vdw) and Vdw (GND), the data in the racetrack memories R1 and R2 are in the

'0' ('1') and '1' ('0') states respectively. The width to length ratio of the transistors (W/L) in the write circuits are set to 10; so, the write time is given by 6.5ns.

The read circuits of the proposed CAM and TCAM cells are shown in Figures 210a) and b); the operation of this circuit requires to precharge the bitline voltages (BL and BLB) to GND prior to the read operation.

Fig. 214. Precharged circuit used in the read circuit

308

Figure 214 presents the precharged part of the read circuit; the bitline voltages BL and BLB are precharged to GND by setting node Pre to VDD. During read operation, the sense voltage (VSen) in the read circuit is set to VDD while voltage at node Pre is at GND. The data stored in the RMs is read. If a '0' ('1') is stored in the cell, the resistances of R1 and R2 are complementary in values (i.e. large and small or vice versa); the bitline voltages BL and BLB are VDD and GND (or GND and

VDD) respectively. After reading the data stored in the cells, a comparison between the stored and the search data is performed. The delay, power dissipation and PDP for the search operation for the

RM-based CAM and TCAM cells are given in Tables 104 and 105.

Table 104. Delay, Power dissipation, and PDP of the RM-based CAM cell (search operation) Search Stored Delay (ps) Power Dissipation (µW) PDP (*10-16) 0 N/A N/A N/A 0 1 98.924 3.9806 3.9378 0 99.835 3.986 3.9802 1 1 N/A N/A N/A

Table 105. Delay, Power dissipation, and PDP of the RM-based TCAM cell (search operation) Search Stored Delay (ps) Power Dissipation (µW) PDP (*10-16) 0 N/A N/A N/A 0 1 100.61 6.7379 6.7793 2 N/A N/A N/A 0 103.18 6.6474 6.8586 1 1 N/A N/A N/A 2 N/A N/A N/A 0 N/A N/A N/A 2 1 N/A N/A N/A 2 N/A N/A N/A

As expected, these figures of merit are better for a RM-based CAM than a TCAM cell due to the large numbers of transistors and RMs. 309

Critical Charge

There is an extensive technical literature on memory design for single event upset (SEU) tolerance. In a memory circuit, the transient voltage change that is generated by a heavy ion strike, may directly lead to a Single Event Upset (SEU) as a state change of the memory cell [125]. A SEU is said to occur when the collected energy Q at a particular node is greater than the critical charge,

Qcrit, i.e. Qcrit is the minimum charge that needs to be deposited at the sensitive node of a storage cell to flip (change) the stored bit (data). In this paper, only the critical charge of the read and comparison circuits for the proposed CAM and TCAM cells are considered (the critical charge of the write and propagation circuits have been presented in the previous section); this is the charge that causes VML of the proposed CAM and TCAM cells to change value.

Table 106. Charge in the proposed RM-based CAM and TCAM cells when the stored and search data are ‘0’ Critical Charge (C) Node CAM TCAM BL -1.4674*10-10 3.2185*10-11 BLB -1.4433*10-10 3.2045*10-11 Q2 N/A 1.5639*10-10 Q3 N/A -1.4195*10-10

Table 106 presents the critical charge at the relevant nodes of the proposed CAM and

TCAM cells when the stored and search data are '0', i.e. the bitline voltages VBL and VBLB change and the outcome in the operation of the TCAM/CAM cells is erroneous (so the worst case). As shown in Table 106, node BLB is the critical node (when a '1' is stored and searched in the proposed cells, the inverse scenario is applicable, such that BL becomes the critical node).

310

Threshold Voltage Variation

The threshold voltage variation is considered next. Only the search operation is considered, because variations of the write and propagation circuits have already been presented in the previous section. The variation of the threshold voltage of a MOSFET at 32nm CMOS feature size is given by 3% [77]; a Gaussian distribution with a variation of 3σ/µ (expressed in percentage, where the mean is µ and the standard deviation is σ) is utilized to assess the mismatch delay of the proposed cells.

Table 107. Percentage variation (3σ/µ) of mismatch delay of the proposed RM-based CAM and TCAM cells Data Percentage Variation (%) Store Search CAM TCAM 0 1 5.173*10-13 0 1 0 0 1.559*10-13

As shown in Table 107, the percentage variation of the mismatch delay of the proposed

CAM and TCAM cells is very small due to the complementary scheme employed in this memory circuit. There is no significant dependency on the data value too.

Table 108. Critical Transistor and Percentage variation (3σ/µ) of mismatch delay of the proposed RM-based CAM and TCAM cells Data Critical Transistors Percentage Variation (%) Store Search 0 1 MSen, MS1 1.959*10-13 CAM 1 0 MP1, MP2, MN2, MSen, MSt1 1.959*10-13 0 1 Mpre3 2.603*10-13 TCAM 1 0 MP1, Mpre3 2.596*10-13

The transistor whose threshold voltage variation has the highest effect on the mismatch delay (also referred to as the critical transistor), is also found. At 32nm CMOS feature size, the 311 simulation results of Table 108 show that the transistors MSen and MP1 are critical in the proposed

CAM cell, while MP2 is the critical transistor in the proposed TCAM cell.

5.4.2.4. Array-Level Evaluation

The evaluation of arrays (Figure 212) made of proposed CAM and TCAM cells is pursued in this section. Different parametric features (such as critical charge and CMOS feature size) are assessed next with respect to an array consisting of proposed CAM/TCAM cells. Note that for the delay, the match case is the default condition in this operation; hence, the mismatch delay is reported.

CAM/TCAM Performance

The proposed RM-based CAM and TCAM cells are evaluated within a square array. As presented in the previous section, the largest number of read circuits that can be connected to the same bitline is limited at 128. So, two precharged circuits are then required when the array size is increased to 256. During the read operation, the values of the bitline voltages (BL and BLB) increase prior to reaching the stable states. To protect the match line voltage from discharging its value during the read operation the search voltages (VS1 and VS2) must be provided once the bitline voltages are stable.

Table 109. Mismatch delay, power dissipation and PDP of the proposed CAM cell. Bitline capacitance is 1fF, the number of precharged circuits is 2 Array Size Delay (ns) Power Dissipation (µW) PDP (*10-16) 32x32 0.868 11.313 98.233 64x64 1.601 8.7801 140.53 128x128 3.026 8.0852 244.68 256x256 5.862 8.2368 482.86

312

Table 110. Mismatch delay, power dissipation and PDP of the proposed TCAM cell. Bitline capacitance is 1fF, the number of precharged circuits is set to 2 Array Size Delay (ns) Power Dissipation (µW) PDP (*10-16) 32x32 0.869 13.617 118.37 64x64 1.597 15.414 246.21 128x128 3.027 16.350 494.98 256x256 5.863 16.792 984.58

Tables 109 and 110 present the mismatch delay, the power dissipation and the power delay product (PDP) of the RM-based CAM and TCAM cells when varying the array size. At a larger array size, the delay and PDP of the proposed cells increase; the values of the mismatch delay of the RM-based CAM and TCAM cells are close because the same comparators are used. The PDP of the proposed RM-based TCAM cell is nearly twice as much as for those of the proposed RM- based CAM cell due to the additional read circuits.

Feature Size Variation

This section considers an array made of the proposed CAM and TCAM cells when varying the CMOS feature size. Using the PTM for high performance CMOS (HP-CMOS) [58], figures of merit such as the mismatch delay, the critical charge and the threshold voltage variation are assessed by considering arrays of different sizes. In this section, the search voltages (VS1 and VS2) are provided after reading the data stored in the RM cell with stable bitline voltages.

Feature Size Variation: Mismatch Delay

The mismatch delay at array-level (Figure 212) is evaluated by varying the CMOS feature size and supply voltage of the proposed CAM and TCAM cells.

313

Table 111. Array-level CAM delay (ns) when varying the CMOS feature size. Bitline capacitance is 1fF, the number of precharged circuits is 2 CMOS Feature Size Array Size 16nm 22nm 32nm 32x32 0.761 0.578 0.857 0.674 0.868 64x64 1.391 1.029 1.551 1.242 1.601 128x128 3.045 1.755 2.921 2.41 3.026 256x256 4.784 2.856 5.741 5.038 5.862 Supply Voltage (V) 0.7 0.9 0.8 0.9 0.9

Table 112. Array-level TCAM delay (ns) when varying the CMOS feature size. Bitline capacitance is 1fF, the number of precharged circuits is 2 CMOS Feature Size Array Size 16nm 22nm 32nm 32x32 0.761 0.578 0.857 0.674 0.869 64x64 1.391 1.029 1.551 1.245 1.597 128x128 3.047 1.755 2.921 2.411 3.027 256x256 4.795 2.857 5.741 5.04 5.863 Supply Voltage (V) 0.7 0.9 0.8 0.9 0.9

Tables 111 and 112 present the array-level delay when varying the CMOS feature size. At the same supply voltage, the mismatch delay at a lower CMOS feature size is reduced; moreover by varying the supply voltage, the mismatch delay is lower at a higher value of supply voltage. In general there is no significant difference in delay between CAM and TCAM cells; at 16nm and large array size, the TCAM cell shows a marginal increase in delay compared to its CAM counterpart.

314

Feature Size Variation: Critical Charge

The critical charge of a CAM/TCAM array is considered next. As shown in Table 106, if a ‘0’ is stored as data in a cell, node BLB of the proposed CAM and TCAM cells is the critical node.

Table 113. Critical charge at node BLB of array of proposed CAM cells at difference CMOS feature sizes with '0' as stored data. The bitline capacitance is 1fF. The number of precharged circuits is 2 Critical Charge at BLB (*10-16 C) CMOS Feature Size Array Size 16nm 22nm 32nm 32x32 4.467 3.603 6.5497 5.742 8.971 64x64 12.141 10.63 18.573 16.975 26.813 128x128 32.187 31.212 50.664 48.375 76.748 256x256 82.312 88.324 131.37 131.24 206.74 Supply Voltage (V) 0.7 0.9 0.8 0.9 0.9

Table 114. Critical charge at node BLB of array of proposed TCAM cells at different CMOS feature size with '0' as stored data. The bitline capacitance is 1fF. The number of precharged circuits is 2 Critical Charge at BLB (*10-16 C) CMOS Feature Size Array Size 16nm 22nm 32nm 32x32 4.4665 3.603 6.5491 5.7356 8.995 64x64 12.14 10.651 18.577 17.052 26.717 128x128 32.243 31.069 50.664 48.373 76.744 256x256 82.327 88.298 131.37 131.24 206.73 Supply Voltage (V) 0.7 0.9 0.8 0.9 0.9

The simulation results are presented in Tables 113 and 114; the critical charge is related to array size, CMOS feature size, and supply voltage. At a larger array size, the total capacitance of the read head array is increased. Therefore, the array of the proposed CAM and TCAM cells can 315 better tolerate a SEU, i.e. the critical charge has a higher value. Moreover when reducing the CMOS feature size, the ability of the proposed CAM and TCAM cells to tolerate a SEU is reduced (i.e. the critical charge is smaller) because the capacitance of CMOS at a lower feature size is also reduced.

The values of the critical charges for the proposed CAM and TCAM cells have similar value because the same read circuit is utilized.

Feature Size Variation: Threshold Voltage Variation

The threshold voltage variation of an array made of the proposed CAM/TCAM cells is considered in this section. The threshold voltage of a MOSFET is varied as described in a previous section in each cell.

Table 115. Percentage variation (3σ/µ) of mismatch delays of proposed CAM cell. The bitline capacitance is 1fF. The number of precharged circuits is 2 Percentage Variation (*10-14 %) Stored Search CMOS Feature Size Array Size Data Data 16nm 22nm 32nm 0 1 14.074 4.7735 13.95 14.188 4.645 32x32 1 0 9.38 9.544 9.30 0 13.95 0 1 4.433 0 8.74 8.981 0 64x64 1 0 4.43 05.31 17.48 8.98 21.76 0 1 0 12.88 7.813 16.272 19.375 128x128 1 0 15.48 12.88 19.53 12.203 11.63 0 1 6.829 15.708 19.25 13.428 0 256x256 1 0 13.66 0 19.25 20.15 12.73 Supply Voltage (V) 0.7 0.9 0.8 0.9 0.9

316

Table 116. Percentage variation (3σ/µ) of mismatch delays of the proposed TCAM cell. The bitline capacitance is 1fF. The number of precharged circuits is 2 Percentage Variation (*10-14 %) Stored Search CMOS Feature Size Array Size Data Data 16nm 22nm 32nm 0 1 18.766 4.774 23.247 14.188 0 32x32 1 0 14.074 9.544 18.599 14.188 9.288 0 1 0 4.58 0 17.957 8.704 64x64 1 0 8.863 9.153 8.74 22.445 13.057 0 1 11.608 8.588 7.813 20.336 15.499 128x128 1 0 11.608 4.295 0 20.336 15.499 0 1 13.649 7.852 6.413 6.712 0 256x256 1 0 20.480 3.927 0 20.15 12.729 Supply Voltage (V) 0.7 0.9 0.8 0.9 0.9

Tables 115 and 116 show that the average percentage variation of the mismatch delay of the proposed CAM and TCAM cells is about 10.575% and 10.6915% respectively and its value is not significantly related to the array size, CMOS feature size and supply voltage.

Racetrack Dimension Variation

As presented in the previous section, a variation of the racetrack dimension affects the write delay, the shift delay and the write/read resistance of a RM. So, the write and search times of the proposed RM-based CAM and TCAM cells are evaluated by changing the length of the cell track

(denoted by a); only the delay of the write ‘1’ operation is presented because the write ‘1’ operation is slower than the write ‘0’ operation.

317

Fig. 215. Write and search times of the proposed RM-based CAM and TCAM cells when changing the length of RM track

As shown in Figure 215, the write times of the proposed CAM and TCAM cells are related to the track length. The racetrack resistance is reduced at a larger length value; so, the voltage difference across a RM-based cell during a write operation is decreased, thus resulting in a slower operation. As for the search time, the results of Figure 215 show that the racetrack length does not significantly affect the performance of the proposed CAM and TCAM cells, i.e. it is nearly independent.

Fig. 216. Write time of the proposed CAM and TCAM cells when changing the RM track length

318

The effect of the CMOS feature size on the write time when varying the track length, is also considered; as shown in Figure 216, the write time is reduced at a larger CMOS feature size, because the capacitance of a CMOS transistor is higher at a larger feature size. This occurs because the write voltage (i.e. the voltage drop across the RM during write operation) at a larger CMOS feature size is also higher.

5.4.2.4. Comparison

Two comparisons as related to the proposed CAM and TCAM cells and other schemes found in the technical literature are reported next.

CAM Comparison

A comparison between the proposed and the other RM-based CAM cell of [108] is initially presented; this evaluation assumes a 32nm CMOS feature size (using as in previous sections the

HP PTM), a 0.9V supply voltage and a 1fF match line capacitance.

Table 117. Comparison between proposed CAM cell and CAM cell of [108] Metric Proposed CAM CAM [108] Mismatch Delay (ps) 99.835 76.935 Power dissipation (µW) 3.9806 7.0097 PDP (*10-16) 3.9802 5.3929 Number of transistors/cell 5 13 Number of racetrack cells 2 2 Cell-level comparison Yes No

Table 117 presents the results; the delay of the CAM of [108] is better, while the proposed cell achieves better power dissipation and PDP. The cell of [108] requires a significantly larger number of MOSFETs as well as performing the comparison operation at cell rather than at array- level, hence incurring in a larger circuit complexity. 319

Consider next array-level operation for the cell of [108]. The mismatch delay, the power dissipation, and power delay product (PDP) of the racetrack based CAM [108] are assessed when varying the array size.

Table 118. Mismatch delay, power dissipation and PDP of the RM-based CAM cell [108] at 32nm HP-CMOS feature size, 0.9V supply voltage and 1fF line capacitance Array Size Delay (ns) Power Dissipation (µW) PDP (*10-16) 32x32 0.145 134.59 195.37 64x64 0.145 270.53 392.71 128x128 0.145 542.40 787.34 256x256 0.145 1087.8 1577

As shown in Table 118, the mismatch delay for [108] is not related to array size, because the stored and search data are compared by the same circuit. At a larger array size, the values of the power dissipation and the PDP increase. Comparing with the mismatch delay of the proposed

CAM array, the mismatch delay of an array of CAM cells of [108] is better than for array made of the proposed RM-based CAM cells (Table 109), however the power dissipation and PDP are worse.

This is mostly caused by the different circuit complexity for the cells, i.e. 13 MOSFETs for [108] vs only 5 for the proposed design (while still utilizing 2 RMs).

TCAM Comparison

The proposed TCAM cell is compared with the other non-volatile TCAM cells (PCM- based [126] and NAND-flash [127]) found in the technical literature at 32nm feature size.

320

Table 119. Comparison between proposed RM-based TCAM cell and other non-volatile TCAM cells Proposed Metric PCM NAND-Flash TCAM Write time / Erase time 6.43ns 199.34ns 300µs/2ms Search Time 103.18ps 2.447ns 25µs Write Operating Voltage (V) 3 3 0.9 Read Operating Voltage (V) 0.9 3

The results are shown in Table 119 (as previously, bold entries show the best values in the results); the proposed RM-based TCAM cell is significantly faster than the Phase Change Memory

(PCM)-based [126] and the NAND flash based [127] TCAM cells. Moreover, the proposed TCAM cell employs the same write operating voltage as the NAND Flash memory (3V), but only 0.9V

(i.e. the supply voltage for 32nm CMOS feature size) for the read/search operation. Note that PCM- based TCAM requires always 0.9V (i.e. for both the write and read operating voltage).

5.5. Conclusion

This paper has presented a detailed model and analysis of a racetrack memory. A HSPICE simulation model that is compatible with a CMOS based design environment has been proposed; by using MATLAB as tool for generating HSPICE code, different operational and physical features

(such as the number of tracks) have been assessed and generated easily using the proposed model.

The results show that the proposed model can simulate operations of racetrack memory at a small error when compared with experimental device data.

Novel circuits for the write, read and shift operations in a PMA-based RM have been proposed next; the proposed circuits are very efficient in terms of numerous figures of merit, such as delay, power dissipation, and power delay product (PDP). Compared with other circuits found in the technical literature, improvements in performance have been accomplished for the three operations (write, read, shift) required in a PMA-based racetrack memory cell. Moreover, the 321 proposed circuits allow an efficient implementation and execution of array-level operations of a racetrack memory. An extensive analysis of variation and SEU tolerance in the operation of a racetrack memory has also been presented; it has been shown that to further improve performance, each pair of bitline should be connected to the detection circuit to detect errors in the stored data.

Overall, this manuscript has confirmed the potential of a racetrack memory and its significant advantages for non-volatile storage.

Moreover, the two racetrack-based CAM and TCAM cells are also introduced in this chapter. These designs utilize two racetrack memories (RMs) as storage element for each pair of data states and CMOS transistors to control the fast execution of the operations of store and search.

Novel circuits to read the RMs and compare data at cell and array-levels have been proposed.

Simulation has shown that the proposed cell achieves better power dissipation and PDP than the

CAM of [108] but at a degradation in delay. The cell of [108] requires a significantly larger number of MOSFETs as well as performing the comparison operation at cell rather than at array-level, hence incurring in a larger circuit complexity. Moreover, the proposed RM-based TCAM cell is significantly faster than the Phase Change Memory (PCM)-based [126] and the NAND flash based

[127] TCAM cells. Moreover, the proposed TCAM cell employs the same write operating voltage as the NAND Flash memory (3V), but only 0.9V (i.e. the supply voltage for 32nm CMOS feature size) for read/search operation.

322

REFERENCE

[1] M. Marinella, “The Future of Memory” Aerospace Conference, March 2013, pp.1-11,

Montana, USA

[2] J. L. Hennessy, D. A. Patterson, “Computer Architecture: A Quantitative Approach” 4th

edition, Morgan Kaufmann Publishers, San Francisco ISBN: 978-0-12-970490-0

[3] G. Moore, “Cramming more components onto integrated circuit”, Electronics, Vol. 38, No.

8, pp. 114-117, 1965

[4] G. I. Bourainoff, P. A. Gargini, D. E. Nikonov “Research Directions Beyond CMOS

Computing” Solid-State Electronics, 51(11-12): 1426 – 1431, 2007

[5] D. Akinwande, S. Yasuda, B. Paul, S. Fujita, G. Close, H.S.P. Wong “Monolithic

Integration of CMOS VLSI and CNT for Hybrid Nanotechnology Applications” Proc. 38th

European Solid-State Device Research Conference, ESSDERC’08, pp. 91-94, 2008

[6] N. Engheta “Circuit with light at nanoscales: Optical nanocircuits inspired by

metamaterials” Science, 317 (5845): pp 1698-1702, 2007

[7] D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, “The missing memristor found”,

Nature, vol. 453, pp. 80-83, May 2008

[8] J. M. Rabaey, A. Chandrakasan, B. Nikolic, “Digital Integrated Circuits; A design

perspective” 2nd Edition, Prentice Hall of India Private Limited, ISBN: 978-81-203-2257-

[9] R. J. Baker, “CMOS: Circuit Design, Layout, and Simulation” revise 2nd edition, Wiley,

2007

[10] K. Roy, S. C. Prasad, “Low-Power CMOS VLSI Circuit Design” Wiley-Inter science; 1

edition (February 22, 2000) ch.6 pp 253-270

323

[11] V. Mohan, T. Bunker, L. Grupp, S. Gurumurthi, M. R. Stan, S. Swanson, “Modeling Power

Consumption of NAND Flash Memories Using FlashPower” IEEE Trans. Computer-aided

design of Integrated Circuits and Systems, Vol. 32, No. 7, pp.1031-1044, July 2013

[12] K. Itoh “VLSI Memory Chip Design” Springer Series in Advanced Microelectronics, Vol.

5, 2001

[13] L. Crippa, R. Micheloni, I. Motta, M. Sangalli “Nonvolatile Memories: NOR vs. NAND

Architectures” Memories in Wireless Systems, Ch. 2, pp.29-53, 2008

[14] R. Micheloni, L. Crippa, A. Marelli “Inside NAND Flash Memories” Springer 2010

[15] F. Sun, S. Devarajan, K. Rose, T. Zhang “Design of on-chip error correction systems for

multilevel NOR and NAND flash memories” IET Circuits Devices System, Vol.1, Issue 3,

pp. 241-249, 2007

[16] R. Bez, E. Camerlenghi, A. Modelli, A. Visconti “Introduction to Flash Memory” Proc.

IEEE Vol. 91, No. 4, pp. 489-502, 2003

[17] K. Pagiamtzis, A.Sheikholeslami “Content-Addressable Memory (CAM) Circuits and

Architectures: A Tutorial and Survey” IEEE Journal of Solid-State Circuits, Vol. 41 No.3

March 2006

[18] M. Meribout, T. Ogura, and M. Nakanishi, “On using the CAM conceptfor parametric

curve extraction,” IEEE Trans. Image Process., vol. 9, no.12, pp. 2126–2130, Dec. 2000

[19] M. Nakanishi and T. Ogura, “Real-time CAM-based Hough transform and its performance

evaluation,” Machine Vision Appl., vol. 12, no. 2, pp. 59–68, Aug. 2000.

[20] E. Komoto, T. Homma, and T. Nakamura, “A high-speed and compact size JPEG Huffman

decoder using CAM,” in Symp. VLSI Circuits Dig. Tech. Papers, 1993, pp. 37–38.

[21] B. W. Wei, R. Tarver, J.-S. Kim, and K. Ng, “A single chip Lempel-Zivdata compressor,”

in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol.3, 1993, pp. 1953–1955.

324

[22] S. Panchanathan and M. Goldberg, “A content-addressable memory architecture for image

coding using vector quantization,” IEEE Trans. Signal Process., vol. 39, no. 9, pp. 2066–

2078, Sep. 1991

[23] B. Rajendran, R. W. Cheek, L. A. Lastra, M. M. Franeschini, M. J. Breitwisch, A. G.

Schrott, Jing Li, R. K. Montoye, L. Chang, Chung Lam “Demonstration of CAM and

TCAM using Phase Change Devices” Memory Workshop (IMW), 2011 3rd IEEE, pp.1-4,

May 2011

[24] S. Koo et al., “Enhanced Channel Modulation in Dual-gated Silicon Nanowire

Transistors,” Nano Letters, vol. 5, no. 12, pp. 2519–2523, 2005.

[25] Y.-M. Lin et al., “High-performance Carbon Nanotube field-effect Transistor with Tunable

Polarities,” IEEE Trans. Nanotechnology, vol. 4, pp. 481–489, 2005.

[26] S. Heinze et al., “Unexpected Scaling of the Performance of Carbon Nanotube Schottky-

barrier Transistors,” Physical Review B, vol. 68, p. 235418, 2003.

[27] K. S. Novoselov et al., “Electric field effect in atomically thin carbon films,” Science, vol.

306, no. 5696, pp. 666– 669, 2004.

[28] A. Colli et al., “Top-gated silicon nanowire transistors in a single fabrication step,” ACS

Nano, vol. 3, no. 6, pp. 1587–1593, 2009.

[29] A. Dodabalapur et al., “Organic Heterostructure Field-effect Transistors,” Science, vol.

269, no. 5230, pp. 1560–1562, 1995.

[30] J. H. Schön et al., “Ambipolar Pentacene Field-effect Transistors and Inverters,” Science,

vol. 287, no. 5455, pp. 1022–1023, 2000.

[31] M. H. Ben Jamaa et al., “Novel Library of Logic Gates with Ambipolar CNTFETs:

Opportunities for Multi-Level Logic Synthesis,” in DATE 2009, pp. 622–627

[32] K. Jabeur, G. D. Pendina, G. Prenat, L. D. Buda-Prejbeanu, B. Dieny “Compact Modeling

of a Magnetic Tunnel Junction Based on Spin Orbit Torque” IEEE Trans. Magnetics, Vol.

50, No. 7, July 2014

325

[33] M. Julliere “Tunneling between ferromagnetic films” Phys. Lett. A., Vol. 54, No. 3, pp.

225-226, 1975

[34] S. Ikeda, J. Hayakawa, Y. Ashizawa, Y. M. Lee, K. Miura, H. Hasegawa, M. Tsunoda, F.

Matsukura, H. Ohno “Tunnel magnetoresistance of 604% at 300K by suppression of Ta

diffusion in CoFeB/MgO/CoFeB pseudo-spin-valves annealed at high temperature” Appl.

Phys. Lett., Vol. 93, No. 8, pp. 082508, 2008

[35] H. Zhao, K. C. Chun, J. D. Harms, T.-H Kim, J.-P Wang, and C. H. Kim “A scaling

roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-

MRAMs for high-density cache memory” IEEE J. Solid-State Circuits, Vol. 48, No. 2, pp.

58-610, February 2013

[36] S. Williams, “How We Found the Missing Memristor,” IEEE Spectrum, vol. 45, no. 12,

pp. 28-35, Dec 2008

[37] A. Pronin “Phase Change Memory: Fundamentals and Measurement Techniques” Keithley

Instruments Inc. March 2010

[38] S. Lin, Y. B. Kim, F. Lombardi, and Y.J. Lee, “A New SRAM Cell Design Using

CNTFETs,” in Proceedings of IEEE International SoC Conference 2008, pp. 168 -171,

Nov. 2008

[39] K. Eshraghian, K.R. Cho, O. Kavehei, S.K Kang, D. Abbott, S.M. Steve Kang, “Memristor

MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High

Performance Search Engines” IEEE Transactions on VLSI Systems, vol. 19, no. 8, pp.

1407-1417, 2011

[40] T. Ohsawa, F. Iga, S. Ikeda, T. Hanyu, H. Ohno, T. Endoh “High-Density and Low-Power

Nonvolatile Static Random Access Memory Using Spin-Transfer-Torque Magnetic Tunnel

Junction” Japanese Journal of Applied Physics 51 (2012) 02BD01

326

[41] S. Matsunaga, J. Hayakawa, S. Ikeda, K. Miura, T. Endoh, H. Ohno, T. Hanyu “MTJ-Based

Nonvolatile Logic-in-Memory Circuit, Future Prospects and Issues” DATE'09 pp.433-435,

April 2009

[42] D. Suzuki, T. Endoh, T. Hanyu “TMR-Logic-Based LUT for Quickly Wake-up FPGA”

MWSCAS 51st pp. 326-329, 2008

[43] L. O. Chua “Memristor–the Missing Circuit Element” IEEE Transactions on Circuit

Theory. Vol. CT-18 No.5 pp.507-519, Sep 1971

[44] V. Erokhin “Organic Memristor: Basic principal” IEEE ISCAS, pp. 508, 2010

[45] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, W. Lu “Nanoscale Memristor

Device as Synapse in Neuromorphic System” Nano Letter. Vol. 10, pp. 1297-1301, 2010

[46] J. Song, Y. Zhang, C. Xu, W. Wu, Z. L. Wang “Polar Charges Induced Electric Hysteresis

of ZnO Nano/Microwire for Fast Data Storage” Nano Letter Vol. 11 (7) pp. 2829-2834,

May 2011

[47] J. Qiu, A. Shih, W. Zhou, Z. Mi, I. Shih “Effects of metal contacts and dopants on the

performance of ZnO-based memristor devices” Journal of Applied Physics Vol. 110,

014513, 2011

[48] T. Prodromakis, K. Michelakis, C. Toumazou “Practical micro/nano fabrication

implementation of memristive devices” 12th international workshop on Cellular Nanoscale

Network and their application (CNNA) pp. 1-4, February 2010

[49] S. D. Yoon, A. Widom, K. E. Miller, M. E. McHenry, C. Vittoria, and V. G. Harris,

“Nanogranular metallic Fe-oxygen deficient TiO2-d composite films: a room temperature,

highly carrier polarized magnetic semiconductor,” J. Phys.: Condens. Matter, 20, 195206

(2008)

[50] A. Kumar, Y. Rawal, M. S. Baghini “Fabrication and Characterization of the ZnO-based

Memristor” Emerging Electronic (ICEE) Mumbai pp. 1-3, December 2012

[51] “Star-Hspice User Guide” Avant! Corporation, Release 2002.2 June 2002

327

[52] D. Batas and H. Fiedler, “A Memristor SPICE Implementation and a New Approach for

Magnetic Flux Controlled Memristor Modeling,” Nanotechnology, IEEE Transactions on,

vol. 10 Issues 2, pp. 250-255, Mar 2011

[53] Predictive Technology Model, http://ptm.asu.edu/

[54] CMOS SRAM Circuit Design and Parametric Test in Nano-Scaled Technologies Frontiers

in Electronic Testing, 2008, Volume 40, 13-38

[55] International Technology Roadmap for Semiconductors (ITRS), 2011 [Online], Emerging

Research Devices Chapter, table ERD3, pp. 6

[56] International Technology Roadmap for Semiconductors (ITRS), 2011 [Online],

http://public.itrs/net, PIDS Chapter Table PIDS7&8

[57] International Technology Roadmap for Semiconductors (ITRS), 2011 [Online],

http://www.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf, System Drivers

Chapter, pp.27

[58] SangBum Kim, Chung H. Lam “Transition of Memory Technologies” VLSI Technology

Systems and Applications (VLSI-TSA), April 2012, pp. 1-3

[59] T. Ogura, M. Mihara, Y. Kawajiri, K. Kobayashi, T. Sakaniwa, K. Nishikawa, S. Shimizu,

S. Shukuri, N. Ajika and M. Nakashima “A Fast Rewritable 90nm 512Mb NOR “B4-Flash”

Memory with 8F2 Cell Size” VLSI Circuits (VLSIC), 2011 Symposium, Technical Papers

pp. 198-199, June 2011

[60] S. Yu, H.S. P. Wong “Compact Modeling of Conducting-Bridge Random-Access Memory

(CBRAM)” IEEE Trans. Electron Devices, Vol. 58, No.5, May 2011

[61] Ugo Russo, Deepak Kamalanathan, Daniele Ielmini, Andrea L. Lacaita, Michael N.

Kozicki “Study of Multilevel Programming in Programmable Metallization Cell (PMC)

Memory” IEEE Trans. Electron Devices, Vol. 56, No.5, May 2009

[62] S. Y. Lee and K. Kim, “Future 1T1C FRAM technologies for highly reliable, high density

FRAM,” in IEDM Tech. Dig., 2002, pp. 547–550.

328

[63] X. Guo, C. Schindler, S. Menzel, and R. Waser, “Understanding the switching-off

mechanism in Ag+ migration based resistively switching model systems,” Appl. Phys.

Lett., vol. 91, no. 13, p. 133513, Sep. 2007

[64] M. Kund, G. Beitel, C.-U. Pinnow, T. Rohr, J. Schumann, R. Symanczyk, K.-D. Ufert, G.

Muller “Conductive bridging RAM (CBRAM): An emerging non-volatile memory

technology scalable to sub 20nm” IEDM Tech. Dig. 205, pp. 754-757

[65] M. Tada, T. Sakamoto, Y. Tsuji, N. Banno, Y. Saito, Y. Yabe, S. Ishida, M. Terai, S.

Kotsuji, N. Iguchi, M. Aono, H. Hada, N. Kasai “Highly scalable nonvolatile TiOx/TaSiOy

solid-electroly crossbar switch integrated in local interconnect for low power

reconFigurable logic” IEDM Tech. Dig 2009, pp. 943-946

[66] S. Maikap, S.Z. Rahaman, T.Y. Wu, F. Chen, M.-J. Kao, M.-J. Tsai “Low current (5 pA)

resistive switching memory using high-k Ta2O5 solid electrolyte” Proc. ESSDERC, 2009,

pp.217-220

[67] D.A. Dimplu, F. Wang “Behavior Modeling of Programmable Metallization Cell Using

Verilog-A” 9th International Conference on Information Technology - New Generations

2012, IEEE Computer Society

[68] L. Goux, K. Sandaran, G. Kar, N. Jossart, K. Opsomer, R. Degraeve, G. Pourtois, G. -M .

Rignanese, C. Detavernier, S. Clima, Y.-Y. Chen, A. Fantini, B. Govoreanu, D.J. Wouters,

M. Jurczak, L. Altimime, J.A. Kittl “Field-driven ultrafast sub-ns programming in

W\Al2O3\Ti\CuTe-based 1T1R CBRAM system” 2012 Symposium on VLSI Technology

digest of Technical Paper pp. 69-70

[69] W. Wei, J. Han, F. Lombardi “Design of a Non-Volatile 7T SRAM Cell for Instant-on

Operation” submitted for publication, 2013.

[70] A. Rubio, J. Figueras, E.I. Vatajelu, et al., “Process Variability in sub-16nm bulk CMOS

technology,” 2012, Project: Terascale Reliable Adaptive Memory Systems, FP7-INFSO–

IST -248789, 2012. Online Available: http://hdl.handle.neu/2117/15667

329

[71] G. Csaba, P. Lugli “Read-Out Design Rules for Molecular Crossbar Architecture” IEEE

Trans. on Nanotechnology, Vol.8, No.3, May 2009 pp.369-374

[72] S. Lin, Y. B. Kim, F. Lombardi “Read-Out Schemes for a CNTFET-based Crossbar

Memory” GLSVLSI 20th, pp. 167-170, May 2010

[73] M. Omana, D. Rossi, C. Metra, “Latch Susceptibility to Transient Faults and New

Hardening Approach”, IEEE Transactions on Computers, Volume 56, Issue 9, pp. 1255 -

1268, Sept. 2007.

[74] T. Calin, M. Nicolaidis, R. Velazco, “Upset Hardened Memory Design for Submicron

CMOS Technology,” IEEE Transactions on Nuclear Science, Volume 43, Issue 6, Part 1,

pp. 2874 - 2878, Dec. 1996.

[75] J. Gong, Y.B. Kim, J. Han and F. Lombardi “Hardening a Memory Cell for Low Power

Operation by Gate Leakage Reduction,” Proc. IEEE International Symposium on DFT in

VLSI and Nanotechnology Systems, pp.73-78, Austin, October 2012.

[76] P. Junsangsri, J. Han, F. Lombardi, “HSPICE macromodel of a Programmable

Metallization Cell (PMC) and its application to memory design” IEEE/ACM Int.

Symposium on Nanoscale Architectures (NANOARCH) 2014, pp. 45-50, Paris, France,

July 2014

[77] M. H. B. Jamaa, K. Mohanram, G. D. Micheli “An Efficient Gate Library for Ambipolar

CNTFET Logic” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems,

Vol. 30, No.2, Feb 2011

[78] J. F. Wakerly, “Error detecting codes, self-checking circuits and applications”, North-

Holland, 1978

[79] E. Fujiwara, “Code Design for Dependable Systems: Theory and Practical Applications,”

Wiley-Interscience, 2006

330

[80] M. Nicolaidis, R. Perez, D. Alexandrescu, “Low-Cost Highly-Robust Hardened Cells

Using Blocking Feedback Transistors,” in Proceedings of 26th IEEE VLSI Test

Symposium, 2008. pp. 371 - 376, April 27 2008-May 1 2008

[81] S. Lin, Y.B. Kim and F. Lombardi, “A 11-Transistor Nanoscale CMOS Memory Cell for

Hardening to Soft Errors”, IEEE Transactions on VLSI Systems, Volume 19, Issue 5, pp.

900 - 904, May. 2011.

[82] Y. Sasaki, K. Namba, H. Ito, “Soft Error Masking Circuit and Latch Using Schmitt Trigger

Circuit,” in Proceedings of 21st IEEE International Symposium on Defect and Fault

Tolerance in VLSI Systems, pp. 327 - 335, Oct. 2006.

[83] D. Sacchetto, M. H. Ben-Jamaa, S. Carrara, G. D. Micheli, Y. Leblebici “Memristive

Devices Fabricated with Silicon Nanowire Schottky Barrier Transistors” IEEE Circuits

and Systems (ISCAS 2010), Vol.1 pp. 9-12, 2010

[84] C. Dao-Lin, S. Zhi-Tang, L. Xi, C. Hou-Peng, C. Xiao-Gang “A Compact SPICE Model

with Verilog-A for Phase Change Memory” Chin. Phys. Lett. Vol.28, No.1 (2011) 018501

[85] X. Q. Wei, L.P. Shi, R. Walia, T.C. Chong, R. Zhao, X.S. Miao, B.S. Quek “HSPICE

Macromodel of PCRAM for Binary and Multilevel Storage” IEEE Trans. Electron

Devices. Vol. 53, No.1 Jan 2006

[86] D. Ielmini, A.L. Lacaita, D. Mantegazza “Recovery and Drift Dynamics of Resistance and

Threshold Voltages in Phase-Change Memories” IEEE Trans. Electron Devices. Vol.54

No.2 Feb 2007

[87] W. Xu, T. Zhang “Using Time-Aware Memory Sensing to Address Resistance Drift Issue

in Multi-Level Phase Change Memory” 11th Int’l Symposium on Quality Electronic Design

(ISQED) March 2010 pp. 356-361

[88] S. Kim, B. Lee, M. Asheghi, F. Hurkx, J.P. Reifenberg, K.E. Goodson, H.S. Philip Wong

“Resistance and Threshold Switching Voltage Drift Behavior in Phase-Change Memory

331

and Their Temperature Dependence at Microsecond Time Scales Studies Using Micro-

Thermal Stage” IEEE Trans. Electron Devices. Vol.58, No.3 March 2011

[89] W. Zhang, T. Li “Helmet: A Resistance Drift Resilient Architecture for Multi-level Cell

Phase Change Memory System” 41st Dependable Systems & Networks (DSN) IEEE/IFIP

June 2011 pp. 197-208

[90] H. L. Chang, H.C. Chang, S.C. Yang, H.C. Tsai, H.C. Li, C.W. Liu “Improved SPICE

Macromodel of Phase Change Random Access Memory” IEEE VLSI DAT’09, April 2009,

pp. 134-137

[91] Y. B. Liao, J.T. Lin, M.H. Chiang “Temperature-Based Phase Change Memory Model for

Pulsing Scheme Assessment” ICICDT June 2008, pp.199-202

[92] K.C. Kwong, Lin Li, Jin He, M. Chan “Verilog-A Model for Phase Change Memory

Simulation” ICSICT 9th, Oct. 2008 pp. 492-495

[93] Y.B. Liao, Y.K. Chen, M.H. Chiang “An Analytical Compact PCM Model Accounting for

Partial Crystallization” EDSSC 2007, pp. 625-638

[94] K. H. Jo, J. H. Bong, K. S. Min, S. M. Kang “A Compact Verilog-A model for Multi-

Level-Cell Phase-change RAMs” IEICE Electronics Express, Vol.6 No.19, pp. 1414-1420

[95] P. Fantini, A. Benvenuti, A. Pirovano, F. Pellizzer, D. Ventrice, G. Ferrari “A Compact

model for Phase Change Memories” SISPAD 2006, pp. 162-165

[96] R.A. Cobley, C.D. Wright “Parameterized SPICE Model for a Phase-Change RAM

Device” IEEE Trans. Electron Devices Vol. 53 No.1 Jan 2006 pp. 112-118

[97] L. Xi, S. Zhi-Tang, C. Dao-Lin, C. Xiao-Gang, J. Xiao-Ling “An SPICE Model for PCM

Based Arrhenius Equation” Chin. PhysLett. Vol. 26, No.12 2009 128501

[98] P. Junsangsri, J. Han and F. Lombardi “Macromodeling a Phase Charge Memory (PCM)

Cell by HSPICE,” Proc. IEEE/ACM Int. Symposium on Nanoarchitectures, pp. 77-84,

Amsterdam, July 2012

332

[99] V.A. Pedroni “Low-Voltage high-speed Schmitt Trigger and compact window

comparator” Electronics Letters 27th Oct 2005 Vol.45 No.22

[100] P. Junsangsri, J. Han and F. Lombardi “On the Drift Behaviors of a Phase Change Memory

(PCM) Cell” Proc. IEEE Int. Symp on Nanotechnology, pp. 1145-1150, Bejing, August

2013.

[101] H.S.P. Wong, Jie Deng, A. Hazeghi, T. Krishnamohan, G. C. Wan “Carbon Nanotube

Transistor Circuits – Models and Tools for Design and Performance Optimization”

ICCAD’06 Nov 2006 pp. 651- 654

[102] K. Chen, J. Han, F. Lombardi “Design and Evaluation of two MTJ-Based Content

Addressable Non-Volatile Memory Cells” Proc. IEEE Int. Symp on Nanotechnology,

August 2013

[103] W. Xu, T. Zhang “A Time-Aware Fault Tolerance Scheme to Improve Reliability of

Multilevel Phase-Change Memory in the Presence of Significant Resistance Drift” IEEE

Trans. VLSI Systems, Vol.19 No.8 August 2011

[104] D. Ielmini, D. Sharma, S. Lavizzari, A. L. Lacaita “Reliability Impact of Chalcogenide-

Structure Relaxation in Phase-Change Memory (PCM) Cells – Part I: Experimental Study”

IEEE Trans. Electron Devices. Vol.56, No.5 pp.1070-1077 May 2009

[105] S. Kang, W. Y. Cho, B. H. Cho, K. J. Lee, C. S. Lee, H. R. Oh, B. G. Choi, Q. Wang, H.

J. Kim, M. H. Park, Y. H. Ro, S. Kim, C. D. Ha, K. S. Kim, Y. R. Kim, D. E. Kim, C. K.

Kwak, H. G. Byun, G. Jeong, H. Jeong, K. Kim, and Y. Shin, “A 0.1-µm 1.8-V 256-Mb

Phase-Change random access memory (PRAM) with 66-MHz synchronous burst-read

operation,” IEEE J. Solid-State Circuits, vol. 42, no. 1, pp. 210–218, Jan. 2007

[106] M. Boniardi, D. Ielmini, S. Lavizzari, A. L. Lacaita, A. Redaelli, and A. Pirovano,

“Statistical and scaling behavior of structural relaxation effects in phase-change memory

(PCM) devices,” in Proc. IEEE Int. Reliab. Phys. Symp., Apr. 2009, pp. 122–127

333

[107] N. Papandreou, A.Pantazi, A.Sebastian, M.Breitwisch, C.Lam, H.Pozidis, E.Eleftheriou

“Multilevel Phase-Change Memory” Electronics, Circuits, and Systems (ICECS),

December 2010 17th IEEE International Conference pp.1017-1020

[108] Y. Zhang, W. Zhao, J.-O. Klein, D. Ravelsona, C. Chappert “Ultra-High Density Content

Addressable Memory Based on Current Induced Domain Wall Motion in Magnetic Track”

IEEE Trans. Magn. Vol. 48, No. 11, November 2012 pp.3219-3222

[109] C. Chappert, A. Fert, F. Nguyen Van Dau “The emergence of spin electronics in data

storage” Nature materials Vol.6 pp.813-823, 2007

[110] S. Mangin, D. Ravelosona, J. A. Katine, M. J. Carey, B. D. Terris, E.E. Fullerton “Current-

induced magnetization reversal in nanopillars with perpendicular anisotropy” Nature

Materials Vol.5, pp. 210-215, 2006

[111] S. Ghosh “Design Methodologies for High Density Domain Wall Memory” IEEE/ACM

Nanoarch 2013 pp. 30-31, July 2013, NY USA

[112] A. J. Annunziata, M. C. Gaidis, L. Thomas, C. W. Chien, C. C. Hung, P. Chevalier, E. J.

O’Sullivan, J. P. Hummel, E. A. Joseph, Y. Zhu, T. Topuria, E. Delenia, P. M. Rice, S. S.

P. Parkin, W. J. Gallagher “Racetrack Memory Cell Array with Integrated Magnetic

Tunnel Junction Readout” IEDM 2011, pp.24.3.1-24.3.4, December 2011

[113] M. Gajek, J. J. Nowak, J. Z. Sun, P. L. Trouilloud, E. J. O'Sullivan, D. W. Abraham, M. C.

Gaidis, G. Hu, S. Brown, Y. Zhu, R. P. Robertazzi, W. J. Gallagher, D. C. Worledge “Spin

Torque Switching of 20nm Magnetic Tunnel Junctions with Perpendicular anisotropy”

Appl. Phys Lett. Vol. 100, Issue 13, 132408 (2012)

[114] M. Nakayama, T. Kai, N. Shimomura, M. Amano, E. Kitagawa, T. Nagase, M. Yoshikawa,

T. Kishi, S. Ikegawa, H. Yoda, “Spin transfer switching in

TbCoFe/CoFeB/MgO/CoFeB/TbCoFe magnetic tunnel junctions with perpendicular

magnetic anisotropy” J. Appl. Phys. 103, 07A710 (2008).

334

[115] S. Ikeda, K. Miura, H. Yamamoto, K. Mizunuma, H. D, Gan, M. Endo, S. Kanai, J.

Hayakawa, F. Matsukura, H. Ohno “A perpendicular-anisotropy CoFeB-MgO magnetic

tunnel junction” Nature Materials, Vol. 9, pp. 721-724, September 2010

[116] D. C. Worledge, G. Hu, David W. Abraham, J. Z. Sun, P. L. Trouilloud, J. Nowak, S.

Brown, M. C. Gaidis, E. J. O’Sullivan, R. P. Robertazzi “Spin torque switching of

perpendicular Ta CoFeB MgO-based magnetic tunnel junctions” Appl. Phys. Lett., Vol.

98, 022501.2, 2011

[117] S. Fukami, T. Suzuki, Y. Nakatani, N. Ishiwata, M. Yamanouchi, S. Ikeda, N. Kasai, H.

Ohno “Current-induced domain wall motion in perpendicularly magnetized CoFeB

nanowire” Appl. Phys. Lett., vol. 98, 082504, 2011

[118] W. S. Zhao, Y. Zhang, H.-P. Trinh, J-O. Klien, C. Chappert, R. Mantovan, A. Kamperti,

R.P. Cowburn, R. Trypiniotis, M. Klaui, J. Heinen, B. Ocker, D. Ravelsona “Magnetic

Domain-Wall Racetrack Memory for high density and fast data storage” ICSICT 11st, pp.

1-4, Oct 2012

[119] M. Hayashi, L. Thomas, R. Moriya, C. Rettner, S. S. P. Parkin “Current-Controlled

Magnetic Domain-Wall Nanowrie Shift Register” Science Vol.320 No.5873 pp.209-211,

2008

[120] S. S. P. Parkin, M. Hayashi, L. Thomas “Magnetic Domain-Wall Racetrack Memory”

Science Vol. 320 No. 5873 pp.190-194, 2008

[121] Y. Zhang, W. S. Zhao, D. Ravelosona, J.-O. Klein, J.V. Kim “Perpendicular-magnetic-

anisotropy CoFeB racetrack memory” Journal of Applied Physics 111, 093925, 2012

[122] W. Zhao, C. Chappert, V. Javerliac, J.-P. Noziere “High Speed, High Stability and Low

Power Sensing Amplifier for MTJ/CMOS Hybrid Logic Circuits” IEEE Trans. on

Magnetics, Vol. 45, No. 10, Oct 2009, pp. 3784-3787

335

[123] H.-P. Trinh, W. Zhao, J.-O Klein, Y. Zhang, D. Ravelsona, C. Chappert “Magnetic Adder

Based on Racetrack Memory” IEEE Trans Circuits and System Vol. 60, No. 6, June 2013,

pp.1469-1477

[124] W. C. Black Jr., B. Das, “Programmable logic using giant-magneto-resistance and spin-

dependent tunneling devices” J. Appl. Phys. Vol. 87, No. 9, pp. 6674-6679, 2000

[125] P. E. Dodd and L.W. Massengill, “Basic Mechanisms and Modeling of Single-Event Upset

in Digital Microelectronics,” IEEE Transactions on Nuclear Science, pp. 583 - 602, June

2003

[126] P. Junsangsri, F. Lombardi, “A Ternary Content Addressable Cell using a single Phase

Change Memory (PCM)” Proc. ACM/IEEE Great Lakes Symposium on VLSI

(GLSVLSI), pp. 259-264, Pittsburgh, PA, USA, May 2015

[127] http://download.micron.com/pdf/datasheets/flash/nand/2gb_nand_salesbrief_m29a.pdf