Quick viewing(Text Mode)

Novel Paradigms and Designs of Nanometric Memories

Novel Paradigms and Designs of Nanometric Memories

Novel Paradigms and Designs of Nanometric Memories

A Dissertation Presented

by

Wei Wei

to

The Department of Electrical and Engineering

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in

Computer Engineering

Northeastern University Boston, Massachusetts

May, 2015

ABSTRACT

Nowadays, although the extensive time and efforts have concentrated on digital memory investigation, while Complementary Metal-- (CMOS) technology have been reduced down to nanometer feature sizes, many new challenges and roadblocks have become apparent in recent years. Low power configurations of new memory cells are proposed and evaluated by analyzing both static and dynamic behaviors. Furthermore, the hardened memory designs are also included in this dissertation to improve its robustness to the Soft Errors (SEs) and characterizations from different perspectives, such as Critical Charge and Soft Error Rate (SER).

A novel Static Random Access Memory (SRAM) cell was designed with the technique of Single- Transfer by proposing an HSPICE compatible behavioral model. The extensive simulations verify the model and , together with the characterization of its performance and comparison with the conventional 6T volatile

SRAM. Moreover, the proposed SRAM cell was implemented for the application as

Ternary Content-Addressable-Memory (TCAM) cell; this cell shows notable improvements compared with the CMOS counterpart in terms of power dissipation.

After that, the novel Oxide Resistive RAM (RRAM) has been investigated and applied for the memory cell design because of its good compatibility with CMOS fabrication processes. A multiple-level memory cell (MLC) is proposed to achieve multiple-value storage and its performance was evaluated, especially its robustness to the noise. The

RRAM was also used for different SRAM configurations. Four different non-volatile

SRAMs (NVSRAMs) have been designed in this dissertation, including 7T1R, 9T1R,

iii

Hardened NVSRAM Type-1 and Hardened NVSRAM Type-2. These memories are capable of non-volatile storage with the used ―Instant-on‖ scheme and the latter two cells provide robustness improvements. For the DRAM design, two volatile memory cells are proposed to improve the performance and lower the influence of process variability at nanometric feature size. In addition, the non-volatile DRAM cell designs are presented by incorporating the RRAM technique. Relevant evaluations have been included to demonstrate its operation. Finally, the novel hybrid memory cell was included by taking advantages of both the SRAM and the DRAM circuits for high performance and low consumption. Accomplished by the non-volatile storage function, the proposed two hybrid memory types were assessed with HSPICE for circuit-level simulation and for benchmark simulations on the architecture-level with the custom access pattern scheme for verification.

iv

ACKNOWLEDGMENTS

It is my great pleasure to thank all the people help me to finish this dissertation.

I would like to express my gratitude to my research advisor Prof. Fabrizio Lombardi for his patient guidance and notable constructive suggestions. I appreciate his vast knowledge in many areas and he brings me the art of VLSI design. He has broadened my horizon and raised me to develop my professional career. His understanding, wisdom, patience, enthusiasm and encouragement, push me farther than I thought I could go.

To the members of my dissertation committee, Prof. Yong-Bin Kim and Prof.

Marvin Onabajo, I am extremely grateful for providing me the helpful suggestions throughout my dissertation. Especially, the graduate courses taught by Prof. Yong-Bin

Kim, initially stimulate my interest and lead me to the world of .

To all the faculties and staffs of the Department of Electrical and Computer

Engineering, who provides me great supports. I feel honored to work with these members and they facilitate my TA work. My sincere appreciation goes to the members of the HPVLSI group at Northeastern University for their assistance.

Finally, I would love to dedicate this work to my parents. Many thanks for all my friends, Caihua Xu, Hao Chen, Jianping Gong, Xikang Zhang, Liuxi Zhang, Jing Yang,

Jing Lu, Sheng Lin, Yuchi Ni, Yuexi Zhang, Junpeng Feng, Ke Chen, Linbin Chen,

Jiliang Liu and Wei Li, helping me survive all the stress from these years and not letting me give up.

v

TABLE OF CONTENTS

Page ABSTRACT ...... iii ACKNOWLEDGMENTS ...... v TABLE OF CONTENTS ...... vi LIST OF FIGURES ...... ix LIST OF TABLES ...... xiv

1. INTRODUCTION ...... 1 1.1. Overview ...... 1 1.1.1. Design Issues of Low Power Memory ...... 5 1.1.2. Design Issues of Robust Memory ...... 7 1.2. Emerging Technology and Previous Work ...... 8 1.2.1. Single-Electron (SE) Turnstile ...... 10 1.2.2. Previous SE Turnstile Model ...... 13 1.2.3. Resistive Random-Access-Memory (RRAM) ...... 15 1.2.4. MOSFET-Based Ternary CAM ...... 17 1.2.5. Previous NVSRAMs ...... 18 1.2.6. Previous DRAMs ...... 20 1.3. Dissertation Outline ...... 22

2. SINGLE-ELECTRON TRANSFER ...... 24 2.1. Introduction ...... 24 2.2. Proposed Single-Electron Turnstile Model...... 25 2.3. Demonstration and Analysis of the Proposed Model ...... 27 2.3.1. Simulation with Experimental Data ...... 28 2.3.2. Functional Operation ...... 30 2.3.3. Comparison with [55] ...... 32 2.4. Proposed Hybrid Memory Cell by Single-Electron Transfer ...... 34 2.4.1. Circuit Elements and Previous Design ...... 34 2.4.2. Proposed SRAM Cell ...... 38 2.5. Evaluation and Analysis of Proposed SRAM Cell ...... 41 2.5.1. SET/MOS Hybrid Circuit Evaluation ...... 41 2.5.2. Circuit Performance ...... 45 2.5.3. Area ...... 47 2.5.4. Static Noise Margin ...... 48 2.6. Proposed Ternary Content-Addressable Memory ...... 49 2.6.1. Ternary Data Matching Using a Dual-Gate SET ...... 49 2.6.2. Proposed TCAM Cell ...... 53 2.7. Evaluation and Analysis of Proposed TCAM Cell ...... 57 2.7.1. Average Write/Read Delay ...... 57 2.7.2. Matching (Mismatching) Delay ...... 58 2.7.3. Power Dissipation ...... 59 2.8. Conclusion ...... 60

3. RESISTIVE RANDOM-ACCESS-MEMORY ...... 61 3.1. Introduction ...... 61

vi

3.2. Proposed Multiple Level Memory Cell (MLC) ...... 62 3.2.1. Modeling RRAM ...... 62 3.2.2. Proposed MLC Design ...... 65 3.3. Multiple Level Memory Cell (MLC) Evaluation ...... 66 3.3.1. Resistive switching behavior ...... 67 3.3.2. Array Evaluation ...... 70 3.4. Proposed 7T1R Non-volatile SRAM Cell ...... 72 3.5. Evaluation and Analysis of Proposed 7T1R NVSRAM Cell ...... 74 3.5.1. RRAM Model Demonstration ...... 74 3.5.2. NVSRAM Demonstrations ...... 76 3.5.3. Energy ...... 79 3.5.4. Average Write(Store)/Read Delays ...... 81 3.5.5. Area ...... 84 3.5.6. Static Noise Margin ...... 84 3.5.7. Multi-context Configurability ...... 86 3.6. Proposed Soft Error and Hardened NVSRAM Cell ...... 87 3.6.1. Soft Error and Critical Charge ...... 87 3.6.2. SEU Tolerance of Existing NVSRAM cells ...... 88 3.6.3. Proposed Hardened NVSRAM Cells ...... 89 3.7. Evaluation and Analysis of Proposed NVSRAM Cells ...... 92 3.7.1. Virtual ground voltage VS ...... 92 3.7.2. Power Dissipation ...... 92 3.7.3. Performance ...... 93 3.7.4. Critical Charge ...... 94 3.7.5. Soft Error Rate ...... 96 3.7.6. Area ...... 98 3.8. Conclusion ...... 99

4. DYNAMIC RANDOM-ACCESS-MEMORY ...... 101 4.1. Introduction ...... 101 4.2. Proposed Volatile DRAM Cells ...... 102 4.2.1. 4TI DRAM Cell ...... 102 4.2.2. 4T1D DRAM Cell ...... 104 4.3. Evaluation and Analysis of Proposed Volatile DRAM Cells ...... 105 4.3.1. Performance ...... 105 4.3.2. Power Dissipation ...... 107 4.3.3. Retention Time...... 108 4.3.4. Critical Charge ...... 108 4.3.5. Area ...... 109 4.3.6. Process Variability ...... 111 4.4. Proposed Non-volatile DRAM Cells ...... 115 4.5. Evaluation and Analysis of Proposed Non-volatile DRAM Cells ...... 119 4.5.1. Power Dissipation ...... 120 4.5.2. Performance ...... 122 4.5.3. Retention Time...... 123 4.5.4. Critical Charge ...... 124 4.5.5. Area ...... 124 4.5.6. Process Variability ...... 125 4.6. Conclusion ...... 126

5. EMBEDDED DYNAMIC RANDOM-ACCESS-MEMORY ...... 127 5.1. Introduction ...... 127 5.2. Previous Design of Hybrid Memory Cell Circuits ...... 128

vii

5.3. Proposed Hybrid Memory Circuits ...... 129 5.3.1. Improved macrocell (MCT) ...... 129 5.3.2. Non-volatile hybrid memory ...... 130 5.3.3. Circuit Power Dissipations and Delays ...... 133 5.3.4. Critical Charge ...... 135 5.4. Proposed Hybrid Memory Scheme with Custom Access Pattern ...... 135 5.4.1. Cache Memory ...... 136 5.4.2. Hybrid Cache Accessing ...... 137 5.4.3. Refresh Scheme ...... 140 5.5. Hybrid Memory Scheme Architectural Evaluation ...... 140 5.5.1. SRAM and eDRAM Hit Rate ...... 141 5.5.2. Instruction Per Cycle (IPC) ...... 143 5.5.3. Power Savings of Hybrid Memory Scheme ...... 144 5.5.4. Area Savings ...... 146 5.6. Conclusion ...... 147

6. SUMMARY AND FUTURE WORKS ...... 150 6.1. Summary of Contributions ...... 150 6.2. Future Works ...... 154 6.2.1. Hardened Design with Noise Tolerance...... 154 6.2.2. Hardened Design with MBU Tolerance ...... 155 6.2.3. Hybrid Memory Cache Design with Process Variability Tolerance ...... 155

7. REFERENCES ...... 156

viii

LIST OF FIGURES

Page Fig. 1. Moore‘s Law [1]...... 1

Fig. 2. N-Type Metal-Oxide-Semiconductor Field-Effect (MOSFET)...... 2

Fig. 3. Multiple Types of Digital Storage Devices...... 3

Fig. 4. Conventional 6T Static Random-Access-Memory (SRAM) Cell...... 6

Fig. 5. Conventional 3T Dynamic Random-Access-Memory (DRAM) Cell...... 7

Fig. 6. Conventional 1T Dynamic Random-Access-Memory (DRAM) Cell...... 7

Fig. 7. (a) Equivalent circuit of the MOSFET-based SE turnstile. (b) Circuit symbol of the SE

turnstile. It consists of a source terminal, a drain terminal, an input gate voltage terminal, a

bias voltage terminal and two clock terminals. The drain is connected to a storage

(SN). (c) Repulsive clock voltage pulses for the turnstile operation. A transfer cycle is

composed of four steps. (d) Schematic diagram to accurately inject into the

storage node (SN). The four steps correspond to the steps in (c). (e) Schematic diagram to

accurately eject electrons from the SN...... 11

Fig. 8. Simulation of SE Turnstile using the model of [55] at 32nm...... 14

Fig. 9. Schematic diagram showing the MIM structure of a OxRRAM, with Set (as formation of a

conductive filament) and Reset (as dissolution of the conductive filament)...... 15

Fig. 10. Current-voltage plot of resistance switching for a unipolar RRAM in voltage sweeping

mode...... 16

Fig. 11. Current-voltage plot of resistance switching for a bipolar RRAM in voltage sweeping

mode...... 17

Fig. 12. Ternary core cells: (a) NOR-type TCAM, (b) NAND-type TCAM, (c) ternary encoding for

NOR cell, (d) ternary encoding for NAND cell [75]...... 18

Fig. 13. RRAM-based 8T2R SRAM cell [78]...... 18

Fig. 14. RRAM-based 9T2R SRAM cell [80]...... 19

Fig. 15. 3T1D DRAM cell of [102]...... 20

ix

Fig. 16. 4T DRAM cell of [102]...... 20

Fig. 17. 3T DRAM cell of [121]...... 21

Fig. 18. Proposed HSPICE Model of a SE Turnstile...... 26

Fig. 19. Simulation Timing Diagram of SE Turnstile at 45nm...... 28

Fig. 20. Simulation Timing Diagram of SE Turnstile at 32nm...... 28

Fig. 21. Simulation results using the experimental parameters of [49]...... 30

Fig. 22. Simulation results of the operational modes of the proposed model at 32 nm. (a)Applied

gate voltage control pulses. (b) Input pulse to start a SE transfer cycle. (c) Amount of

charge in SEB as function of cycle time. (d) Transient current through G1 for a SE transfer.31

Fig. 23. Simulation results for the model of [55] at 32 nm. (a) Applied gate voltage control pulses.

(b) Input pulse to start a SE transfer process. (c) Amount of charge in SEB as function of

time. (d) Transient current through G2 to represent a SE transfer...... 32

Fig. 24. Schematic diagram of the SET/MOS hybrid circuit...... 35

Fig. 25. Proposed SRAM using a MOSFET-based SE turnstile...... 38

Fig. 26. HSPICE timing diagram of proposed SRAM cell at 45nm node...... 39

Fig. 27. Schematic diagram of the single-electron multiple-valued memory (SEMV) circuit [11]. 42

Fig. 28. Simulation of MOSFET-based electrometer at 32nm node...... 42

Fig. 29. Simulation of SET/MOS hybrid circuit at 32nm node...... 43

Fig. 30. Static inverting transfer characteristics of the SE turnstile circuit at 32nm...... 48

Fig. 31. Static inverting transfer characteristics of the SET/MOS hybrid circuit at 32nm...... 49

Fig. 32. Operation principles of ternary matching: (a) matching circuit consisting of a dual-gate

SET and a MOSFET, (b) ternary matching and (c) measured drain current

characteristics [77], (d) and (e) simulated drain current characteristics at VG1=0V,

VG1=0.9V at 32 nm node, respectively...... 50

Fig. 33. Proposed TCAM Cell & Precharge Circuit: (a) Cell Structure & Precharge Circuit (b) SE-

based memory cell...... 53

Fig. 34. Simulated results of timing diagram of TCAM at 45 nm...... 54

x

Fig. 35. Simulated results of timing diagram of TCAM at 32 nm...... 55

Fig. 36. A HSPICE macromodel of a single RRAM [79]...... 62

Fig. 37. HSPICE simulation of RRAM model [79] at nanosecond scale...... 64

Fig. 38. HSPICE simulation of a multiple bit RRAM cell at nanosecond scale...... 65

Fig. 39. Proposed multiple level memory cell using RRAMs in a 8-bit 16-word 128 cells array. ... 66

Fig. 40. Simulation of proposed 1T1R RRAM cell for binary operation at 32 nm...... 67

Fig. 41. Simulation of 1T1R memory cell for multiple level operation at 32 nm...... 69

Fig. 42. Simulation of proposed single RRAM cell array for binary operation (write and read at 32

nm, N=1)...... 70

Fig. 43. Noise margin and output deterioration for binary operations at 32 nm...... 71

Fig. 44. Size (N) of operating linear array versus base of memory...... 72

Fig. 45. Proposed 7T1R NVSRAM cell...... 73

Fig. 46. Modified RRAM model for bipolar operation...... 75

Fig. 47. Simulation of modified bipolar RRAM model...... 76

Fig. 48. Store ―1‖, Power-down and Restore ―1‖ operations of the proposed 7T1R cell at 32nm. .. 77

Fig. 49. Store ―0‖, Power-down and Restore ―0‖ operations of the proposed 7T1R cell at 32nm. .. 79

Fig. 50. Average ―Write‖/―Store‖ energy for the different memory cells versus feature size...... 81

Fig. 51. Average ―Restore‖ energy for four different NVSRAM memory types versus feature sizes.81

Fig. 52. Average ―Write‖/―Store‖ delay for the different memory cells versus feature size...... 83

Fig. 53. Average ―Read‖ delay for the different memory cells versus feature size...... 83

Fig. 54. Layout of the proposed 7T1R NVSRAM...... 84

Fig. 55. Write Static Noise Margin (WSNM) of 7T1R memory cell for Store ―1‖ at 32 nm...... 85

Fig. 56. Write Static Noise Margin (WSNM) of 7T1R memory cell for Store ―0‖ at 32 nm...... 85

Fig. 57. Multiple-context configuration of proposed 7T1R cell...... 86

Fig. 58. Charge at storage node vs feature size for four memory types...... 89

Fig. 59. Proposed 9T1R NVSRAM cell...... 90

Fig. 60. Proposed Hardened NVSRAM Type-1 cell...... 90

xi

Fig. 61. Proposed Hardened NVSRAM Type-2 cell...... 91

Fig. 62. Power dissipation for four memory cells at 32 nm (VS=0.1V)...... 93

Fig. 63. Performance metrics for memory cells (VS=0.1V)...... 94

Fig. 64. Charges for four memory cells at 32 nm (VS=0.1V)...... 95

Fig. 65. SER vs feature size for various memory cells...... 97

Fig. 66. Layout of the proposed 9T1R memory cell...... 98

Fig. 67. Layout of the proposed Hardened Type-1 memory cell...... 98

Fig. 68. Layout of the proposed Hardened Type-2 memory cell...... 99

Fig. 69. Proposed 4TI DRAM cell circuit...... 102

Fig. 70. Simulated waveforms of 4TI DRAM cell at 45nm...... 103

Fig. 71. Simulated RBL voltage vs Ctrl2 voltage for body of transistor T4 at 45nm...... 104

Fig. 72. Proposed 4T1D DRAM cell circuit...... 104

Fig. 73. Simulated waveform of 4T1D DRAM cell at 45nm...... 105

Fig. 74. Average write delay vs type of memory cell...... 106

Fig. 75. Average read delay vs type of memory cell...... 107

Fig. 76. Power dissipation vs type of memory cell...... 107

Fig. 77. Retention time vs various feature sizes for DRAMs...... 108

Fig. 78. Layout of the 3T1D DRAM cell of [102]...... 110

Fig. 79. Layout of the 4T DRAM cell of [102]...... 110

Fig. 80. Layout of the BBMOS as part of a 4T/4TI DRAM cell...... 110

Fig. 81. Layout of the proposed 4T1D DRAM cell...... 110

Fig. 82. Area vs type of memory cell...... 111

Fig. 83. Proposed non-volatile 4T1D1R DRAM cell...... 116

Fig. 84. Proposed non-volatile 4T1RP DRAM cell...... 118

Fig. 85. Proposed Read Output Circuit for non-volatile DRAM cell...... 118

Fig. 86. Simulated waveforms of 4T1D1R cell for ―1‖ at 45nm...... 119

Fig. 87. Simulated waveforms of 4T1D1R cell for ―0‖ at 45nm...... 120

xii

Fig. 88. Layout of the 4T1D1R NVDRAM cell...... 125

Fig. 89. Proposed improved macrocell (MCT)...... 129

Fig. 90. Proposed non-volatile hybrid memory...... 130

Fig. 91. Simulated waveforms of proposed MCT cell at 22nm...... 132

Fig. 92. Simulated waveforms of proposed non-volatile hybrid memory cell at 22nm...... 132

Fig. 93. Power dissipation vs n at 22nm...... 133

Fig. 94. Write delay vs n at 22nm...... 134

Fig. 95. Read delay vs n at 22nm...... 134

Fig. 96. General hybrid memory scheme...... 136

Fig. 97. Scheme for cache memory address decoding...... 137

Fig. 98. Scheme inside the cache controller...... 138

Fig. 99. Access scheme for conventional and hybrid caches...... 139

Fig. 100. Average static L1 hit rate the integer and floating-point benchmarks respective to cache

size...... 142

Fig. 101. Average dynamic L1 hit rate the integer and floating-point benchmarks respective to cache

size...... 142

Fig. 102. IPC performance with respect to the integer benchmarks for 16KB-4way memory cache.143

Fig. 103. IPC performance with respect to the floating-point benchmarks for 16KB-4way memory

cache...... 144

Fig. 104. Statistical power dissipations of conventional cache with SRAM and hybrid cache with

MCC cell...... 145

Fig. 105. Power dissipation with respect to the integer benchmarks for 4way hybrid memory cache.145

Fig. 106. Power dissipation with respect to the floating-point benchmarks for 4way hybrid memory

cache...... 146

Fig. 107. Entire 4way cache area vs cache size for various memory cells...... 147

xiii

LIST OF TABLES

Page Table I. Device Parameters for HSPICE Simulation ...... 13

Table II. Device Parameters for Comparison ...... 29

Table III. Device Parameters for SRAM HSPICE Simulation ...... 37

Table IV. Device Parameters for SRAM Simulation of Read Operation Comparison ...... 41

Table V. HSPICE Simulation Results for SRAM Cell ...... 45

Table VI. Device Parameters for TCAM HSPICE Simulation ...... 51

Table VII. HSPICE Simulation Results for Single TCAM Cell ...... 57

Table VIII. Device Parameters for RRAM HSPICE Simulation ...... 63

Table IX. Parameters for Multiple Bit RRAM Simulation ...... 64

Table X. Parameters for 1T1R MLC Simulation ...... 67

Table XI. Resistances for Different Base Operation of 1T1R MLC Cell ...... 68

Table XII. Simulation Results of 1T1R MLC Cell For Binary Operation ...... 69

Table XIII. Parameters for NVSRAM Simulation ...... 76

Table XIV. Energy of Memory Cells (32nm) ...... 80

Table XV. Average delay of Memory Cells (32nm) ...... 81

Table XVI. Simulated MCC scenarios for Proposed 7T1R memory cell ...... 86

Table XVII. Performance of 9T1R Memory Cell at 32 nm ...... 92

Table XVIII. Summary of SER (FIT/h•MBit) at Sea-Level obtained from Evaluations at 32nm ...... 97

Table XIX. Parameters for DRAM Cell Simulation ...... 103

Table XX. Charges of DRAM Cell Types ...... 109

Table XXI. Standard deviation for variability (in percentage) of each Technology Node ...... 111

Table XXII. Variability (in percentage) of each transistor in 4T1D cell at 45nm ...... 112

Table XXIII. Variability (in percentage) of each transistor in the other three DRAM cells at 45nm ..... 112

Table XXIV. Variability (in percentage) of four DRAM cells for various MOSFET feature size ...... 114

Table XXV. Signal voltages used of DRAM Cells ...... 115

xiv

Table XXVI. Power Dissipation of DRAM Cells ...... 121

Table XXVII. Performance of DRAM Cells ...... 122

Table XXVIII. Retention Time and Charge of DRAM Cells ...... 123

Table XXIX. Variability (in percentage) of NVDRAM cells for various MOSFET feature size ...... 125

Table XXX. Parameter for hybrid memory cell simulation ...... 131

Table XXXI. Critical charge and nodes of Memory Cells (n=2) ...... 135

Table XXXII. Cache Memory Configuration ...... 141

Table XXXIII. Ranking of Cache ...... 148

Table XXXIV. Comparison of Various Cache Associativities ...... 148

xv

1. INTRODUCTION

1.1. Overview

The rapid development of Integrated Circuit (IC) technology in the past several decades, has a wide and significant influence on the way we live. The enabling technology is allowed for systems with larger quantity of by integration, which is capable of multiple functions and fulfilling the requirements of in the real life. An industry pioneer, , gives has prediction of the number of transistors that can be fabricated on a chip would double approximately every two years in his 1965 publication, shown in Fig. 1 [1]. Nowadays, regarded as Moore‘s Law, represents the high speed development of the IC technology.

Fig. 1. Moore‘s Law [1].

Undoubtedly, the progress of the IC technology takes advantages of the rapid pace of the semiconductor technology and the transistors inventions. Although the various semiconductor fabrication processing technologies have been developed through the past several decades, however, the technologies have been the most widely used in the [2]. Among so many digital IC technologies, Complementary Metal-Oxide-Semiconductor (CMOS) technology has become the dominant

1

technology for Very-Large-Scale Integration (VLSI) and in the past decade it has been still a leading technology for several reasons [3]. Shown in Fig. 2, the n-type MOS Field-Effect Transistor (MOSFET) has typical four connects nodes used with voltage for various operational modes [2]. First, respective to other devices, the physical properties of CMOS is easier to understand. Second, its performance improves consistently with the program of CMOS technology. Finally, it has the lowest

Power Delay Product (PDP) among all technologies [2]. Hence, it allows to be integrated the most for the most functions in unit area or single chip, under the limitation of power consumption.

Fig. 2. N-Type Metal-Oxide-Semiconductor Field-Effect Transistor (MOSFET).

Refer to the evolution of CMOS technology on recent International Solid-State Circuits Conference

(ISSCC) [4] and the International Technology Roadmap for (ITRS) report [5], the number of transistors in a ship increases and the power consumption per transistor decreases accordingly. However, the advantage of higher integration capacity in VLSI of CMOS technology, also presents its several shortcomings when the feature size has been scaled down to the nano range. In the

32nm technology era, the leakage current has been increased and becomes unavoidable, company with the process variation during the fabrication process. Meanwhile, the traditional charge stored at a circuit node is susceptible to the influence from the spurious voltage variation, generated from the and alpha-particle due to the smaller (Vdd) used. In sum, the situation of the CMOS becomes serious since the above phenomena appear that are negligible historically and the techniques of low power and hardened circuit design becomes critical at nanoscale era.

2

Fig. 3. Multiple Types of Digital Storage Devices.

In the family of silicon digital designs, a large portion is dedicated to the storage of data values and program instruction [3]. Shown in Fig. 3, multiple types of digital storage appeared in the past half century and have a wide influence on our daily life Meanwhile, more than half of the transistors in current high- performance are contributed to cache memories [3], and this ratio is expected to further increase. Furthermore, the situation becomes dramatic at system level, while the high-performance and work stations contain several Gbytes of . Normally, the CMOS storage elements, such as the CMOS memory, are built on the concepts of either positive feedback or capacitive storage [3]. Meanwhile, the memory arrays are combined with the single cell and minimize the area overhead generated from peripheral circuitry to increase the storage density. While this type of memory circuit is categorized into the volatile memories [2] since the stored data will be lost when the power to the chip is turned off. By contrast, the non-volatile memory is capable of storing the data without power supply, such as Read-Only-Memory (ROM) types at the cost of slow access time. Relatively, utilizing the properties of fast speed and less expensive Random Access Memory (RAM), it has been widely applied in the cache memory and main memory of the computer storage system [3]. Its random access property, allows the data can be efficiently accessed in any random order [3].

Single-Electron (SE) transfer technique has been investigated for a long history [6]-[8] and successfully implemented with the devices for various applications [9]. Because of its ultra-low power

3

dissipation and high-density [9], the Single-Electron Devices (SEDs) become promising candidates as the storage components circuit design. Moreover, the extensive time and efforts have been concentrated on the characterization and modeling of the quantum phenomena, which is the important observation for SE tunneling [10]. The new phenomena, such as the Coulomb blockage and Coulomb oscillation are related to the tunneling and have been investigated [10]. Through the previous studies [6][11], the SE transfer technique has been demonstrated its good compatibility with CMOS fabrication process, which enables it implement for the modern VLSI deign.

Differently, the appearance of portion of resistive storages, including Ferroelectric RAM (FRAM)

[12][13], Phase-change RAM (PRAM) [14] and Resistive RAM (RRAM) [15], also offers an alternative for non-volatile storage design. The phenomenon of large negative resistance has been first observed in the current-voltage (I-V) characteristics of five metal-oxide-metal (MOM) structures [16]. The resistance switching is the remarkable effect of the RRAM, which has been investigated for more than 40 years [15].

Judged by the controlling , there are two major types, including the unipolar and bipolar [15].

Through the time and efforts of research [17][18], the non-volatile RRAM has been demonstrated with its great potential among the various non-volatile RAM, such as small time and current for program/erase, high density and high endurance [18], which enables it to be the good candidates for IC memory design.

Especially, the different fabrication materials with the various combinations of properties [15][18], supply the multiple possibilities for the implementations.

Typically, the modern Random Access Memories (RAMs) has two main forms are Static RAM

(SRAM) and Dynamic RAM (DRAM) [3]. Although both of these two memory types are volatile, however, they have different applications in a computer system according to their properties. The SRAM stores a bit of data using the state of a flip-flop, which has the faster access speed at cost of relatively high expense than the DRAM [3]. Normally, it is embedded in the cache near the (CPU) in modern computers [2]. By contrast, the main memory based on DRAM unit is placed at a location far from the CPU but also to the hard disk since it has relatively high density and low cost [2]. Therefore, the key parameter of SRAM circuit is speed and that of DRAM is the size and cost [2]. Conventionally, the

SRAM utilizes the circuit structure of flip-flop to store the data and the connected access transistor acts as

4

a that lets the control circuitry on the chip access the storage node for write/read operation. By contrast, although multiple different forms of the DRAM unit exist, such as one-transistor (1T) and three- transistor (3T) [3], however, it is usually implemented with less transistors for higher density. In addition, the DRAM chip requires the periodical refresh operation by control signal since the leakage would loss the stored data [3].

In this dissertation, the study is concentrated on the low power and hardened designs of the two major memory types:

 Static RAM (SRAM)

 Dynamic RAM (DRAM)

1.1.1. Design Issues of Low Power Memory

The low power design of memory attracts quite significant efforts on its investigation since the portion becomes increasingly important in the microprocessor [3]. Traditionally, the low-volatile method has been used widely in CMOS VLSI, which is also applicable for the digital memory design [2]. After all, in order to reduce the vertical and lateral electric fields in the Metal-Oxide-Semiconductor Field-Effect-

Transistor (MOSFET) for maintaining the device reliability, the reduction of power supply voltage is necessary [2]. Meanwhile, the device count increases sharply with the scaled down devices [2]. Although the CMOS has the relatively low DC power dissipation, however, its dynamic power dissipation is also accumulated quickly along with the CMOS technology development [3].

Lowering power supply voltage is also the effective method to reduce the power dissipation of the IC memories [2] because the dynamic power dissipation of a CMOS is given by [3],

P  C V 2 f (1) L dd CLK

where CL is the load , and fCLK is the clock frequency. From this equation, the dynamic power dissipation of the circuit is one fourth when the power supply Vdd reduces to half value. By contrast, although the clock frequency of the digital system will increase and the load capacitance will decrease

5

with the advanced CMOS technology, however, the dynamic power can be efficiently saved by lowering the power supply, which is the most straightforward method.

Differently, the static power dissipation is usually generated from leakage component of the

MOSFET incorporated in the respective memory circuit [2]. For the conventional six transistors (6T)

SRAM circuit, shown in Fig. 4, the power is always connected for its standby mode due to its volatile property. Normally, there are two complementary storage nodes inside cross-couple inverters and one of them will be ―1‖, which turns on the one of the driving transistor. Hence, although the access transistors are turned off, controlled by the Word Line (WL), however, the channel leakage from the respective Bit

Line (BL) through the transistors to ground bring the significant power dissipation. Especially, the value increases with the scaled down MOSFET feature size nowadays [2], which is also an important power dissipation portion of SRAM.

Fig. 4. Conventional 6T Static Random-Access-Memory (SRAM) Cell.

Compared with the SRAM, the DRAM has the similar storage mechanism with the incorporated but less complexity for circuit components [3]. For instance, the conventional 3T and 1T DRAM cells are presented in Fig. 5 and Fig. 6, respectively.

6

Fig. 5. Conventional 3T Dynamic Random-Access-Memory (DRAM) Cell.

Fig. 6. Conventional 1T Dynamic Random-Access-Memory (DRAM) Cell.

By eliminating the redundant components of the SRAM circuits, the DRAM is also capable of data storage but requires the necessary refresh operation to compensate the charge loss by periodically rewriting the cell content. Moreover it suffers from the leakage, leading to the state loss finally, which can be mitigated by the refresh. Although the reduction in cell complexity more than compensates for the added systems complexity imposed by the refresh requirement, however, the variation also presents the power usage penalty to some extent [3]. Therefore, besides the dynamic power dissipation for the write/read operation occurring in the DRAM, it also consumes the significant power dissipation with the refresh operation. Usually, the Retention Time is introduced to characterize the time period of the effectiveness for the stored data in the DRAM [2], which has been focused as an important characteristic to improve for larger by the designers.

1.1.2. Design Issues of Robust Memory

Improvements in CMOS at nanoscales have made it possible to achieve an extremely high density in

IC design. Considering the conventional characterization of the SRAM cell circuit, the Static Noise Marge

(SNM) is used to present the robustness of the circuit dealing with the noise disturbance [3]. Although it is widely accepted in the technical literatures, however, it shows its limitations due to the respective measurement method of increasing the DC voltage to flip the memory stored data. After all, the fatal failure happens occasionally and affects the circuit from various mechanisms, which has a higher requirement of the measurement for dynamics. Furthermore, with the reduction in feature size, ICs are less robust and their operations can be affected by externally induced phenomena such as cosmic ray neutrons and α-particle [28]. When these energy particles travel through the silicon bulk, minority carriers are

7

created and collected by the source/drain diffusion, leading to a voltage variation [29]. This may cause a soft error due to a transient fault (TF) in the circuit. At reduced scaling, CMOS technology exhibits this type of disruptive phenomena. Therefore, memory design is confronted with many challenges originating from technology changes, especially in the soft errors.

Over the last few years, many approaches have been proposed for tolerating TFs in storage elements, such as error correcting codes, temporal redundancy and hardened design [30][31][32][33]. Among them, hardening has been utilized for low-cost design to tolerate a single event upset (SEU) in memories and latches [34] [35][36]. Hardening techniques improve the tolerance to a SEU by adding in most cases extra transistors to the original design. Few hardened memory cell circuits have been designed to tolerate TFs, independent of both the transistor feature size and node capacitance in the cell [37][38][39]. However, a high design overhead can be incurred due to the modified circuitry. Protection techniques use in a SRAM cell to absorb the excess charge of the transient upset phenomenon, but it usually causes degradation in performance at a reduced feature size [40]. In general, hardened and protection designs achieve an improvement in SEU tolerance by increasing the charge of least value (often referred to as critical charge) in a node of the memory cell circuit, usually at the expenses of an increase in power dissipation. Low-power designs of memory cells hardened to SEU have been proposed in [34]. These designs utilize a positive virtual ground technique to lower the gate leakage current in nanometer process technologies [42], while increasing the critical charge. It has been shown that these cells have better performance and better tolerance to SEU (affecting multiple nodes) than schemes such as DICE [38].

The extensive efforts have been concentrated to deal with the robust memory designs, however, as device size shrinks, spacing between nodes decreases significantly and the charge generated from a single event strike may diffuse to affect adjacent nodes. Novel hardened cell designs and memory schemes are therefore, required to address these new phenomena for achieving good tolerance to the soft errors.

1.2. Emerging Technology and Previous Work

Based on the previous discussion, the low voltage operation design is a typical trend of the high- density low power memory design [2]. For instance, the SRAM is designed to operate at a low voltage

8

level, which is smaller than 1V. However, due to the threshold voltage of the access transistor considering body effect, it is much more difficult for ―1‖ to write inside the circuit. Hence, a remarkable technique has been implemented to mitigate this problem, regarded as ―Boosted Level‖ for word line voltage. This means that a drop appears the ratio between the driver transistor and access transistor in the memory implementation. Furthermore, the data in the memory cell is not easy to maintain during the ―Read‖ operation. In addition, a two-step word-voltage method (TSW) has been used in the design [19]. By contrast, the technique of driving source-line (DSL) memory cell architecture [20][21], connects the source of NMOS driver transistors to an introduced Source Line (SL). Because of the small voltage swing for its operation, the notable power has been saved. Generally, the idea of low-voltage operation is applied for the SRAM design widely and the above techniques are the typical examples to reduce the power dissipation by improving the conventional CMOS SRAM. By contrast, the appearance of the Single-

Electron transfer and non-volatile RAM techniques [6]-[15] mentioned above, represent a different concentrate for low power memory design.

Meanwhile, the continued growth of semiconductor non-volatile memories will likely rely on advances in both electronic materials and device structures. Extensive efforts have been devoted to address these two complementary issues. Resistance switching is the basic physical phenomenon in the operation of a resistive random access memory (RRAM); this phenomenon has been studied for more than 40 years

[15]. In addition to its non-volatile operation, one of the most evident advantages of a RRAM is its compatibility with CMOS processes, such that the current can be readily applied to its fabrication/. Furthermore, the scaling merit of a RRAM permits to operate at low power consumption, making it a very competitive technology for large storage at low costs. In the past decade, several novel techniques have been proposed for implementing NVSRAMs, such as ferroelectric capacitors [22], phase change [23], non-polar Resistive Switching Devices (RSDs) [24], nanocrystal

PMOS flash [25], spin-transfer-torque MTJs (STT-MTJs) [26] and the [27].

9

1.2.1. Single-Electron (SE) Turnstile

The MOSFET-based SE turnstile is a promising device that can accurately transfer Single Electrons

(SEs) at high speed even at room temperature [49][51][52][53][54]. Its equivalent circuit is shown in Fig.

7(a). The SE turnstile consists of the following elements: a source S, a drain D, a gate voltage terminal G, a voltage terminal B, and two clock voltage terminals (CLK1 and CLK2) [55]. In the model proposed in this paper, the source is connected to a supply voltage (Vdd or -Vdd). The drain is connected to an electron storage node (SN). Electrons can be injected into the SN or ejected from the SN by the SE turnstile. The turnstile consists of two (FET1 and FET2). Single electrons are transferred from the source to the drain one by one (i.e. sequentially) by turning FET1 and FET2 ON and OFF alternately [53]. The circuit symbol shown in Fig. 7(b) is commonly used to represent the SE turnstile.

Compared with [49], an additional voltage terminal B is introduced to the circuit. Generally, terminals G and B are the upper or side gates and they control the number of electrons transferred per cycle. G and B are connected to the input voltage VG and the bias voltage VB, respectively. The pulse sequences for CLK1 and CLK2 for controlling the turnstile operation are shown in Fig. 7(c). Fig. 7(d) presents the process by which SEs are transferred from the source to the SN as per steps (i)-(iv) (shown in

Fig. 7(c)) when the source (S) of the SE turnstile is connected to –Vdd.

The operation of the SE turnstile can be described by a transfer cycle made of four steps as follows.

When both FET1 and FET2 are turned OFF, a single-electron-box (SEB) is electrically formed. The potential of the SEB is controlled by electrically coupling the voltages of VG and VB [49]. When FET1 is turned ON, the electrons enter the SEB from the source [step (i)]. After FET1 is turned OFF again, the electrons are retained in the SEB [step (ii)]. The number of electrons transferred (N) is dependent on the potential difference between the SEB and the source. At a working temperature T = 0 [53], N is given by

if V  V  V G B dd (2) N  0

if  V  V  V dd G B (3)

N  [Cg (VG  VB  Vdd )/e  1/2]

10

where Cg is the capacitance between the gates and the SEB, e is the electron charge [53].

Fig. 7. (a) Equivalent circuit of the MOSFET-based SE turnstile. (b) Circuit symbol of the SE turnstile. It consists of a source terminal, a drain terminal, an input gate voltage terminal, a bias voltage terminal and two clock terminals. The drain is connected to a storage node (SN). (c) Repulsive clock voltage pulses for the turnstile operation. A transfer cycle is composed of four steps. (d) Schematic diagram to accurately inject electrons into the storage node (SN). The four steps correspond to the steps in (c). (e) Schematic diagram to accurately eject electrons from the SN.

The electron injection process is given as follows. When FET2 is turned ON, the SEB is connected to the SN [step (iii)]. The capacitance of SN is denoted by CSN and is much larger than the capacitance of the

SEB; theoretically when electrons enter the SN, the variation in voltage does not influence the behavioral model; thus, it can be neglected and for simplicity it is assumed that the voltage of the SN is always 0V.

When simulating using the HSPICE model of [55] and the model proposed in this paper, voltage variations appear when representing the stored electrons. This occurs because the SN is modeled as an ideal capacitor. Hence When VG + VB < 0, the potential of the SN is lower than the SEB, resulting in all

11

SEs flow into the SN [49]. In this case, the number of electrons (N) transferred depends exclusively on VG.

Nevertheless, when VG + VB > 0, not all electrons flow out of the SEB; in this case, N is only determined by the potential difference between the source and the SN. Eventually after FET2 is turned OFF [step (iv)], the transfer cycle is completed. To sum up, when the SE turnstile injects electrons into the SN, N is given by

if V  V  V G B dd (4) N  0

if  V  V  V  0 dd G B (5)

N  [Cg (VG  VB )/e  1/2]

if V  V  0 G B (6)

N  [Cg Vdd /e  1/2)]

Correspondingly, single electrons can also be ejected from the SN, as shown in Fig. 7(e). In this case, the source of the SE turnstile is connected to VSS and VG is positive. The number of transfer electrons (N) in the SEB is dependent on the potential difference between the SEB and SN. In steps (iii) and (iv), electrons flow out of the SEB to the source. When VG + VB < Vdd, the voltage of the SN is lower than the

SEB, and all electrons flow out of the SEB. So, N also depends on VG. However, when VG + VB > Vdd, the voltage of the SN is higher than the SEB, and not all electrons flow out of the SEB, i.e. N only depends on

Vdd. Therefore, when the SE turnstile ejects electrons from the SN, N is given by

if V  V  0 G B (7)

if 0  V  V  V G B dd (8)

if V  V  V G B dd (9)

12

Using (5)-(8), N can be directly controlled by the gate voltage; this feature permits the SE turnstile to be used in many applications, such as multi-valued memory cells and threshold logic circuits [52][53].

1.2.2. Previous SE Turnstile Model

For circuit design, it is likely that the SE turnstile will be used together with CMOS devices (with compatible fabrication [49]); hence, an electrical model of its operation is highly desirable. Although the dynamic behavior of the SE transfer process is stochastic in nature, an HSPICE model for the MOSFET- based SE turnstile has been proposed in [55]. The SEB is modeled as an ideal capacitor with capacitance

CSEB. G1 is a voltage-controlled of I1 and is controlled by the output of module P1. The number of electrons stored in the SEB is controlled by Vg and they are stored in CSEB. With the rising edge of CLK2, the current mirror G2 (controlled by FET2), begins to charge CE. When the charge on CE is larger than e, then N*e/CE>VSEB and the comparator P2 resets the charge in CE to ―0‖. Meanwhile, the output of comparator P2 transiently enables the voltage-controlled G3, such that G3 produces a sharp current pulse as output. A SE transfer event will occur. Using the current pulse for the SE transfer, G3 is transiently opened for N times until all electron flow to the drain. By considering the above device operations, the dynamic behavior of the SE turnstile has been modeled and evaluated using an

HSPICE simulation environment.

Table I. Device Parameters for HSPICE Simulation

Feature Size 45 nm 32 nm

Temperature 26 K 26 K

Vdd 2.1 V 0.9 V Power Supply -Vdd -2.1 V -0.9 V

WFET1, WFET2 45 nm 32 nm

LFET1, LFET2 45 nm 32 nm SE turnstile Vg1, Vg2 2.1 V 0.9 V

Vh 2.1 V 0.9 V

Vl 0 V 0 V

13

Feature Size 45 nm 32 nm

CSEB 0.5 aF 0.5 aF

CSN 10 aF 10 aF

C0 1 aF 1 aF

Fig. 8. Simulation of SE Turnstile using the model of [55] at 32nm.

However, the behavioral MOSFET-based HSPICE model of [55] does not correctly model the SE transfer process at nanometric scales; the model of [55] has been simulated using the parameter listed in

Table I with the equivalent circuit of [56] to simulate its operations at 32 nm feature size. As shown in Fig.

8; the current pulse at G2 appears when the FET2 is turned ON by the voltage pulse of CLK2. However, there is no electron accumulation at the Drain (D); hence the SE transfer operation fails when a 32 nm feature size is utilized for the MOSFETs because the voltage at the Drain is also influenced by the voltage pulse of CLK1, although its voltage variation is limited to 50 mV. Compared with the simulated results of

[55][56], the parameter values used in Table I have been slightly changed to account for the predictive technology model corresponding to the lower feature size of the MOSFETs [57]. The use of a predictive technology model (PTM) [58] at nanoscales allows to take into account both scalability and technology phenomena (such as process variations) that may lead to new insights in both the simulation process and the SE transfer process. Moreover, when the operating frequency is increased, this transient feature may result in an erroneous modeling of the turnstile because the SE transfer process may be affected; hence. a reliable and robust model at circuit level is required if nanometric feature sizes are utilized.

14

1.2.3. Resistive Random-Access-Memory (RRAM)

The basic scheme of a RRAM employs a normally insulating that can be made to conduct through a filament or conduction path; the filament or path is formed after applying a sufficiently high voltage. The conduction path can arise from different mechanisms, such as a defect and metal migration.

Once the filament is formed, it may be reset (broken, resulting in a high resistance) or set (formed again, resulting in a low resistance) by providing the applied voltage. Recent data suggest that multiple current paths are probably involved in this process [82]. Fig. 9 presents the integration of this device in a Back

End Of Line (EBOL) CMOS process and integrated with a Front End Of Line (FEOL) transistor for standard bulk CMOS.

Fig. 9. Schematic diagram showing the MIM structure of a OxRRAM, with Set (as formation of a conductive filament) and Reset (as dissolution of the conductive filament).

The resistive switching phenomenon has been observed in several transition metal such as

TiO2, HfO2, CuxO, NiO, ZnO and some perovskite oxides. The OxRRAM consists of a Metal-insulator-

Metal (MIM) structure with a Transition Metal Oxide (TMO) sandwiched between the Top Electrode (TE) and the Bottom Electrode (BE) contacts (usually it is cohabitant with a ―via‖ [84]). Fig. 10 shows the typical current voltage plot of the resistance switching of a RRAM for a voltage sweeping mode. For this unipolar RRAM, following an initial increase in the applied voltage, the High Resistance State (HRS) of the device changes to a Low Resistance State (LRS). This process is referred to as SET. The LRS changes to the HRS by a voltage sweep of the same polarity. A sudden current drop is observed for the RESET. In the bipolar operation (Fig. 11), the RESET and SET processes are achieved by applying a voltage in the

15

opposite direction. [83] has shown that the current flows uniformly in the HRS, while it is localized in the

LRS. Also, the voltages for the RESET and SET do not depend on the thickness of the oxide; therefore the corresponding switching effects are associated only with the homogeneous/inhomogeneous transition of the current distribution.

Fig. 10. Current-voltage plot of resistance switching for a unipolar RRAM in voltage sweeping mode.

Resistance switching is the basic physical phenomenon in the operation of a resistive random access memory (RRAM); this phenomenon has been studied for more than 40 years [15]. This negative resistance feature was first observed in the current-voltage (I-V) characteristics of five metal-oxide-metal (MOM) structures: SiOx, Al2O3, Ta2O5, ZrO2 and TiO2 [15]. [15] also reported the general occurrence of a negative resistance effect in a MOM structure as well as a switching mechanism of NiO; switching was mostly due to the formation and rupture of a metallic filament in a NiO thin film sandwiched between two electrodes. The switching speed of oxide materials and the dependence of the switching voltage on the thickness [15] were analyzed. The filament model was first proposed in 1967 [83]. [83] studied RRAM- like MOM stacking structures to support the filamentary nature of the non-volatile resistance switching effect. The electrode area dependence (in the order of micrometers) and the oxide thickness dependence

(in the order of nanometers) on the I-V characteristics were also investigated in [83].

16

Fig. 11. Current-voltage plot of resistance switching for a bipolar RRAM in voltage sweeping mode.

1.2.4. MOSFET-Based Ternary CAM

A TCAM cell serves two basic functions: bit storage and bit comparison. Fig. 12 shows the core

NOR-type and NAND-type TCAM cells [75][76]. The bit storage in both cases is an SRAM cell in which cross-coupled inverters implement the bit-storage nodes. The NMOS access transistors and bitlines those are used to read and write the storage bit, are omitted in Fig. 12 to simplify the schematic diagram. The bit comparison, that is logically equivalent to an XOR of the stored bit and the searched bit, is implemented in a somewhat different fashion in the NOR and NAND cells. Moreover, for ternary cells, the stored ―X‖ value represents the ―don‘t care‖, allowing a so-called wildcard operation. A wildcard operation means that an ―X‖ value stored in a cell results in a match regardless of the input bit. To avoid additional power supplies, ternary logic can be encoded in CMOS into two (cells) as shown in Fig. 12; the two stored nodes are not required to keep the complementary notation for consistency with the binary CAM cell.

17

Fig. 12. Ternary core cells: (a) NOR-type TCAM, (b) NAND-type TCAM, (c) ternary encoding for NOR cell, (d) ternary encoding for NAND cell [75].

1.2.5. Previous NVSRAMs

This section reviews two NVSRAM cells using RRAM for non-volatile storage. The 8T2R [78] and the 9T2R cells [80] adopt different processes and schemes to program the NVSRAM.

Fig. 13. RRAM-based 8T2R SRAM cell [78].

The 8T2R NVSRAM cell is designed using a complementary circuit (Fig. 13) [78]. Two RRAMs

(RRAM1, RRAM2) are used per SRAM cell (M1-M6). The resistive elements are connected to the data nodes of the SRAM cell to store the logical information for the 6T cell during ―Power-off. The resistive elements are part of two RRAMs and are accessed using two control transistors (M7, M8). As per the information stored at the data nodes (D, DN) of the 6T core, each RRAM is programmed either to a Low

18

Resistance State (LRS) or High Resistance State (HRS). When the power is turned on, the data is written back to the 6T SRAM core based on the states stored in the resistive elements. In [78], 22nm LETI-FDSOI technology and HfO2-based OxRRAMs are used when designing the NVSRAM.

Fig. 14. RRAM-based 9T2R SRAM cell [80].

A different 8T2R NVSRAM cell (referred to as Rnv8T) is proposed in [88] by incorporating two fast-write low-current RRAM devices; this cell adds two transistors to the original 6T SRAM core for a differential read operation. Its performance will be investigated in later sections and compared with other memory cell types.

Similarly, the 9T2R memory cell (Fig. 14) also takes advantage of two programmable RRAMs for non-volatile storage during the ―Power-down‖ state [80]. In addition to a 6T SRAM core, an equalization transistor (M9) is introduced. The source and drain of M9 are separately connected to the storage nodes D and DN. The gates of the two access transistors (M5, M6) of the SRAM core and the equalization transistor are tied together to the Restore signal. Also different from the 8T2R circuit, it utilizes only a single Bit Line (BL) and introduces an extra Source Line (SL) to program the two RRAMs. The intermediate nodes of the two 1T1R RRAM cells are connected to the original Bit Line and its complement of the SRAM core respectively. The sources of M7 and M8 are tied together to the SL, while the other ends of the 1T1R cells are connected to BL. Each 1T1R cell has its own Word Line (WLL,

WLR); therefore, the data is stored in the two RRAMs during the ―Power-down‖ state and restored back to the 6T SRAM core when the power turns on.

19

1.2.6. Previous DRAMs

In this section, the DRAM cells proposed in [102] and relevant techniques are reviewed from different perspectives, including the investigation of their designs and the simulation of the operations for these cells. The 3T1D and 4T DRAM cells of [102] are initially simulated using the parameters shown in

Table XIX; simulation is performed at a 45nm MOSFET feature size (as in [102]) and the supply voltage is set to 1V in HSPICE (as corresponding to the PTM model).

Fig. 15. 3T1D DRAM cell of [102].

The 3T1D DRAM cell [102] is shown in Fig. 15. Different from a traditional 3T DRAM, this cell requires an additional NMOS transistor (corresponding to D1), whose source and drain are both connected to the Read Word Line (RWL). This gated configuration acts as a storage device and for the cell voltage. This raises the voltage at the gated-diode to reduce the read time and achieve a better retention as well as a higher tolerance to process variations than a 3T cell at the same voltage and cell size

[104][117]. Moreover, it is also capable of restoring the charge back to the Storage Node (SN) after a read cycle [102].

Fig. 16. 4T DRAM cell of [102].

20

In addition to the 3T1D cell presented previously, [102] has also presented a 4T DRAM cell (Fig. 16); this cell does not suffer from the significant leakage encountered with the 3T1D cell. The gate-diode configuration is replaced with a NMOS pass transistor to decouple the leakage paths in the 3T1D cell.

However, this cell requires an extra signal (denoted by ―Control‖) to refresh the stored data. So the NMOS pass transistor avoids the sub-threshold leakage that may also improve the retention time of the DRAM.

The 3T1D DRAM cell [102] (Fig. 15) has the potential to overcome some of the issues encountered for a 3T DRAM cell; this cell requires an additional NMOS transistor (i.e. D1), whose source and drain are connected to the Read Word Line (RWL). This gated diode configuration acts as a storage device and amplifier for the cell voltage; it raises the voltage at the gated diode to reduce the read time and achieve a better retention as well as a higher tolerance to process variations than a 3T cell at the same voltage supply and cell size [104][117]. Moreover, it is also capable of restoring the charge back to the Storage Node (SN) after a read cycle [102].

Fig. 17. Gain 3T DRAM cell of [121].

Different from the 3T1D circuit, the so-called gain 3T DRAM cell has been proposed in [121]; it utilizes a boost technique to improve performance (Fig. 17) [121][122]. In this design, the NMOS transistors of a 3T DRAM cell are replaced with PMOS transistors; in a 3T PMOS DRAM gain cell, the drain of transistor T2 is connected to RWL to improve performance. It is well known [3][58] that the

NMOS transistor has better performance than the PMOS, such as for example with respect to the on- resistance. The voltage signals used for the cell operations are presented in the latter section.

21

1.3. Dissertation Outline

With the rapid development of the microprocessors, they become increasingly larger and more complex, while the large portion of many digital designs is dedicated to the storage of data and program instruction. Meanwhile, since more than half of the transistors in current high-performance microprocessors are contributed to cache memories, the issues of power reduction and robustness design are very important as those memories to store correct data. Typically, the SRAM and DRAM have the significant percentage of the total power and area in many digital chips. Therefore, the study concentrates on both of them and deals with the relevant low power and robust design issues.

In Chapter 2, the Single-Electron (SE) transfer technique is firstly investigated thanks to its good compatibility with CMOS fabrication process and low power dissipation of its operations. A new behavioral HSPICE model is proposed for typical SE device, SE turnstile to contribute the verification and characterization of the memory cell design. Through extensive simulations, the evaluations show the good property and performance of the proposed model. By incorporating the SE technique, a novel SRAM cell is designed with the SE turnstile and Single-Election-Tunneling Transistor (SET) to fulfill the normal

―Write‖ and ―Read‖ operations as the conventional volatile SRAM circuit. The proposed SRAM cell shows the effective power dissipation reduction compared with the conventional 6T SRAM cell. In addition, a Ternary Content-Addressable-Memory (TCAM) cell is designed with the core of the above

SRAM and simplifies the circuit structure with the novel property of SET to approach the ―Don‘t care‖ function, utilizing the single-electron transfer (SET) process as basic principle in its operation. The proposed TCAM employs a novel cell consists of both Single-Electron Transfer (SET) devices and

MOSFET transistors. This cell processes ternary data for searching and matching by utilizing SETs and the mechanism of phase-gate shifting for the match outcome. The proposed cell is evaluated in terms of delay for all three operations as well as power dissipation.

In Chapter 3, instead of using SE transfer technique, a new Oxide Resistive RAM (RRAM) is studied and designed as the multiple-level memory cell (MLC) by utilizing its property. The proposed MLC circuit is capable of multiple-value storage and its performance is characterized with the simulation using the modified HSPICE model for nanoscaled feature size. Especially, due to the implemented RRAM in the

22

memory cell, the various voltage levels are more sensitive to noise influence and the relevant topics of and crosstalk noise evaluations are also included in this study. After that, the conventional 6T

SRAM cell is improved as the 7T1R non-volatile SRAM (NVSRAM) circuit with the 1T1R component, compromising with one NMOS transistor and one RRAM. The novel ―Instant-on‖ operation scheme is used to achieve the non-volatile storage, utilizing the incorporated RRAM. Moreover, the 9T1R

NVSRAM cell is designed to further save the power dissipation with the virtual ground technique. Finally, the two hardened NVSRAM circuits are proposed to improve the tolerance to the soft errors and demonstrated by the extensive evaluations.

In Chapter 4, the investigations are concentrated on the improvement of the DRAM cells. First, two voltage DRAMs are designed with the novel design techniques to effectively enhance their tolerance to the process variability. Meanwhile, their performances are also improved compared with the previous designs and proven by the relevant evaluations. Finally, the non-volatile DRAM circuits are designed by adding the non-volatile storage component and their operation correctness have been demonstrated with the adequate simulations.

In Chapter 5, considering the performance trade-offs of the SRAM and DRAM circuits, the hybrid memory cell is proposed by incorporating both the static and dynamic components to achieve the lower leakage power design and higher density. Through the HSPICE simulation and verification at circuit level, the proposed hybrid memory cells present the correct operations and good property. Furthermore, to successfully implement the proposed hybrid memory circuits and demonstrate the applicable capability, the novel hybrid cache memory scheme at architecture level is developed with custom compatible access protocol. The verification is achieved using the SPEC benchmark simulations and investigates the performance trade-off between various cache implementations.

Eventually, the Chapter 6 summarizes the conclusions from all the above sections of investigations, with key finds of this dissertation. In addition, the further thoughts and recommendations for the future work are also included.

23

2. SINGLE-ELECTRON TRANSFER

2.1. Introduction

CMOS technology is steadily reducing its feature size; scaling at 45 and 32 nm is been used to design advanced high performance electronic systems. However, the reduction in feature size is encountering significant problems as this technology is moving fast toward the end of the roadmap, as predicted by the

Semiconductor Industry Association. Emerging technologies have been proposed to supersede the basic

CMOS device, i.e. the MOSFET. Examples of emerging technologies are carbon nano tubes, quantum-dot cellular automata, single-electron devices and molecular/magnetic electronics [43][44][45][46]. However,

CMOS has evolved over many years and a considerable financial investment has occurred in the fabrication and manufacturing infrastructure of these devices. So, technology changes must be confronted with the economic viability and compatibility of the existing and emerging platforms. It is foreseen that new technologies will be at least initially transitioning through so-called ―hybrid‖ implementations in which CMOS will be still used [47].

Emerging technologies utilize novel physical phenomena in their operation; among them, the

Coulomb Blockade (CB) can be utilized for memory devices with the potential of very high capacity and scalability. CB allows considerable electrical margins and low power consumption, because it can control the transfer of individual electrons with extremely small statistical fluctuations. Moreover, it can utilize devices of small dimension [48]. The so-called single-electron (SE) turnstile can be used to sequentially transfer electrons in the circuit node; the turnstile is formed by two voltage-controlled CBs and using two one-dimensional field effect transistors (FETs).

The single-electron transfer transistor (SET) is yet another type of device that has received considerable interest in the technical literature; it is based on measuring either the current or voltage across the transistor. This device however, incurs in a large delay because it suffers from its relatively large output resistance (typical resistance values are in excess of 100 kV) and a cable capacitance of at least 1 nF. Today‘s technology relies on utilizing -frequency SETs to reduce delay and improve sensitivity.

These fabrication techniques utilize electron beam lithography and standard two-angle evaporation of

24

aluminum with oxidation between the first and second layers for generating tunnel junctions [49]. So, it has been advocated that a SET can be used as a readout device in applications for very sensitive charge meters. For sensing, the device is formed on the same silicon-on-insulator layer for counting the number of transferred electrons and by utilizing processes compatible with those of traditional silicon integrated circuits [49]. These features can also be employed for memory designs with large storage capacity.

SET transistors and circuits are compatible in operation with CMOS and represent viable candidates for implementing hybrid designs [47][50]. One of the circuits that is fundamental for the adoption of a new technology is the memory cell. The design of a SRAM allows to assessing the performance parameters that are characteristic of an entire technology platform, such as propagation delay and power consumption.

In this chapter, the Single-Electron Turnstile HSPICE model is proposed to fulfill the application of the nanometric circuit evaluation. The relevant operational verification and performance characterization have been demonstrated by the extensive evaluations using the HSPICE simulation tool. By incorporating the SE turnstile and SET transistor, a hybrid memory cell is proposed company with the relevant demonstration. After that, a Ternary Content-Addressable-Memory Cell is proposed utilizing the phase- shift proper of the dual-gate SET with the above proposed hybrid memory circuit. The proposed TCAM has been proven with its significant power and area savings.

2.2. Proposed Single-Electron Turnstile Model

Fig. 18 shows the proposed HSPICE model of a SE turnstile. The goal of this circuit model is to reduce the transient effects from the input signals (source) to the output node (drain) as occurring in [55].

Its robust nature refers to the ability to properly model the operational behavior of the turnstile at circuit level; moreover as evidenced in later sections, stability is also achieved at nanometric feature size. The

SEB and the SN are modeled as ideal capacitors with capacitance CSEB and CSN, respectively. An electron transfer event is accomplished as follows: when the two transistors Tn1 and Tn2 are turned ON, two separate reference voltage nodes (Vg1 and Vg2) are utilized. There is no direct connection between the transistors, when both the input signal and CLK1 are high, the output will not experience a voltage change; therefore, the two transistors can separately control the two parts of this model.

25

Fig. 18. Proposed HSPICE Model of a SE Turnstile.

Consider initially the left part of the proposed model; when CLK1 and the input signal are both ―1‖, the path of the current mirror F1 is open by controlling (turning ON) the transistor Tn1. Meanwhile, an electron is transferred from the source node (S) into the SEB. Then, CSEB is charged and its voltage VSEB is fed back. A voltage-controlled current source G1 is then used to compare the voltage difference between

VSEB and Vg1. When VSEB>N*e/CSEB, the voltage-controlled current source G1 is turned OFF. Therefore, the electron that has been transferred into the SEB, is stored, i.e. it is not returned to the input source node.

This is a very stable and precise process. The value of the voltage of the SEB will not decrease back to its initial value, so the equivalent charge stored in the SEB is exactly N*e. This structure can eliminate the effects of CLK1 when it changes back to ―0‖, causing a negative current pulse in the controlling path [55].

This ensures that the robust nature of this operation in the proposed circuit-level model. Hence, as the number of single-electrons stored in the SEB (N) is controlled by Vg1, then using CLK1, N electrons are reliably stored in CSEB. As for the right part of Fig. 18, a circuit similar to the left part is used in the model.

The right part operates as follows. With the rise edge of CLK2, the current mirror F2 generates a current pulse from the drain to the SEB; this indicates that an electron is transferred from the SEB to the drain.

Similarly, a voltage-controlled current source G2 is used for the voltage of the drain node, i.e. the voltage of the storage node (SN). All single-electrons are transferred into the SN using a series of pulses of CLK2.

26

The proposed model has still behavioral features; the microscopic process in steps (i) and (ii) are rather difficult to model in HSPICE. Physical modeling of the SE turnstile requires a detailed investigation of the electrical generation of the Coulomb blockade as well as the electron dynamics. This paper mostly deals with confirming the macroscopic circuit behavior; moreover from a circuit prospective, steps (i) and

(ii) are not of the uttermost significance as only the transfer of the electrons in the SN is relevant at application level. Electrons are transferred into the SN when FET2 is turned ON (as corresponding to step

(iii) of Fig. 7(c)), and this process is fully simulated by HSPICE, thus making the behavior of the SE to be accurately modeled by the proposed circuit.

The proposed HSPICE model for the SE turnstile has been simulated and demonstrated to successfully address most of the concerns found in the previous model [55]; as shown in later sections, the evaluation of the proposed model at 45 and 32 nm, shows its robust functionality, also in the presence of variation in parameters. The significant difference in operational modes between these two models allows capturing the dynamic behavior of the SE turnstile at nanometric scales. In addition, the proposed HSPICE model has been successfully applied to a hybrid memory cell whose design has been assessed under variation of tunnel resistance and capacitance, so further showing its robustness [59] [60].

2.3. Demonstration and Analysis of the Proposed Model

The proposed SE turnstile model has been simulated at the feature sizes of 45 and 32 nm; Table I shows the values of the device parameters as applicable to the simulation of the proposed model. Vh denotes the bias voltage for CLK1 and CLK2 in Fig. 18 to control the MOSFET switching operations. The temperature is set to 26 degrees Kelvin using experimental data from the fabrication process of the SE turnstile in [49]; hence, in this paper this value is used. Moreover for simulation purposes, additional capacitors (C1, C2 and C3) are introduced to connect few nodes to ground; the value of 1 aF is selected for all these capacitors based on the features of the fabricated device of [49] and mitigating their influence on the simulated operation in the proposed model. Hereafter in the evaluation, the operational frequency of the turnstile has been set to 50 MHz; this value has been selected by considering the transfer rate for the correct operation of the SE turnstile [54].

27

Fig. 19. Simulation Timing Diagram of SE Turnstile at 45nm.

Fig. 20. Simulation Timing Diagram of SE Turnstile at 32nm.

Fig. 19 and Fig. 20 show the simulation results for the SE turnstile at 45 and 32 nm. Each of these figures shows the two clocks (CLK1 and CLK2) as well as the input (at the source) and the drain (SN). In both cases, the electrons transfer into the SN occurs sequentially, as shown by the decrease of the voltage level of the SN. Differently from the model of [55], these figures show that the proposed approach correctly models the electron transfer operation and the voltage change as result of the presence of electrons in the SN.

2.3.1. Simulation with Experimental Data

A multilevel memory using a SE turnstile has been experimentally demonstrated in [49]; the turnstile of [49] uses two one-dimensional field effect transistors (FETs) to sequentially transfer electrons into the

28

Memory Node (MN). The MN is equivalent to the Storage Node (SN) as outlined in previous discussion.

The fabricated device of [49] is patterned on a 30 nm-thick silicon-on-insulator layer, followed by a pattern-dependent-oxidation (PADOX) [49]. As per [49], the gate length and width of the MOSFETs in the fabricated turnstile are given by 30 and 80 nm, respectively. Similarly to the proposed HSPICE model, a small single-electron box (SEB) was electrically formed in the channel between the two FETs of the fabricated device at a temperature of 26 K. A value of 4 aF was used for the SN capacitance in [49] at input bias and supply voltages of 1 V for both.

Table II. Device Parameters for Comparison

Experimental Feature Simulation Model Data

Temperature 26 K 26 K

Power Supply Vdd 1 V 0.9 V

-Vdd 1 V -0.9 V

SE turnstile WFET1, WFET2 80 nm 32 nm

LFET1, LFET2 30 nm 32 nm Vg1, Vg2 1 V 0.9 V

Vh 1 V 0.9 V

Vl 0 V 0 V

CSEB 0.5 aF 0.5 aF

CSN 4 aF 10 aF

As part of the evaluation of the proposed model, the experimental parameters of the fabricated

MOSFET-based SE turnstile (as reported in [49]) have been utilized as data. These parameters are used to compare the experimental and simulated turnstiles and are given in Table II. Fig. 21 shows the experimental and the simulated results using the values of [49]; it shows that the proposed model can capture the SE transfer process quite accurately in terms of timing performance as well as the SN voltage.

The difference in SN voltage levels between experimental and simulated results reflects a constant DC

29

voltage that does not affect the correctness of the proposed model. This is mostly related to the difference in temperature and SN capacitance, i.e. the absolute value of the SN voltage increases by reducing the SN capacitance (in the proposed model, an ideal capacitor is used for simulating its characteristics). However, the reduced MOSFET feature size (affecting mostly the gate width) and the lower supply voltage are also contributing factors. The supply voltage affects the output voltage swing, while the lower feature size of the MOSFET affects the rate of the charging/discharging process. Also the proposed HSPICE model utilizes the switching function of the MOSFET to model the electron transfer event, so it is influenced by the difference in values.

Fig. 21. Simulation results using the experimental parameters of [49].

2.3.2. Functional Operation

To evaluate the proposed model, the modes in the proposed circuit have been analyzed with respect to the functional operation of the SE turnstile. This evaluation is shown in Fig. 22 over two cycles for those pulses required for its operational modes, such as for the applied gate voltage control, the input for the amount of charge in the SEB and the transient current through G1 for a SE transfer.

30

Fig. 22. Simulation results of the operational modes of the proposed model at 32 nm. (a)Applied gate voltage control pulses. (b) Input pulse to start a SE transfer cycle. (c) Amount of charge in SEB as function of cycle time. (d) Transient current through G1 for a SE transfer.

Fig. 22(a) shows the gate voltage pulses for the turnstile at an operational frequency of 50 MHz; together with these two control pulses, an input pulse is then utilized to start the electron transfer (Fig.

22(b)). Fig. 22(c) shows the amount of charge in the SEB as function of time; this is consistent with the transfer event cycles. After a transfer event, the voltage of the SEB rises back to 0, as corresponding to the number of electrons. Therefore as expected, simulation proves that there is no electron stored in the SEB after the electron transfer. Using the same scale for the x-axis (time), Fig. 22(d) shows that when the control pulse on CLK1 is high, the voltage-controlled current source G1 is open and a transient current flows in the turnstile, leading in most cases to an electron to be transferred into the SEB. Similarly, this electron will transfer into the SN through the current source G2 when next, the voltage level of the control pulse on CLK2 becomes high.

The operational modes of the proposed HSPICE model have been implemented by simulation; based on a 50 MHz operational frequency for the SE turnstile, the inputs pulse is imposed on the source node with a longer time interval. So for example, in the first cycle for the operation of the turnstile in Fig. 22(d), the current does not change without a variation in the value of the input signal, i.e. no electron is

31

transferred through G1. As no electron is transferred, then the SEB is still empty (Fig. 22(c)). Differently from the first cycle, when the input signal is high, the control pulse on CLK1 opens the current source G1; hence, an electron is transferred into the SEB during the second cycle. So, changes in IG1 and the number of electrons stored in the SEB (Fig. 22(c) and Fig. 22(d)) correctly simulate the SE transfer process. This shows that the proposed HSPICE model works correctly by utilizing the control pulses on CLK1 and

CLK2 and the operation of the turnstile is modeled in the proposed model by a robust circuit-level simulation and assessment.

2.3.3. Comparison with [55]

A comparison is pursued with respect to [55] based on the above discussion of the operational modes of the proposed model. The simulated operation of [55] is shown in Fig. 23. Based on the simulation results the following two features are considered as part of the comparative discussion presented in this section.

Fig. 23. Simulation results for the model of [55] at 32 nm. (a) Applied gate voltage control pulses. (b) Input pulse to start a SE transfer process. (c) Amount of charge in SEB as function of time. (d) Transient current through G2 to represent a SE transfer.

32

An HSPICE model of the SE turnstile has been proposed in [55] with the main purpose of evaluating single-electron multiple valued memories (SEMVs) [11]. The HSPICE model proposed in this paper is shown in Fig. 18. In the proposed model, the SEB and SN are both modeled as ideal capacitors with capacitance CSEB and CSN, respectively. Compared to the previous HSPICE model [55], the proposed model uses a symmetric design (made of two separate and almost identical parts) in the circuit to robustly simulate the operation of the SE turnstile. When using HSPICE, the control voltage pulses imposed on

CLK1 and CLK2 require a current pulse as encountered in the model of [55]; this transient phenomenon must be carefully handled at circuit level because it may erroneously affect the electron transfer process, thus leading to an erroneous operation. The proposed model operates on a nearly disjoint operation of its two parts to mitigate interactions, i.e. each part operates separately and independently in a voltage-based mode, so there is no negative effect or coupling between each other (such as the current pulses of [56]), thus confirming the robust feature of the proposed model. The output signal operates differently in [55] compared to the proposed model. [55] utilizes a current pulse to represent the SE transfer. When the charge on Ce is larger than e (i.e. Ne/Ce>VSEB), the comparator P2 sets the charge on Ce to ―0‖. On a transient basis, the output of the comparator P2 enables the voltage-controlled current source G2, such that

G2 outputs a sharp current pulse. These operations are shown in Fig. 23.

Differently from the current pulse for representing the electron transfer process, the output of the proposed model is given by the voltage in the SN (whose level corresponds to the number of transferred electrons). As evidenced by simulation, the proposed model operates correctly in terms of logic and timing.

This is in agreement with the expected output (as shown in Fig. 22(c) for the discharge event of a SEB).

The model of [55] utilizes a current pulse to represent the transfer event, but current pulses occur in the circuit [56]; this phenomenon leads to a transient in the path and a pulse signal to appear in the output node. Simulation results of the proposed SE turnstile model in Fig. 19 and Fig. 20, show more stable output signals (in which the negative voltage levels of the SN corresponds to the number of electrons that have been moved sequentially through the turnstile). Again, the robust feature of the proposed model is in evidence.

33

2.4. Proposed Hybrid Memory Cell by Single-Electron Transfer

The objective of this paper is to present a novel design of a SRAM cell; this design utilizes a

―hybrid‖ implementation in which SET-based devices are utilized together with CMOS circuitry to operate a SRAM cell. The proposed memory cell consists of a SE turnstile and a SET/MOS circuit, to respectively transfer and sense (i.e. counting electrons in the storage node); extensive simulation results using HSPICE compatible models [61][62][63] are reported at the nano scale feature sizes of 45 and 32 nm. The proposed memory cell shows good performance metrics such as propagation delay and power consumption.

2.4.1. Circuit Elements and Previous Design

SE transfer requires the utilization of specific devices, such as pumps and turnstiles. SE pumps and turnstiles have been proposed and experimentally demonstrated by using multiple metal islands separated by metal-oxide tunnel junctions. The MOSFET-based SE turnstile is a promising device that can accurately transfer SEs at high speed even at room temperature [51][52][53]. It consists of a source terminal, a drain terminal, an input gate voltage terminal, a bias voltage terminal and two clock terminals

[62][63]. An accurate transfer with an error rate of 10-8 has been achieved by using a seven-tunnel junction pump, however the operating frequency of a SE turnstile is still limited to the order of MHz due to the resistance of the tunnel junctions. [6] has reported an operating frequency of 166MHz. Although the operating frequency is still limited, modulated tunnel barriers have been proposed to improve performance as well as fabrication. Unless explicitly specified, an operating frequency of 166MHz is assumed hereafter in this study.

Single-electron multiple-valued memories (SEMVs) have been proposed [11] for applications in which a novel simulated annealing algorithm (SAA) is utilized to design a single-electron circuit (using a

MOSFET-based single-electron turnstile as basic element). The SAA circuit is made of a voltage- controlled single-electron random number generator and the SEMVs with the design objectives of reducing power dissipation and interconnect delay. SAA finds solutions to exponential time complexity

(NP-hard) problems, such as the Travel-Salesman-Problem (TSP). SAA is often realized in software, thus

34

limiting a real-time application. Hence, its hardware realization is complex due to the large size of the solution space for the variation operations and the random number range. A transfer speed as high as

100MHz is reported [11]. In the circuit of the SEMVs, a MOSFET-based electrometer has been used for the read operation. When the transistors are turned on, the difference in current represents the number of electrons stored in the storage node (SN). Therefore, the advantage of using the electrometer is that only one MOSFET is used per memory cell at a relatively low power dissipation and small fabrication area in the circuit layout. This circuit element will be assessed in more detail by simulation in a later section.

In this study, a SET/MOS hybrid circuit is used to sense (measure) the presence of a SE in the memory cell. This is required during the read operation of the memory cell; the diagram of the SET/MOS hybrid circuit is shown in Fig. 24. Its input node is connected to the SN and its output is controlled by the

Switch-FET. The circuit proposed in this paper consists of a dual gate (input gate and phase-control gate)

SE transistor (SET), a PMOS transistor (as a constant current source) and a NMOS transistor (as a cascade device). The circuit has a control voltage terminal Vctrl, a bias voltage Vgg, a phase-control voltage Vpg and a power supply Vdd.

Fig. 24. Schematic diagram of the SET/MOS hybrid circuit.

The SET transistor is a single-electron tunneling device whose operation follows the so-called orthodox theory of single-electronics [6]. Due to the high charging energy resulting from the small total

35

capacitance around the island, the spontaneous junction tunneling is prohibited and the number of electrons in the island becomes discrete under the control of the gate voltage. Therefore, the drain current changes periodically with respect to the gate voltage, showing valleys and peaks respectively at integer and half-integer numbers of electrons in the island. A SET transistor [64][65] exploits the discrete nature of the tunnel current; in the impulse model, a SET junction is modeled by an ideal capacitor in parallel with a voltage controlled-current source. The capacitors Cg and Cpg are used for electrical coupling.

As the SET and the NMOS transistor are connected in series in Fig. 24, this unit is referred to as the serial delay unit. In this paper, the tunneling resistances (Rd, RS) and (Cd, CS) on the drain and source sides of the SET are equal [53]. The structure of the serial delay unit was initially proposed in

[53][66]. The experimental results show that this circuit can perform as a universal gate; in the SRAM circuit for this application, the parameters of the device are specified differently to perform as an inverter.

The PMOS transistor in the serial delay unit usually operates in its saturation region as a constant current

(CC) source. The output current is controlled by its gate voltage Vctrl. The NMOS transistor with a fixed gate voltage Vgg is used to keep the SET drain voltage approximately constant and to generate a large output.

This circuit is utilized for designing a memory cell. The SET transistor has two gates: an input gate and a phase control gate. The input gate receives the output voltage signal from the SN in which the electrons are stored. The received signal induces a Coulomb oscillation of the drain-source current of the

SET transistor. The phase control gate (Vpg) is used to adjust the inverting characteristic, but it is generally connected to ground. The operating principles of the serial delay unit are specified as follows. The output current Io of the PMOS transistor is set at mid-point between the peak and bottom values of the Coulomb oscillating drain-source current IdsSET of the SET. When the number of single-electrons in SN is small, the output voltage signal from SN is low, so the output voltage of the serial delay unit is high, because the inverting SET operates in the nearly cut-off region and the drain-source current of the SET is also low.

When the output voltage signal from the SN gradually increases and IdsSET increases to a value higher than

Io, then the output voltage sharply to a low value.

36

Therefore, the proposed hybrid circuit has three advantages over the SEMVs of [11]: (1) combined with the MOS transistors, the hybrid circuit, (especially the SET transistor) can detect the amount of charge in the SE level with a load capability (2) due to the small gate capacitance of the SET (that electrically couples the SET to the SN), the total amount of transferred charge during operation is significantly smaller than for conventional CMOS devices; (3) by controlling the voltage (Vctrl) of the

PMOS, the hybrid circuit can supply a relatively stable output signal to the memory during a read operation. i.e. to the Read Bit Line (RBL).

Table III. Device Parameters for SRAM HSPICE Simulation

Feature Size 45 nm 32 nm

Temperature 300 K 300 K

Power Supply Vdd 2.1 V 0.9 V

-Vdd -2.1 V -0.9 V

SE turnstile WFET1, WFET2 45 nm 32 nm

LFET1, LFET2 45 nm 32 nm Vg1, Vg2 2.1 V 0.9 V

Vh 2.1 V 0.9 V

Vl -2.1 V -0.9 V

CSEB 0.5 aF 0.5 aF

CSN 10 aF 10 aF

SET Cs, Cd 0.2 aF 0.2 aF CgSET 0.4 aF 0.4 aF

CpgSET 0.1 aF 0.1 aF

Rs, Rd 100 KΩ 100 KΩ

Vpg 0 V 0 V

Hybrid Circuit Vctrl -2.1 V -0.9 V

Vgg 2.1 V 0.9 V

37

2.4.2. Proposed SRAM Cell

The circuit diagram of the proposed SRAM cell is shown in Fig. 25; it consists of a SE turnstile, a

SET/MOS hybrid circuit (as a charge-voltage converter), a storage node (SN), a reset MOSFET. The Read

Word Line (RWL) and Write Word Line (WWL) are used for the two basic operations of the cell. The Set

Line (SETL) is used to reset the stored charge (number of SEs) in the SN. The Write Bit Line (WBL) is the input node and the Read Bit Line (RBL) is the output node of the cell. A single electron is initially assumed in the transfer process. The following operational cycles are analyzed with respect to the simulation result of the proposed memory cell using the parameters in Table III at 45 nm feature size (as in

Fig. 26).

Fig. 25. Proposed SRAM using a MOSFET-based SE turnstile.

 When RWL is ―0‖, independently of the value of RBL, no read operation is performed in the SRAM.

The cell is said to be in a standby state.

 The ―reset‖ operation is achieved by turning on the reset-FET (Tn5) and Tn2. For example, when the

Write Word Line (WWL) and the Reset Line (SETL) are ―1‖ simultaneously, Tn2 and Tn5 are both

turned on at the same time. All electrons stored in the single-electron box (SEB) and SN flow to

ground, i.e. the memory cell is reset to ―0‖.

38

 The ―write‖ operation occurs in Cycle 2; initially, the Write Bit Line (WBL) is ―0‖ when the Read

Word Line (RWL) is ―1‖ (to turn the transistor Tn1 on). However, as the information to be written is

―0‖, then there is no electron to be transferred into the single-electron box (SEB). Subsequently in

Cycle 2, WWL is ―1‖ and RWL is ―0‖; therefore, Tn1 is turned off and Tn2 is turned on. So, a ―0‖ is

written into the SN, i.e. there is no voltage change in the RBL.

 The Read ―0‖ operation occurs at Cycle 3. Following Cycle 2, the information (i.e. ―0‖) is in the SN;

the charge-voltage converter (consisting of Tn3, Tp1 and SET1) converts the stored electron into the

corresponding voltage. As the SN has a ―0‖, there is no electron in SN. In Cycle 3, RWL is ―1‖ so

transistor Tn4 is turned on. The ―0‖ will be then read through the Read Bit Line (RBL).

 The Write ―1‖ operation is given follows. In Cycle 1, RWL and WBL are ―1‖ simultaneously,

(turning on the transistor Tn1). In this cycle, as WBL is ―1‖, a single electron is transferred into the

SEB. Following this event, WWL is ―1‖, the Tn1 is turned off in this cycle. Moreover, Tn2 is turned

on when RWL is ―0‖. In this process, the SE is transferred into the SN, i.e. the write ―1‖ operation

takes place.

 The process to read a ―1‖ is as follows. As a ―1‖ is stored in the memory, then there is a single

electron stored in the SN. The converter of the SET/MOS hybrid circuit will sense the stored SE in the

SN by converting it into a high voltage level for Tn4 to be on. In Cycle 2, RWL is ―1‖ and Tn4 is

turned on. So, the ―1‖ is read and a voltage change occurs as observed through the RBL.

Fig. 26. HSPICE timing diagram of proposed SRAM cell at 45nm node.

39

The proposed SRAM cell has been simulated using HSPICE by combining the models of its different elements (such as the SET/MOS hybrid circuit and the MOSFET-based SE turnstile) [61][62] as described in a previous section.

For simulating single-electron devices, few tools are available as reported in the technical literature.

Notable among these are SIMON [67] and MOSES [68]. However, these simulation tools cannot include many electronic components such as and transistors. So the SET circuit elements have been simulated by HSPICE models. These models are either phenomenological in nature, or simplification of the orthodox theory of single-electron tunneling, or specifically tailored extensions of HSPICE. The compact SPICE model to describe the behavior of the SET based on Lientschnig‘s SET [61] has been used in this paper; its accuracy has been verified by both the Monte Carlo simulator SIMON [67] and experimental results [66].

For simulating the operation of the SE turnstile, [62] has presented a novel HSPICE circuit model applicable at nanometric feature sizes. This SE model is used in this paper and consists of two nearly similar parts whose operation is independent of each other; this feature permits to accurately model the sequential transfer of electrons through the turnstile as a charge of voltage in the storage node. It therefore avoids the transient (current-based) nature of a previous model. The model has been simulated and results have shown that it can correctly operate at 32 and 45 nm with high stability in its operation [62]. With the circuit architecture and functions discussed previously, the SE turnstile model of [62] has been slightly modified to change the voltage levels to negative values. This is required because they correspond to the charges of the SEs and the SET/MOS hybrid circuit operates as an inverter. It can read the SE(s) stored in the SN and its output voltage levels correspond to the number of electrons stored in the SN.

Table III lists the device-level simulation parameters and the values used in the HSPICE simulation of the proposed memory cell. MOS transistors with 45 and 32 nm feature sizes are used. Similar to Fig. 26, the proposed SRAM cell is also demonstrated 32 nm. As discussed previously for the function of the

SET/MOS hybrid circuit, simulation has shown that the voltage swings can be controlled accurately by

Vgg, while Vctrl is utilized to modify the output current of RBL.

40

The simulation plots show that the voltage can be gradually increased when RWL and WBL are both

―1‖. Then with WWL set to ―1‖, the single-electron stored in the SEB is transferred into the SN. This process means that the ―1‖ is written in the SN. Each pulse at the input source node can transfer a SE when

RWL and WWL are path on. So the increased voltage levels only appear at this time. Finally for both simulated cases and by taking into account the transfer error rate of the SE turnstile [54], the frequency of operation for the SRAM cell is given by 166MHz [54].

2.5. Evaluation and Analysis of Proposed SRAM Cell

In this section, a detailed evaluation of different circuit elements (and related performance features) of the proposed memory cell is pursued using simulation by HSPICE.

Table IV. Device Parameters for SRAM Simulation of Read Operation Comparison

Feature Size 32 nm

Temperature 300 K

Power Supply Vdd 0.9 V

-Vdd -0.9 V

SET Cs, Cd 0.2 aF CgSET 0.4 aF

CpgSET 0.1 aF

Rs, Rd 100 KΩ Vpg 0 V

Hybrid Circuit Vctrl -0.9 V

Vgg 0.9 V

MOSFET-based WFET 32 nm Electrometer LFET 32 nm

2.5.1. SET/MOS Hybrid Circuit Evaluation

41

Fig. 27. Schematic diagram of the single-electron multiple-valued memory (SEMV) circuit [11].

A detailed comparison of two possible implementations of a circuit element and related mechanisms

(the SET/MOS hybrid circuit and the MOSFET-based electrometer [11]) for the read operation is presented. For simulation, a negative voltage pulse is imposed on the Storage Node (SN) to represent the stored (transferred) single electron at 32 nm feature size and 166 MHz operating frequency. Table IV shows all other simulation parameters for comparing the read mechanism.

Fig. 28. Simulation of MOSFET-based electrometer at 32nm node.

Consider first the MOSFET-based electrometer [11] of Fig. 27; the simulation results for the read operation are shown in Fig. 28. When a voltage pulse is provided at the input node (the SN), the current through the MOSFET shows a ringing behavior in the transitions (in a scale of nA) of the output current.

This can be explained as follows. There are three operational modes for an n-channel MOSFET. When

VGS

42

between drain and source. As per the basic threshold model, the transistor is working in the cutoff mode, so the current between drain and source should ideally be zero (as the transistor is a turned-off switch).

However, there is a weak-inversion current (sometimes also referred to as the subthreshold leakage) due to the presence of a Boltzmann distribution of electron energies. This phenomenon leads to some of the more energetic electrons at the source to enter the channel and flow to the drain, resulting in a subthreshold current. This current has an exponential function of the gate–source voltage. In the weak inversion mode, the current varies exponentially with the gate-to-source bias VGS [69]. Therefore, although VGS

At the beginning of the up-transition, the current changes severely and many pulses appear prior to reaching a stable level. Furthermore, this ringing phenomenon occurs also during the second (downward) transition; these current variations have small values (in the nA range), but they can be still detected. The read operation is effectively achieved by turning on and sensing the current of the MOSFET-based electrometer in the SEMV of Fig. 27, so the output current for the read operation is affected (Fig. 28). The electrometer is capacitively coupled to the SN, and its current changes discretely with the number of electrons in the storage node [70].

Fig. 29. Simulation of SET/MOS hybrid circuit at 32nm node.

43

Fig. 29 shows the simulation results for the read operation in the proposed SET/MOS hybrid circuit.

As shown previously by simulation, the SN has been directly provided with an input voltage signal; furthermore, the Switch-FET has been kept on to mitigate its effect on the output signal. As shown in Fig.

29, the output is stable in signal variation once a voltage change occurs at the input. This is significantly better than for the MOSFET-based electrometer, thus detecting the state of the SN and the presence of an electron on a voltage-basis. The stable high value of the output voltage signal in Fig. 29 shows that the hybrid circuit can monitor the stored (transferred) electrons, i.e. for the read operation, the circuit can correctly sense the negative voltage signal and change it into a positive value (as final output).

These two circuits (Fig. 24 and Fig. 27) have totally different mechanisms for the same output sensing function. In addition to the ringing behavior, the MOSFET-based electrometer of [11] has an output current variation that is relatively small to monitor. The proposed hybrid circuit has a rather sharp voltage signal that is compatible with digital circuit operation. Moreover, memory operations will benefit from a voltage level signal, thus avoiding the severe change in output current experienced by [11] and the likely damage due to stress in the sensing circuit. Therefore, simulation has confirmed that for a SE based operation, the proposed SET/MOS hybrid circuit offers substantial advantages compared with [11] and its use is viable for the memory operations of the proposed cell.

The proposed SET/MOS hybrid circuit is used to sense and measure the number of SEs stored in the memory cell. The circuit performs as a universal gate in which the PMOS and NMOS transistors have specific functions. The PMOS transistor is used as a constant current source by operating in the saturation region; the NMOS transistor generates a large output by applying an appropriate voltage value Vgg. The

SET transistor is an important element of the proposed SET/MOS hybrid circuit; it is used to sense the electrons stored in the SN and convert the output signal to a voltage level (as corresponding to the number of stored electrons). Similar to the SE turnstile [62], the SET is formed by the tunneling junctions originating from the phenomenon of Coulomb Blockade (CB). This is advantageous for designing memory cells with low power consumption and large electrical margins, because it is possible to control the transfer of individual electrons at extremely small statistical fluctuations (even with small device

44

dimensions) [49]. So, the internal parameters of the SET are very important for performance and robustness of the proposed memory cell.

2.5.2. Circuit Performance

Performance evaluation and related metrics will be presented and discussed initially at circuit level.

Table V shows the figures of merit for the proposed SRAM cell as simulated using HSPICE at the two different feature sizes of 45 and 32 nm. The number of electrons transferred by the SE turnstile (N) depends on the value of Cg, i.e. the capacitance between the gates and the SEB. In this paper, it is assumed that Cg/C0=N, where C0=1 aF as a unit capacitance and x denotes an integer that is a device parameter in the simulations. So, the voltage level can be controlled by utilizing different values of the capacitance Cg.

In this case, the capacitance of the SE turnstile model has been set to C0, i.e. the number of transferred electrons is one per cycle.

Table V. HSPICE Simulation Results for SRAM Cell

Feature Size 45 nm 32 nm

Temperature (K) 300 K 300 K

Average Write Delay (ns) 9.04 ns 8.73 ns

Average Read Delay (ns) 3.06 ns 2.96 ns

Vdd (V) 2.1 V 0.9 V

Average Power Dissipation 1.38E-6 W 1.27E-6 W (W)

Leakage Current (A) 1.27E-6 A 5.25E-7 A

Consider the delay for the two basic memory operations, as reported in Table V; both delays are reduced as the feature size decreases for the MOSFETs. However, this is related to the operating frequency of the proposed SRAM cell by two limiting factors. The first limiting factor is given by the transfer error rate of the SE turnstile. The ―write‖ operation is accomplished through the SE turnstile, so its operating frequency has a significant effect on the write delay. The transfer accuracy deteriorates if the falling time tfall of the clock signal is too small [54]; hence, the write delay of the proposed SRAM cell is

45

higher than the read delay. Furthermore, the mechanism of the ―write‖ operation in the proposed memory cell is different from a conventional (CMOS-based) SRAM. In the proposed cell, once an operational cycle is completed, a pulse must be generated to turn on Tn5 (SET-FET) to reset the SN back to ―0‖. So, there is no significant and direct improvement in delay for the write operation of ―0‖; the write delay is mostly affected by the SE transfer speed, not the characteristics of the MOSFET (as occurring in a conventional memory circuit).

The relative slow speed of a SE-based memory design avoids a transfer error in charge-state circuits.

For a MOSFET-based turnstile, it can be categorized into thermal and dynamic errors; at a lower operating speed, thermal errors dominate and dynamic errors can be neglected [70]. This failure mode is inherent to the SE transfer process and can be only mitigated using a suitable frequency. As discussed in [70], the operating frequency is usually set to 166MHz.

The second limiting factor is the delay incurred in the SET/MOS hybrid circuit (as charge-voltage converter) for the read operation. As the SET can sense the number of SEs stored in the SN, the circuit converts them into a voltage level; a portion of the read delay is related to the high reactive response of the

SET. The simulation result shows that the read delay td is around 3 ns. In the worst case, the delay of the

SET/MOS hybrid circuit is given by td=CtotVo/Io, where td is the delay time, Ctot is the total load and inherent capacitance and Io is the bias current. As the proposed SRAM require two repulsive clock signals, the maximum operating frequency is fmax=1/2 td, which is approximately 150 MHz using the parameters of

Table III.

The power dissipation of the proposed SRAM cell is very low, because most of its operations are achieved by transferring SEs. In theory, the power dissipation is given by W=NtoteVSSf, where, Ntot is the total number of electrons transferred by the SE turnstile in one operating cycle, e is the electron charge and f is the operating frequency. Simulation results show that the average power dissipation (that may include leakage and repulsive clock components) is around 1300 nW, slightly decreasing with feature size. This is expected because the SE turnstile has very small power consumption and most of its power is used to transfer each single electron. Also, the total power dissipation of the proposed SRAM is mostly due to the

SET/MOS hybrid circuit; in addition to a SET transistor, there are only three transistors. Tn3 and Tp1

46

stabilize the output current and modify the output voltage swing. The results of Table V include power dissipation due to leakage as well as the power dissipated by the electronic components of the HSPICE model of the SE turnstile [62] too, hence the larger than expected values. Also as the proposed SRAM cell requires two repulsive clock signals, it is anticipated that extra power will be dissipated by them as determined by the interconnects capacitances and frequency [11]. However, its value is smaller than the power dissipation of the SET/MOS hybrid circuit, but larger than the SE turnstile. The reduction in power dissipation is further accomplished by scaling the feature size from 45 to 32 nm, showing only a modest improvement in this figure of merit.

The results show that the leakage currents are in the range of 0.5 μA to 1.5 μA. These results follow from the utilization of the SE transfer mechanism in the proposed cell. The SE transfer process effectively replaces the conventional write mechanism, thus it may also lead to a relatively large decrease of the active leakage component. Hence, the leakage current of the proposed SRAM occurs mostly in the standby mode of the SET/MOS hybrid circuit. Due to the biasing voltages (Vdd, Vgg and Vctrl), there exists a transistor path in the circuit from the power supply to ground, even when there is no input signal. Along this path, Tn3 and Tp1 are turned on during each operation. The use of the PMOS Tp1 is to generate a stable current to the output node. So when the SRAM is in standby, this leakage current cannot be mitigated.

2.5.3. Area

The proposed memory consists of the SE turnstile and the SET/MOS hybrid circuit as experimentally demonstrated and fabricated in [49]. [49] has shown that the turnstile and the coupling SET (so exclusive of the three MOSFETs of the hybrid circuit and the Reset-FET Tn5) can be patterned on a 30 nm thick silicon-on-insulator layer using pattern-dependent-oxidation (PADOX) [49]. By considering requirements such as gate length, spacing between Tn1 and Tn2 (as shown in Fig. 25) and the transistors in the hybrid and reset circuits, the total area of the proposed memory cell is approximately 850 λ2. A conventional

CMOS (6T) SRAM has an area of 1092 λ2 [3]; so, the proposed memory cell requires less area, hence accomplishing a higher density in integration.

47

2.5.4. Static Noise Margin

The so-called Static Noise Margin (SNM) must be considered for assessing the stability and robustness of a memory cell. The SNM is defined based on the input to output voltage characteristics

(VTC) [72] to characterize the robustness to noise. As the proposed cell utilizes the basic mechanism of electron transfer using single-electron devices, each memory operation must be treated separately.

Fig. 30. Static inverting transfer characteristics of the SE turnstile circuit at 32nm.

Consider first the ―write‖ operation using the SE turnstile, whose operation is dependent on the control gates for RWL and WWL. Different from a conventional 6T SRAM, the proposed memory transfers electrons following the application of sequential pulses on RWL and WWL. The electrons stored in SEB and SN are represented by voltage levels, not a continuous voltage variation of a charging/discharging process. The simulation results of the SE turnstile with both RWL and WWL at a high voltage level are presented in Fig. 30 using the parameters listed in Table III at 32 nm feature size; the measured values are 0.409V for VL and 0.611V for VH under a binary memory write operation at a curve slope of -1. So, the Static Noise Margin Low (SNML) and Static Noise Margin High (SNMH) for the write operation are 0.409V and 0.289V, respectively, i.e. by subtracting 0v from VL and VH from Vdd (i.e.

0.9v) [72]. Compared with 0.390V as the SNML for a conventional 6T CMOS SRAM [73], the proposed cell has a larger noise margin, thus making it more stable for a write operation.

48

Fig. 31. Static inverting transfer characteristics of the SET/MOS hybrid circuit at 32nm.

Consider next the ―read‖ operation; this is mainly related on the SET/MOS hybrid circuit. As per the evaluation in a previous section, its transfer characteristics have been simulated using the parameter listed in Table III at 32 nm feature size. The results are plotted in Fig. 31. In this simulation, RWL is set to a constant high voltage level and the SET transistor is characterized for the inverting property. In this case, the measured values of VL and VH for the read operation are 0.158V and 0.479V, respectively; hence, the

SNML and SNMH are 0.158V and 0.421V. As a 6T CMOS SRAM has a SNML 0.16V [74], there is an almost negligible penalty in the read static noise margin.

2.6. Proposed Ternary Content-Addressable Memory

Unlike SRAM in which for a read operation the input is an address and data is provided as output, a content-addressable memory (CAM) operates such that for an input data word, the entire memory is searched to check whether the data is stored in any of its locations. If so, the CAM returns a list of one or more storage addresses for which the word was found (match); else, a miss (mismatch or no-match) is said to occur.

2.6.1. Ternary Data Matching Using a Dual-Gate SET

The basic principle of the proposed memory is to replace a conventional CMOS-based SRAM cell by a SET-based static memory cell with a SET/MOS hybrid circuit. In the simplest case, this requires the

SET-based design to have the following functions: 1) a ternary level data storage function implemented by

49

the SET-based memory cell; and 2) a data matching function utilizing the multi-peak periodic drain- current characteristics of a SET with dynamic phase-shift control.

However, few changes are required in the design. The biasing voltage imposed on the control gate of a SET transistor in a SET/MOS hybrid circuit must be selected appropriately, as it may lead to totally different I-V characteristics. Also, the arrangement of the Word Line (WL) and the Bit Line (BL) must be changed because a cell using SET-based devices needs two WLs and BLs for the ―read‖ and ―write‖ operations and must accommodate ternary matching too. Thus, a novel circuit for the cell is required.

Fig. 32. Operation principles of ternary matching: (a) matching circuit consisting of a dual-gate SET and a cascode MOSFET, (b) ternary matching and (c) measured drain current characteristics [77], (d) and (e) simulated drain current characteristics at VG1=0V, VG1=0.9V at 32 nm node, respectively.

Fig. 32(a) shows the matching circuit consisting of a dual-gate SET with a cascode MOSFET, in which VG1 accepts the stored ternary data and VG2 accepts the searched binary data. For the matching

50

operation, VG2 is used to shift the periodic drain-current characteristics of the dual-gate SET, as illustrated in Fig. 32(b) [77]. Initially the following cases are considered.

 Assume VG2=0, i.e. the searched data is ―0‖; so, the SET is turned ON only when VG1=V2 (the stored

data is ―1‖). When this condition is met, the drain current through the matching SET causes a high

voltage with a discharge of the ML. Hence, ML is low voltage indicating a ―Mismatch‖.

 By contrast, when VG1 is either ―0‖ or ―X‖, the matching SET is not turned ON; so, ML is always

high with no discharge process. So, both of these two conditions show the ―Match‖ result.

Assume VG2=3e/4CG2; the phase is shifted left by 3/4 of the period. The following two cases are possible.

 As the searched data is ―1‖, then based on the I-V characteristics, the SET is turned ON only when

VG1=V0 (the stored data is ―0‖). This operation is similar to the condition of ―Mismatch‖.

 Correspondingly, when VG1 is either ―X‖ or ―1‖, the matching SET is not turned ON, so both

showing the ―Match‖ result.

Fig. 32 (c) shows the measured characteristics of the fabricated dual-gate SET of [77] and demonstrates the capabilities of the TCAM cell using this technique. Besides, our simulations further proves the correctness of drain current characteristics in Fig. 32(d) and Fig. 32(e), based on the SET model in [61] and parameters in Table VI.

Table VI. Device Parameters for TCAM HSPICE Simulation

Feature Size 45 nm 32 nm

Temperature 300 K 300 K

Power Supply Vdd 2.1 V 0.9 V

-Vdd -2.1 V -0.9 V

SE turnstile WFET1, WFET2 45 nm 32 nm

LFET1, LFET2 45 nm 32 nm Vg1, Vg2 2.1 V 0.9 V

Vh 2.1 V 0.9 V

51

Feature Size 45 nm 32 nm

Vl -2.1 V -0.9 V

CSEB 0.5 aF 0.5 aF

CSN 10 aF 10 aF

SET in MEM cell Cs, Cd 0.2 aF 0.2 aF CgSET 0.4 aF 0.4 aF

CpgSET 0.1 aF 0.1 aF

Rs, Rd 100 KΩ 100 KΩ

Vpg 0 V 0 V

Hybrid Circuit Vctrl -2.1 V -0.9 V

Vgg 2.1 V 0.9 V

Matching SET Cs, Cd 0.9 aF 0.9 aF

CG1, CG2 0.05 aF 0.13 aF

Rs, Rd 100 KΩ 100 KΩ

Based on previous discussion, the MOSFET-based TCAM cell and SET-based TCAM cell have significantly different principles and characteristics in the operational nodes. The MOSFET-based TCAM cell utilizes a traditional complementary circuit to meet the requirement of TCAM search operations.

Relatively, it is much easier to implement and integrate into a conventional circuit. However, due to the simple switch function of the MOSFET, the number of transistors in each cell increases to accomplish the expected functions, hence at a high power consumption.

The SET-based TCAM cell can reduce the circuit complexity for accomplishing the ternary matching operation by utilizing the phase-shift characteristics of a dual-gate SET. From the previous presentation, this memory cell can correctly meet the requirement of comparing the stored and the searched data. It has relatively low power dissipation due to the employed single electron transfer process. In general, the SET- based TCAM cell has several advantages and may offer a great potential; this discussion motivates the investigation pursued in the next section.

52

2.6.2. Proposed TCAM Cell

Fig. 33(a) shows the proposed TCAM cell structure using SETs and Precharge Circuit. The structure of the SET-based TCAM consists of a memory cell, Local Match Line (LML), Search Line (SL) (note that

Bit Lines (BLs) and Word Lines (WLs) are omitted in Fig. 33(a)). In the proposed cell, the matching SETs are connected to a LML, such that the LML can be connected to only a number of matching SETs (for a low LML capacitance). Further, several TCAM cells can share the Precharge Circuit depending on specific requirement.

Fig. 33. Proposed TCAM Cell & Precharge Circuit: (a) Cell Structure & Precharge Circuit (b) SE-based memory cell.

The dual-gate SET in Fig. 33(a) is used for the data matching operation. It receives the stored data from the output of the memory cell; as the ternary memory using a SE turnstile [59][60] can be designed with 6 transistors (1 SET and 5 MOSFETs), the total transistor count for the proposed cell is 9. Compared

53

with a traditional CMOS-based TCAM cell that requires two 6-transistor SRAM and 4 matching transistors, the number of transistors is reduced by nearly 50%.

Fig. 33(b) illustrates the proposed SET-based memory cell. This circuit consists of a SE turnstile, a

SET/MOS hybrid circuit, a storage node (SN), a reset MOSFET. The Read Word Line (RWL) and Write

Word Line (WWL) are used for the two basic operations of the cell. The Set Line (SETL) is used to reset the stored charge (number of SEs) in the SN. The Write Bit Line (WBL) is the input node and the Read Bit

Line (RBL) is the output node of the cell. The SN is utilized to store a multi-level voltage data.

Fig. 34. Simulated results of timing diagram of TCAM at 45 nm.

[59] has already demonstrated the basic operations of a SE turnstile with a SET/MOS hybrid memory circuit using a multi-level storage element. In this paper, a similar circuit is employed to store the ternary data (i.e. ―0‖, ―X‖ and ―1‖). The operations and waveforms of the proposed TCAM cell are shown in Fig. 34;. The memory cell stores the ternary data (―0‖, ―X‖ and ―1‖) to be written; the cell is connected to LML by the SET matching circuit. As per previous discussion, if the searched data is ―0‖, only when the stored data is ―1‖, the SET will be turned ON and a ―Mismatch‖ output is generated. If the searched data is ―1‖, then only when the stored data is ―0‖, as a result of the phase-shift characteristic, the SET will be turned ON, resulting in a ―Mismatch‖ output. When the stored data is ―X‖, then independently of the searched data, a ―Match‖ output is always generated. When the searched data is equal to the stored data, a

―Match‖ output is generated.

54

Next, the HSPICE simulation results of the proposed TCAM cell are initially presented. Table VI shows the values of all device parameters as applicable to the simulation of the proposed TCAM cell. As the ―write‖ operation is accomplished through the SE turnstile [59], then the transfer error rate is the most significant limiting factor on its operational frequency; as based on the error analysis of [59], this frequency is set to 166 MHz. Initially, by controlling the SE turnstile, electrons are transferred into the SN; the output voltage levels are converted as ―Write data‖. Using the matching SET, they are compared with the ―Searched data‖ on the searching line. The final output is as expected. By utilizing the phase-shift characteristics of the dual-gate SET, for ―X‖ as stored ternary data, the output always shows a ―Match‖ independently of the ―Searched data‖. Hence through this simulation, the correct write and matching operations of proposed TCAM cell have been confirmed.

Fig. 35. Simulated results of timing diagram of TCAM at 32 nm.

Fig. 34 and Fig. 35 show the HSPICE simulation of the basic operations of the proposed TCAM cell at 45 nm and 32 nm as feature sizes; the HSPICE device model of [61] is utilized for the SET. Throughout this simulation, the voltages of the ternary signals are initially stored in the memory cell through the write operation; then, the SET/MOS hybrid circuit converts them into positive voltage levels at the RBL (Vout).

The matching operation is performed to generate the local match output. Based on the simulation parameters presented in Table VI, an extensive analysis of the matching operation is pursued next. The operations and waveforms of the proposed TCAM cell are shown in Fig. 33; these plots are applicable to a

55

single TCAM cell that stores the three logic values (―0‖, ―X‖ and ―1‖). By considering the combinations of the two different logic values (―0‖, ―1‖) in the Searching Line (SL), six matching operations (one per cycle) must be considered as follows.

 In the first cycle, no ―write‖ operation is performed in the memory cell and its output of the ―Write

data‖ is ―0‖. Compared with the ―Search data‖ in SL, the matching operation is achieved by the

matching SET; the ―Match‖ output is high. The cell has a ―Match‖ condition based on the

comparison of the same data, i.e. ―0‖.

 When the ―Search data‖ is ―1‖ and the ―Write data‖ is still ―0‖, the ―Match‖ output is low (Cycle 2).

As previously discussed, in this case, the phase is shifted by a 3/4 of a period; this results in a high

drain current through the matching SET. Hence, LML is discharged; the output is low representing

the ―Mismatch‖ outcome.

 The ―write‖ operation occurs in Cycle 3; as the information to be written is ―X‖ (a middle voltage

level), there is an electron transferred into the Storage Node (SN) of the memory cell. So, ―X‖ is

shown in the ―Write data‖ and compared with the data in SL. As the ―Searched data‖ is ―0‖ (Fig.

32(b)), the drain current through the matching SET is low and the ―Match‖ is completed without the

discharge of LML.

 As per the phase-shift characteristics of the matching SET and based on the I-V characteristics of the

SET, when the ―Searched data‖ is ―1‖, a ―Match‖ will occur at the output (Cycle 4). By utilizing this

characteristic (i.e. the phase is shifted by 3/4 of a period), the drain current is low to hold the charge

state of the LML. However, there is a small voltage decrease in the output node as result of the

precharged clock (CLK) in the Matching Line. Prior to the evaluation in each cycle, ML is

precharged again; a CLK pulse causes a temporary decrease that is mitigated in the next evaluation

period.

 In Cycle 5, the second electron is transferred into the SN, causing a voltage increase of the ―Write

data‖ (i.e. to ―1‖); however, the ―Searched data‖ in SL is ―0‖. Following this event, the matching

SET is turned on in this cycle and a high drain current passes through it to discharge the LML. So the

―Match‖ is low indicating a ―Mismatch‖ condition.

56

 Finally, when the ―Searched data‖ is ―1‖ (same as the ―Write data‖), the matching SET is turned off

again (Cycle 6). LML is high following the precharge process in this cycle because the drain current

is low. So, a ―Match‖ occurs as observed through the ―Match‖ output.

The above discussion has considered the six combinations of the ―Write data‖ (―0‖, ―X‖ and ―1‖) and the ―Searched data‖ (―0‖, ―1‖). These results show that the proposed TCAM cell works correctly using

HSPICE simulation and employs compatible models for all SET devices in the proposed circuit of the memory cell.

2.7. Evaluation and Analysis of Proposed TCAM Cell

The performance evaluation and relevant metrics for assessing a single TCAM cell are presented and discussed in this section. Based on the simulation parameters given in Table VI, the proposed TCAM cell has been simulated using HSPICE; all results are at 32 nm feature size.

2.7.1. Average Write/Read Delay

Table VII. HSPICE Simulation Results for Single TCAM Cell

Feature Size 45 nm 32 nm

Temperature (K) 300 K 300 K

Average Write Delay (ns) 9.04 ns 8.73 ns

Average Read Delay (ns) 3.06 ns 2.96 ns

Vdd (V) 2.1 V 0.9 V

Power Dissipation for Each 2.63E-9 W 4.83E-10 W Search Operation (W)

Matching Delay (0,X) (ns) 3.32 ns 3.21 ns

Matching Delay (0,1) (ns) 3.44 ns 3.31 ns

Matching Delay (1,X) (ns) 3.31 ns 3.22 ns

Matching Delay (1,0) (ns) 3.43 ns 3.33 ns

Mismatching Delay (X,0) (ns) 2.77 ns 2.69 ns

Mismatching Delay (X,1) (ns) 2.75 ns 2.68 ns

Mismatching Delay (0,1) (ns) 2.86 ns 2.73 ns

57

Feature Size 45 nm 32 nm

Mismatching Delay (1,0) (ns) 2.85 ns 2.71 ns

Table VII shows the simulation results for the two basic memory operations; with a reduction in feature size of the MOSFETs, delays are decreased. This is expected also for the proposed cell due to its hybrid nature [59]. These delays are influenced by two limiting factors: transfer error rate of the SE turnstile and the speed of the SET/MOS hybrid circuit. In the former, the internal characteristics of the

SET process affect the write delay. So, different from the conventional (CMOS-based) SRAM, there is no significant and direct improvement in delay of the write operation for ―0‖; this operation is mostly influenced by the SE transfer speed, not the characteristics of the MOSFET. For the latter, the speed of the hybrid circuit significantly affects the read delay because the highly reactive response of the SET is used in the memory to measure/sense the number of stored electrons in the ―read‖ function. In the worst case, the delay of the SET/MOS hybrid circuit is given by td=CtotVout/Io, where td is the delay time, Ctot is the total load and inherent capacitance and Io is the bias current. The proposed TCAM cell utilizes a SET to accomplish the ―match‖ function by comparing the values on the Read Bit Line (RBL) and the Search

Line (SL). Hence, the performance of the SET plays an important role in the matching delay, which will be investigated in more detail next.

2.7.2. Matching (Mismatching) Delay

The device characteristics of a SET require an accurate control; the proposed TCAM employs the periodic drain-current characteristics of a SET for the data matching operation, and hence deviation in this device characteristics can prevent the correct execution of the matching operation. From Table VII, the matching delay is effective to characterize the performance of the TCAM match function that is defined from the precharge to the response of the match output. The results have been categorized into two types: the matching delay (that refers to the voltage changing from low to high) and the mismatching delay, respectively. In the simulation, a low voltage level corresponds to a ―Mismatch‖, while a high voltage level corresponds to a ―Match‖ as output response.

58

In the proposed TCAM cell, the matching delay is significantly influenced by the matching SET and the internal characteristics of the cell such as the mechanism of the phase gate for the match outcome when it stores a ―Don‘t Care‖; this mechanism is totally different from the encoding mode of a conventional TCAM cell. So, a variation on the Search Line (i.e. a change of searched data), may influence the SET as limited by its circuit features. In the HSPICE simulation of the proposed TCAM cell, the ―Searched data‖ is always changed between ―0‖ and ―1‖ at the beginning of each cycle, causing a data dependency for the ―Match‖ at the SL. For example, following the ―Match‖ in Cycle 1, the ―Match‖ should be still high at the beginning of Cycle 2. With a change in ―Searched data‖, the voltage level decreases to a ―Mismatch‖ with a corresponding output delay. However, the simulation results show that there is no data dependency and the memory circuit operates correctly. The operating frequency of 166

MHz as required for a correct transfer rate of the electrons in the SE turnstile (modeled at macroscopic level by HSPICE) allows the proposed memory cell to process data as required by the TCAM.

2.7.3. Power Dissipation

The power dissipation of the proposed TCAM cell is related to the memory circuit and the matching

SET as associated with the ML operation. The dynamic power consumed by a single match line for a mismatch is due to the rising edge during the precharge process and the falling edge during the evaluation step ; it is given by

2 Pmiss  C MLVDDf (10)

where f is the frequency of the search operation and CML is the capacitance of ML. For a match, the power consumption associated with a single ML depends on its previous state; as in practice (for example in a network router design) only a small number of matches are encountered, then this is not significant.

So, the ML power consumption of a TCAM memory block with n match lines is

2 PML  nPmiss  nC MLVDDf (11)

59

Simulation results show that the power dissipation of each search operation is 2630 pW and 483 pW for 45 nm and 32 nm, respectively, so decreasing with feature size. As reported in [59], only the

SET/MOS hybrid circuit significantly contributes to the power dissipation of a SET-based memory cell for the traditional memory operations (write and read operations only). So, the performance of a search operation in a TCAM cell must be investigated. The main source of power dissipation in a TCAM cell is the dynamic search process, by which charge/discharge is a significant factor in performance. The proposed cell requires two repulsive clock signals, so it is anticipated that extra power will be dissipated by them as determined by the interconnect, the gate capacitance and the frequency [59]. A reduction in power dissipation is further accomplished by scaling the feature size from 45 to 32 nm, showing however only a modest improvement in this figure of merit.

2.8. Conclusion

This chapter presents the applications of the Single-Electron transfer technique by utilizing its low power dissipation with the operations. After proposing the novel HSPICE behavioral model for SE turnstile at nanometer feature size, the hybrid memory cell is proposed to achieve the storage function as the conventional SRAM cell circuit. Furthermore, the proposed TCAM cell is also included, using the phase shift characteristics of the dual-gate SET transistor. Through the extensive evaluations, the proposed memory cells save the power dissipations and area significantly, compared with the conventional memory cells implemented with the CMOS transistors. Especially, the proposed hybrid memory cell requires nearly 22% less area than a 6T CMOS memory cell and shows the improvement of SNM for the write operation. In addition, the proposed SE turnstile behavioral model captures the excellent accuracy and ensures the robustness of the operation at nano feature size.

60

3. RESISTIVE RANDOM-ACCESS-MEMORY

3.1. Introduction

Recent advances in memory technology have made possible new modes of operation for nanoscaled

Integrated Circuits (ICs). For example, Field Programmable Gate Arrays (FPGAs) have mostly utilized

Static Random Access Memories (SRAMs) as programming technology [78][79][80]. However with the decrease in supply voltage and feature size, the leakage current of a SRAM has considerably increased, thus becoming a major source of power consumption when the IC is in standby mode [78]. However,

SRAMs are volatile, so, non-volatile storage is required for power-down operation. A non-volatile SRAM

(NVSRAM) combines the benefit of a simple access and a nearly unlimited ―Write‖ capability of a SRAM with the non-volatile element provided by an EEPROM (electrically erasable programmable ROM). In the past non-volatile RAMs had the disadvantages of low density (at most 4 Kbit) and significantly lower speed than a volatile SRAM [81].) Through the years, increased density (64 Kbit) and faster access (30ns for military standard ICs) have been reported for NVSRAMs [81]. The NVSRAM is normally accessed like any static RAM and a ―Restore‖ signal is utilized to clear the volatile data held in the SRAM and replace it with the data held in the non-volatile storage when a ―Restore‖ on ―Power-up‖ is performed.

This operation is also commonly referred to as ―instant-on‖ [78][80].

This continued growth of semiconductor non-volatile memories will likely rely on advances in both electronic materials and device structures. Resistance switching is the basic physical phenomenon in the operation of a resistive random access memory (RRAM); this phenomenon has been studied for more than

40 years [15]. Extensive efforts have been devoted to address these two complementary issues. In addition to its non-volatile operation, one of the most evident advantages of a RRAM is its compatibility with

CMOS processes, such that the current infrastructure can be readily applied to its fabrication/manufacturing. Furthermore, the scaling merit of a RRAM permits to operate at low power consumption, making it a very competitive technology for large storage at low costs. In the past decade, several novel techniques have been proposed for implementing NVSRAMs, such as ferroelectric

61

capacitors [22], phase change [23], non-polar Resistive Switching Devices (RSDs) [24], nanocrystal

PMOS flash [25], spin-transfer-torque MTJs (STT-MTJs) [26] and the memristor [27].

In this chapter, the RRAM technique is implemented in the CMOS digital memory design. First, the

RRAM is reviewed for its typical operation and relevant properties. Second, a Multiple Level Memory cell

(MLC) is designed by utilizing the technique. After that, the RRAM is implemented with conventional

SRAM cell circuit to approach the non-volatile storage with the ―instant-on‖ scheme and the proposed

7T1R cell is characterized with the adequate evaluations. Finally, the propose 7T1R circuit is upgraded to

9T1R to further reduce the power dissipation and other two hardened memory cell designs are included to improve the robustness to soft errors.

3.2. Proposed Multiple Level Memory Cell (MLC)

3.2.1. Modeling RRAM

A HSPICE macromodel is used in this study as an accurate but compact model of the RRAM, shown in Fig. 36. This model uses a small number of circuit elements to describe resistive switching and it has been shown that it yields good agreement with the experimental data for millisecond switching operation

[79]. Its description follows.

Fig. 36. A HSPICE macromodel of a single bit RRAM [79].

RSET denotes the low resistance of the RRAM during the SET process. RRESET denotes the high resistance during the RESET process. Two switches (S0 and S1) are used to control the RRAM operation

62

as switching between the Low Resistance State (LRS) and the High Resistance State (HRS) for RSET and

RRESET, respectively. S0 is turned on for the switching transition from HRS to LRS and off from LRS to

HRS. Reversely, S1 is on for the switching transition from LRS to HRS and off from HRS to LRS. S0 and

S1 are utilized to charge or discharge the capacitors, C1 and C2 respectively. In the macromodel, the inputs to the SR latch are driven from the outputs of the comparators CMP1 and CMP2 that are modeled as voltage-controlled sources or ideal operational .

Table VIII. Device Parameters for RRAM HSPICE Simulation

Parameter Nano Model Bipolar Model

Temperature 300 K 300 K

MOSFET (W/L) 64 nm / 32 nm 32 nm

Vdd 3 V 0.9 V

VS 1.5 V 0.8 V

VR 0.7 V -0.8 V

RSET 1 KΩ 1 KΩ

RRESET 1 MΩ 1 MΩ R1 100 Ω 100 Ω

R2 100 Ω 100 Ω

C1 5 fF 5 fF

C2 5 fF 5 fF

This section improves [79] for nanosecond scale operation by reducing/calibrating the internal parameter values of this model. This is related to the new RRAM technology fabricated and demonstrated experimentally in [85] using NiO and Ti:NiO. The data in the model of the previous section is based on

TiO2 thin films; hence, the model operation is restricted to millisecond scales. In this section, the RRAM technology of [85] is assessed for nanosecond scale operation. This is accomplished by modifying the internal resistances and capacitances, as parameters in the cell function to account for the new material.

Therefore, compared with the parameters given in Table VIII, simulation is pursued in this section by

63

reducing R1, R2, C1 and C2 for a faster charge/discharge process in the RC networks, thus increasing the operational frequency. The changes in SET and RESET voltages are also included, i.e. VSET is given now by 1.5V, and VRESET is 0.7V (as reported in [85]).

Fig. 37. HSPICE simulation of RRAM model [79] at nanosecond scale.

The simulation result for the nano model is presented in Fig. 37 for the SET and RESET operations.

In Fig. 37, the I-V characteristics of a RRAM are demonstrated using four cycles (10 ns for each cycle).

Table IX. Parameters for Multiple Bit RRAM Simulation

RSET VG (SET Pulse Voltage)

1 KΩ 0.85 V

3 KΩ 0.80 V

5 KΩ 0.75 V

10 KΩ 0.70 V

Next, the changes in model parameters are presented to demonstrate the multiple bit capability in operation for the nano model. The values in Table IX are used; they are taken from the experimental data of [85]. By biasing at different values of gate voltage VG, the RRAM also shows a variation in SET resistance for LRS, thus influencing the I-V characteristics. The simulation results are shown in Fig. 38 for the SET/RESET operations by varying the SET resistance. In these simulations, timing consists of two

64

cycles, i.e. one cycle each for SET and RESET. With a reduction of resistance, the current increases under the same condition of external bias voltage. Therefore, multiple bit operation is possible for a RRAM also at a nanosecond scale, providing the basis for the proposed memory design in the next sections.

Fig. 38. HSPICE simulation of a multiple bit RRAM cell at nanosecond scale.

3.2.2. Proposed MLC Design

In this section, a multiple level memory cell is designed using a RRAM. The multiple bits RRAM utilizes the technology of [86] that shows various current-voltage characteristics, including a High

Resistance State (HRS) and several different Low Resistance States (LRSs). From experimental results, the difference is mostly due to a change of switching time to form the filaments at different contact areas, almost approaching an electron transfer process. The thin films [86] implement the expected function of voltage-controlled switching for memory operation.

Hence, longer the SET pulse width is, smaller RSET is for the next SET; also, a higher VR is required for rupturing the formed current paths of the next RESET. This is mostly due to the longer SET pulse that contributes to the presence of stronger current paths. As RSET decreases, the value of VR for rupturing the stronger paths must be raised by utilizing a shorter SET pulse width [86]. So, the switching in the

RRAM cell is realized by a backend process with a metal-dielectric-metal structure [85][87]. Based on the above fabrication process, the switching film is placed underneath the source contact as a resistive storage node (Fig. 39). An n-type MOSFET (NMOS) is utilized in the MLM cell to modify the voltage value for

65

storing the multiple bits information. The proposed 1T1R RRAM cell (Fig. 39) uses a circuit architecture that has been demonstrated experimentally in [85].

Fig. 39. Proposed multiple level memory cell using RRAMs in a 8-bit 16-word 128 cells array.

For both cases (i.e. storing single or multiple bits in a cell), the ―write‖ operation is controlled by the

Write Word Line (WWL) and the Bit Line (BL) (Fig. 39). Based on the value of BL, when the WWL is

―1‖, the NMOS Mn1 is turned ON, so the data is written in the Storage Node connected to the Read Bit

Line (RBL). The resistive characteristics of a RRAM, (as investigated in previous sections) are utilized by changing the resistance for multiple bit function as follows.

 When in HRS, the memory is turned OFF, so it cannot be written.

 When in LRS with an increase in resistance, the voltage of the Storage Node decreases to different

levels for the ―write‖ operation (so capable to store multiple levels information). The memory is used

for single bit operation when the resistance of the RRAM in LRS has only one level (in addition to 0).

3.3. Multiple Level Memory Cell (MLC) Evaluation

In this section, different aspects related to the evaluation of the proposed 1T1R MLC under different bases (binary, ternary and quaternary) are presented.

66

3.3.1. Resistive switching behavior

Table X. Parameters for 1T1R MLC Simulation

Parameter Value

Temperature 300 K

MOSFET (W/L) 64 nm / 32 nm

VS 1.5 V

VR 0.7 V

RRESET 1 MΩ

RSET 1 KΩ – 100 KΩ

Simulation results are provided to demonstrate the operation of the proposed 1T1R memory cell under single and multiple bit conditions. The timing diagram of the simulated write operation for single bit

(binary) operation is shown in Fig. 40 by utilizing the parameters listed in Table X. The supply voltage is

1.8 V, while the feature size of the MOSFET is 32 nm. Each cycle in Fig. 40 is 20 ns; this value is based on the pulse width of the SET/RESET process as previously described. When the Write Word Line (WWL) is ―1‖, the NMOS M1 is turned ON to write the data (for the binary base, a ―0‖ or a ―1‖) on the Bit Line

(BL) to the Storage Node.

Fig. 40. Simulation of proposed 1T1R RRAM cell for binary operation at 32 nm.

67

So, the performance of the proposed 1T1R cell is mostly influenced by the characteristics of the

RRAM; in all simulations, the data used is based on experimental results from RRAMs fabricated with

NiO and Ti:NiO [85]. The SET/RESET process for the RRAM and the NMOS transistor also play important roles for assessing the operational speed of the memory. This evaluation is based on the results of Fig. 41, in which different values are utilized for RSET in LRS to demonstrate the multiple levels of the storage node voltage as corresponding to the multiple bit operation of the 1T1R cell. The resistance values are listed in Table XI for the different bases of the cell. In all cases, RRESET is unchanged to 1 MΩ as corresponding to the ―0‖ level and the RRESET for the highest level is 1KΩ.

Table XI. Resistances for Different Base Operation of 1T1R MLC Cell

Memory Base Voltage Level Resistor Resistance (Ω)

Binary ―0‖ RRESET 1 MΩ

―1‖ RSET 1 KΩ

Ternary ―0‖ RRESET 1 MΩ

―1‖ RSET 20 KΩ

―2‖ RSET 1 KΩ

Quaternary ―0‖ RRESET 1 MΩ

―1‖ RSET 35 KΩ

―2‖ RSET 8 KΩ

―3‖ RSET 1 KΩ

68

Fig. 41. Simulation of 1T1R memory cell for multiple level operation at 32 nm.

The resistance values for RRESET and RSET depend on the base of the RRAM cell (binary, ternary or quaternary). Simulations at different feature sizes for the MOSFET and voltage levels of the 1T1R cell are reported in Table XII to evaluate the proposed RRAM at a binary base. Due to its high switching speed

[85], the write delay of a binary memory cell is in the nanosecond ranges; moreover by reducing the

MOSFET feature size, the delay decreases (as expected, smaller delays are achieved under the HP model).

These phenomena demonstrate that the proposed 1T1R RRAM cell is very fast in its operation. A lower write time is accomplished from the lowest to the highest levels rather than the reverse.

Table XII. Simulation Results of 1T1R MLC Cell For Binary Operation

Feature Size Binary Write Write Delay (ns) Power Dissipation Operation (W)

32 nm ―0‖ to ―1‖ 5.725 ns 1.266E-6 W

―1‖ to ―0‖ 6.590 ns 3.692E-7 W

Average 6.158 ns 8.311E-7 W

22 nm ―0‖ to ―1‖ 5.723 ns 2.283E-8 W

―1‖ to ―0‖ 6.075 ns 1.644E-8 W

Average 5.899 ns 1.964E-8 W

The high base RRAM cells (ternary and quaternary for example) are also simulated at feature sizes of

32 nm and 22 nm and the value of RSET has a significant influence on the voltage levels and therefore, on the performance of the RRAM cell. The write delay at a higher voltage level tends to decrease for a single bit ―write‖; so for example, the write delay for the operation of ―0‖ to ―1‖ for a quaternary cell is smaller than that for a ternary cell due to the decrease in voltage difference between levels (as caused by the resistance of the RRAM for ―1‖). For multiple bit operation, the ―write‖ operation requires more time, thus resulting in an increase of the delay. In general, the performance of the RRAM cell is mostly affected by the operational speed of the resistive switching mechanism and the base for the different memory operations. The variation in MOSFET feature size does not play a significant role in the delay, because the

69

MOSFET is only used to control the memory operations of the 1T1R cell. So on average the average write delay decreases by decreasing the feature size.

Table XII also show the simulation results for the power dissipation of the 1T1R MLC cell for the three bases considered in this study. These results are mostly caused by the small current of the RRAM (i.e. less than 100 μA) for controlling the switching operation between the resistance states. Therefore, the

1T1R RRAM cell has very low power dissipation. From the simulation results, the change in power is mostly caused by the NMOS feature size, i.e. the power dissipation of a RRAM cell is reduced by decreasing the feature size.

3.3.2. Array Evaluation

This section presents the evaluation of the memory array consisting of the RRAM cells and the

PROC. Let N denote the number of RRAM cells connected to the sense amplifier.

Fig. 42. Simulation of proposed single RRAM cell array for binary operation (write and read at 32 nm, N=1).

The array consists of a RRAM cell and the amplifier, i.e. N=1, considering initially the binary case.

The simulation of the write and read operations (four cycles) in the binary array are shown in Fig. 42.

 For the first two cycles, BL is always ―0‖ but WWL and RWL are changed to ―1‖ to sequentially

write and read ―0‖. Correctly, the Read Output voltage is always ―0‖ as expected.

 For the third cycle, BL is ―1‖ when WWL is ―1‖, i.e. the write ―1‖ occurs.

 During the fourth cycle, when RWL is ―1‖, the stored data in the memory cell is read out.

70

The ternary and quaternary RRAM memory arrays have been demonstrated using the same conditions in the input signals (WWL, RWL and BL) for each operation using the various values of resistance listed in Table XI.

Fig. 43. Noise margin and output signal deterioration for binary operations at 32 nm.

After above evaluation, the number of RRAM cells in the array is increased such that a comprehensive evaluation is pursued. Using the parameters described previously, the simulation of a binary RRAM memory can be used to characterize the output signal and its possible deterioration by increasing N on the same BL. This is shown in Fig. 43. The output signal of ―1‖ is still initially acceptable due to the large noise margin (the noise margin for ―1‖ is defined by dividing the voltage range of 1.8 V into two equal parts). However when the signal level reaches 0.9V, an erroneous read operation occurs.

Similarly, for ternary and quaternary memories, the noise margins are also characterized. The largest value of N yielding correct operation (so an acceptable output signal is present) is shown in Fig. 44; as expected,

N decreases with an increase in the number of levels for the high base arrays. Therefore, the binary memory arrays tolerate a larger number of memory cells on the BL than ternary and quaternary arrays.

71

Fig. 44. Size (N) of operating linear array versus base of memory.

The results show that the proposed memory array is affected by the multi-level signal due to the partitioning of the voltage range as defined by the base of the memory cell. Moreover the quantization of the noise margin utilizes the average partition for the RRAM memories with the different bases and their relation with the linear array. Finally, different with a MOSFET-based array, the proposed cell shows a consistent good performance at array-level with performance that is capable of handling rather efficiently bases higher than binary.

3.4. Proposed 7T1R Non-volatile SRAM Cell

The proposed 7T1R NVSRAM design is shown in Fig. 45. In this circuit, only one bipolar RRAM

(with a resistive element denoted as RRAM1) is added to the 6T SRAM core (M1-M6). RRAM1 is controlled by the transistor M7; it is connected directly to the data storage node of the memory core and is used to store the logic information of the SRAM during its ―Power-down‖ state. Hence, this is a 7T1R memory cell. The transistor sizing strategy for designing the 7T1R depends on the core of the proposed cell (in this case, a 6T SRAM) and must consider its Read/Write operation correctness; so, the sizing of each transistor (i.e. W/L) is as follows: 3:1 for M5, M6 and M7; 4:1 for M1 and M3; 2:1 for M2 and M4.

This design and sizing strategy are evaluated using the layout of the proposed 7T1R cell in the next sections.

Similar to the 8T2R NVSRAM cell and depending on the specific information stored at the SRAM data node, the RRAM element changes its resistance between the Low Resistance State (LRS) and the

72

High Resistance State (HRS). Generally, the SET process changes the resistance element from HRS to

LRS; the RESET process is used for the reverse operation.

Fig. 45. Proposed 7T1R NVSRAM cell.

To achieve non-volatile ―Instant-on‖ operation, the proposed memory cell has two basic states:

―Power-down‖ and ―Power-up‖. The ―Power-up‖ requires to ―Reset‖ (i.e. the RESET process takes place in the RRAM, thus accounting also for the corresponding effects on the memory core), ―Store‖ and

―Restore‖. Thus, consider each of these operations in more detail.

 Reset: The RRAM is changed back to HRS ahead of writing to the memory core, this is accomplished

by the ―Reset‖ operation. During this step, a ―1‖ is applied on CTRL1 to turn ON M7 and a ―1‖ is also

applied to CTRL2; meanwhile, the power Vdd is turned OFF. Hence, the voltage of node D is always

―0‖ and if the RRAM1 is in HRS, a RESET is not required. If the state of the RRAM1 is in LRS, the

negative voltage drop on RRAM1 causes the device to be reset and thus, it changes its state from LRS

to HRS.

 Store: When the access transistors M5 and M6 are turned ON in this operation, the complementary

data at the Bitline (BL) is ―written‖ into the storage node D. Meanwhile, M7 is turned ON and

CTRL2 is at 0V to program the RRAM1 depending on the voltage at node D. Hence, if D is ―1‖, the

positive potential drop on RRAM1 changes the state from HRS to LRS. Thus, the operations of

writing the data into the 6T core SRAM and programming the RRAM are completed in one step. At

73

completion of this step, the RRAM presents different resistance states corresponding to the stored

information: if D is ―1‖ (and DN is ―0‖), it is in LRS; if D is ―0‖ (and DN is ―1‖), it is still in HRS,

following ―Reset‖.

 Power-down: The non-volatile storage capability is accomplished by the different resistance states of

the RRAM. The following signals are applied to ensure the retention of this information during the

―Power-down‖ step. M7 is turned OFF, because it is controlled by CTRL1. Meanwhile, Vdd and Vss

are both lowered to 0V, such that D=DN=―0‖. So, no power supply is required in this state.

 Restore: For the ―Restore‖ operation, the transistor M7 is turned ON, CTRL2 is high and the power

supply is turned ON. Meanwhile, Vss is low. If RRAM1 is in LRS, the storage node D remains at ―1‖

and DN is discharged through M3. If RRAM1 is in HRS, there is initially a voltage increase at node D;

however, the fast voltage increase at node DN turns ON transistor M1. D finally discharges back to

―0‖ because it is connected to CTRL2 through the significantly larger resistance of the RRAM. So,

the execution of ―Restore‖ ―0‖ is relatively more complicated than ―Restore‖ ―1‖ due to the

asymmetric design of the cell. The correct execution of the above operations is shown in the next

section by simulation.

3.5. Evaluation and Analysis of Proposed 7T1R NVSRAM Cell

In this section, the model of the RRAM is first presented, followed by the evaluation of the basic memory operations of the proposed cell as well as the two NVSRAM cells found in the technical literature

[78][80] since the model is upgraded for bipolar operations.

3.5.1. RRAM Model Demonstration

74

Fig. 46. Modified RRAM model for bipolar operation.

Non-volatile memory operation requires the use of a resistive element within a RRAM circuit. A compact model has been proposed in [79]; however it is not compatible with a bipolar RRAM scheme.

Hence a modification to the model of [79] is implemented such that it can be used in memory design. The bipolar model differs from the unipolar one by the use of the comparator CMP2. As in [78], HRS is achieved by the RESET process and using a negative RESET voltage. As shown in Fig. 46, during RESET,

S1 is ―1‖ and a decrease of VIN results in charging the capacitor C2. The output voltage of CMP2 changes to ―1‖ when the negative value of V2 is lower than the RESET voltage (VR); hence, the state is switched to

HRS. The operating supply voltage Vdd is also changed (Table VIII) to an appropriate value for the feature size (i.e. 0.9 V for 32nm feature size) in the column of bipolar model. The variation of the

MOSFET feature size requires also to change the period of charge/discharge, hence the new values for the internal parameters, such as R1, R2, C1 and C2 in Table VIII. The external parameters such as the SET voltage (VS), the RESET voltage (VR), the SET resistance (RSET) and the RESET resistance (RRESET) vary as dependent on the resistive element used in the RRAM cell. For nanosecond scale operation a reduction and calibration of the internal parameter values of the model of [79] are required. This is related to the new

RRAM technology fabricated and demonstrated experimentally in [85] using NiO and Ti:NiO. The data in the model of [79] is based on TiO2 thin films; hence, the model operation is restricted to millisecond scales.

In this paper, the RRAM technology of [85] is used for nanosecond scale operation. The operation of the modified bipolar RRAM model is shown in Fig. 47 using the parameters of Table VIII for the RRAM; the

SET and RESET processes are shown in the first and second cycles when S0 or S1 is turned ON. In

75

addition, depending on the biasing input voltage VIN, the model shows the polarity of the current as required for the correct operation of the RRAM.

Fig. 47. Simulation of modified bipolar RRAM model.

3.5.2. NVSRAM Demonstrations

Table XIII. Parameters for NVSRAM Simulation

Parameter Value

Temperature 25 C

MOSFET Feature 32 nm Size

Vdd 0.9 V

Vss 0 V

CTRL1, CTRL2 0.9 V

Source Line (SL) 0.9 V

RRAM1or RRAM2 1 KΩ / 1 MΩ (LRS/HRS)

The proposed NVSRAM cell has been simulated using the parameters listed in Table XIII; this evaluation concentrates on the performance of the NVSRAM for the ―Power-down‖ and ―Restore‘ operations.

76

Fig. 48. Store ―1‖, Power-down and Restore ―1‖ operations of the proposed 7T1R cell at 32nm.

Consider initially the ―Write‖ ―1‖ operation; the simulation results are shown in Fig. 48. The period of each cycle is given by 20 ns due to the programmable property of a RRAM (that requires at least 10 ns for the resistance variation) and by also considering the control signals, such as the Word Line (WL). It consists of four cycles as follows.

Consider initially the ―Store‖ ―1‖, ―Power-down‖ and ―Restore‖ ―1‖ operations; the simulation results are shown in Fig. 48. Prior to executing these operations, the memory cell requires ―Reset‖ to change the resistance value of the RRAM to HRS; this is also reflected in the timing diagram. The period of each cycle is given by 20 ns due to the programmable property of an RRAM (that requires at least 10 ns for the resistance variation) and by also considering the control signals, such as the Word Line (WL). It consists of four cycles as follows.

 In the first cycle, WL and S0 are turned ON and the 6T memory core is written; meanwhile, RRAM1

is programmed, because M7 is turned ON (controlled by CTRL1 and CTRL2, for Vdd and 0V,

respectively). In this case, the value of BL is ―1‖, the SET process is executed due to the positive

voltage drop across RRAM1 from D to CTRL2 and the resistance changes to LRS (i.e. 1 KΩ). In Fig.

77

49, the value of BL is ―0‖, RRAM1 is kept in HRS (because the voltage of D is ―0‖); there is no

significant voltage drop across RRAM1. Hence, the ―Store‖ state is completed.

 During the second cycle, when the power supply Vdd is turned OFF (i.e. 0V), the storage node D in

the incorporated 6T memory core discharges to ―0‖, while the programmed RRAM1 stores a ―1‖ (as

LRS). During this cycle of the ―Power-down‖ state, the proposed memory cell saves a significant

amount of energy, compared with that a conventional volatile 6T SRAM in ―Standby‖ (caused mostly

by leakage).

 At the start of the third cycle, Vdd is turned ON; hence, M7 turns ON and CTRL2 changes to ―1‖. The

storage node D charges back to ―1‖, hence fully achieving the ―Restore‖ process.

 Finally, the last cycle shows that the ―Restore‖ process provides the same voltage value as the ―Store‖

operation. Therefore, the ―1‖ stored before ―Power-down‖ is written again to the memory core, thus

correctly executing the ―Instant-on‖ operation.

Similarly, Fig. 49 shows the complete execution of the ―Store‖ ―0‖, ―Power-down‖ and ―Restore‖

―0‖ operations in each cycle. During the ―Restore‖ cycle when the power supply is turned ON, the voltage values of D and DN are ―0‖. During the ―Restore‖ cycle the power is turned ON, when the voltage values of D and DN have already discharged to zero. When the voltage of CTRL1 for turning ON M7 and

CTRL2 is changed to ―1‖, which will slow the charging of the storage node D since the large resistance value of HRS for RRAM1; so, DN is charged faster than D to ―1‖ due to the cross-coupled inverter scheme of the 6T SRAM core, while finally a ―0‖ is retained at D (although it also has an initial charge at the beginning of ―Restore‖, shown in Fig. 49). These results show that the proposed memory cell successfully and correctly restores the information after the ―Power-down‖ state by utilizing the resistance switching feature of the RRAM to accomplish non-volatile storage.

78

Fig. 49. Store ―0‖, Power-down and Restore ―0‖ operations of the proposed 7T1R cell at 32nm.

3.5.3. Energy

Energy is assessed using the parameters listed in Table XIII; due to the limitation of the behavioral model for the RRAMs, the evaluation results for the energy include the HSPICE simulation results and the data of [17] thus taking into consideration the energy of the RRAM for its operations, such as SET/RESET.

The average energies of these circuits are presented in Table XIV for all operations (i.e. ―Write‖, ―Store‖ and ―Restore‖) under both values (i.e. ―0‖ and ―1‖). The energies of the ―Reset‖ operation for the 7T1R and 8T2R cells are also presented in Table XIV for the ―Instant-on‖ cells. The proposed 7T1R memory cell achieves a significant reduction in energy for these operations. This is particularly important for the

―Store‖ and ―Restore‖ operations, because the proposed NVSRAM cell utilizes only a single programmable RRAM. Also, a substantial difference in energy is reported for the ―0‖ and ―1‖ values; this is due to the asymmetric design of the proposed cell (as utilizing only a RRAM connected to D). Finally, compared to the volatile 6T SRAM cell, the proposed memory cell requires more energy when the data is written during the ―Store‖ operation. The proposed cell is non-volatile; so, energy dissipation is reduced more in the ―Power-down‖ state than in ―Standby‖ (as applicable to the 6T SRAM).

79

Table XIV. Energy of Memory Cells (32nm)

Memory Cell Operation Energy (J)

―0‖ ―1‖

7T1R Reset 38.61 fJ 49.55 fJ

Store 163.3 fJ 219.2 fJ

Restore 456.8 fJ 389.9 fJ

8T2R [78] Reset 62.42 fJ 62.42 fJ

Store 379.2 fJ 379.2 fJ

Restore 490.1 fJ 490.1 fJ

9T2R [80] Store 505.3 fJ 487.9 fJ

Restore 523.4 fJ 577.8 fJ

6T Write 139.6 fJ 139.6 fJ

Rnv8T [88] Write 182.5 fJ 182.5 fJ

Store 478.1 fJ 478.1 fJ

Restore 494.7 fJ 494.7 fJ

Energy is also affected by a change in feature size. Using the parameters of Table XIII, Fig. 50 shows the average energy for the ―Store‖ operation; as expected, the average energy decreases at lower feature sizes. The proposed 7T1R cell remains the best among the NVSRAMs (with the 9T2R having the highest value). The Rnv8T cell utilizes a variable supply voltage (CVdd in [88]), to reduce the energy for the ―Store‖ state compared to the 8T2R cell (as also confirmed by the evaluation in this study).

80

Fig. 50. Average ―Write‖/―Store‖ energy for the different memory cells versus feature size.

Fig. 51 shows the average energy for the ―Restore‖ operation as applicable to the NVSRAM cells only; the simulation results show that the 7T1R cell proposed in this manuscript has the lowest value for rewriting the stored data back to the SRAM core. Although it incurs in a penalty during the ―Store‖ operation, this feature is amply compensated by avoiding the leakage current in the ―Standby‖ mode occurring in a volatile SRAM cell.

Fig. 51. Average ―Restore‖ energy for four different NVSRAM memory types versus feature sizes.

3.5.4. Average Write(Store)/Read Delays

Table XV. Average delay of Memory Cells (32nm)

Memory Cell Operation Average Delay (s)

―0‖ ―1‖

81

Memory Cell Operation Average Delay (s)

7T1R Store 30.17 ps 31.40 ps

Read 36.93 ps 38.54 ps

8T2R [78] Store 46.88 ps 46.88 ps

Read 39.40 ps 39.40 ps

9T2R [80] Store 145.0 ps 147.1 ps

Read 127.9 ps 129.4 ps

6T Write 29.86 ps 29.86 ps

Read 36.56 ps 36.56 ps

Rnv8T [88] Write 43.83 ps 43.83 ps

Store 66.34 ps 66.34 ps

Read 41.72 ps 41.72 ps

The proposed 7T1R memory cell has been simulated and its performance is compared with the other three cells. Using the parameters of Table XIII, Table XV shows the simulation results at 32 nm feature size for the ―Store‖/―Write‖ and ―Read‖ operations. Note that the ―Reset‖ operation for both the 7T1R and

8T2R memory cells is executed prior to the ―Store‖ operation. The results show the improvement of the

7T1R memory cell, especially compared with the 9T2R cell for the ―Store‖ operation. Except for the 9T2R, all other cells have similar values for the average ―Read‖ delay. Different from the other cells, the 9T2R cell incurs in a significant penalty; this is mainly caused by its circuit implementation that BL is connected to both RRAMs, thus slowing down the charge/discharge process.

82

Fig. 52. Average ―Write‖/―Store‖ delay for the different memory cells versus feature size.

For the ―Store‖ operation, the proposed 7T1R requires a longer time for programming the cell, thus leading to a very small speed penalty compared with a 6T SRAM cell. The Rnv8T has a relatively larger

―Read‖ delay due to its ―Write‖-assisted transistors [88] that increase the BL load. The ―Store‖ and

―Read‖ delays are given in Fig. 52 and Fig. 53 versus a reduction in the feature size; the proposed 7T1R memory cell accomplishes a significant improvement, although it incurs in a small penalty with respect to the 6T (volatile) SRAM. Moreover, Fig. 52 shows the simulation results for both the ―Write‖ and ―Store‖ operations of the Rnv8T cell; in this cell, the operations are separate, thus different from the other cells considered in this manuscript. These results confirm that the proposed cell has excellent performance for non-volatile memory operation and related metrics, as based on the previously presented simulation results.

Fig. 53. Average ―Read‖ delay for the different memory cells versus feature size.

83

3.5.5. Area

Respective to the layout of the 8T2R [78] and 9T2R [80], the proposed 7T1R cell is obtained using

Cadence Virtuoso [89] at a 32nm MOSFET feature size. As RRAM1 is placed on a different layer [17] than the MOSFETs (using stacking), its area is not included in this evaluation (this condition is applicable to all NVSRAMs considered in this manuscript). Compared with a volatile CMOS (6T) SRAM with an area of 1092λ2 [3], the 8T2R has an area of approximately 3328λ2. Comparatively, the proposed non- volatile memory cell further simplifies the design by using only one programmable RRAM and its layout is presented in Fig. 54, i.e. with a cell area of approximately 2958λ2. The 9T2R cell of [80], requires an equalization transistor, thus having a slight area penalty and resulting in an area of 3384λ2. Hence also under this metric, the proposed memory cell offers a significant advantage for saving area.

Fig. 54. Layout of the proposed 7T1R NVSRAM.

3.5.6. Static Noise Margin

The Static Noise Margin (SNM) is considered a metric for the stability and robustness of a memory cell. The SNM is defined based on the voltage transfer characteristics (VTC) of the complementary storage nodes to characterize the tolerance to noise.

84

Fig. 55. Write Static Noise Margin (WSNM) of 7T1R memory cell for Store ―1‖ at 32 nm.

Consider the proposed 7T1R NVSRAM cell; the 1T1R RRAM is only turned ON and connected to the 6T SRAM core during the ―Store‖ and ―Restore‖ operations. The transistor M7 turns OFF RRAM1 during the ―Read‖ operation; hence, its effect on the Read SNM (RSNM) can be neglected. Therefore, simulation has concentrated on the Write SNM (WSNM) to evaluate the stability of the proposed cell. Fig.

55 and Fig. 56 show the results at 32nm (using the parameters listed in Table XIII) for the ―Store‖ ―1‖ and

―Store‖ ―0‖ operations of the proposed NVSRAM cell.

Fig. 56. Write Static Noise Margin (WSNM) of 7T1R memory cell for Store ―0‖ at 32 nm.

The WSNM is measured using the butterfly plot, obtained from DC simulation by sweeping the input of the inverters (DN and D). For a successful ―Store‖, only one crosspoint must be found on the butterfly plot, indicating that the cell is monostable; the WSNM for writing a ―1‖ is the width of the

85

smallest square that can be embedded between the lower-right half of the curves (the WSNM for writing a

―0‖ can be obtained in a similar fashion). The WSNM is the least value between the ―Store‖ ―0‖ and ―1‖ margins; a cell with a low WSNM has a poor ―Store‖ ability. The measured widths of the smallest embedded square of the lower-right half side of Fig. 55 and Fig. 56 are 0.313V and 0.297V, respectively.

They are less than the SNM of a 6T SRAM cell, i.e. 0.390V [73]. Using the same method, the WSNMs of the 8T2R and the 9T2R memory cells are given by 0.322V and 0.332V, respectively. These two cells have a better SNM than the proposed cell due to the complementary arrangement used in the NVSRAM cells.

3.5.7. Multi-context Configurability

Fig. 57. Multiple-context configuration of proposed 7T1R cell.

In this section, the proposed non-volatile SRAM cell is assessed with respect to the capability to store and operate under multiple sets of configuration data, as required for FPGA operation [80]. This is referred to as multi-context configurability (MCC). MCC is achieved by connecting parallel 1T1R

RRAMs to the same storage node D of a memory cell (Fig. 57). The two signals CTRL1 and CTRL2 of a single memory cell are now given by the multiple-bit signals Control Line (CLi) and Source Line (SLi) for controlling the operation of the ith RRAM cell.

Table XVI. Simulated MCC scenarios for Proposed 7T1R memory cell

Memory Cell RRAM 1 All remaining k-1 RRAMs

86

Memory Cell RRAM 1 All remaining k-1 RRAMs

Programmed 0 0 Stored Data 0 1

1 0

1 1

Simulation has then been performed to find the largest number of parallel RRAM circuits (denoted by k) for correct operation of the MCC in Fig. 57. Except for ―Power-down‖, the other three operations

(―Read‖, ―Store‖ and ―Restore‖) deteriorate by increasing the number of parallel RRAMs. Compared to the other two operations, ―Restore‖ is more affected, because data for ―Restore‖ depends only on each programmed RRAM and its associated CL (a BL is capable of driving a large number of cells). Therefore, evaluation of MCC has been pursued with respect to the four scenarios shown in Table XVI. A value of k=42 has been found for the first scenario i.e. when 43 1T1R RRAMs are connected, the storage node D cannot be written back with a ―0‖. This is mainly due to the signals used for ―Restore‖, in which CL1 and

SL1 are both ―1‖ (while all other signals are ―0‖). This leads to an erroneous operation when the ―Power- on‖ operation causes a large positive current at node D, thus flipping the expected restored data. The other three scenarios are capable of restoring the data back to the 6T memory cell and a value of k=50 have been found. The above results demonstrate the capability of the proposed NVSRAM cell to be very effective under MCC.

3.6. Proposed Soft Error and Hardened NVSRAM Cell

This section presents a brief review of soft error and the proposed hardened NVSRAM cells based on the proposed 7T1R non-volatile cell circuit.

3.6.1. Soft Error and Critical Charge

The amount of charge stored on a circuit node is becoming increasingly smaller due to the lower supply voltage and the smaller node capacitance. This makes circuits more susceptible to spurious voltage and charge variations caused by cosmic ray neutrons and α-particles. These energy particles travel through

87

the silicon bulk and create minority carriers that may be collected by the source/drain diffusion. They alter voltage values [29] and data integrity could be changed [28] if storage cells (such as memories and latches) are affected by the occurrence of this type of event. This event may result in a transient fault (TF); if a TF is latched by a sampling element (latch), then this may result in a so-called soft error (SE) [28]. The number of soft errors is expected to be significantly higher for CMOS in the deep submicron/nano ranges

[29]. In the technical literature, the TF been extensively modeled as a current pulse [34][35]. The charge at a single node (due to cosmic ray neutrons or α-particle hits) generates a large transient current at that node; therefore, a TF at such node can be modeled as a current pulse for HSPICE simulation.

In a memory circuit, the transient voltage change that is generated by a heavy strike, may directly lead to a Single Event Upset (SEU) as a state change of a memory cell [92]. A SEU is said to occur when the collected energy Q at a particular node is greater than the critical charge, Qcrit, i.e. Qcrit is the minimum charge that needs to be deposited at the sensitive node of a storage cell to flip (change) the stored bit (data).

Usually for a SRAM cell, Qcrit depends not only on the charge collected, but also on the temporal shape of the induced pulse [29].

3.6.2. SEU Tolerance of Existing NVSRAM cells

Using the current pulse method with the circuit in [93], the 7T1R memory cell shows that a change of state due to a soft error may be possible (albeit highly unlike) during the ―Restore 0‖ than the ―Restore

1‖, because in the former ―Restore‖ case, both signals CTRL1 and CTRL2 are ―1‖. So, if a positive pulse occurs due to a heavy ion strike, the NVSRAM cell flips the stored data at node D from ―0‖ to ―1‖. So using the parameters in Table XIII and the circuit in [93], the simulation results show that the charge at the storage node for the 7T1R cell is 1.739 fC, while it is 2.361 fC for a 6T core cell. The reduction in this charge is caused by the presence at D of the RRAM.

88

Fig. 58. Charge at storage node vs feature size for four memory types.

Fig. 58 shows the critical charge of the SRAM core (node D) versus feature size for different

NVSRAM cells (7T1R, 8T2R [78] and 9T2R [80]) as well as the unhardened 6T SRAM cell. The charge at D in the 9T2R cell is barely better than the one of the 6T; this is mainly caused by the peripheral circuit added in this scheme for non-volatile storage and its operation during ―Restore‖ (as consistent with the 6T

SRAM). The 9T2R and 8T2R cells of [78][80] have two RRAMs directly connected to the storage nodes, thus affecting the charge stored at D as well as its complement DN.

3.6.3. Proposed Hardened NVSRAM Cells

In this section, three designs of a NVSRAM cell with improved critical charge at the storage node are proposed. These hardened designs also incorporate the feature of low-power operation of [94] using a positive virtual ground technique to lower the gate leakage current. The three new low-power hardened

NVSRAM cells are as follows:

89

Fig. 59. Proposed 9T1R NVSRAM cell.

9T1R NVSRAM cell (Fig. 59): two pass transistors (M8, M9) are added to the 7T1R cell to provide different ground levels for the standby and active modes of cell operation [42]. The gates of transistors M8 and M9 are controlled by WLN and WL, respectively. The virtual ground voltage is at VSS during the

―Write‖ and ―Read‖ operations, while the virtual ground VS is connected to a positive voltage larger than

VSS during ―Restore‖. By utilizing this technique, the gate leakage and the subthreshold currents are significantly reduced by the virtual ground VS (set to a positive value).

Fig. 60. Proposed Hardened NVSRAM Type-1 cell.

Hardened NVSRAM Type-1 cell (Fig. 60): in this design, the technique of virtual ground proposed by

[94] is used to reduce the power dissipation. As in [34], four transistors are added to control the voltage at storage nodes D and DN. This is a 13T1R NVSRAM cell.

90

Fig. 61. Proposed Hardened NVSRAM Type-2 cell.

Hardened NVSRAM Type-2 cell (Fig. 61): this is an improved design of the cell in Fig. 60. In the hardened SRAM Type-1 cell, the power dissipation increases when the value of VS is above 0V, because

M11 and M12 force the voltage to 0V although VS supplies a positive virtual ground. The third proposed cell design incorporates four transistors to control the node voltages; however, it modifies the use of VSS to the virtual ground VS. Therefore, this variation decreases power dissipation, while improving the

―Write‖/‖Read‖ times due to the lower charge. This is also a 13T1R NVSRAM cell.

It should be also noted that [34] has shown that volatile hardened cells utilizing a positive virtual ground technique to lower the gate leakage current (similar in design to the ones proposed in this paper) have a better SEU tolerance as well as other performance metrics (such as ―Write‖/ ―Read‖ times and power dissipation) than other hardened volatile designs, such as DICE [38] and its low-power variants; hence, this aspect is not further treated in this manuscript (the interested reader can refer to [34] for an in- depth presentation).

The transistor sizing strategy for designing the 9T1R depends on the core of the proposed cell (in this case, a 6T SRAM) and must consider its Read/Write operations. As per the guidelines in [3], the sizing of each transistor (i.e. W/L) is as follows: 4:1 for M1 and M3; 3:1 for M5, M6, M7, M8 and M9; 2:1 for M2 and M4. For the two hardened memory types, the sizing of extra transistors is as follow: 2:1 for M10 and

M13, 4:1 for M11 and M12. These designs and sizing strategy are evaluated using the layouts of the proposed memory cells in the next section.

91

3.7. Evaluation and Analysis of Proposed NVSRAM Cells

The three proposed NVSRAM cells are evaluated using different metrics and operational features in this section. Among them, the critical charge is found in all NVSRAM cells, thus considering the tolerance to SEU for the non-volatile and low-power functions.

3.7.1. Virtual ground voltage VS

The proposed 9T1R NVSRAM cell is simulated for various values of virtual ground voltage VS to investigate its effects. A change in VS significantly affects ―Write‖ and ―Restore‖. A higher value of VS leads to substantial power savings, when the stored information is written back into the cell.

Table XVII. Performance of 9T1R Memory Cell at 32 nm

VS Write Time (ps) Read Time (ps) Storage Node Charge (fC)

0 V 32.32 ps 39.45 ps 2.345 fC

0.05 V 32.38 ps 39.49 ps 2.096 fC

0.1 V 32.41 ps 39.52 ps 1.844 fC

The variation of the virtual ground voltage affects also the ―Write‖/―Read‖ time and the charge of the storage node; the results at 32nm are shown in Table XVII for the 9T1R cell. The ―Read‖/―Write‖ times increase with an increase of this voltage value, while the charge at the storage node decreases. The same relationship also applies to the other two proposed NVSRAM cells.

3.7.2. Power Dissipation

The simulation results for the power dissipation are shown in Fig. 62 using the parameters in Table

XIII with VS values of 0.1V and designs at different MOSFET feature sizes. The three proposed NVSAM cells and the 6T SRAM are evaluated and compared under specific operations. The power dissipation of the three operations of a NVSRAM (―Write‖, ―Power-down‖ and ―Restore‖) significantly decreases when the value of VS is 0.1V; the cell is connected to the positive virtual ground voltage and the source nodes of

92

the NMOS transistors in the cross-coupled inverters are switched through the pass transistor M8. This leads to an increase of the threshold voltages, reducing the sub-threshold leakage within the cell.

Fig. 62. Power dissipation for four memory cells at 32 nm (VS=0.1V).

The two hardened NVSRAM cell types also increase their power consumption due to the additional four transistors; the hardened Type-2 cell has lower power dissipation, compared to the hardened Type-1 cell, because the internal nodes are also connected to the virtual ground. In all cases, power dissipation is reduced by reducing the feature size.

3.7.3. Performance

Fig. 63 shows the ―Write‖ and ―Read‖ times of the same four cells; the ―Write‖ time is less than the

―Read‖ time for the 9T1R and the 6T cells. However, the 9T1R incurs in a performance penalty compared to a 6T SRAM cell due to the additional pass transistors. During the ―Write‖ operation, WL is ―1‖ and turns ON the transistor M9; the charge/discharge capacitance increases by adding the drain-source capacitance of transistor M9, also increasing the delay in the operation. The proposed 9T1R memory cell shows similar behavior for the charge/discharge process when WL is ―1‖ for the ―Read‖ operation.

Compared to the 9T1R NVSRAM cell, the two other hardened NVSRAM cells incur also in a performance penalty due to the additional four transistors to improve the SEU tolerance. The simulation results show that the hardened Type-2 is faster in operation than the hardened Type-1 cell.

93

Fig. 63. Performance metrics for memory cells (VS=0.1V).

3.7.4. Critical Charge

In previous sections, the charge of the storage node (D) has been assessed by simulation. However additional circuits are utilized in the proposed NVSRAM cells; these circuits introduce new nodes (i.e. G and H in Fig. 60 and Fig. 61) that must be considered when establishing the critical charge. Fig. 64 shows the charges at D, H and G for the proposed NVSRAM cells by varying the feature size. These nodes are analyzed in more detail next.

This node has the critical charge of a 6T SRAM; the addition of the RRAM as well as other transistors affects the value of this charge. As expected and shown in Fig. 64, the charge value at node D decreases with the scaling of the MOSFET features size (the feature size reduction results in the decrease of the MOSFET capacitance and impacts the SEU tolerance). The proposed 9T1R cell marginally lowers the charge at D and slightly deteriorates the SEU tolerance at this node, although it achieves significant power saving. The proposed two types of hardened NVSRAM cells substantially increase the charge at D, albeit at the expense of a minor reduction in performance. The hardened Type-2 cell has a lower charge value than the hardened Type-1 cell with an improvement in performance and power consumption when a change in the positive virtual ground voltage occurs.

94

Fig. 64. Charges for four memory cells at 32 nm (VS=0.1V).

Next, the non-volatile storage node G is considered; G is the internal node between transistor M7 and

RRAM1. As shown in Fig. 64, node G has a significant larger value in charge (i.e. nearly two orders of magnitude). This is particularly attractive for the proposed two hardened cell types (having a charge higher than also the 9T1R cell due to the internal cross-coupled inverters). This result confirms that the non- volatile element is very unlikely to be affected by a SEU due to the resistance provided by RRAM1; hence, the RRAM is capable of restoring the correct value to the volatile memory cell when the 6T core is affected by a soft error.

The addition of the circuitry of [94] also results in the presence of node H; the simulation results for the charge at this node are shown. The proposed hardened Type-1 cell has a significant higher SEU tolerance than the 9T1R cell; this is mostly due to the introduced internal cross-coupled inverters in the memory cell. The hardened Type-2 cell has a lower tolerance at the virtual ground node H when VS equals

0.1V, although its charge value is still better than the 9T1R cell. A SEU at H will not however, affect the

RRAM, but only the leakage current and the storage node. So the data stored in the RRAM is highly reliable.

In conclusion based on the results of Fig. 64 at VS=0.1V, the critical charge for the proposed 9T1R and hardened Type-1 NVSRAM cells is still at the volatile storage node D. The hardened Type-2 cell has the critical charge at the virtual ground node H (that does not affect the stored non-volatile data). However, when VS changes to 0V, the charge value at node H for hardened Type-2 cell is larger than that at node D; so this cell has the same critical charge node as other memory cells when VS=0V. This occurs, because in

95

the circuit of Fig. 61, H is connected to the ground of the internal cross-coupled inverters, hence affecting both two cross-inverters and leading to the failure of the hardened scheme and loss of the data at D. This feature however does not affect the data stored in the RRAM; so even though the storage node D can be affected by a SEU, the copy of the data stored in RRAM1 remains at the correct value.

3.7.5. Soft Error Rate

Besides critical charge, conventionally the Soft Error Rate (SER) is also a significant characterization parameter of the robustness to soft errors in the digital systems. Especially, the SER of a memory cell circuits depends on the device characteristics and the flux encounters the device [95].

Considering the various components contributing to the SER, such as alpha particles emitted by decaying radioactive impurities in packaging and interconnect materials, and atmospheric neutrons.

However, the component proportion of the SER, generated from alpha and , depends on the specific condition. At sea level, the alpha-SER may be dominant, particularly if the IC package has been manufactured from less highly purified materials [96]. Differently, neutron-SER will generally be dominant at flight altitudes, where the cosmic-neutron flux is roughly two orders of magnitude higher at terrestrial altitudes [96]. In this manuscript, the SER investigation only considers the condition at sea level.

The empirical model [97] is used to characterize the different components contributed to the SER, which is given by

SER  rad Adiff exp(QC /) (12)

where κ is an overall scaling factor and η is a measurement for charge collection efficiency for a given , depending on the give IC process technology. Both are referred to the experimental process [98]. For the used model, it is capable of considering both the critical charge and the variation of

MOS diffusion area (Adiff) generated from the various circuit implementations. The term of Φrad denotes the nominal flux for the given radiation. In this estimation, the alpha-SER results and the neutron-SER results presented adopt the nominal fluxes of 0.001 alpha/h•cm2 and 14 neutron/h•cm2, respectively [96].

Meanwhile, the critical charge (Qc) values used are referred to the HSPICE simulation results and the

96

relevant other parameters for the empirical mode, such as the overall scale factor, the charge collection efficiency, have taken the respective values for calculations [96][98].

Table XVIII. Summary of SER (FIT/h•MBit) at Sea-Level obtained from Evaluations at 32nm

6T 9T1R Hardned Hardned NVSRAM NVSRAM Type-1 Type-2

Maximum of 1531 1658 1438 1459 Alpha-SER

Average of 1442 1562 1355 1374 Alpha-SER

Maximum of 582 606 543 548 Neutron-SER

Average of 509 531 475 482 Neutron-SER

Fig. 65. SER vs feature size for various memory cells.

The statistical results at 32nm feature size, are categorized into Table XVIII to present the maximum and average of the SER values with FIT/h•Megabit (FIT/h•MBit). Meanwhile, the plots in Fig. 65 shows the SER values vary with the feature size variation. Notice that the SER in these figures includes both contributions from Alpha and Neutron. Therefore, the above evaluations of the SER sufficiently presents the proposed two hardened NVSRAMs effectively reduce the SER from various components and demonstrate their robustness to the soft errors.

97

3.7.6. Area

Fig. 66. Layout of the proposed 9T1R memory cell.

Fig. 67. Layout of the proposed Hardened Type-1 memory cell.

98

Fig. 68. Layout of the proposed Hardened Type-2 memory cell.

In a non-volatile memory [99], the resistive elements of the cells are always placed on a separate layer, hence this evaluation will only consider the layout of the transistors (i.e. no resistive element). For completeness of the analysis, [100][101] have reported Ox-based elements of square area with a side of 40 to 65 nm. These elements can be stacked over multiple planes for efficient layout implementation. Hence at 32nm, the resistive elements account to at most an area occupied by four transistors. Compared with a

6T volatile CMOS SRAM cell with an area of 1092λ2 [3], the proposed 9T1R NVSRAM cell requires a larger area, i.e. 2990λ2 with its layout shown in Fig. 66, using Cadence Virtuoso [89]. The two proposed hardened NVSRAM cells requires adding four transistors as well as a RRAM with their layouts in Fig. 67 and Fig. 68, respectively; in these cases, the area is given by approximately 4828λ2. The area overhead is expected due to the larger circuit complexity for providing higher SEU tolerance and non-volatile operation.

3.8. Conclusion

This chapter has presented several design for reduced the power dissipation and leakage component of the memory circuits with the Oxide RRAM technique. By implementing the RRAM, the proposed

Multiple-level memory cell (MLC), 7T1R and 9T1R memory cell circuits effectively save the power dissipations with multiple data storage and ―Instant-on‖ scheme. While the proposed memory cell offers a significant advantage in area due to smaller numbers of MOSFETs and Oxide Resistive elements, it also

99

incurs in a degradation of Write SNM (WSNM) due to the asymmetric nature of its design. Although the addition of a RRAM deteriorates the WSNM of the memory cell, its influence can be mitigated by varying the RESET resistance of the resistive element. This chapter has also shown that the proposed NVSRAM cell offers a significant capability to implement multiple-context configurability as capable of storing and operating under multiple sets of configuration data for FPGA operation.

100

4. DYNAMIC RANDOM-ACCESS-MEMORY

4.1. Introduction

Reduced scaling in nanometric technology has the potential to substantially increase the density and performance of digital circuits and systems. However, process variations can severely affect the potential gains of a reduced feature size in CMOS transistors, because they may affect the stability of circuits such as memories and erroneously change a circuit behavior [102][103]. Innovative schemes are required to address these problems at all levels in the design flow.

In today‘s microprocessors, on-chip memory occupies a significant portion of the overall die area; it is extensively used to provide high system performance, while considering low power requirements.

Dynamic memories have been extensively used for data storage structures in the core due to the transient nature of the data flow. Different designs of a Dynamic Random-Access Memory (DRAM) cell have been proposed; among them, the 3T1D DRAM cell [102] is a promising scheme due to the small area, the non-destructive read process and the good retention time. However, the operation of this cell is heavily influenced by process fluctuations [104][105][106] and external induced phenomena.

Among these fluctuations, the so-called random dopant fluctuation (RDF) results from a process variation in the implanted impurity concentration and plays a significant role in CMOS performance [107]; the RDF in the channel region may alter the MOSFET properties, especially its threshold voltage [107]. If more advanced process technologies are utilized, the RDF has a stronger effect, because the total number of dopants is small and the addition or deletion of a few impurity can significantly alter the transistor properties [107]. In addition to process fluctuations, DRAMs are also susceptible to so-called soft errors due to externally induced upsets, such as those generated from alpha particles and atmospheric neutrons [108][109][110]. [111][112] have shown that neutron induced soft errors dominate over alpha particle upsets in the deep sub-micron range.

A further area of concern in memory design is the increase in leakage current (so contributing to power dissipation) due to short channel effects in which the drain potential lowers the source junction barrier to the minority carriers [113]. Although this phenomenon (referred as to drain induced barrier

101

lowering, or DIBL) has been extensively investigated in the past [114][115][116], the leakage problem has become a prominent cause of high power consumption for CMOS technology at nano scaled feature sizes.

New schemes such as those utilizing the provision of a gated diode in a 3T1D cell [102], have been proposed to mitigate the above problems for designing a reliable, low power and high retention DRAM cell. For example, forward body-biasing is capable of improving performance [102]; additional transistors can be introduced in the cell to improve the retention time and tolerance to process variations [102]. The addition of transistors also contributes to a better tolerance to a Single-Event Upset (SEU) [35][36]; however due to the small number of transistors compared to a SRAM cell, the provision of adding further transistors must be taken into account by considering also the concerns previously outlined for DRAM cell design.

In this chapter, the DRAM circuit issues are investigated to improve its robustness to the soft error and process variability. In addition, the RRAM is implemented by modifying the previous DRAM circuits to achieve the non-volatile storage with the ―instant-on‖ scheme and extensive evaluations are included to demonstrate the improvements.

4.2. Proposed Volatile DRAM Cells

4.2.1. 4TI DRAM Cell

Fig. 69. Proposed 4TI DRAM cell circuit.

The first proposed DRAM design is shown in Fig. 69 and it is referred to as the 4T Improved DRAM cell (denoted as 4TI). Two control signals (given by Ctrl1 and Ctrl2) are required. Ctrl1 is used the same as the Control signal for the 4T DRAM cell to refresh the stored data at the storage node. Ctrl2 is

102

connected to the body of the transistor T4 to replace the original proposed connection of RWL proposed in

[102]. Therefore, Ctrl2 is equal to RWL but it has a different lower voltage value to improve the ―Read‖ operation. The simulation of Fig. 70 shows the correct operation of the proposed 4TI DRAM cell using the parameters in Table XIX and a 45nm MOSFET feature size.

Table XIX. Parameters for DRAM Cell Simulation

Parameter Value

Temperature 25 C

MOSFET Feature 45 nm Size

Vdd 1 V

Control 1 V

Ctrl1 1 V

Ctrl2 0.1 V, 1 V

Fig. 70. Simulated waveforms of 4TI DRAM cell at 45nm.

103

The body biasing voltage of transistor T4 in the 4TI DRAM cell has been varied to characterize its influence on the transistor threshold voltage, as a further parameter to affect the ―Read‖ operation. The simulated results are shown in Fig. 71 (again using the parameters of Table XIX). The voltage of RBL is simulated for the ―Read‖ 0 operation by changing the voltage value of Ctrl2. The RBL voltage decreases from 220mV to 0V when the voltage of Ctrl2 increases from 0V to 0.4V. This makes possible a correct

―Read‖ operation by the Sense Amplifier (SA) as output of the memory.

Fig. 71. Simulated RBL voltage vs Ctrl2 voltage for body of transistor T4 at 45nm.

From these timing diagrams, the first two cycles perform the operations of ―Write‖ and ―Read‖ 0, respectively. During the ―Read‖ operation, although the voltage of RBL is discharged, it is still significantly different from the voltage of the ―Read 1‖ operation. Therefore, this voltage variation can be recognized by the Sense Amplifier to read at the output the stored data. This also shows that the technique of forward-biasing body is effective in improving the ―Read‖ operation.

4.2.2. 4T1D DRAM Cell

Fig. 72. Proposed 4T1D DRAM cell circuit.

104

The proposed 4T1D DRAM cell is shown in Fig. 72 by incorporating the gated diode technique; in this cell, an additional gated diode is connected to the SN to improve its retention time and speed of operation. The two signals Ctrl1 and Ctrl2 use the same parameters as in Fig. 69. Also as for the 4TI cell, the technique of forward-biasing body is still used to improve the ―Read‖ operation. The simulated operation of the proposed 4T1D cell is given in Fig. 73. The timing diagrams of the signals are very similar to the ones in Fig. 70 confirming the correct operations of ―Write‖ and ―Read‖. The voltage on

RBL changes, because the value of Ctrl2 is now given by 0.1V.

Fig. 73. Simulated waveform of 4T1D DRAM cell at 45nm.

4.3. Evaluation and Analysis of Proposed Volatile DRAM Cells

The volatile DRAM cells are simulated next to evaluate their performance using different figures of merit. Different SRAM cell schemes (6T, 7T [118] and 8T [119]) are also simulated for comparison purposes. The MOSFET feature size is varied from 45nm to 10nm using the corresponding PTMs.

4.3.1. Performance

105

Fig. 74. Average write delay vs type of memory cell.

Initially, the delay values for the ―Read‖ and ―Write‖ operations for 0 and 1 are established. Fig. 74 and Fig. 75 show the average values, company with each delay value for both the ―Write‖ and ―Read‖ 0/1 operations (bold entries denote the best values. The newly introduced transistor T4 in a 4T DRAM cell improves the retention time, but it also accounts for a penalty in the write access time [102] compared with the 3T1D cell. Moreover, the utilization of the two control signals used for transistor T4 in the 4TI cell results in an improved average ―Read‖ delay, because the forward-biased body is beneficial to this operation compared to a 3T1D cell. However, due to the small voltage value used for the Ctrl2 signal (i.e.

0.1V), the improvement in the ―Read‖ operation is less than for the 4T cell of [102]. The proposed 4T1D cell has the least average ―Read‖ delay among the four DRAM cells memory types by incorporating both the gated diode and the body biasing voltage of transistor T4. The evaluation of the 6T SRAM is also included. Compared with other DRAM cells, the 6T presents few advantages, but it has few disadvantages such as a larger number of transistors. The other SRAM cells (such as 7T and 8T) show also better performance than the DRAMs.

106

Fig. 75. Average read delay vs type of memory cell.

4.3.2. Power Dissipation

Power dissipation is defined for each operational cycle, shown in Fig. 76; there is no significant difference among the cells, because they incorporate four devices in their operation. On a marginal basis, the value of the proposed 4TI cell is the best, since it uses two control signals and a low voltage value for one of them, thus reducing the dynamic power consumptions during a ―Read‖. Compared with the other

DRAM cells, the proposed 4T1D cell has the largest power consumption due to largest number of devices in its design; however, the difference is very marginal.

Fig. 76. Power dissipation vs type of memory cell.

107

The three SRAM memories incur in larger power dissipations, because they require a larger number of transistors. Therefore, the proposed DRAM cells have the advantage of low power consumptions when compared with the SRAM.

4.3.3. Retention Time

Fig. 77. Retention time vs various feature sizes for DRAMs.

The retention time is defined as the time such that the minimum voltage Vmin has a value to distinguish the current (high voltage value) ―1‖ state from the (low voltage value) ―0‖ state [120]. The low retention time in a 3T1D cell (Fig. 77) is mostly caused by the sub-threshold leakage due to the weak read access transistor [102], leading to most of the charge flowing into the bit lines. The retention time of the proposed 4TI is significantly higher than the 3T1D, because it effectively mitigates the leakage. The proposed 4T1D DRAM cell has the highest value, because the gated diode significantly increases the internal capacitance at the Storage Node. Similar to power dissipation, the reduction in feature size reduces the retention time (due to the decrease of the internal capacitance of the transistors).

4.3.4. Critical Charge

The evaluation of the charges of the internal nodes of the DRAM cells is also evaluated as a measure of tolerating soft errors. The two nodes (Fig. 69 and Fig. 72) W and SN represent the critical pair by which the minimum charges (Qw and Qs respectively) are found to flip the stored data. The tolerance to soft

108

errors of the SRAM cells (6T, 7T and 8T schemes) is better than the DRAMs due to the cross-inverting circuit. Therefore, the discussion in this section is only concentrated on the evaluation of the DRAMs.

Table XX. Charges of DRAM Cell Types

Cell Node Charge (aC)

45 nm 32 nm 22 nm 16 nm 14 nm 10 nm

3T1D Qs 19.55 16.33 14.28 7.303 3.867 2.144

4T Qs 10.71 7.946 6.043 3.091 1.636 0.907

Qw 106.4 99.21 92.86 47.49 25.14 13.94

4TI Qs 18.43 15.67 13.42 6.863 3.634 2.015

Qw 122.6 115.4 110.8 56.67 30.00 16.63

4T1D Qs 48.96 44.23 24.75 12.66 6.701 3.715

Qw 140.3 131.9 73.80 37.74 19.98 11.08

The results of the charges of the critical pairs are shown in Table XX; the value of Qs is significantly smaller than Qw for the 4T, 4TI, 4T1D DRAM cells, i.e. node W has a better tolerance than the Storage

Node (SN) to soft errors. Note that the 3T1D DRAM cell has only the SN node (no node W). As in the 4T cell, the body of transistor T4 is connected to RWL, then the tolerance to soft errors deteriorates compared with the proposed 4TI cell. These results and findings have been confirmed by simulation. Finally, the proposed 4T1D cell has the highest charge values for both nodes, because D1 is connected to the SN as gated diode and increases the node capacitance, leading to the improvement of the tolerance to soft errors.

4.3.5. Area

109

Fig. 78. Layout of the 3T1D DRAM cell of [102].

Fig. 79. Layout of the 4T DRAM cell of [102].

Fig. 80. Layout of the BBMOS as part of a 4T/4TI DRAM cell.

Fig. 81. Layout of the proposed 4T1D DRAM cell.

The layouts of the four DRAM cells have been designed using Cadence Virtuoso [89] and they are given as follows. The layout of the 3T1D cell is shown in Fig. 78; in general, the NMOS transistors are constructed on a p-type substrate that has a shared body [89]. However, due to the utilization of the forward body-biasing technique, the transistor T4 in the 4T cell is constructed on a separate p-type well because the body voltage of T4 will be changed. Hence, the layout of 4T is split into two parts, as shown in Fig. 79 and Fig. 80. Fig. 80 shows the NMOS transistor on a separate p-type well defined by the Body

Biasing MOS (BBMOS). The 4TI has the same layout as the 4T cell, because the difference between them is the biasing voltage variation. Finally, the layout of the proposed 4T1D DRAM is presented in Fig. 81

110

(to which the BBMOS of Fig. 80 must be added). The comparative results are shown in Fig. 82 based the areas of the 3T1D, 4T/4TI and 4T1D cells (given by 1162λ2, 1304λ2 and 1531λ2, respectively) as well as the areas of 3T [3], 6T [3], 7T [118] and 8T [119] cells. Therefore, SRAMs account for a significant area penalty. The improvements in performance as per different figures of merit and tolerance to soft errors

(achieved by using forward body-biasing and/or a gated diode for a 4T1D DRAM cell) incur in an area overhead for the DRAMs.

Fig. 82. Area vs type of memory cell.

4.3.6. Process Variability

Process variability is evaluated in this section by mostly focusing on random dopant fluctuation

(RDF) and threshold voltage (VT) variation at nano scaled ranges.

Table XXI. Standard deviation for variability (in percentage) of each Technology Node

Parameter σ (%)

45nm 32nm 22nm 16nm

Leff 2% 2% 2.5% 3%

VT 4% 6% 8% 10%

The process variability of the DRAM cells in the following is evaluated by Monte Carlo simulation.

Table XXI shows the variability (measured by the standard variation in percentage) of VT and Leff for the

111

technology nodes as reported in [90]. A Gaussian distribution is assumed as characterized by the mean value (µ) and the standard deviation (σ) from that mean. Hence, the 3σ/µ ratio (expressed in percentage difference with the nominal value) will be used to quantify the variability.

Table XXII. Variability (in percentage) of each transistor in 4T1D cell at 45nm

3σ/µ (%) T1 T2 T3 T4 D1

Leff Write Delay 3.135 0.457 0.256 5.004 0.119

Read Delay 2.138 4.589 13.47 1.515 1.023

Retention Time 17.02 1.539 0.213 19.82 8.539

VT Write Delay 1.308 0.180 0.032 8.244 0.092

Read Delay 1.302 0.583 27.92 7.749 1.628

Retention Time 0.942 0.955 0.220 3.485 0.296

Next, simulation is performed to study the influence produced by the variability of each individual transistor/diode on the performance of the whole DRAM cell. Variations of the effective channel length

(Leff) and the threshold voltage (VT) using the data of Table XXI are introduced to determine their effect.

The simulation results at 45nm feature size (with the parameters in Table XIX) are reported in Table XXII to show the variability (given in percentage) caused by each device on the performance of the proposed

4T1D DRAM cell as per different metrics. The simulation results for the other three DRAM cells are shown in Table XXIII. For the 4T1D cell, T4 has the most significant impact on the ―Write‖ delay and retention time (followed by T1) while the ―Read‖ delay is mostly influenced by a variation of T3; this is applicable independently of the parameter variation, i.e. both for Leff and VT. Similar dependencies are exhibited for the 4T and 4TI cells. For the 3T1D, T1 is critical because it has the most significant impact on the ―Write‖ delay and the retention time of 3T1D.

Table XXIII. Variability (in percentage) of each transistor in the other three DRAM cells at 45nm

112

3σ/µ (%) Parameter T1 T2 T3 D1(T4)

Leff 3T1D Write Delay 5.347 0.673 0.241 0.164

Read Delay 2.597 5.601 15.38 1.049

Retention Time 26.73 3.750 2.611 9.603

4T Write Delay 3.983 0.436 0.215 6.218

Read Delay 2.197 3.651 20.57 1.476

Retention Time 17.59 2.597 1.636 24.33

4TI Write Delay 3.849 0.417 0.179 5.955

Read Delay 1.959 2.473 16.43 1.571

Retention Time 17.30 2.601 1.349 22.61

VT 3T1D Write Delay 9.469 0.609 0.127 0.289

Read Delay 2.394 7.592 29.46 2.677

Retention Time 8.212 0.783 0.236 1.576

4T Write Delay 5.871 0.780 0.459 11.54

Read Delay 1.962 3.583 27.61 2.769

Retention Time 3.594 1.796 0.254 6.377

4TI Write Delay 5.774 0.732 0.483 10.68

Read Delay 1.865 2.203 25.79 2.873

Retention Time 3.323 1.664 0.242 6.028

On a comparative basis, the 4T1D cell shows the least impact of process variation at 45 nm; VT (Leff) significantly affects the ―Write‖ operation (retention time). Moreover, the ―Read‖ delay is influenced by both parameter variations and T3/T4 are the most critical transistors for all four memory cells. Hence, T3 for the ―Read‖ operation, T4 for both the ―Write‖ operation and the retention time of the 4T1D, 4T and

4TI cells will be investigated further by reducing the feature size.

The DRAM cells are simulated to characterize the process variability by reducing the feature size and by concentrating on the most critical transistor variations (as reported previously): 1) T4 is varied for both

113

the ―Write‖ delay and the retention time of the 4T1D, 4T and 4TI cells. 2) T1 is varied for both the

―Write‖ delay and the retention time of the 3T1D cell. 3) T3 is varied for the ―Read‖ delay for all the four memory cells.

Table XXIV. Variability (in percentage) of four DRAM cells for various MOSFET feature size

3σ/µ (%) Feature Write Read Retention Size Delay Delay Time

Leff 4T1D 45nm 5.004 13.47 19.82

32nm 6.943 19.82 23.03

3T1D 45nm 5.347 15.38 26.73

32nm 7.535 22.72 31.27

4T 45nm 6.218 20.57 24.33

32nm 8.059 29.88 29.77

4TI 45nm 5.955 16.43 22.61

32nm 7.844 23.97 27.12

VT 4T1D 45nm 8.244 27.92 3.485

32nm 13.50 35.74 5.820

3T1D 45nm 9.469 29.46 8.212

32nm 14.82 36.88 13.92

4T 45nm 11.54 27.61 6.377

32nm 17.39 35.56 11.34

4TI 45nm 10.68 25.79 6.028

32nm 16.77 34.95 10.89

The results are presented in Table XXIV; the 3σ/µ ratio increases by decreasing the MOSFET feature size, thus having a more pronounced impact on cell performance. Also, the proposed 4T1D cell shows the

114

least value in variability for the ―Write‖ delay and retention time compared with the values of other three

DRAM memory cells. This is caused by the utilization of both forward body-biasing and the gated diode.

The proposed 4TI cell improves over the 4T cell of [102] (that together with the 3T1D cell are affected the most in terms of the three performance metrics considered, i.e. ―Write‖ delay, ―Read‖ delay and retention time). As expected, the ―Read‖ operation is substantially affected by the variation of both

Leff and VT while the retention time is more affected by Leff. Additionally, the reduction in feature size increases the variability in all cases. Finally in all cases, the proposed 4T1D DRAM cell shows the least variability to these variations, thus confirming its viability for implementation.

4.4. Proposed Non-volatile DRAM Cells

In this section, the feature of non-volatile operation is added to a DRAM cell and new cell designs are proposed; the 3T1D and B3T DRAM cells are used as DRAM cores in the proposed non-volatile cells.

Also, related considerations (such as the read output circuit and the threshold voltage for the ―Refresh‖ operation) are treated.

Table XXV [122] shows the voltage signals biasing for B3T DRAM cell; the voltage at WWL during the ―Write‖ operation is given by -500mV to improve this operation [122]. The gain 3T cell achieves high performance and low static power dissipation using a boost technique [121]; it incurs in a lower area penalty than the 3T1D cell. Moreover, the cell of [121] has been compared by implementing the 3T, 3T1D and B3T with PMOS transistors and it achieves improvements in retention time, power dissipation and operational speed.

Table XXV. Signal voltages used of DRAM Cells

Index Operation WWL RWL CTRL1 CTRL2

B3T Write -500mV Vdd - -

Idle Vdd Vdd - -

Read Vdd 0 - -

Proposed 4T1D1R Reset Vdd 0 Vdd Vdd

Store Vdd 0 Vdd 0

115

Index Operation WWL RWL CTRL1 CTRL2

Idle 0 0 0 0

Restore 0 0 Vdd Vdd

Read 0 Vdd 0 0

Proposed 4T1RP Reset 0 Vdd 0 Vdd

Store -500mV Vdd 0 0

Idle Vdd Vdd Vdd 0

Restore Vdd Vdd 0 Vdd

Read Vdd 0 Vdd 0

Fig. 83. Proposed non-volatile 4T1D1R DRAM cell.

An Oxide-based RRAM is added to the 3T1D DRAM core reviewed in a previous section for achieving non-volatile operation. In this new cell (Fig. 83), the 3T1D DRAM core is connected to a 1T1R circuit that consists of a non-volatile resistive element (RRAM1) and a control transistor T4; the 1T1R is connected directly to the data storage node of the DRAM core and stores the non-volatile information, i.e.

RRAM1 changes its resistance between the LRS and the HRS according to the data written into the

Storage Node (SN). Therefore, this non-volatile cell is referred to as a 4T1D1R NVDRAM cell.

To achieve non-volatile operation, the proposed memory cell has two basic states: ―Idle‖ and

―Power-up‖. The ―Power-up‖ state requires to ―Reset‖, ―Store‖ and ―Restore‖. Thus, a full operational cycle of this memory cell utilizes the following sequence: ―Reset‖, ―Store‖, ―Idle‖ and ―Restore‖. In this design, the cell has two Word Lines (WLs) and Bit Lines (BLs) similar to a conventional (volatile) 3T1D

[104]. The Write Word Line (WWL) and the Write Bit Line (WBL) are utilized in the ―Store‖ operation

116

and the Read Word Line (RWL) and the Read Bit Line (RBL) are utilized in the ―Read‖ operation. The operations of this cell are given as follows (using the parameters of Table XXV).

 Reset: RRAM1 is changed back to HRS ahead of writing to the memory core; this is accomplished by

the RESET process. During this step, a ―1‖ is applied on CTRL1 to turn ON T4; a ―1‖ is also applied

to CTRL2. If the logic value stored at SN is ―1‖ and the state of RRAM1 is HRS, a RESET is not

required. If SN is ―0‖, the negative voltage drop on RRAM1 causes the device to be reset and thus, it

changes its state from LRS to HRS.

 Store: T4 is turned ON and CTRL2 is at 0V to store the information into the memory core. Hence, the

positive voltage drop on RRAM1 changes the state from HRS to LRS. Thus at completion of this step,

the RRAM1 presents different resistance states corresponding to the stored information: if SN is ―1‖,

it is in LRS; if SN is ―0‖, it is in HRS.

 Idle: Non-volatile storage is accomplished by the different resistance states of RRAM1. All signals

are turned OFF to ensure the retention of this information during ―Idle‖. T4 is turned OFF, because it

is controlled by CTRL1. This prevents RRAM1 to change state during the ―Restore‖ operation.

 Restore: In the ―Restore‖ operation, T4 is turned ON by CTRL1; CTRL2 is also high. If RRAM1 is in

LRS, then SN is charged to ―1‖. If RRAM1 is in HRS, due to the large resistance value of RRAM1,

the SN remains at ―0‖.

 Read: During the ―Read‖ operation, the value of RWL turns ON T3 to sense the data stored at the SN.

By utilizing the above operations and related sequences, the proposed non-volatile cell is capable of storing the data into RRAM1 and restoring it back ahead of the ―Read‖ operation.

117

Fig. 84. Proposed non-volatile 4T1RP DRAM cell.

A 4T1RP DRAM cell is also proposed (Fig. 84); this cell utilizes B3T as DRAM core. In this non- volatile DRAM cell, all four transistors are implemented with PMOS to take advantage of the high performance of the B3T core. The voltages of the different signals are also shown in Table XXV.

Fig. 85. Proposed Read Output Circuit for non-volatile DRAM cell.

In addition to the DRAM cells, a peripheral Read Output circuit is also proposed in this manuscript

(Fig. 85). This Read Output circuit utilizes a differential sense amplifier; it receives the small-signal differential inputs and amplifies them to a large-signal single-end output, because a DRAM cell has a small voltage swing. The input ―+RBL‖ is connected to the RBL of the cell while ―-RBL‖ corresponds to its inverse. Also, Signal Enable (SE) is used to sense the voltage on RBL. This circuit is robust for common-mode rejection of the noise generated from switching spikes on the supply voltages and the capacitive crosstalk between Word Lines (WLs) and Bit Lines (BLs); the above advantages are also applicable to a SRAM and the utilization of a single RBL.

In the proposed non-volatile cells, the voltage of the SN is discharged during the ―Idle‖ state due to leakage. Although the ―Restore‖ operation can be used to correct errors in stored data, the threshold voltage of the SN is used for the ―Refresh‖ operation to overcome the voltage reduction due to leakage.

Leakage at the SN consists of two components: the subthreshold leakage through transistor T4 and the gate-source leakage through transistor T2. Both impact performance metrics, such as the retention time. In the proposed cells, leakage may cause a significant drop of the voltage when the cell stores a ―1‖ at the SN.

118

So, if RBL is not discharged during the ―Read‖ operation, the Read Output circuit may generate an erroneous output based on the voltage value on RBL. Leakage has no significant effect on the SN voltage when the cell stores a ―0‖; therefore, the state of storing a ―1‖ is used for defining the threshold voltage for the ―Refresh‖ operation. Using HSPICE and the parameters of Table XIX, this voltage has been found to be 455mV, i.e. in a NVDRAM at 45nm, the stored data must be updated by using a global ―Refresh‖ operation, when the SN voltage reduces to 455mV.

4.5. Evaluation and Analysis of Proposed Non-volatile DRAM Cells

The two proposed NVDRAM cells and the volatile DRAM cells of [102] and [121] are evaluated next. These DRAM cells have been simulated using HSPICE. The 3T1D [102] and the B3T cells [121] are used as DRAM cores in the NVDRAM cells proposed in the previous sections. It is assumed that sizing of a transistor (i.e. W/L) for the 3T1D and 4T1D1R cells is 1:1 and 2:1 for the B3T and 4T1RP cells.

Fig. 86. Simulated waveforms of 4T1D1R cell for ―1‖ at 45nm.

Based on the previously presented sequence, the operation of the 4T1D1R NVDRAM cell is initially simulated (shown in Fig. 86 and Fig. 87 using the parameters of Table XIX and the corresponding predictive technology model (PTM) [58]). In the simulations, the period of the ―Idle‖ state is set to 20ns, same as for the other states; this period can be extended depending on the discharge process of the SN

119

voltage, till the previously defined threshold voltage is reached. Note that a ―Reset‖ operation is assumed to have executed ahead of these operations (and as illustrated in subsequent timing diagrams of the simulations).

Fig. 87. Simulated waveforms of 4T1D1R cell for ―0‖ at 45nm.

Fig. 86 shows the timing diagrams of a cell for the ―Store‖ and ―Read‖ ―1‖ operations; RRAM1 is programmed by changing its state to LRS in the first cycle (i.e. the ―Store‖ state). The voltage of the SN drops to ―0‖ when the WWL is turned OFF, because CTRL1 turns T4 ON and CTRL2 is ―0‖. In the second cycle, the cell is in the ―Idle‖ state. The logic value of ―1‖ is rewritten in the third cycle of the

―Restore‖ state when CTRL1 and CTRL2 are both ―1‖. The rewritten value is then ―Read‖ out and the

RBL is discharged in the fourth cycle; different from the ―0‖ case, the RBL remains at the precharged value of ―1‖. The timing diagrams for storing and a ―0‖ are shown in Fig. 87 for the same sequence of operations as for a ―1‖ in Fig. 86. The 4T1RP NVDRAM cell and the proposed Read Output circuit have been also demonstrated using the data of Table XIX.

4.5.1. Power Dissipation

The 3T1D and B3T DRAM cells and the 4T1D1R and 4T1RP NVDRAM cells are evaluated and compared for power dissipation (using the parameters of Table XIX). The simulation results are shown in

120

Table XXVI. Compared to the 3T1D cell [102], the NVDRAM cells account for an increase in power dissipation in both ―Write‖ and ―Store‖. Although the power required for programming the RRAM is less than 0.1pJ [17], the 4T1D1R NVDRAM cell consumes a significantly larger amount of power during

―Store‖ due to the additional transistor when compared with the 3T1D circuit. Similar considerations also apply to the proposed 4T1RP and B3T cells (as implemented by PMOS transistors).

Table XXVI. Power Dissipation of DRAM Cells

Cell Operation Power Dissipation (nJ)

45 nm 32 nm 22 nm

3T1D Write 1245 1002 810.9

Idle 936.1 792.6 625.2

Read 1376 1095 879.2

B3T Write 1573 1265 1016

Idle 1072 908.4 716.5

Read 1482 1179 946.9

4T1D1R Store 1384 1054 845.9

Idle 969.3 811.6 640.7

Restore 1022 870.1 691.3

Read 1422 1140 916.4

4T1RP Store 1748 1331 1059

Idle 1111 930.2 734.3

Restore 1249 1063 845.5

Read 1531 1227 987.1

A similar condition is also encountered for the ―Read‖ operation. Although the second cycle in the timing diagram of Fig. 85 corresponds to ―Idle‖, a NVDRAM cell incurs in a power overhead due to leakage through the non-volatile element; the 3T1D core achieves the least power dissipation for the

―Idle‖ state, because this core is effective in reducing leakage. Similarly, the 4T1RP NVSRAM cell has

121

the largest power dissipation for both the ―Idle‖ and the ―Read‖ operations due to the B3T cell as DRAM core.

4.5.2. Performance

Performance of the DRAM and NVDRAM cells under different metrics is assessed and shown in

Table XXVII; the resistance of HRS of the 4T1D1R NVDRAM cell is assumed to be initially 1MΩ. The non-volatile element affects mostly ―Store‖ (more than the ―Write‖ for the 3T1D DRAM cell) due to the programming of the added RRAM; the B3T cell (and therefore the 4T1RP NVDRAM) has the worst performance due to the larger internal resistance of a PMOS compared to an NMOS, so causing a longer charge/discharge process.

Table XXVII. Performance of DRAM Cells

Cell Operation 45 nm 32 nm 22 nm Delay (ps)

3T1D Write ―0‖ 120.5 103.1 89.43

Write ―1‖ 189.1 171.6 158.0

Read ―0‖ 211.2 187.3 172.1

Read ―1‖ 253.0 229.1 213.9

B3T Write ―0‖ 183.9 157.3 136.4

Write ―1‖ 282.0 255.9 235.6

Read ―0‖ 203.5 180.4 165.3

Read ―1‖ 188.3 175.6 162.4

4T1D1R Store ―0‖ 159.9 141.5 123.8

Store ―1‖ 250.9 235.5 218.7

Restore ―0‖ 117.6 99.25 84.71

Restore ―1‖ 148.4 125.3 106.9

Read ―0‖ 215.6 191.2 175.7

Read ―1‖ 260.3 235.9 220.3

4T1RP Store ―0‖ 224.0 215.9 188.9

122

Cell Operation 45 nm 32 nm 22 nm Delay (ps)

Store ―1‖ 374.2 351.2 326.2

Restore ―0‖ 156.0 136.3 117.1

Restore ―1‖ 196.9 171.4 147.8

Read ―0‖ 207.5 184.2 169.3

Read ―1‖ 193.7 175.6 164.0

―Read‖ is also affected by the non-volatile element, but it is not as significant as for ―Store‖, because during this operation the RRAM is not connected directly to the SN. The operation of the B3T cell is boosted by the gate capacitance coupling for ―Read‖ and achieves the smallest delay value among the two cores. The 4T1RP NVDRAM cell also achieves good performance for this operation by taking advantage of the fast ―Read‖ of the B3T cell as its core.

The ―Restore‖ operation is evaluated only for the two NVDRAM cells; as expected, the proposed

4T1D1R NVDRAM cell has a faster ―Restore‖ than the 4T1RP cell.

4.5.3. Retention Time

Table XXVIII. Retention Time and Charge of DRAM Cells

Cell Operation 45 nm 32 nm 22 nm

3T1D Retention Time (ns) 1112 613.7 319.4

QS (aC) 26.49 22.13 19.35 B3T Retention Time (ns) 1263 697.3 362.7

QS (aC) 17.82 14.88 13.01 4T1D1R Retention Time (ns) 749.5 428.7 235.9

QS (aC) 19.55 16.33 14.28

QG (fC) 48.21 46.95 45.82 4T1RP Retention Time (ns) 851.2 486.9 267.9

QS (aC) 13.15 10.98 9.606

123

Cell Operation 45 nm 32 nm 22 nm

QG (fC) 32.43 31.58 30.82

Retention time is simulated next and the results are provided in Table XXVIII. The retention time of the 4T1D1R NVDRAM cell is less than for the 3T1D cell, because its SN is also connected to the non- volatile element (although T4 is turned OFF by CTRL1 during the ―Idle‖ state). Therefore, this results in a larger leakage through T4 and RRAM1, leading to a smaller retention time. The B3T cell can significantly improve retention time by boosting the storage voltage via the gate-to-RWL coupling capacitance [122].

The larger internal resistance of the PMOS transistors causes a penalty in performance, so a longer period of discharge and a shorter retention are encountered. Consequently, the 4T1RP NVDRAM cell improves over the 4T1D1R cell for this figure of merit.

4.5.4. Critical Charge

The charges are found by simulation at SN for all cells and also at node G for the 4T1D1R and

4T1RP NVDRAM cells; the charge values are denoted as QS and QG, respectively. Node G for the non- volatile storage element has a significantly larger charge compared with QS, i.e. the critical charge for the

4T1D1R and 4T1RP NVDRAM cells is still at the SN node. In the evaluation, the worst case of storing a

―1‖ during the ―Idle‖ state is used, because CTRL2 is ―0‖ (although T4 is turned OFF by CTRL1); therefore, it is easier for the SEU to change the data from ―1‖ to ―0‖. In this case, the 4T1D1R NVSRAM cell has a smaller charge value than the 3T1D cell, thus less tolerance to a soft error. Furthermore, due to the gated diode, the critical charge of the B3T cell at SN has a lower value. Finally, the 4T1RP NVSRAM cell has the least value at SN due to its circuit implementation and the addition of the non-volatile element.

4.5.5. Area

124

Fig. 88. Layout of the 4T1D1R NVDRAM cell.

The layouts of the four DRAM cells have been found using Cadence Virtuoso [89]. The layout of the proposed 4T1D1R cells are shown in Fig. 88. As RRAM1 is placed on a different layer [17] than the

MOSFETs (using stacking), its area is not included in the evaluation. The areas of the 3T1D and 4T1D1R cells are based on NMOS transistors, while the areas of the B3T and 4T1RP cells are based on PMOS transistors. The areas of the 3T1D and 4T1D1R cells are given by 1162λ2 and 1356λ2, respectively (by comparison, a conventional 3T DRAM cell requires an area of 576λ2 [3]). The B3T and 4T1RP cells have areas of approximately 1492λ2 and 1856λ2, respectively. Therefore, the improvements in performance of the 4T1D1R NVDRAM cell incur in an area overhead too.

4.5.6. Process Variability

The DRAM cells are also simulated using a Monte-Carlo method to characterize process variability using the parameters of Table XXI for both volatile and non-volatile cells, referred the data in Table XXI.

Table XXIX. Variability (in percentage) of NVDRAM cells for various MOSFET feature size

3σ/µ (%) Feature Size Write Delay Read Delay Retention Time

Leff B3T 45nm 8.162 22.51 31.89

32nm 11.50 33.25 37.31

4T1D1R 45nm 4.253 23.76 23.42

32nm 5.993 35.10 27.40

4T1RP 45nm 7.362 30.58 29.35

32nm 10.37 45.17 34.33

VT

125

3σ/µ (%) Feature Size Write Delay Read Delay Retention Time

B3T 45nm 15.61 32.35 12.36

32nm 24.43 40.50 20.95

4T1D1R 45nm 14.21 15.71 4.379

32nm 22.24 19.67 7.423

4T1RP 45nm 21.33 27.46 10.59

32nm 33.38 34.37 17.95

Presented in Table XXIX, the 3σ/µ ratio increases with the reduction of the MOSFET feature size, thus it has a more pronounced impact on cell performance. As expected, the ―Read‖ operation is substantially affected by the variation of both Leff and VT while the retention time is more affected by Leff.

The reduction in feature size increases the variability in all cases. Moreover, the proposed non-volatile

4T1D1R cell improves over the DRAM core by the provision of the non-volatile element for all metrics when varying Leff. The proposed 4T1RP cell has the largest variation values under the same simulation conditions and shows less tolerance to process variability.

4.6. Conclusion

Recently, new designs of DRAM cells have been proposed in the technical literature [102]. These designs (denoted as 3T1D and 4T) utilize schemes based on a gated diode and forward body-biasing to offer the potential to improve performance (inclusive of retention time) and to be less affected by process variability in the nano ranges. This chapter shows multiple designs of the proposed DRAM cells circuits with the techniques of gated diode and forward body-biasing, which have improved circuit performance

(inclusive of retention time) and to be less affected by process variability in the nano ranges. Compared with the previous volatile DRAM circuit, the proposed 4TI and 4T1D achieves the high performance, low power dissipation and good tolerance to the process variability. In addition, the two non-volatile cell designs offer the function of non-volatile storage and have been demonstrated with extensive evaluations.

126

5. EMBEDDED DYNAMIC RANDOM-ACCESS-MEMORY

5.1. Introduction

Power dissipation of on-chip memory due to leakage is an increasing concern in today‘s microprocessor design due to the high density and large on-die utilization [123]. While it is well known that on-chip cache is effective in reducing the performance gap between processor and main memory, design of memory circuits has attracted considerable attention in the technical literature [124][125][126].

For example, cache design is significantly impacted by the deterioration of circuit-level performance at lower feature sizes. Moreover as CMOS moves deeper in the nanoscales, circuits must face the challenge to operate at lower supply voltages and account for increased short channel effects.

The SRAM and DRAM are been widely used as memory cells; a SRAM is implemented with at least six transistors for fast operational speed. A DRAM requires a smaller number of transistors in a cell, thus attaining a higher density. It is therefore not surprising that the former cell has been extensively used for cache [124][125], while the latter cell is mostly used for high volume computer storage. Different configurations of cell have been proposed for both SRAM and DRAM to overcome issues as related to stability and retention time. The drawbacks of a SRAM cell are the low signal noise margin (SNM, as measure of stability), circuit complexity, transistor sizing limitations and high leakage during standby.

Embedded DRAM (eDRAM) cells achieve lower leakage [102][104], because the power supply is disabled following data access. An eDRAM incurs in a small circuit complexity, thus further improving chip density.

Memory design has radically changed in the last few years; the emergence of new technologies has further improved performance and the traditional separation of storage levels between SRAM and DRAM is not viable as in the past. Recently, the eDRAM has been proposed for cache utilization to improve density while attempting to retain high performance operations; this scheme is often referred to as hybrid due to the utilization of different technologies in a memory. The eDRAM has been extensively investigated [125] for hybrid memories; different from a conventional cache (implemented using only

SRAM cells), a hybrid memory takes advantage of different access schemes. A hit in an eDRAM involves

127

a ―Read‖ operation of a destructive nature; although a hybrid memory cache deals with this shortcoming, significant performance penalties are still encountered [124][125].

Different memory cells are proposed and investigated in this chapter; they are firstly evaluated with respect to circuit-level figures of merit as related to operational features (read, write, static noise margin, power delay product) as well as tolerance to event upsets (critical charge and SER analysis) and variations.

Extensive simulation results using nanometric PTMs are provided. It is shown that the proposed designs offer substantial improvements over previous hybrid cells. Besides, the novel hybrid cache memory scheme has been proposed and investigated in this manuscript with extensive evaluations, which demonstrates the improvements of power and area saving. Meanwhile, the various design implementations of the hybrid cache are also evaluated in details at architecture level, including the cache associativity, access pattern and process variability.

5.2. Previous Design of Hybrid Memory Cell Circuits

The so-called hybrid design of a memory cell has been recently investigated in the technical literature

[123][124]. The simplest circuit configuration consists of combining the 6T SRAM and 1T DRAM for multi-context storage [123]. Although it shows good gains in area and energy compared to a conventional

SRAM array with the same capacity, this design is mostly used only for multi-thread register files [123].

A different configuration referred to as the macrocell (MCC), has been proposed for potential utilization as a first-level data cache [124][125]; the general MCC circuit consists of n memory cells, i.e. one SRAM cell and n-1 eDRAM cells.

The Wordline (WL) is used to store the data in the 6T SRAM core with the value of the Bitline (BL).

Each storage capacitor has an NMOS pass transistor that is controlled by the corresponding Wordline

Read (WLD); this scheme charges the capacitor (with 20fF as capacitance value [125]) by utilizing a

Bitline Read (BLD). The novelty of this design is that the transistors act as bridges to transfer the stored data between the SRAM and the selected eDRAM. Furthermore, the L1 cache is now integrated into a single MCC-based scheme, thus saving in circuit complexity, delay, latency and power [125].

128

5.3. Proposed Hybrid Memory Circuits

5.3.1. Improved macrocell (MCT)

Fig. 89 shows the first proposed cell referred to as the improved macrocell (MCT); in this cell, the capacitor is replaced by a NMOS transistor within each eDRAM, i.e. data is stored in the gate capacitance.

Similar to MCC [125], the proposed MCT cell has n-1 eDRAM cells; each eDRAM is connected to the

SRAM by a NMOS pass transistor (acting as the bridge) and controlled by the corresponding signal from the Selection Decoder (SD). This signal allows the data initially stored in the single SRAM core to be transferred to the Storage Node (SN) of the selected eDRAM. In this scheme, the bridges are bidirectional to allow data transfer between the SRAM and the eDRAM; moreover, no Bitline (BL) is involved in this process, i.e. it is internally controlled.

Fig. 89. Proposed improved macrocell (MCT).

The ―Write‖ operation is controlled by the Wordline (WL) and stores the data at node D. For the

―Read‖ operation, the corresponding Wordline Read (WLD) is used to transfer the stored data to BL Read

(BLD). By utilizing this circuit, the ―Read‖ operation is not destructive for the data stored in an eDRAM.

Moreover, each pass transistors acting as a bridge between the SRAM core and the eDRAM cell permits a fast transfer of data.

129

5.3.2. Non-volatile hybrid memory

Fig. 90. Proposed non-volatile hybrid memory.

Different from the MCT cell, a novel non-volatile hybrid memory is proposed for cache implementation (Fig. 90). An embedded RRAM (eRRAM) is added to the eDRAM for non-volatile storage. Although the proposed cell still takes advantage of the bridge, the stored data is controlled, such that the RRAM can be programmed for data transfer from the SRAM core to the eDRAM. To achieve non-volatile storage, the proposed cell requires the execution of the ―Reset‖ operation in the RESET process of the RRAM. Thus, a full operational cycle of the proposed memory cell utilizes the following sequence: ―Reset‖, ―Write‖, ―Transfer‖ and ―Read‖, which shows same operation scheme as ―Instant-on‖ in previous chapter.

Compared to a conventional (SRAM-based) cache, the proposed non-volatile hybrid cell has two additional levels for storage, the eDRAMs and eRRAMs. Therefore, although the data is destructive at the

Storage Node (SN) as occurring either after a ―Read‖ operation or following a soft error, the use of an eRRAM is advantageous for data updating and multiple read out operations.

130

The proposed memories are simulated using HSPICE and High Performance (HP) Predictive

Technology Models (PTMs) [58]. As per the guidelines of [3], the Cell Ratio (RC) and Pull-up Ratio (RP) for the 6T SRAM core are given by 1.5; in addition, the sizing of the NMOS transistors in an eDRAM and an eRRAM is 1:1.These designs are initially evaluated for power dissipation and performance. The value of n is initially set to 2 to assess the basic features of the memory cells (larger values of n are considered later in this section), i.e. 1 SRAM, 1 eDRAM and 1 eRRAM. Therefore in general, there is only a single

SRAM core, while the number of eDRAMs is increased up to n-1; also, the number of eRRAMs in the proposed hybrid cell is assumed to be the same as the number of eDRAMs, i.e. in a hybrid cell for n, there is a single SRAM core and n-1 eDRAMs and n-1 eRRAMs.

Table XXX. Parameter for hybrid memory cell simulation

Parameter Value

Temperature 25 C

Feature Size 22 nm

Vdd 0.8 V

RRAM Resistances 1 KΩ / 1 MΩ

C1 20 fF

E2RPOM P/E Voltage 1.3V/-5V

The operations of the proposed MCT and the hybrid memory circuits are simulated using the parameters of Table XXX (Fig. 91 and Fig. 92); the period of each operation is 10ns to allow sufficient programming time for the RRAM [17]. In Fig. 91, the proposed MCT memory executes the ―Write 0‖ and

―Transfer 0‖ in the first 20ns. Then, the ―Read‖ operation is achieved by WLD1 and BLD is discharged to the high voltage level for a ―1‖. Moreover due to the improved scheme of this cell, the operation for the data stored in the eDRAM is non-destructive following a ―Read‖. In the next cycles, the operations of

―Write‖, ―Transfer‖ and ―Read‖ for ―1‖ are executed to demonstrate its operational correctness.

131

Fig. 91. Simulated waveforms of proposed MCT cell at 22nm.

Fig. 92. Simulated waveforms of proposed non-volatile hybrid memory cell at 22nm.

The operation of the proposed non-volatile hybrid memory cell and related timing are presented in

Fig. 92. Same as the operation for the eDRAM, the simulated timing diagrams concentrate only on the operations of the eRRAM, i.e. all WLDs are ―0‖ to disable the eDRAMs. In Fig. 92, it is assumed that

132

―Reset‖ has already been completed. So, the ―Write 1‖ operation is executed first and the stored data is transferred from D to SN1 using SD1 (Fig. 90). A ―1‖ turns ON the transistor and starts the process of programming the RRAM. Hence, the RRAM changes its state from HRS to LRS, as corresponding to the stored data of ―1‖; otherwise, it remains in HRS. Then, WLR1 is changed from ―0‖ to ―1‖ for the ―Read‖ operation; the voltage of BLD is kept to ―1‖. If the stored data is ―0‖, the RRAM is in HRS and the voltage distribution drops the voltage of BLD to ―0‖ due to the large resistance of the RRAM between WLR1 and

BLD, which can be measured by the peripheral sense amplifier. This timing diagram shows that the proposed cell operates correctly.

5.3.3. Circuit Power Dissipations and Delays

The power dissipation is evaluated using the parameters listed in Table XXX for n>2 and plotted in

Fig. 93. The power dissipation increases with n, however the rate of increase is dependent on the cell type.

The 6T SRAM has the lowest value when n=2; however when n is increased, the MCT cell has the lowest values, while the 6T SRAM has the highest. The proposed hybrid cell incurs in a higher power dissipation than both the MCT and MCC cells; however, it still remains substantially less than for a 6T SRAM cell.

Fig. 93. Power dissipation vs n at 22nm.

For n>2, the performance of the ―Write‖ and ―Read‖ operations related to the SRAM are not significantly changed in the different memory cells. The simulation results for the ―Write‖ and ―Read‖ are presented in Fig. 94 and Fig. 95, respectively. In Fig. 94, the changes in the ―Write‖ delay are as follows.

133

By increasing n, the most affected delay is for the ―SRAM to eDRAM‖ case, as caused by the increase of the parasitic resistance and capacitance (although the bridge NMOS transistors of the other unselected eDRAMs are not turned ON). The ―Write‖ delay for the data transfer of the ―eDRAM to SRAM‖ case is least affected by n. The ―Write‖ delay for the SRAM core is nearly constant with n, because all eDRAMs and eRRAMs are not selected during this operation.

Fig. 94. Write delay vs n at 22nm.

Fig. 95. Read delay vs n at 22nm.

As shown in Fig. 95, the ―Read‖ operation has a dependence on n. Similarly to ―Write‖, the ―Read‖ delay for the SRAM cell is nearly constant, while it increases with n for the eDRAM. In conclusion, the performance of the memory cells is substantially affected by n, i.e. a larger n implies a larger amount of hybrid storage, but this also negatively affects performance.

134

5.3.4. Critical Charge

The proposed two hybrid memory cell types are evaluated and compared with the 6T and MCC circuits. Therefore, a node assessment of the cells (including the storage node D for the 6T SRAM core and the SN nodes for the eDRAMs) is pursued to find the critical node and charge. The results are reported in Table XXXI (using the parameters of Table XXX in the HSPICE simulation). From this evaluation, the

6T SRAM presents the least value of charge at the storage node (D). The other three memories show improvements in critical charge, because the additional circuits in these cells increase the capacitance at the storage node. So although the charge for the MCC cell achieves the least improvement, there is no significant difference between them due to the same features in the added eDRAMs. The charges at the SN nodes for the MCC and hybrid cells are larger than for the D nodes due to the large capacitance.

Table XXXI. Critical charge and nodes of Memory Cells (n=2)

Node Memory Charge (fC)

22 nm 16 nm 10 nm

D 6T 1.321 0.832 0.450

MCC 1.452 0.915 0.495

MCT 1.486 0.936 0.507

Hybrid 1.493 0.941 0.509

SN MCC 8.845 5.573 3.016

MCT 9.356 5.895 3.190

Hybrid 10.15 6.396 3.461

Therefore, the critical node is always at D. In addition, the charge values decrease by scaling down with the MOSFET feature size because the internal capacitances are also reduced, thus it also deteriorates the tolerance to an SEU.

5.4. Proposed Hybrid Memory Scheme with Custom Access Pattern

By incorporating the SRAM and eDRAM, the hybrid memory cell circuit presents the significant advantages and great potential for the future memory design. Relatively, the hybrid memory scheme

135

becomes important and attracts the significant efforts for investigations to implement the hybrid memory cells for cache design. The novel hybrid cache memory scheme has been proposed and investigated in this section with extensive evaluations, which demonstrates the improvements of power and area saving.

Meanwhile, the various design implementations of the hybrid cache are also evaluated in details, including the cache associativity, access pattern and process variability.

5.4.1. Cache Memory

With the performance gap between the Central Processing Unit (CPU) and memory continuous increasing, a CPU cache is normally integrated to reduce the average time to access the memory. Through the application of the cache memory, it is capable of effectively speeding up the computer performance

[127].

Memory caches are in every computer to speed up instruction execution and data retrieval and updating [127]. Meanwhile, an issue is the fundamental tradeoff between cache latency and hit rate [127].

Larger caches have better hit rates but longer latency. To address this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger, slower caches. Multi-level caches generally operate by checking the fastest, Level 1 (L1) cache first; if it hits, the processor proceeds at high speed. If that smaller cache misses, the next fastest cache (Level 2, L2) is checked, and so on, before external memory is checked.

Fig. 96. General hybrid memory scheme.

136

Traditionally, the circuit unit of cache memory is implemented with the SRAM cell, by utilizing its fast access speed and good static noise margin (SNM) [127]. Based on the existing hybrid memory circuit

MCC, the general hybrid memory scheme is extracted and presented in Fig. 96. Basically, it also utilizes the conventional 6T SRAM circuit, company with the introduced eDRAMs, pass transistors that communicate SRAM to eDRAM cells for the Level 1 (L1) data cache storage function. Therefore, the relevant circuit block named eDRAM selector is required to approach the function of searching the objective data, storing back or reading out directly.

5.4.2. Hybrid Cache Accessing

To effectively achieve the address decoding and control the data transfer for the proposed memory types inside the cache, this section deals with the relevant memory architectural scheme, which consists of three main steps. First is the accessing the eDRAM cells only in the case of a hit in the corresponding tag.

Second is to increase the percentage of hits in the SRAM blocks to minimize the number of data movements between SRAM and eDRAM cells.

Fig. 97. Scheme for cache memory address decoding.

137

Fig. 98. Scheme inside the cache controller.

Normally, memory cache is faster than main memory and allows instructions to be executed and data to be read and written at higher speed. Instructions and data are transferred from main memory to the cache in fixed blocks, known as cache ―lines‖ [127]. Referred to the address decoding scheme in Fig. 97, the cache controller and two address decoders for SRAM and respective incorporated eDRAM are presented with their sizes. Especially, the architecture inside the cache controlled is shown in Fig. 98, which is composed of an incorporated address decoder, Content-Addressable Memory (CAM) and comparators.

138

Fig. 99. Access scheme for conventional and hybrid caches.

Conventionally, the modern processors overlap the access the data and tag arrays for high performance, which means that the target data has been already available during successful tag comparison. For the instance of 4-way cache in Fig. 99(a), the narrow box represents the tag array and the right box represents the data array in the conventional cache. By contrast, the hybrid cache consists of the hybrid memory cells utilize the different scheme, shown in Fig. 99(b). Due to reading the eDRAM cells is destructive and the previous design does not include refresh logic, the hybrid cache which consists of the hybrid memory cells utilizes the different scheme [125], shown in Fig. 99(b). In this scheme, to effectively reduce the number of state losses from read, the tags of all the ways in the set are accessed in parallel with the data array in the block located in way-0. If there is a hit in way-0 (SRAM hit), the connected eDRAM will be not read. If there is a miss for the data but a hit in other tag associated with the eDRAM block, the corresponding data array is subsequently accessed as the case in Fig. 99(b). Especially, this scheme requires one more cycle for the case that a miss happens in the SRAM for tag comparison. This scheme has been adopted for the hybrid cache design with the hybrid memory cells of MCC and proposed non- volatile hybrid, which have the destructive reading of their eDRAMs.

139

Moreover, the mentioned MCC in [125], has to use their write-through policy since the read data is destructive when a hit happens in an eDRAM. Although this solution is capable of addressing this case, however, it has to read the data from L2 caches and bring a huge power consumption, company with the deteriorated performance. By contrast, the proposed non-volatile hybrid memory can efficiently mitigate the above drawback and the data can be transferred back to the eDRAM from the respective eRRAM.

After all, the data can be ―exchanged‖ between the SRAM, eDRAMs and eRRAMs. So, compared with the previous MCC circuit, the proposed hybrid memory can effectively approach the power savings by utilizing the incorporated eRRAM to achieve the data store and transfer.

5.4.3. Refresh Scheme

Actually, by replacing the traditional 6T cells for architectural support, several approaches accommodate the limitation for hybrid memory retention time, including periodic data refreshing, retention-time driven replacement policies, allowing data to expire without refresh and combinations of these approaches [106]. Relatively, the refresh operation is a typical direct method. Similar to the 6T

SRAM cache, the proposed memory scheme suffers a similar disadvantage that the frequency is determined by the slowest cells due to the clock generation and synchronization limitation. Hence, the larger retention time of eDRAM circuits, the lower refresh operation frequency, which can achieve the power savings in the real applications.

Although the data can be transferred between the incorporated SRAM and eDRAM through the swap operation and the above discussed the access scheme is capable of effectively mitigating the state loss after the read access, however, the data stored in the eDRAM may also lose their contents after long time access.

This phenomenon directly results in the penalty on cache memory performance. Therefore, the relevant method is required to avoid this situation.

5.5. Hybrid Memory Scheme Architectural Evaluation

This section presents the architectural evaluations to characterize the hybrid memory caches incorporated the mentioned hybrid memory cells. The cache memories are modeled with modified version of the Hotleakage simulation framework [128] and simulated by running the SPEC2000 benchmark suites

140

[129]. Then, the simulated results are presented and analyzed for different cache organizations at 45nm technology node, using the configuration in Table XXXII. Especially, the CACTI tool [130] is also used to demonstrate the area savings of hybrid cache, compared with the conventional cache. Meanwhile, the

HSPICE tool and the used model discussed in the previous section are also included to approach the accurate characterization of the memory arrays.

Table XXXII. Cache Memory Configuration

Parameter Value

Data Cache Block Size 64 B

Data Cache Capacity 16 KB, 32 KB, 64 KB, 128 KB

Set Associative 2 Way, 4 Way

Replacement Policy Least-Recently Used (LRU)

Write Policy Write-Through

5.5.1. SRAM and eDRAM Hit Rate

As discussed in the previous section, the ―Read‖ in the eDRAM may lead to the data loss for the

MCC and proposed hybrid memory cells due to their used storage capacitance, further resulting in a cache miss. Therefore, to efficiently mitigate the unnecessary refresh operations for eDRAMs, they are only accessed after checking tags and the access are characterized in this section.

The hit rates of SRAM and eDRAM cells are adopted to define the percentage of cache access for static and dynamic components, respectively. The simulation results presented in Fig. 100 and Fig. 101, are obtained by running the SPEC2000 benchmark suite [129] for different cache (16KB, 32KB, 64KB and 128KB) and the number of ways is set as shown in Table XXXII. From these results, the frequencies to access the static component is much larger than that of the dynamic for both cases. Second, the access frequencies of static component for the integer (Int) benchmark simulations achieves a higher extent than that of the floating-point (FP), which means the hybrid cache appears a larger percentage of tag hits of the integer program. Third, an important observation is the number of hits in this way significantly depends on

141

its storage capacity since this type of architecture enforces the storage of the Most Recent Used (MRU) block in the SRAM way. Fourth, for a given associative degree, the larger cache size leads to a lower eDRAM hit rate but a higher overall L1 hit rate. Finally, the accumulation of associative degree, directly results in the increase of L1 hit rate sum, including of smaller contribution of static component but larger dynamic part.

Fig. 100. Average static L1 hit rate the integer and floating-point benchmarks respective to cache size.

Fig. 101. Average dynamic L1 hit rate the integer and floating-point benchmarks respective to cache size.

With the continued increase of cache memory size to 128 KB, the statistical results of L1 hit rate are presented in Fig. 100 and Fig. 101 for static and dynamic components, respectively. Notice, the evaluated results here are the average of various benchmarks simulations. Normally, the larger cache capacity results in a higher hit rate, which has been demonstrated from the above results. Although the dynamic part decreased with this variation, however, the sum of hit rate including both static and dynamic components,

142

increase since the larger cache size provides the high probability for the happen of hit operation.

Especially, they show the same static hit rate values between 16KB-2way and 32KB-4way, which is also applicable to 32KB-2way and 64KB-4way, 64KB-2way and 128KB-4way since both cache has a static data array of the same. Therefore, both arrays will have the same amount of misses, being the difference that the incoming block may be fetched from different memory structures (e.g., eDRAM cells or other level of the memory hierarchy). Especially, the 4way cache has a higher overall hit rate though the SRAM hit rate decreases with the accumulation of eDRAM hit rate.

In general, the lower eDRAM hit rate is generated from two reasons. One hand is the overall access time of the eDRAM cells requires an additional processor cycle for tag comparison; on the other hand, a hit in the eDRAM way incurs a swap between SRAM and eDRAM cells. According to the above simulations, the access frequencies of the SRAM and eDRAM are quite different and the performance of the SRAM is determinant. Therefore, this is the fundamental difference between the proposed hybrid cache memory and conventional cache implemented SRAM only.

5.5.2. Instruction Per Cycle (IPC)

The performance of the proposed hybrid memory circuits are further evaluated by incorporating inside the respective memory scheme in this section. The Instruction Per Cycle (IPC) is used to characterize the benchmark program executions for different memory cells and the memory takes the

16KB-4way cache.

Fig. 102. IPC performance with respect to the integer benchmarks for 16KB-4way memory cache.

143

Fig. 103. IPC performance with respect to the floating-point benchmarks for 16KB-4way memory cache.

The simulation results of integer and floating-point benchmarks are presented in Fig. 102 and Fig.

103, respectively. According to our measurement, although the IPC performances of the various benchmark programs show the difference, however, the general variations of the different memory circuit for each program present the difference and consistency. First, the applied hybrid memory scheme mentioned shows its advantages for all three hybrid memory cells, compared with the conventional memory cache. Second for the three hybrid memory types, the proposed MCT memory cache achieves the important improvement compared with the MCC circuit in [125]. Although the proposed non-volatile hybrid memory presents the limited penalty than the MCT circuit, however, it still approaches the better performance than the conventional cache and the MCC memory with the restore data from the incorporated eRRAM when there appears a hit in the respective eDRAM and state loss.

5.5.3. Power Savings of Hybrid Memory Scheme

The proposed hybrid cache memory scheme is assessed with the conventional cache to present the improvement on the power savings, utilizing the Hotleakage simulation framework [128].

Shown in Fig. 104, the various components of power dissipations based on the specific event, including the leakage, store, load and miss, are included to compare the dynamic power dissipation and leakage of the evaluated memory schemes. The evaluated results are statistical values from the simulations of the same benchmark programs to show the power variations for different cache schemes and sizes. Here,

144

the represented leakage is defined with the leakage of the whole L1 cache. The sum of all these values shows the major energy consumed by the cache. From these results, the overall power dissipation is dominated by the leakage, which results in the less leakage consumption when the hybrid memory cell is incorporated. Generally, the proposed hybrid cache memory scheme (Hybrid) incorporates the MCC cell for storage and achieves the approximately 30% of the power savings, compared with conventional cache memory scheme (Conv). Meanwhile, the leakage component increases with the cache size for both the

Conv and Hybrid schemes. The above trends are similar for both integer (Int) and floating-point (FP) benchmarks.

Fig. 104. Statistical power dissipations of conventional cache with SRAM and hybrid cache with MCC cell.

Fig. 105. Power dissipation with respect to the integer benchmarks for 4way hybrid memory cache.

145

Fig. 106. Power dissipation with respect to the floating-point benchmarks for 4way hybrid memory cache.

Especially, other two hybrid memory cells are also simulated with the above Hybrid scheme with

MCT circuit and shown in Fig. 105 and Fig. 106 for the same benchmarks by varying the cache size from

16KB to 64KB and the associativity number is set as 4. Different from the memory cell evaluations in previous section, the results here refer to the overall cache memory. As discussed in [125], the destructive

―Read‖ operation for the MCC cell, leads to the capacitor state loss and a write-through policy is used to copy the data from L2, further having the significant power overhead for cache access. This type of power penalty has been demonstrated from the simulated results and the proposed MCT cell achieves the power savings compared with MCC by 18.9% and 20.1% for integer and floating-point benchmarks, respectively.

By contrast, the proposed non-volatile hybrid cell shows the larger power dissipation values than the MCT cell due to the transfer process from eRRAM to eDRAM, however, it is capable of restoring the data back from the eRRAM to the respective the eDRAM and mitigating the destructive ―Read‖ when the eDRAM appears a hit. The power penalty of non-volatile hybrid is limited and it also reduces the power dissipation without capacitor state loss.

5.5.4. Area Savings

Using the CACTI tool [130], the area estimation of the proposed hybrid scheme cache has been achieved. Actually, the area overhead generated from the adding bridge transistors has been also taken into consideration and assumed that occupy the same area as a 1T-1C cell. Notice that the evaluated results in this section have been normalized for the comparison, independent on the MOSFET feature size.

146

For the basic circuit cell, the SRAM and eDRAM have areas of 1092λ2 [3] and 228λ2 for the MCC cell [125], respectively. Therefore, a 4-bit MCC consists of one SRAM and three eDRAMs, which has an area of 3084λ2. Differently, the 4-bit MCT and non-volatile hybrid circuits have the areas of 3582λ2 and

4196λ2, respectively. The area overhead is generated from the circuit design difference with the introduction of the transistors and non-volatile component.

Fig. 107. Entire 4way cache area vs cache size for various memory cells.

Incorporating the mentioned circuits with the hybrid memory scheme and the SRAM cell with the conventional memory scheme, the simulation results of the 4way cache areas are presented in Fig. 107, with the above values used for the input of CACTI. In general, these results present the hybrid cache memory achieves around 11 and 34 percent for two and four ways, respectively. Specifically, the hybrid cache incorporated MCC cell presents the smallest area among the four caches. By contrast, although the caches with proposed MCT cell and non-volatile component have the area penalties, however, they still achieve the significant improvements than the conventional cache. After all, the proposed MCT cell efficiently mitigates the destructive ―Read‖ problem and the non-volatile component is capable of storing the data even the power turns down. The above evaluations demonstrate the trade-offs of these hybrid memory circuits.

5.6. Conclusion

These designs strive to improve operational performance while saving power dissipation. This chapter has proposed two additional cell designs referred to as MCT and non-volatile hybrid cell; these

147

designs utilize a Resistive RAM (RRAM) for non-volatile operation. Figures of merit such as

―Write‖/―Read‖ delays, power dissipation, and critical charge have been extensively evaluated. Compared with a conventional 6T SRAM and macrocell (MCC) cell, the proposed MCT cell brings a penalty in write and read delays due to the connected eDRAM circuits, however, it still achieves the best performance for most of the other metrics considered in this chapter. On a relative basis, the proposed non-volatile hybrid memory represents a compromise in design compared with MCC [125]; moreover, the non-volatile storage in the eRRAM is rather pronounced. Furthermore, the simulated results indicate that the non-volatile one has best tolerance to the soft error, though it presents the penalties on the power dissipations with the added non-volatile storage components. The proposed MCT cell achieves a significant performance improvement and is a good candidate for designing a high performance memory circuit with non-volatile capabilities and excellent tolerance to soft errors.

Table XXXIII. Ranking of Cache

Metrics MCC MCT Hybird 6T

Power Dissipation 3 1 2 4

IPC 3 1 2 4

Area 1 2 3 4

Based on the extensive statistical evaluation results in Table XXXIII, the proposed MCT hybrid memory also achieves the lowest power dissipations by effectively mitigating the problem of destructive reading and saving the power without the write-through policy. Meanwhile, the first ranking shown in IPC performance presents its notable improvement, compared with previous caches. By contrast, the cache implemented with the proposed non-volatile hybrid memory cell takes advantage of non-volatile storage with the acceptable power and IPC penalties. Finally, the area of the hybrid memory caches approaches the significant progress in area savings, compared with the conventional 6T-based cache.

Table XXXIV. Comparison of Various Cache Associativities

148

Associativity Power Dissipation IPC Hit Rate

2-way High Fast Low

4-way Low Slow High

Differently, the observations for the proposed hybrid memory scheme, are categorized in Table

XXXIV to show the performance trade-offs with various cache associativities. Compared with 2-way hybrid memory cache, the 4-way cache shows lower power dissipation and higher hit rate with penalty of slow access speed for IPC. Therefore, the evaluations demonstrate that 4-way cache is good candidate for hybrid memory implementation.

149

6. SUMMARY AND FUTURE WORKS

6.1. Summary of Contributions

The objective of this study is to propose the low power and hardened design approaches to improve the performance metrics for the multiple types of memory circuits. Meanwhile, the efforts are also focused on the investigation of the performance trade-offs between the conventional circuits and the novel scheme utilizing the emerging technologies.

Chapter 2 firstly presented an HSPICE circuit model for a single-electron (SE) turnstile. The proposed model captures the sequential transfer of electrons through the turnstile using a nearly symmetric circuit that shows stability at nanometric feature sizes (32 and 45 nm) using a voltage level output as mode of operation. This ensures that the proposed circuit-level model is robust in its operation, so avoiding the transient (current-based) nature of a previous model [55]. By using the temperature employed in [55], the proposed model captures the single-electron transfer process with excellent accuracy also at lower values in feature size and in agreement with the experimental data (as provided by [49]). The proposed model is

HSPICE compatible and its assessment has shown that it can operate at nanometric scales, while correctly simulating the process of single-electron transfer. The proposed circuit model has been compared with

[55]. It has been shown that the nearly disjoint operation of the proposed circuit model (consisting of two nearly independent parts) results in a stable output; stability has also been accomplished when changing many parameters such as capacitance, feature size and voltages.

After that, this chapter also presented novel designs and implementations of an SRAM cell and a

TCAM cell; the SRAM cell utilizes a turnstile (to sequentially transfer SEs in and out of the SN) and an

SET/MOS circuit (to sense the charge in the SN). One of the advantages of this cell is that by utilizing a

―hybrid‖ implementation (i.e., using SE-based components with MOSFETs), it is compatible with CMOS technology. By utilizing hybrid designs (made of SET and CMOS devices), the proposed cells combine different operational features to improve memory performance. The operation of the proposed SRAM cell has been analyzed and simulated at the nano feature sizes of 32 and 45 nm using HSPICE compatible models; the simulation results show that the delay in the basic operations of the memory cell is mostly

150

related to the SE transfer process and the characteristics of the turnstile. Compared to an SRAM with 45 nm MOSFETs, the SRAM at 32 nm accomplishes reduced write or read delays due to intrinsic parameters, such as gate capacitance, oxide layer capacitance, and parasitic capacitance. MOSFETs have been integrated in both the SE turnstile and the SET/MOS hybrid circuit; thus, the MOSFET feature size influences the operating delays. The proposed cell requires nearly 22% less area than a CMOS (6T) memory cell; also, in terms of stability, the proposed cell shows a significant improvement in SNM for the write operation and a nearly negligible decrease for the read operation compared to the CMOS counterpart.

Finally, the reduction of average power dissipation in the proposed SRAM has been mostly achieved by the lower biasing voltage and leakage encountered at 32 nm feature size and the utilization of SE driven circuitry in the design and operation of the proposed SRAM cell. While reduction in scaling will affect the characteristics of the MOS circuitry, the most significant technological challenges for high performance are with the improvement of SE-based components, such as the SET and the turnstile. Issues related to the sequential and simultaneous SE transfers on the operating frequency and average delays for the two memory operations (write and read) have also been presented to show the relationship with the number of

SEs.

Moreover, an SET-based TCAM cell has been presented; this design utilizes the phase-shift characteristics of the drain current of the proposed SRAM cell (consisting of a dual-gate SET and a cascode MOSFET) for ternary matching; together with a precharge circuit, the proposed TCAM memory cell has shown by HSPICE simulation to have excellent performance.

Chapter 3 has presented several different applications of RRAM in the digital memory designs, including the Multiple Level Memory (MLC), non-volatile SRAMs and relevant hardened design circuits by incorporating the ―instant-on‖ scheme. The proposed supplies an efficient implementation to utilize the

RRAM for high density integration of memory circuits.

The proposed 7T1R memory cell achieves a significant reduction in power dissipation for all three operational states (i.e. ―Write‖, ―Power-down‖ and ―Restore‖) required for ―Instant-on‖ operation when compared with other NVSRAM cells. This improvement is significant especially for ―Write‖ and ―Power- down‖. Also a substantial difference in power dissipation has been reported for ―0‖ and ―1‖ values; this is

151

due to the asymmetric design of the proposed cell (as utilizing only a RRAM connected to D in the 6T

SRAM). Although the proposed memory cell requires a larger power when data is written during normal operation, its power dissipation in the ―Power-down‖ state is lower than the power dissipation of the

―Standby‖ state of a 6T SRAM cell. As expected, the average power dissipation decreases at lower feature sizes and the proposed 7T1R cell still remains the best among the NVSRAMs (with the 9T2R having the highest value). Simulation has shown the substantial improvement of the 7T1R memory cell, especially compared with the 8T2R cell for the ―Write‖ operation. Except for the 9T2R, all other cells have similar average ―Read‖ delay.

While this section also presents novel solutions to SEU tolerance in NVSRAMs. Initially three different designs of non-volatile SRAM (NVSRAM) cells have been proposed. These cells provide non- volatile operation by using a single resistive element. A virtual ground circuitry is added for low power- operation and additional transistors are included in the design to increase the critical charge. A detailed assessment of the critical charge at the storage node as well as at the nodes for virtual ground and non- volatile storage has been pursued; in all cells it has been shown that the non-volatile storage node has a charge orders of magnitude larger than the critical charge, thus making the data stored in the RRAM very reliable and highly unlike to be affected by a SEU. The first design consisting of one resistive element and

9 transistors (i.e. 9T1R) has in all cases except one (the critical charge) the best performance. The other two NVSRAM cells proposed in this paper require more transistors (i.e. both are 13T1R); Type-1 achieves the best tolerance to SEU (i.e. highest critical charge) but it has the worst ranking in all remaining metrics.

Type-2 offers the most balanced design as it ranks in the middle among the proposed NVRAM cells.

Chapter 4 has firstly proposed two additional cell designs referred to as 4TI and 4T1D. Moreover, this study has proposed two non-volatile cell designs referred to as 4T1D1R and 4T1RP that utilize the volatile DRAMs of [102][121] as memory core within a non-volatile cells. An extensive evaluation has been pursued at nano feature sizes (from 45 to 10 nm) using HSPICE as simulation tool with the relevant

PTMs [58]. Figures of merit such as ―Write‖/―Read‖ delays, retention time, layout area, critical charge and process variability have been extensively evaluated.

152

From the assessment of the volatile DRAM cells, although the proposed 4T1D DRAM cell brings penalties in area and power dissipation due to the additional transistor, it also achieves the best performance for most of metrics considered when compared with the other three cells, the improvement is rather pronounced for the retention time. The proposed 4TI represents a compromise in design when compared with the cells of [102], avoiding the bottom performance often encountered for the 4T and

3T1D cells.

By contrast, the proposed 4T1D1R NVDRAM cell occurs into penalties for the Read delay and retention due to the additional non-volatile element, it also achieves the better performance for most of metrics utilizing as core the 3T1D DRAM. Moreover, it shows the least variability to process. While the proposed 4T1RP NVDRAM cell represents a compromise in design when compared to the other cells as achieving a middle ranking for the Read delay and the retention time. However the utilization of the B3T cell as the DRAM core, results in the worst performance.

Chapter 5 proposed two hybrid memory circuits as MCT and non-volatile hybrid cell with the objective to improve the performance and non-volatile storage function. Although the proposed MCT circuit takes the penalty with extra transistors, however it offers smaller leakage in its eDRAMs and longer retention time for following low power cache memory design. By contrast, the proposed non-volatile hybrid memory has a compromise in the design and achieves the tolerance improvement compared with the previous design.

The novel hybrid memory cache scheme has been designed at architecture level to utilize the proposed hybrid memory circuits. The proposed scheme takes advantage of the hybrid circuits with the applicable improvement and the performance have been demonstrated with the benchmark simulations.

Meanwhile, the 4-way hybrid memory cache has been characterized with lower power dissipation and higher hit rate through the extensive observations for the proposed scheme, compared with 2-way cache.

Therefore, this chapter completely offers the hybrid memory design from memory circuit cell to architectural scheme proposal, and all the designs have been sufficiently assessed for the respective levels.

In conclusion, this dissertation of study has investigated the low power design techniques and methods to accomplish the tolerance improvement to the soft error, especially in the SEU. It achieves the

153

research and design concentrated on the memory circuit at nano scaled CMOS from various perspectives, including SRAM, DRAM, hybrid memory and multiple level memory. Moreover, by implementing the novel technology like Single-Electron transfer and Resistive RAM (RRAM), the designs are capable of replacing the conventional MOSFET-based memory circuit and compatible with the MOSFET fabrication process. The proposed memory circuits have been evaluated with extensive HSPICE simulations. Finally, the novel design of memory circuits have been also demonstrated from architecture level with SPEC benchmark simulation and characterized the performance improvement of the implemented cache. In sum, this manuscript presents the adequate investigations on the nanometric memory designs, verification and characterization, facilitating the novel design techniques with the emerging technology implementation.

6.2. Future Works

Several orientations of future research in this dissertation are discussed in this section.

6.2.1. Hardened Design with Noise Tolerance

Noise is a random fluctuation, widely appearing in electronic circuits, which is a basic characteristic.

It has many categories, such as thermal noise, shot noise, flicker noise, burst noise and avalanche noise, generated by the electronic devices differently. The flicker noise is caused by traps associated with contamination and crystal defects through the process of randomly capturing and releasing the carriers from the trap; the noise signal shows that the energy is concentrated at low frequencies. Meanwhile, due to specific mechanism of integrated circuit, such as the resistive switching in Resistive RAM (RRAM), a typical spike noise is introduced and the relevant robustness of the circuit is required.

Furthermore, the crosstalk (XT) noise is the most common source of the circuit noise in deep submicron digital designs. This noise is typically a noise voltage induced on a wire that is at a stable logic value because of the interconnect capacitive coupling with a switching wire. In sum, when the memory arrays implemented with the proposed multiple memory cell types, especially in the MLC circuit, the resistance-switching phenomenon may generate multiple types of the noise and degrade the performance at array-level. Therefore, the relevant memory design issues are required to dealing with these noises and ensures the memory array are immune to the relevant influences.

154

6.2.2. Hardened Design with MBU Tolerance

In Chapter 3, the research shows the effective designs of memory cell dealing with the SEU, however, the objective of the hardening design is multiple types, also including Multiple Bit Upset (MBU). The

MBU appears at the outputs of the memory (as not always fully mitigated by the hardened design of the cells) is usually handled at array-level using coding for both error detection and correction thus adding further overhead in terms of hardware and delay. However, as device size shrinks, spacing between nodes decreases significantly and the charge generated from a single event strike may diffuse to affect adjacent nodes. Charge sharing/collection causes multiple cells to be upset when high energy cosmic ray neutrons hit a nanoscale memory, resulting in a high number of MBU. Therefore, a novel approach to MBU tolerance following SEU is required. The novel scheme may utilize the non-volatile storage proposed and rely on adding coding circuitry for detection only (no error correction as in a traditional scheme for MBU tolerance) the correct data in the cell is retrieved from non-volatile storage using a ―Restore‖ operation.

The new scheme will result in a significant reduction in delay and coding hardware.

6.2.3. Hybrid Memory Cache Design with Process Variability Tolerance

The novel hybrid memory cache scheme with custom access pattern has been proposed in Chapter 5.

However, this research can be further investigated for the process variability influence on the cache performance since the proposed hybrid memory cell introduced the eDRAM, which has the retention time and will lose the stored data. Through the periodical refresh operation, the previous stored state will be updated, that brings the power consumption penalty. Meanwhile, the process variability will also impact the retention time and further determine the refresh frequency. This results in the variation of power dissipation and IPC of the hybrid cache. Therefore, the research may begin with the IPC performance distribution for conventional and hybrid cache memories, including the influence generated from the process variability on the eDRAM retention time to sufficiently characterize the performance trade-offs between different cache implementations.

155

7. REFERENCES

[1] G. Moore, ―Cramming more components onto integrated circuit,‖ Electronics, vol. 38, no. 8, pp. 114- 117, 1965. [2] J. B. Kuo and J. H. Lou, ―Low-Voltage CMOS VLSI Circuits,‖ John Wiley & Sons, Inc. ISBN: 0-471- 32105-2, 1999. [3] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, ―Digital Integrated Circuits: A Design Perspective (Second Edition)‖, Prentice Hall, ISBN: 0130909963, 2003. [4] International Solid-State Circuits Conference (ISSCC) Digest, www.isscc.org. [5] International Technology Roadmap for Semiconductor (ITRS), www.itrs.net. [6] K.K. Likharev, ―Single-electron transistors: Electrostatic analogs of the DC SQUIDS,‖ IEEE Transactions on Magnetics, vol. 23, issue 2, pp. 1142-1145, 1987. [7] H. Grabert, ―Single charge tunneling: A brief introduction,‖ Zietschr. phys. B, vol. 85, pp. 319–325, 1991. [8] M. H. Devoret, D. Esteve, and C. Urbina, ―Single-electron transfer in metallic nanostructures,‖ Nature, vol. 360, pp. 547–553, Dec. 1992. [9] K. K. Likharev, ―Single-electron devices and their applications,‖ Proc. IEEE, vol. 87, no. 4, pp. 606– 632, Apr. 1999. [10] J. Hoekstra, ―On Circuit Theories for Single-Electron Tunneling Devices,‖ IEEE Transactions on Circuit and Systems-I, vol. 54, no. 11, 2007. [11] W. C. Zhang, and N. J. Wu, ―Nanoelectronic Circuit Architectures Based on Single-Electron Tusntiles,‖ 2nd IEEE International Conference, 978-1-4244-1573-1, 2008. [12] Rick Bailey, Glen Fox, Jarrod Eliason et al. ―FRAM Memory Technology – Advantages for Low Power, Fast Write, High Endurance Applications‖, Proceedings of the 2005 International Conference on Computer Design (ICCD’05), 0-7695-2451-6/05, 2005. [13] Glen R. Fox, Richard Bailey, William B. Kraus et al. ―The Current Status of FeRAM‖, Topics Appl. Phys. vol. 93, pp. 139–149, 2004. [14] A.L. Lacaita, ―Phase change memories: State-of-the-art, challenges and perspectives‖, Solid-State Electronics, vol. 50, pp. 24–31, 2006. [15] H. Akinaga, and H. Shima, ―Resistive Random Access Memory (ReRAM) Based on Metal Oxides,‖ Proceedings of the IEEE, pp. 2237-2251, 2010. [16] T. W. Hickmott, ―Low-frequency negative resistance in thin anodic oxide films,‖ J. Appl. Phys., vol. 33, no. 9, pp. 2669–2682, 1962. 2 [17] B. Govoreanu, G.S. Kar, Y. Y. Chen, et al., ―10x10nm Hf/HfOx Crossbar Resistive RAM with excellent performance, reliability and low-energy operation,‖ 2011 IEEE International Electron Devices Meeting (IEDM), pp. 31.6.1-31.6.4, 2011.

156

[18] S. S. Sheu, et al., ―Fast-Write Resistive RAM (RRAM) for Embedded Applications,‖ IEEE Design and Test of Computers, vol. 28, issue 1, pp. 64-71, 2011. [19] K. Ishibashi, et al., ―A 1-V TFT-Load SRAM using a two-step word-volatile method,‖ IEEE J. Sol. St. Ckts., 27(11), 1519-1524, 1992. [20] H. Mizuno and T. Nagano, ―Driving Source-Line (DSL) Cell Architecture for Sub-1-V High-Speed Low-Power Applications,‖ Symp. VLSI Ckts Dig., pp. 25-26, 1995. [21] H. Mizuno and T. Nagano, ―Driving Source-Line Cell Architecture for Sub-1-V High-Speed Low- Power Applications,‖ IEEE J. Sol. St. Ckts., 31(4), 552-557, 1996. [22] T. Miwa, J. Yamada, H. Koike et al., ―NV-SRAM: A Nonvolatile SRAM with Backup Ferroelectric Capacitors‖, IEEE Journal of Solid-State Circuits, Vol. 36, No. 3, 2001. [23] M. Takata, K. Nakayama, T. Izumi et al., ―Nonvolatile SRAM based on Phase Change‖, IEEE Non- Volatile Semiconductor Memory Workshop (NVSMW) 2006, pp. 95-96, 2006. [24] S. Yamamoto, Y. Shuto, S. Sugahara et al., ―Nonvolatile SRAM (NV-SRAM) Using Functional MOSFET Merged with Resistive Switching Devices‖, IEEE 2009 Custom Integrated Circuits Conference (CICC), pp. 531-534, 2009. [25] S. Rajwade, W. K. Yu, S. Xu et al., ―Low Power Nonvolatile SRAM Circuit with integrated low power voltage nanocrystal PMOS flash‖, 2010 IEEE International SOC Conference (SOCC), pp. 461- 466, 2010. [26] Y. Shuto, S. Yamamoto, S. Sugahara, ―Analysis of static noise margin and power-gating efficiency of a new nonvolatile SRAM cell using pseudo-spin-MOSFETs‖, IEEE 2012 Silicon Nanoelectronics Workshop (SNW), pp. 1-2, 2012. [27] M. F. Chang, C. H. Chuang, M. P. Chen et al., ―Endurance-Aware Circuit Designs of Nonvolatile Logic and Nonvolatile SRAM Using Resistive Memory (Memristor) Device‖, 2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 329-334, 2012. [28] R.C. Baumann, ―Soft errors in advanced semiconductor devices-part I: the three radiation sources,‖ IEEE Transactions on Device and Materials Reliability, vol. 5, issue 3, pp. 305-316, 2005. [29] C. Detcheverry, C. Dachs, E. Lorfevre et al, ―SEU Critical Charge and Sensitive Area in a Submicron CMOS Technology,‖ IEEE Transactions on Nuclear Science, vol. 44, pp. 2266-2273, 1997. [30] Z. Chishti, A. R. Alameldeen, C. Wilkerson, W. Wu, and S.-L. Lu, ―Improving cache lifetime reliability at ultra-low voltages,‖ Proc. Annu. IEEE/ACM Int’l Symp. Microarchit., pp. 89–99, 2009. [31] P. Reviriego, C. Argyrides, J. A. Maestro, and D. K. Pradhan, ―Improving memory reliability against soft errors using block parity,‖ IEEE Trans. Nucl. Sci., vol. 58, no. 3, pp. 981–986, Jun. 2011. [32] A. Sánchez-Macián, P. Reviriego, J. A. Maestro,―Enhanced Detection of Double and Triple Adjacent Errors in Hamming Codes Through Selective Bit Placement,‖ IEEE Trans. Device and Materials Reliability, vol. 12, no. 2, pp. 357–362, Jun. 2012.

157

[33] D. Radaelli, H. Puchner, S. Wong, S. Daniel, ―Investigation of multi-bit upsets in a 150 nm technology SRAM device,‖ IEEE Trans. Nucl. Sci., vol. 52, no. 6, pp. 2433–2437, Dec. 2005. [34] W. Wei, K. Namba, J. Han and F. Lombardi, ―Design of a Non-Volatile 7T SRAM Cell for Instant-on Operation,‖ IEEE Transactions on , vol. 13, issue 5, pp. 905-916, 2014. [35] W. Wei, K. Namba and F. Lombardi, ―Extending Non-Volatile Operation to DRAM Cells,‖ IEEE Access, vol. 1, pp. 758-769, 2013. [36] W. Wei, K. Namba and F. Lombardi, ―New 4T-Based DRAM Cell Designs,‖ Proceedings of IEEE/ACM Great Lakes Symposium on VLSI 2014, pp. 199-204, May 2014. [37] M. Nicolaidis, R. Perez, and D. Alexandrescu, ―Low-cost highly-robust hardened cells using blocking feedback transistors,‖ in Proc. IEEE VTS, pp. 371-376, 2008. [38] T. Calin, M. Nicolaidis, and R. Velazco, ―Upset hardened memory design for submicron CMOS technology,‖ IEEE Tran. Nucl. Sci., vol. 43, no. 12, pp. 2874-2878, 1996. [39] Y. Shiyanovskii, F. Wolff, and C. Papachristou, ―SRAM cell design protected from SEU upsets,‖ 14th IEEE International On-Line Testing Symposium, pp. 169-170, 2008. [40] Y. Z. Xu, H. Puchner and A. Chatila, O et al, ―Process Impact on SRAM alpha-Particle SEU Performance,‖ 42nd IEEE Intern. Reliability Symp., pp. 194-299, 2004. [41] W. Wei, K. Namba and F. Lombardi, ―Designs and Analysis of Non-Volatile Memory Cells for Single Event Upset (SEU) Tolerance,‖ Proceedings of IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems 2014, pp. 69-74, October 2014. [42] P. Elakkumanan, CC. Tondapu, and R. Sridhar, ―A gate leakage reduction strategy for Sub-70 nm memory circuit,‖ in Proc. IEEE Dallas/CAS Workshop, pp. 145-148, 2004. [43] B. Yu, L. Chang, S. Ahmed, et al., ―FinFET scaling to 10 nm gate length,‖ Int. Electron. Devices Meet. 2002; 251–254. [44] R. Martel, V. Derycke, J. Appenzeller et al., ―Carbon nanotube fieldeffect transistors and logic circuits,‖ ACM SIGDA DAC, 2002. [45] J. Redwing, T. Mayer, S. Mohney et al., ―Semiconductor : building blocks for nanoscale electronics,‖ NSF Nanoscale Science and Engineering Grantees Conference, December 2002. [46] I. Amlani, A.O. Orlov, G. Toth, et al., ―Digital using quantum-dot cellular automata,‖ Science 284 (1999) 289–291. [47] Y. Chen, G. Y. Jung, D.A.A. Ohlberg, et al., ―Nanoscale molecularswitch crossbar circuits,‖ Nanotechnology 14 (2003) 462–468. [48] Likharev, K.K. ―Correlated discrete transfer of single electrons in ultrasmall tunnel junctions‖, IBM J. Res. Dev. 32, (1), pp. 144–156, 1998. [49] K. Nishiguchi, H. Inokawa, Y. Ono, A. Fujiwara and Y. Takahashi, ―Multilevel memory using single- electron turnstile‖, Electrons Letters, Vol. 40 No. 4, 2004.

158

[50] E.S. Soldatov, V.V. Khanin, A.S. Trifonov, S.P. Gubin, et al., ―Room temperature molecular single- electron transistor,‖ Phys. Usp. 41 (2) (1998) 202–204. [51] T. Oya, T. Asai, and Y. Amemiya, ―Stochastic resonance in an ensemble of single-electron neuromorphic devices and its application to competitive neural networks,‖ Chaos, Solitons Fractals, vol. 32, pp. 855–861, 2007. [52] K. C. Smith, ―The prospects of multi-valued logic: A technology and applications view,‖ IEEE Trans. Comput., vol. AC-30, no. 9, pp. 619-634, Sep. 1981. [53] H. Inokawa, A. Fujiwara, and Y. Takahashi, ―A multiple-valued logic and memory with combined single-electron and metal-oxide-semiconductor devices,‖ IEEE Trans. Electron Devices, vol. 50, no. 2, pp. 462-470, Feb.2003. [54] N. M. Zimmermana, E. Hourdakis, Y. Ono, A. Fujiwara, and Y. Takahashi, ―Error mechanisms and rates in tunable-barrier single-electron turnstiles and charge-coupled devices,‖ J. Appl. Phys., vol. 96, pp. 5254–5266. [55] W. C. Zhang and N. J. Wu, ―Smart Universal Multiple-Valued Logic Gates by Transferring Single Electrons,‖ IEEE Transactions on Nanotechnology, vol. 7, no. 4, pp. 440-450, July 2008. [56] W. C. Zhang and N. J. Wu, ―A Novel Hybrid Phase-Locked-Loop Frequency Synthesizer using Single-Electron Devices and CMOS Transistors,‖ IEEE Transactions on Circuits and Systems, vol. 54, no. 11, pp. 2516-2527, Nov. 2007. [57] Y. Nara, ―Scaling challenges of MOSFET for 32 nm node and beyond,‖ VLSI Technology, System, and Application, pp. 72-73, 2009. [58] Berkeley Predictive Technology Model website [Online], http://www.eas.asu.edu/~ptm [59] W. Wei, J. Han and F. Lombardi ―"A Hybrid Memory Cell Using Single-Electron Transfer", Proc. IEEE/ACM Symposium on Nanoarchitectures, pp. 16-23, San Diego, June 2011. [60] W. Wei, J. Han and F. Lombardi, ―Design and Evaluation of a Hybrid Memory Cell by Single- Electron Transfer,‖ IEEE Transactions on Nanotechnology, vol. 12, issue 1, pp. 57-70, 2013. [61] G. Lientschnig, I. Weymann, P. Hadley, ―Simulating hybrid circuits of single-electron transistors and field-effect transistors,‖ IEEE-NANO, 2002. [62] W. Wei and F. Lombardi, ―A HSPICE model for the single-electron turnstile,‖ Proc. ACM GLSVLSI 2012, pp. 221-226, 2012. [63] W. Wei, J. Han and F. Lombardi, ―Robust HSPICE Modeling of a Single-Electron Turnstile,‖ Journal, vol. 45, issue 4, pp. 394-407, 2014. [64] J. Hoekstra and J. Guimaraes, ―Some outlines of circuit applications for a single-electron 2-island subcircuit,‖ in ProRISC/IEEE 2001, pp. 414-419, Nov. 2001.

159

[65] R.H. Klunder and J.Hoekstra, ―Energy conservation in a circuit with single electron tunnel junctions,‖ in The IEEE international Symposium on Circuits and Systems, Sydney, Australia, May 2001, ISCAS 2001, pp. I-591-I-594. [66] H. Inokawa and Y. Takahashi, ―Experimental and simulation studies of single-electron-transistor- based multiple-valued logic,‖ in Proc. 33rd IEEE Int. Symp. Multiple-Valued-Logic, May 2003, pp. 259-266. [67] C. Wasshuber, H. Kosina, and S. Selberherr, ―SIMON—a simulator for single-electron tunnel devices and circuits,‖ IEEE Trans. Comput. Aided Des., vol. 16, no. 9, pp. 937–944, Sep. 1997. [68] R. H. Chen, Meeting Abstracts 96-2 (The Electrochem. Soc., Pennington, Pa., 1996) p. 576. [69] P. R. Gray, P. J. Hurst, S. H. Lewis, and R G Meyer, ―Analysis and Design of Analog Integrated Circuits (Fourth Edition ed.)‖, New York: Wiley. pp. 66–67. ISBN 0-471-32168-0, 2001. [70] W. C. Zhang, et al., ―Transfer and Detection of Single Electrons using Metal-Oxide-Semiconductor Field-Effect-Transistors‖, IEICE Trans. Electron, vol. E90-C, pp. 943-948, May. 2007. [71] W. Zwerger and M. Scharpf, ―Crossover from Coulomb blockade to ohmic conduction in small tunnel junctions‖, Zeitschrift für Physik B - Condensed Matter, 85:421-426, 1991. [72] A. Pavlov, M. Sachdev, ―CMOS SRAM Circuit Design and Parametric Test in Nano-Scaled Technologies: Process-Aware SRAM Design and Test‖, Springer, pp. 40, ISBN: 1402083629, 2008. [73] J. Wang, S. Nalam, and B. H. Calhoun, ―Analyzing Static and Dynamic Write Margin for Nanometer SRAMs‖, Proc. 2008 ACM/IEEE International Symposium on Low and Design (ISLPED), pp. 129-134, Aug 2008. [74] S. Nakatal, H. Suzuki et al., ―Increasing Static Noise Margin of Single-bit-line SRAM by Lowering Bit-line Voltage during Reading‖, Proc. 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1-4, Aug 2011. [75] K. Pagiamtzis, A. Sheikholeslami, ―Content-addressable memory (CAM) circuits and architectures: a tutorial and survey,‖ IEEE J. Solid-State Circuits, vol 41, no. 3, pp.712-727, Mar. 2006 [76] S. Choi, K. Sohn, and H.-J. Yoo, ―A 0.7 fJ/bit/search, 2.2-ns search time hybrid-type TCAM architecture,‖ IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 254–260, Jan. 2005. [77] K. Degawa, T. Aoki, T. Higuchi, ―A high-density ternary content-addressable memory using single- electron transistors,‖ 36th International Symposium on Multiple-Valued Logic, 2006. [78] O. Turkyilmaz et al., ―RRAM-based FPGA for ―Normally Off, Instantly On‖ Applications,‖ Proceedings of 2012 IEEE/ACM International Symposium on Nanoscale Architectures, pp. 101-108, 2012. [79] J. G. Lee, D. H. Kim et al, ―A compact HSPICE macromodel of resistive RAM,‖ IEICE Electronics Express, vol. 4, No. 19, pp. 600-605, 2007.

160

[80] X. Y. Xue, W. X. Jian et al, ―Novel RRAM Programming Technology for Instant-on and High- security FPGAs,‖ 2011 IEEE 9th International Conference on ASIC (ASICON), pp. 291-294, 2011. [81] C. E. Herdt, ―Nonvolatile SRAM – the Next Generation‖, Nonvolatile Memory Technology Review 1993, pp. 28-31, 1993. [82] D. Lee et al., "Resistance switching of copper doped MoOx films for nonvolatile memory applications", Appl. Phys. Lett. 90, 122104 (2007)

[83] A. Chen, S. Haddad, Y. C. Wu et al, ―Erasing characteristics of Cu2O metal-insulator-metal resistive switching memory,‖ Appl. Phys. Lett., vol. 92, pp. 013503-1-013503-3, 2008. [84] S. Onkaraiah, J-M Portal, ―Using OxRRAM Memories for Improving Communications of Reconfigurable FPGA Architectures,‖ Proceedings of 2011 IEEE/ACM International Symphosium on Nanoscale Architecture , pp. 65–69, June 2011. [85] K. Tsunoda, et al., ―Low Power and High Speed Switching of Ti-doped NiO ReRAM under the Unipolar Voltage Source of less than 3V,‖ IEDM Tech. Dig., pp. 767-770, 2007.

[86] B. J. Choi, S. Choi et al, ―Study on the resistive switching time of TiO2 thin films,‖ Appl. Phys. Lett., vol. 89, pp. 012906, 2006. [87] H. Y. Lee, et al., ―Low Power and High Speed Bipolar Switching with A Thin Reactive Ti Buffer

Layer in Robust HfO2 Based RRAM,‖IEDM Tech. Dig., 2008. [88] P. F. Chiu, M. F. Chang, S. S. Sheu, et al., ―A low store energy, low VDDmin, nonvolatile 8T2R SRAM with 3D stacked RRAM devices for low power mobile applications,‖ 2010 Symposium on VLSI Circuits (VLSIC), pp. 229-230, 2010. [89] ―Cadence Virtuoso,‖ www.cadence.com. [90] A. Rubio, J. Figueras, E. I. Vatajelu et al., ―Process variability in sub-16nm bulk CMOS technology,‖ Online. Available: http://hdl.handle.neu/2117/15667, 2012. [91] IEEE IEDM, Presentation, Short Course on Emerging Memories, 2011. [92] P. E. Dodd and L. W. Massengill, ―Basic Mechanisms and Modeling of Single-Event Upset in Digital Microelectronics,‖ IEEE Transactions on Nuclear Science, pp. 583-602, 2003. [93] F. L. Yang and R. A. Saleh, ―Simulation and Analysis of Transient Faults in Digital Circuits,‖ IEEE J. Solid State Circuits, vol. 27, no. 3, pp. 258-264, 1992. [94] C. Razavipour, A. Afzali-Kusha and M. Pedram ―Design and analysis of two low-power SRAM cell structures,‖ IEEE Trans on VLSI, vol. 17, no. 10 , pp.1551-1555, 2009. [95] P. Hazucha, C. Svensson, ―Impact of CMOS technology scaling on the atmospheric neutron soft error rate,‖ IEEE Transactions on Nuclear Science, vol. 47, no. 6, pp. 2586-2594, 2000. [96] T. Heijmen, P. Roche, G. Gasiot, ―A comprehensive study on the soft-error rate of flip- from 90- nm production libraries,‖ IEEE Transactions on Device and Materials Reliability, vol. 7, no. 1, pp. 84- 96, 2007.

161

[97] T. Heijmen, ―Analytical semi-empirical model for SER sensitivity estimation of deep-submicron CMOS circuits,‖ Proceedings of the 11th IEEE International On-Line Testing Symposium (IOTL’05), pp. 3-8, 2005. [98] T. Heijmen, ―Soft-error vulnerability of sub-100-nm flip-flops,‖ Proceedings of the 14th IEEE International On-Line Testing Symposium (IOTL’08), pp. 247-252, 2008. [99] J. Li, P. Ndai, A. Goel, S. Salahuddin, and K. Roy, ―Design paradigm for robust spin-torque transfer magnetic ram (stt mram) from circuit/architecture perspective‖ IEEE Transactions on Very Large Scale Integration (VLSI) Systems , Vol. 18, No. 12, pp. 1710 –1723, 2010. [100] W. Zhao, E. Belhaire, C. Chappert, and J. O. Klein, ―Hybrid Spintronic/CMOS circuit design and analysis, ‖ LAP LAMBERT Academic Publishing, 2010. [101] R. Waser, ―Resistive Non-Volatile Memory Devices,‖ Microelectronics Engineering, Publ. Co, no. 86, pp. 1925-1928, 2009. [102] S. Ganapathy, R. Canal, D. Alexandrescu et al., ―A Novel Variation-Tolerant 4T-DRAM Cell with Enhanced Soft-Error Tolerance,‖ 2012 IEEE 30th International Conference on ICCD, pp. 472-477, 2012. [103] B. Cheng, S. Roy, and A. Asenov, ―CMOS 6-T SRAM cell design subject to ‗atomistic‘ fluctuations,‖ Solid-State Electronics, vol. 51, no. 4, pp. 565-571, 2007. [104] W. K. Luk, J. Cai, R. H. Dennard, et al., ―A 3-Transistor DRAM Cell with Gated Diode for Enhanced Speed and Retention Time,‖ 2006 Symposium on VLSI Circuits Digest of Technical Papers, pp. 184- 185, 2006. [105] E. Amat, C. G. Almudever, N. Aymerich et al., ―Variability mitigation mechanisms in scaled 3T1D- DRAM memories to 22nm and beyond,‖ IEEE Transactions on Device and Materials Reliability (T- DMR), issue 99, pp. 1-6, 2012. [106] X. Liang, R. Canal, G. Y. Wei et al., ―Replacing 6T SRAMs with 3T1D DRAMs in the L1 data cache to combat process variability,‖ IEEE Computer Society on Micro, vol. 28, issue 1, pp. 60-68, 2008. [107] A. Asenov, A. Huang, ―Random dopant induced threshold voltage lowering and fluctuations in sub- 0.1 μm MOSFET‘s: a 3-d ‗atomistic‘ simulation study,‖ IEEE Transactions on Electron Devices, vol. 45, issue 12, pp. 2505-2513, 1998. [108] J. F. Ziegler, ―Terrestrical cosmic rays,‖ IBM Journal of Research and Development, vol. 40, no. 1, 1996. [109] Y. Tosaka, S. Satoh, K. Suzuki, et al., ―Impact of Cosmic Ray Neutron Induced Soft Errors on Advanced Submicron CMOS circuits,‖ 1996 Symposium on VLSI Technology. Digest of Technical Papers, pp. 148-149, 1996. [110] L. Borucki, G. Schindlbeck, ―Impact of DRAM Process Technology on Neutron-Induced Soft Errors,‖ 2007 IEEE International integrated reliability workshop final report, pp. 143-146, 2007.

162

[111] P. Hazucha and C. Svensson, ―Impact of CMOS technology scaling on the atmospheric neutron soft error rate,‖ IEEE Transactions on Nuclear Science, vol. 47, no. 6, pp. 2586-2594, 2000. [112] G. Schindlbeck, ―Types of soft errors in DRAMs,‖ 8th European Conference on Radiation and Its Effects on Components and Systems (RADECS), pp. PE1-1 – PE1-5, 2005. [113] W. K. Henson, N. Yang, S. Kubicek et al., ―Analysis of leakage currents and impact on off-state power consumption for CMOS technology in the 100-nm regime,‖ IEEE Transactions on Electron Devices, vol. 47, no. 7, pp. 1393-1400, 2000. [114] T. Y. Chan, J. Chen, P. K. Ko, ―The impact of gated-induced drain leakage current on MOSFET scaling,‖ 1987 International Electron Device Meeting, vol. 33, pp. 718-721, 1987. [115] K. Roy, S. Mukhopadhyay and H. Mahmoodi-Meimand, ―Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits,‖ Proceedings of IEEE, vol. 91, issue 2, pp. 305-327, 2003. [116] M. Anis and M. H. Aburahma, ―Leakage Current Variability in Nanometer Technologies,‖ Proceedings of Fifth International Workshop on System-on-Chip for Real-Time Applications, pp. 60- 63, 2005. [117] W. K. Luk, and R. H. Dennard, ―Gated diode amplifiers,‖ IEEE Trans on Circuits and System II: Express Briefs, pp. 266-270, 2005. [118] K. Takeda et al., ―A Read-Static-Noise-Margin-Free SRAM Cell for Low-Vdd and High-Speed Applications,‖ IEEE JSSC, pp. 113-121, 2006. [119] L. Chang et al., ―Stable SRAM Cell Design for the 32nm Node and Beyond,‖ Symp. VLSI Tech. Dig., pp. 292-293, 2005. [120] J. C. Koob, S. A. Ung, B. F. Cockburn et al., ―Design and Characterization of a Multilevel DRAM,‖ IEEE Transactions on Very Large Scale Integration (VLSI) System, vol. 19, no. 9, 2011. [121] K. C. Chun, P. Jain, J. H. Lee and C. H. Kim, ―A sub-0.9V logic-compatible embedded DRAM with boosted 3tgain cell, regulated bit-line write scheme and PVT-tracking read reference bias,‖ 2009 Symposium on VLSI Circuits Digest of Technical Papers, pp. 134-135, 2009. [122] K. C. Chun, P. Jain, J. H. Lee and C. H. Kim, ―A 3T gain cell embedded DRAM utilizing preferential boosting for high density and low power on-die caches,‖ IEEE Journal of Solid-State Circuits, vol. 46, no. 6, 2011. [123] W. S. Yu, R. Huang, S. Xu et al., ―SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading,‖ in Proc. ISCA, pp. 247-258, 2011. [124] A. Valero, J. Sahuquillo, S. Petit et al., ―An Hybrid eDRAM/SRAM Macrocell to Implement First- Level Data Caches,‖ Proc. 42nd Ann. IEEE/ACM Int’l Sympo. Microarchitecture, 2009.

163

[125] A. Valero, S. Petit, J. Sahuquillo et al., ―Design, performance, and energy consumption of eDRAM/SRAM macrocells for L1 data caches,‖ IEEE Transactions on Computers, vol. 61, no. 9, 2012. [126] Y. Z. Chang, F. P. Lai, ―Dynamic zero-sensitivity scheme for low-power cache memories,‖ IEEE Micro, vol. 25, issue 4, 2005. [127] J. L. Hennessy, D. Patterson, ―Computer architecture: a quantitative approach (Fourth Edition),‖ Morgan Kaufmann, ISBN: 0123704901, 2006. [128] Y. Zhang, D. Parikh, K. Sankaranarayanan et al., ―Hotleakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects,‖ technical report, Dept. of Computer Science, Uni. of Virginia, 2003. [129] Standard Performance Evaluation Corporation, http://www.spec.org/cpu2000. [130] S. Thoziyoor, N. Muralimanohar, J. H. Ahn et al., ―CACTI 5.1.,‖ technical report, Hewlett-Packard Laboratories, 2008.

164