STANDARD CELL LIBRARY DESIGN WITH FOLDING USING

65NM TECHNOLOGY BY GLOBAL FOUNDRIES

by

Vibhav Kumarswami Salimath

APPROVED BY SUPERVISORY COMMITTEE:

______Dr. Carl M. Sechen, Chair

______Dr. William Swartz

______Dr. Benjamin Carrion Schaefer

Copyright 2018

Vibhav Kumarswami Salimath

All Rights Reserved

To my family and my teachers

STANDARD CELL LIBRARY DESIGN WITH TRANSISTOR FOLDING USING

65NM TECHNOLOGY BY GLOBAL FOUNDRIES

by

VIBHAV KUMARSWAMI SALIMATH, B.E.

THESIS

Presented to the Faculty of

The University of Texas at Dallas

in Partial Fulfillment

of the Requirements

for the Degree of

MASTER OF SCIENCE IN

ELECTRICAL ENGINEERING

THE UNIVERSITY OF TEXAS AT DALLAS

May 2018

ACKNOWLEDGMENTS

I want to thank my advisor, Dr. Carl Sechen, for his continuous supervision and guidance. I took

Dr. Sechen’s course during my first semester in the master’s degree program. I loved the way he taught and came away from the course with a clear idea of my research interests. Working at the

Nanometer Design Lab has been an incredible experience and I am most grateful for this opportunity. Thank you to my friends at Nanometer Design Lab for their valuable input, especially

Xiangyu Xu and Qiongdan Huang (Olivia). I wish to specially thank Dr. William Swartz Jr. for providing me with timely help and support. Thank you to Dr. William Swartz Jr. and Dr. Benjamin

Carrion Schaefer for serving as the committee members for my defense and providing me with their support and advice. I am extremely grateful to my family and friends for their encouragement, which has motivated me to do my best academically.

April 2018

v

STANDARD CELL LIBRARY DESIGN WITH TRANSISTOR FOLDING USING

65NM TECHNOLOGY BY GLOBAL FOUNDRIES

Vibhav Kumarswami Salimath, MSEE The University of Texas at Dallas, 2018

ABSTRACT

Supervising Professor: Dr. Carl M. Sechen

We use the concept of transistor folding to design some of the cells in the cell library. Transistor folding, also known as fingering of MOSFETs, is used when we require cells with larger drive strengths. By keeping the beta ratio (Wp / Wn) fixed, a greater number of are arranged in parallel. Whenever there is a requirement of large current to the load, this technique of transistor folding is employed. The major advantage of using this technique is that it drastically reduces the resistances. To be more precise, if there are N transistors in parallel then the overall resistance reduces by a factor of N. Folding is used to optimize the resistance of the gate poly along the width of the transistor. Gate poly is driven from one end; hence, there is a reason to have a guideline that states maximum width of single finger. Folding is the only way to meet this guideline for large transistors. Our physical library has 16 functions, each with several drive sizes, giving a total of

83 cells. These 16 functions are comprised of simple functions as well as some complex functions.

The complex functions were included in the library design because adding these complex functions improves the synthesis performance. Various parameters such as layout area, fall-time, rise-time, fall-transition time, and rise transition time are obtained during library characterization. The

vi

designed cells were characterized using Siliconsmart ACE and we were able to automatically various designs using Cadence Encounter. The static timing analysis was performed using Synopsys PrimeTime.

vii

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ...…………………………………………………………………… v

ABSTRACT ……………………………………………………………………………………. vi

LIST OF FIGURES ……...……………………………………………………………………... xi

LIST OF TABLES ……………………………………………………………………………... xv

CHAPTER 1 INTRODUCTION

1.1 Significance of a Quality Library …………………………………………… 1

1.2 Importance of Transistor Folding …………………………………………… 1

1.3 Literature Review …………………………………………………………… 2

1.4 Our Work ……………………………………………………………………. 3

CHAPTER 2 TRANSISTOR FOLDING

2.1 Introduction to Transistor Folding ………………………………………….. 4

2.2 Parasitic Capacitances and Parasitic Resistance ……………………………. 6

2.3 Two Ways of Transistor Folding ...... 7

CHAPTER 3 CELL LIBRARY DESIGN

3.1 Inverter ……………………………………………………………………… 8

3.2 Buffer ………………………………………………………………………. 11

3.3 NAND2 …………………………………………………………………….. 13

3.4 NAND3 …………………………………………………………………….. 15

3.5 NOR2 ………………………………………………………………………. 17

3.6 AOI21 ……………………………………………………………………… 20

3.7 AOI22 ……………………………………………………………………… 22

viii

3.8 OAI21 ……………………………………………………………………… 24

3.9 OAI22 ……………………………………………………………………… 26

3.10 MUX 2:1 .…………………………………………………………………. 28

3.11 Mirror-Carry ….…………………………………………………………... 32

3.12 Mirror-Sum ……………………………………………………………….. 34

3.13 XOR2 ……………………………………………………………………... 37

3.14 XNOR2 …………………………………………………………………… 39

3.15 D Flip-Flop ……………………………………………………………….. 41

3.16 Scan Flip-Flop ……………………………………………………………..43

3.17 Filler ……………………………………………………………………… 45

CHAPTER 4 OVERALL DESIGN FLOW

4.1 DRC, LVS, PEX …………………………………………………………… 48

4.2 Abstract Generation and LEF file ………………………………………….. 48

4.3 Library Characterization using Siliconsmart ACE ………………………… 48

4.4 Library Compilation using LC Shell ……………………………………….. 49

4.5 Gate-Level using Design Vision …………………………………… 50

4.6 Automatic Place and Route in Encounter ………………………………….. 50

4.7 Static Timing Analysis by Primetime ……………………………………… 51

CHAPTER 5 RESULTS

5.1 DRC, LVS, PEX Results ……………………………………………………. 52

5.2 Abstract View ………………………………………………………………. 55

5.3 Library Characterization Results ……………………………………………. 55

ix

5.4 Library Compilation Results ………………………………………………... 57

5.5 Cell Report in Design Vision ………………………………………………... 58

5.6 Automatic Place and Route Results …………………………………………. 58

CHAPTER 6 CONCLUSIONS AND FUTURE WORK ……………………………………. 63

REFERENCES ………………………………………………………………………………… 65

BIOGRAPHICAL SKETCH …………………………………………………………………... 67

CURRICULUM VITAE ……………………………………………………………………….. 68

x

LIST OF FIGURES

Figure 2.1 Transistor Layout …………………………………………………………………….. 4

Figure 2.2 Transistor Folding Schematic ………………………………………………………... 5

Figure 2.3 Transistor Folding Layout …………………………………………………………… 5

Figure 2.4 Parasitic Capacitances ……………………………………………………………….. 6

Figure 2.5 (a) Parasitic Resistance of Transistor (b) Parasitic Resistance of Folded Transistor ... 6

Figure 2.6 (a) Unfolded Transistor (b) Even Fingered Transistor (c) Odd Fingered Transistor ... 7

Figure 3.1 Inverter Schematic ……………………...... 8

Figure 3.2 (a) Inverter 0.25x (b) Inverter 0.5x (c) Inverter 1x (d) Inverter 2x (e) Inverter 3x

(f) Inverter 4x (g) Inverter 6x ……………………………………………………………………. 9

Figure 3.2 (h) Inverter 8x (i) Inverter 12x (j) Inverter 16x …………………………………….. 10

Figure 3.3 Buffer Schematic …………………………………………………………………… 11

Figure 3.4 (a) Buffer 0.25x (b) Buffer 0.5x (c) Buffer 1x (d) Buffer 2x ………………………. 11

Figure 3.4 (e) Buffer 3x (f) Buffer 4x (g) Buffer 6x (h) Buffer 8x (i) Buffer 12x …………….. 12

Figure 3.4 (j) Buffer 16x ……………………………………………………………………….. 13

Figure 3.5 NAND2 Schematic …………………………………………………………………. 13

Figure 3.6 (a) NAND2 0.5x(b) NAND2 1x (c) NAND2 2x (d) NAND2 3x (e) NAND2 4x

(f) NAND2 6x ………………………………………………………………………………….. 14

Figure 3.6 (g) NAND2 8x ……………………………………………………………………… 15

Figure 3.7 NAND3 Schematic …………………………………………………………………. 15

Figure 3.8 (a) NAND3 0.5x (b) NAND3 1x (c) NAND3 2x (d) NAND3 3x (e) NAND3 5x …. 16

Figure 3.8 (f) NAND3 7x ………………………………………………………………………. 17

xi

Figure 3.9 NOR2 Schematic …………………………………………………………………… 17

Figure 3.10 (a) NOR2 0.5x (b) NOR2 1x (c) NOR2 2x (d) NOR2 3x (e) NOR2 4x ………….. 18

Figure 3.10 (f) NOR2 6x (g) NOR2 8x ………………………………………………………….19

Figure 3.11 AOI21 Schematic …………………………………………………………………. 20

Figure 3.11 (a) AOI21 0.5x (b) AOI21 1x (c) AOI21 2x …………………………………….... 20

Figure 3.12 (d) AOI21 3x (e) AOI21 5x (f) AOI21 7x ………………………………………… 21

Figure 3.13 AOI22 Schematic …………………………………………………………………. 22

Figure 3.14 (a) AOI22 0.5x (b) AOI22 1x (c) AOI22 2x ……………………………………… 22

Figure 3.14 (d) AOI22 3x (e) AOI22 5x (f) AOI22 7x ………………………………………… 23

Figure 3.15 OAI21 Schematic …………………………………………………………………. 24

Figure 3.16 (a) OAI21 0.5x (b) OAI21 1x (c) OAI21 2x ……………………………………… 24

Figure 3.16 (d) OAI21 3x (e) OAI21 5x (f) OAI21 7x ………………………………………… 25

Figure 3.17 OAI22 Schematic …………………………………………………………………. 26

Figure 3.18 (a) OAI22 0.5x (b) OAI22 1x (c) OAI22 2x ……………………………………… 26

Figure 3.18 (d) OAI22 3x (e) OAI22 5x (f) OAI22 7x ………………………………………… 27

Figure 3.19 (a) Transmission gate MUX 2:1 (b) Static Timing Analysis viable MUX 2:1

(c) Final Design of MUX 2:1 …………………………………………………………………... 28

Figure 3.20 MUX 2:1 Schematic ………………………………………………………………. 29

Figure 3.21 (a) MUX 2:1 0.5x (b) MUX 2:1 1x ……………………………………………….. 29

Figure 3.21 (c) MUX 2:1 2x (d) MUX 2:1 3x (e) MUX 2:1 5x ……………………………….. 30

Figure 3.22 (a) Most Area Efficient (b) Fast Adder

(c) Comparison among different full-adders in a 16 column 16:2 compressor ………………... 31

xii

Figure 3.23 Mirror-Carry Schematic …………………………………………………………... 32

Figure 3.24 (a) Mirror-Carry 0.5x (b) Mirror-Carry 1x (c) Mirror-Carry 2x

(d) Mirror-Carry 3x …………………………………………………………………………….. 33

Figure 3.24 (e) Mirror-Carry 5x ………………………………………………………………... 34

Figure 3.25 Mirror-Sum Schematic ……………………………………………………………. 35

Figure 3.26 (a) Mirror-Sum 0.5x (b) Mirror-Sum 1x ………………………………………….. 35

Figure 3.26 (c) Mirror-Sum 2x (d) Mirror-Sum 3x ……………………………………………. 36

Figure 3.26 (e) Mirror-Sum 5x ………………………………………………………………… 37

Figure 3.27 XOR2 Block Diagram …………………………………………………………….. 38

Figure 3.28 XOR2 Schematic ………………………………………………………………….. 38

Figure 3.29 XOR2 Layout ………………………………………………………………………. 39

Figure 3.30 XNOR2 Block Diagram ……………………………………………………………. 39

Figure 3.31 XNOR2 Schematic ………………………………………………………………… 40

Figure 3.32 XNOR2 Layout …………………………………………………………………….. 40

Figure 3.33 D Flip-Flop Design ………………………………………………………………… 41

Figure 3.34 D Flip-Flop Schematic ……………………………………………………………... 42

Figure 3.35 D Flip-Flop Layout ………………………………………………………………… 42

Figure 3.36 Scan Flip-Flop Block Diagram …………………………………………………….. 43

Figure 3.37 Scan Flip-Flop Schematic ………………………………………………………….. 44

Figure 3.38 Scan Flip-Flop Layout ……………………………………………………………... 44

Figure 3.39 Filler Layout ……………………………………………………………………….. 45

Figure 4.1 ASIC Design Flow …………………………………………………………………... 47

xiii

Figure 5.1 (a) DRC Window with Error ………………………………………………………… 52

Figure 5.1 (b) DRC Clean Window ……………………………………………………………... 53

Figure 5.2 (a) LVS Errors Window ……………………………………………………………... 53

Figure 5.2 (b) LVS Clean Window ……………………………………………………………... 54

Figure 5.3 Contents of PEX File ………………………………………………………………... 54

Figure 5.4 Abstract View of D Flip-Flop ……………………………………………………….. 55

Figure 5.5 Contents of AOI21 Instance File …………………………………………………….. 56

Figure 5.6 Library Characterization Output …………………………………………………….. 56

Figure 5.7 Library Compiler Output ……………………………………………………………. 57

Figure 5.8 Cell Report …………………………………………………………………………... 58

Figure 5.9 Terminal Window after Nanoroute ………………………………………………….. 59

Figure 5.10 (a) Design after Nanoroute …………….…………………………………………… 59

Figure 5.10 (b) Design Zoomed In ……………………………………………………………… 60

Figure 5.11 (a) LEF import successful ………………………………………………………….. 60

Figure 5.11 (b) DEF import successful …………………………………………………………. 61

Figure 5.12 (a) Imported Design in Cadence …………………………………………………... 61

Figure 5.12 (b) Imported Design in Cadence Zoomed In………………………………...... 62

xiv

LIST OF TABLES

Table 3.1 Different Full Adders ………………………………………………………………… 32

Table 3.2 Implementation of XOR2 and XNOR2 ………………………………………………. 37

Table 3.3 Cell Library Cells Summary ………………………………………………………… 46

Table 5.1 Area of All 1x Cells in Library ……………………………………………………….. 57

xv

CHAPTER 1

INTRODUCTION

1.1 Significance of a Quality Library

Library quality is the most important factor in the ASIC design to converge with the design specifications. While designing a library, one should make sure that the gates give the best possible number of sizes and beta ratio (ratio of width of pMOS to width of nMOS) choices for every designed gate. Apart from that, the library should also have an optimal set of gate types which implements a set of logic functions. To meet the performance requirements of a design, a library with continuous transistor sizes for every gate is ideal, but a library with a large number of sizes would not be pragmatic because it would be expensive to produce and maintain. Thus, only the essential gate sizes with minimum impact on design performance must be chosen.

1.2 Importance of Transistor Folding

Out of various design features, timing delay is an important one. To meet the timing requirements, we need transistors with all kinds of current driving abilities. If we increase transistor size, the circuit delay reduces but chip area adds up, making it very difficult to strike a balance between the two. To efficiently use the chip area, transistor folding should be employed. In transistor folding, the height of cells is kept uniform. This subsequently reduces the total layout area. Additionally, the resistance of the transistor can be reduced significantly using transistor folding. Because the transistors are arranged in parallel to one another, the decreases N fold if there are N transistors in parallel. Every node process has a guideline that states the maximum width of a single finger. The only solution for this is transistor folding.

1

1.3 Literature Review

Research on standard cell library content and transistor folding has long been underway. Various methodologies are discussed to optimize the library [1, 10-11]. Utilizing two cell libraries, one for physical design and another for synthesis, a methodology was proposed to determine the contents of the most power efficient physical library [1]. Their results suggested that the drive sizes of 0.5x,

1x, 2x, 3x, and 4x and 3-4 beta ratios is the simplest library that has near optimal power efficiency.

Algorithms are presented to reduce the layout area using the transistor folding technique [2-8].

Designing the physical library with just 8 functions and with a much more complex synthesis library resulted in a dynamic power reduction of 25%-35% and leakage power reduction by 50%-

70% for large blocks [10]. With the usage of reversible gates such as Feynman gate, Fredkin gate,

Toffoli gate and Peres gate, a standard cell library with low power dissipation is characterized [11].

Time transition algorithm O(K3L3) to minimize area was proposed, where K is the number of implementations of each transistor due to folding, and L is the channel length [4]. Another efficient algorithm was put forward to find the optimal transistor folding sizes in row-based designs; for a

CMOS with m pairs of pMOS and nMOS this algorithm finds the optimal folding size in O(m2 log m) time complexity with the effective reduction of the solution space [6]. There is another technique which is proposed for pull-up transistor folding [9]. Following the standard cell approach does not make the layouts of the circuits fully optimized, due to the restricted number of cells present in the library. This problem is solved using an open source automatic synthesis tool called ASTRAN, which helps generate layouts with unrestricted cell structures which are very similar to the designed layouts. Transistor folding is an important step in the flow followed by this tool. The results after transistor folding using this tool showed substantial decrease in cell area [7].

2

1.4 Our Work

Here, we use the results suggested in [1] that the drive sizes of 0.5x, 1x, 2x, 3x, and 4x are sufficient for a power efficient library design. Our design consists of 16 functions which include Inverter,

Buffer, NAND2, NAND3, NOR2, AOI21, AOI22, OAI21, OAI22, MUX 2:1, Mirror-Carry,

Mirror-Sum, XOR2, XNOR2, D Flip-Flop, and Scan Flip-Flop. In our design, we added more drives such as 0.25x, 6x, 8x, 12x, and 16x to simpler cells such as inverter and buffer, and further we added drives 6x and 8x to NAND2, NAND3, NR2, AOI21, AOI22, OAI21, and OAI22. Drive

6x was added to MUX 2:1, Mirror-Carry, and Mirror-Sum. These additional drives were added to compensate for the loss of the 3-4 beta ratios required for power efficient design. Because all these drives are designed with transistor folding, there are no different beta ratios. XOR2, XNOR2, D

Flip-Flop and Scan Flip-Flop are designed with just one drive size of 1x because it does not make it simpler if larger drive size cells are included in the library. So, we designed our physical library in Cadence Virtuoso using the 65nm technology by Global Foundries. A total of 83 cells were designed with various drive sizes from 0.25x to 16x. Library characterization was done using

Synopsys Siliconsmart ACE. A number of different designs were automatically placed and routed using Cadence Encounter. The netlist was generated using Synopsys Design Vision. The static timing analysis was done using PrimeTime.

3

CHAPTER 2

TRANSISTOR FOLDING

2.1 Introduction to Transistor Folding

The MOS transistor’s performance changes with the change in its channel length (L) and channel width (W). If a transistor is operating in saturation mode, then its drain current (ID) is given by:

2 ID = K * (W/L) * (VGS – Vt) * (1+λVDS) ……………………………………………. (2.1.1)

K and λ can be taken as process technology constants. VGS is the gate to source voltage, and VDS is drain to source voltage. The aspect ratio of the transistor can be modified by using the transistor current equation.

Let us consider a transistor layout with channel width (W) 20 um, as shown below in Figure 2.1

Figure 2.1 Transistor Layout

This has an awkward aspect ratio. Using transistor folding, this can be fixed. The aspect ratio of the transistor can be modified by using the transistor current equation (2.1.1); this width of 20 um is like having four transistors connected in parallel, each with a width of 5 um. Figure 2.2 and Figure 2.3 explain transistor folding concept through schematic and layout respectively.

4

Figure 2.2 Transistor Folding Schematic

The layout shown below explains the transistor folding.

Figure 2.3 Transistor Folding Layout

5

2.2 Parasitic Capacitances and Parasitic Resistance

The parasitic capacitances associated with a transistor are shown below (Figure 2.4)

Figure 2.4 Parasitic Capacitances

Here, the capacitances j, jsw, gb, db, and sb are junction, junction side wall, gate to bulk, drain to bulk, and source to bulk, respectively. The parasitic capacitance values change if folding of transistors is employed, especially the source to bulk capacitance and drain to bulk capacitance.

Fingering reduces resistance drastically. If the resistance of the transistor in Figure 2.5 (a) is R, and fingering is done once, all these resistances come in parallel. Hence, the resistance reduces by a factor of N, seen in Figure 2.5 (b).

(a) (b)

Figure 2.5 (a) Parasitic Resistance of Transistor (b) Parasitic Resistance of Folded Transistor

6

2.3 Two Ways of Transistor Folding

The transistor can be folded in two ways. A transistor can have either an even number of fingers or an odd number of fingers. A transistor with channel width (W) when folded to have an even number of fingers will have widths as W/2, W/4, W/6, etc., whereas if it has an odd number of fingers the widths will be W/3, W/5, W/7, etc. Figure 2.6 (a) shows the unfolded transistor. Figure

2.6 (b) shows transistor folding with even number of fingers. Figure 2.6 (c) shows transistor folding with odd number of fingers

W

(a)

(a)

(b) (c)

Figure 2.6 (a) Unfolded Transistor (b) Even Fingered Transistor (c) Odd Fingered Transistor

7

CHAPTER 3

CELL LIBRARY DESIGN

The standard cell libraries of today are difficult to create and maintain because they are comprised of thousands of standard cells. The growing demand for power minimization makes it very important to examine whether the functions in the library are power efficient. A physical library which contains cells with necessary functions is designed to deal with these issues. The important and necessary cells that must be included in any cell library are Inverter, Buffer, NAND2, NAND3,

NOR2, AOI21, AOI22, OAI21, and OAI22 because other complex cells constitute these simple cells. The other cells which are surely needed are MUX 2:1, full adder consisting of Mirror-Carry and Mirror-Sum, XOR2, XNOR2, D Flip-Flop and a Scan Flip-Flop. The minimum cell height is found to be 4.94um. The pin pitch was set to 0.26um and offset to 0.13um. The β ratio (Wp / Wn) is set to 1.5 for the entire design.

3.1 Inverter

The inverter function is given as, out = ~in; where in is input and out is the output.

Inverter is designed with 10 different drive sizes: 0.25x, 0.5x, 1x, 2x, 3x, 4x, 6x, 8x, 12x, and 16x.

This has the simplest design. The schematic and layouts are shown in Figure 3.1 and Figure 3.2 respectively.

Figure 3.1 Inverter Schematic

8

The drives 2x, 3x, 4x, 6x, 8x, 12x, and 16x are designed with transistor folding.

4.94um

(a) (b) (c) (d)

(e) (f) (g)

Figure 3.2 (a) Inverter 0.25x (b) Inverter 0.5x (c) Inverter 1x (d) Inverter 2x (e) Inverter 3x

(f) Inverter 4x (g) Inverter 6x

9

4.94um

(h) (i)

(j)

Figure 3.2 (h) Inverter 8x (i) Inverter 12x (j) Inverter 16x

10

3.2 Buffer

The buffer function is given as, out = ~(~in); where in is input and out is the output.

Buffer is designed with 10 different drive sizes: 0.25x, 0.5x, 1x, 2x, 3x, 4x, 6x, 8x, 12x, and 16x.

This has 2 inverters in series. The schematic and layouts are shown in Figure 3.3 and Figure 3.4 respectively.

Figure 3.3 Buffer Schematic

The drives 2x, 3x, 4x, 6x, 8x, 12x, and 16x are designed with transistor folding.

4.94um

(a) (b) (c) (d)

Figure 3.4 (a) Buffer 0.25x (b) Buffer 0.5x (c) Buffer 1x (d) Buffer 2x

11

4.94um

(e) (f) (g)

(h) (i)

Figure 3.4 (e) Buffer 3x (f) Buffer 4x (g) Buffer 6x (h) Buffer 8x (i) Buffer 12x

12

4.94um

(j)

Figure 3.4 (j) Buffer 16x

3.3 NAND2

The NAND2 function is defined as, out = ~(a & b); where a and b are inputs and out is output.

NAND2 is designed with 7 different drive sizes: 0.5x, 1x, 2x, 3x, 4x, 6x, and 8x. The schematic and layouts are shown in Figure 3.5 and Figure 3.6 respectively.

Figure 3.5 NAND2 Schematic

13

The drives 2x, 3x, 4x, 6x, and 8x are designed with transistor folding.

(a) (b) (c) (d)

4.94um

(e) (f)

Figure 3.6 (a) NAND2 0.5x (b) NAND2 1x (c) NAND2 2x (d) NAND2 3x (e) NAND2 4x

(f) NAND2 6x

14

4.94um

(g)

Figure 3.6 (g) NAND2 8x

3.4 NAND3

The NAND3 function is defined as, out = ~(a & b & c); where a, b, and c are inputs and out is output. NAND3 is designed with 6 different drive sizes: 0.5x, 1x, 2x, 3x, 5x, and 7x. Here, odd fingers 5x and 7x are used because there is a diffusion break at 2x drive in nMOS Euler trail. The schematic and layouts are shown in Figure 3.7 and Figure 3.8 respectively.

Figure 3.7 NAND3 Schematic

15

The drives 2x, 3x, 5x, and 7x are designed with transistor folding.

4.94um

(a) (b) (c)

(d) (e)

Figure 3.8 (a) NAND3 0.5x (b) NAND3 1x (c) NAND3 2x (d) NAND3 3x (e) NAND3 5x

16

4.94um

Figure 3.8 (f) NAND3 7x

3.5 NOR2

The NOR2 function is defined as, out = ~(a + b); where a and b are inputs and out is output.

NOR2 is designed with 7 different drive sizes: 0.5x, 1x, 2x, 3x, 4x, 6x, and 8x. The schematic and layouts are shown in Figure 3.9 and Figure 3.10 respectively.

Figure 3.9 NOR2 Schematic

17

The drives 2x, 3x, 4x, 6x, and 8x are designed with transistor folding.

4.94um

(a) (b) (c)

(d) (e)

Figure 3.10 (a) NOR2 0.5x (b) NOR2 1x (c) NOR2 2x (d) NOR2 3x (e) NOR2 4x

18

4.94um

(f)

(g)

Figure 3.10 (f) NOR2 6x (g) NOR2 8x

19

3.6 AOI21

The AOI21 function is defined as, out = ~((a & b) + c); where a, b, and c are inputs and out is output. AOI21 is designed with 6 different drive sizes: 0.5x, 1x, 2x, 3x, 5x, and 7x. Here, odd fingers 5x and 7x are used because there is a diffusion break at 2x drive in nMOS Euler trail. The schematic and layouts are shown in Figure 3.11 and Figure 3.12 respectively.

Figure 3.11 AOI21 Schematic

The drives 2x, 3x, 5x, and 7x are designed with transistor folding.

4.94um

(a) (b) (c)

Figure 3.11 (a) AOI21 0.5x (b) AOI21 1x (c) AOI21 2x

20

4.94um

(d) (e)

(f)

Figure 3.12 (d) AOI21 3x (e) AOI21 5x (f) AOI21 7x

21

3.7 AOI22

The AOI22 function is defined as, out = ~((a & b) + (c & d)); where a, b, c, and d are inputs and out is output. AOI22 is designed with 6 different drive sizes: 0.5x, 1x, 2x, 3x, 5x, and 7x. Here, odd fingers 5x and 7x are used because there is a diffusion break at 2x drive in nMOS Euler trail.

The schematic and layouts are shown Figure 3.13 and Figure 3.14 respectively.

Figure 3.13 AOI22 Schematic

The drives 2x, 3x, 5x, and 7x are designed with transistor folding.

4.94um

(a) (b) (c)

Figure 3.14 (a) AOI22 0.5x (b) AOI22 1x (c) AOI22 2x

22

(d) (e)

(f)

Figure 3.14 (d) AOI22 3x (e) AOI22 5x (f) AOI22 7x

23

3.8 OAI21

The OAI21 function is defined as, out = ~((a + b) & c ); where a, b, and c are inputs and out is output. OAI21 is designed with 6 different drive sizes: 0.5x, 1x, 2x, 3x, 5x, and 7x. Here, odd fingers 5x and 7x are used because there is a diffusion break at 2x drive in pMOS Euler trail. The schematic and layouts are shown in Figure 3.15 and Figure 3.16 respectively.

Figure 3.15 OAI21 Schematic

The drives 2x, 3x, 5x, and 7x are designed with transistor folding.

4.94um

(a) (b) (c)

Figure 3.16 (a) OAI21 0.5x (b) OAI21 1x (c) OAI21 2x

24

4.94um

(d) (e)

(f)

Figure 3.16 (d) OAI21 3x (e) OAI21 5x (f) OAI21 7x

25

3.9 OAI22

The OAI22 function is defined as, out = ~((a + b) & (c + d) ); where a, b, c, and d are inputs and out is output. OAI22 is designed with 6 different drive sizes: 0.5x, 1x, 2x, 3x, 5x, and 7x. Here, odd fingers 5x and 7x are used because there is a diffusion break at 2x drive in pMOS Euler trail.

The schematic and layouts are shown in Figure 3.17 and Figure 3.18 respectively.

Figure 3.17 OAI22 Schematic

The drives 2x, 3x, 5x, and 7x are designed with transistor folding.

4.94um

(a) (b) (c)

Figure 3.18 (a) OAI22 0.5x (b) OAI22 1x (c) OAI22 2x

26

4.94um

(d) (e)

(f)

Figure 3.18 (d) OAI22 3x (e) OAI22 5x (f) OAI22 7x

27

3.10 MUX 2:1

The MUX 2:1 function is given as, out = ((~s & a) + (s & b) ); where s is the input select line, a and b are inputs, and out is output. MUX 2:1 is implemented usually using either pass transistor logic or transmission gate. MUX 2:1 is shown in Figure 3.19 (a). The static timing analysis tools require purely capacitive inputs, hence MUX 2:1 is redesigned as in Figure 3.19 (b). The encircled inverter transmission gate pair here is equivalent to a tri-state inverter; it can be finally designed in the way as shown in Figure 3.19 (c).

Figure 3.19 (a) Transmission gate MUX 2:1 (b) Static Timing Analysis viable MUX 2:1

(c) Final Design of MUX 2:1

28

MUX 2:1 is designed with 5 different drive sizes: 0.5x, 1x, 2x, 3x, and 5x. Here, odd finger 5x is used because there are 2 diffusion breaks at 2x drive. The schematic and layouts are shown in

Figure 3.20 and Figure 3.21 respectively.

Figure 3.20 MUX 2:1 Schematic

The drives 2x, 3x, and 5x are designed with transistor folding.

4.94um

(a) (b)

Figure 3.21 (a) MUX 2:1 0.5x (b) MUX 2:1 1x

29

4.94um

(c) (d)

(e)

Figure 3.21 (c) MUX 2:1 2x (d) MUX 2:1 3x (e) MUX 2:1 5x

30

Full adders are necessary cells in the library. There are two types of full adders. One is the most area efficient, consisting of mirror-carry and mirror-sum as shown in Figure 3.22 (a), and the other is the fast adder consisting of 2 XOR2’s as shown in Figure 3.22 (b). The comparison of these two types of full-adders is shown in Table 3.1. Comparison among these 2 full-adders in a 16 column

16:2 compressor is made, and the results are found as shown in Figure 3.22 (c). Thus, the cell library must include the mirror based full adder since it has a substantial power and area advantage for longer delays.

(a) (b)

(c)

Figure 3.22 (a) Most Area Efficient Adder (b) Fast Adder

(c) Comparison among different full-adders in a 16 column 16:2 compressor

31

Table 3.1 Different Full Adders

Type Structure Transistors Diffusion Breaks

1 Mirror Carry + Mirror Sum 28 3

2 XOR2 32 5

3.11 Mirror-Carry

Mirror-carry is used over other designs because it has a smaller number of effective transistors and the energy is 30% less for larger delays. Its function is defined as, out = ~(~(a & b) + (c & (a+b))); a, b, and c are inputs, and out is the output. Mirror-carry is designed with 5 different drive sizes:

0.5x, 1x, 2x, 3x, and 5x. Here, odd finger 5x was used because there are 2 diffusion breaks. The schematic and layouts are shown in Figure 3.23 and Figure 3.24 respectively.

Figure 3.23 Mirror-Carry Schematic

32

The drives 2x, 3x, and 5x are designed with transistor folding.

4.94um

(a) (b)

4.94um

(c) (d)

Figure 3.24 (a) Mirror-Carry 0.5x (b) Mirror-Carry 1x (c) Mirror-Carry 2x (d) Mirror-Carry 3x

33

4.94um

Figure 3.24 (e) Mirror-Carry 5x

3.12 Mirror-Sum

Similar to mirror-carry, mirror-sum is used because of its smaller number of effective transistors and also its 30% less energy for higher delays. Its function is defined as, out = ~(~((a & b & c) + ((~((a & b) + (c & (a + b)))) & (a+ b + c)))); a, b, and c are inputs and out is output. Mirror-sum is designed with 5 different drive sizes: 0.5x, 1x, 2x, 3x, and 5x. Here, odd finger 5x is used because there are 4 diffusion breaks. The schematic and layouts are shown in

Figure 3.25 and Figure 3.26 respectively.

34

Figure 3.25 Mirror-Sum Schematic

The drives 2x, 3x, and 5x are designed with transistor folding.

4.94um

(a) (b)

Figure 3.26 (a) Mirror-Sum 0.5x (b) Mirror-Sum 1x

35

4.94um

(c)

(d)

Figure 3.26 (c) Mirror-Sum 2x (d) Mirror-Sum 3x

36

4.94um

Figure 3.26 (e) Mirror-Sum 5x

3.13 XOR2

The function of XOR2 is out = ((~a) & b) + (a & (~b)); where a and b are inputs and out is the output. The implementation using NOR2 and AOI21 is shown, XOR2 = a’b + ab’ = ((a’b + ab’)’)’

=> (ab + a’b’)’ = (ab + (a+b)’) = (ab + c)’; where c = (a + b)’. Table 3.2 shows the implementation of XOR2 and XNOR2. Because the number of effective transistors is smaller compared to other designs, it is a better design.

Table 3.2 Implementation of XOR2 and XNOR2

Cell Structure Transistor Diffusion breaks

XOR2 NOR2 + AOI21 10 1

XNOR2 NAND2 + OAI21 10 1

37

The block diagram is shown in Figure 3.27, and this was designed with only one drive size 1x.

The schematic and layouts are shown in Figure 3.28 and Figure 3.29 respectively.

Figure 3.27 XOR2 Block Diagram

Figure 3.28 XOR2 Schematic

38

4.94um

Figure 3.29 XOR2 Layout

3.14 XNOR2

The function of XNOR2 is out =(((~a) & (~b)) + (a & b)); where a and b are inputs and out is the output. The implementation using NAND2 and OAI21 is shown, XNOR2= a’b’+ab=((a’b’ + ab)’)’

=> ((a’b’)’ + (ab)’)’ = (((a’)’ + (b’)’) (ab)’)’ = ((a+b) c)’ ; where c = (ab)’. The block diagram is shown in Figure 3.30. XNOR2 is designed with only one drive size 1x. The schematic and layouts are shown in Figure 3.31 and Figure 3.32 respectively.

Figure 3.30 XNOR2 Block Diagram

39

Figure 3.31 XNOR2 Schematic

4.94um

Figure 3.32 XNOR2 Layout

40

3.15 D Flip-Flop

The D flip-flop designed is a negative edge triggered one. The design has 2 diffusion breaks and is as shown below in Figure 3.33. The functionality is given as Q = D. Q is the output, sometimes called the next state. D is the input, also called the present state. R is reset. Φ is clock input. Only one drive is designed for D flip-flop, which is 1x.

Figure 3.33 D Flip-Flop Design

41

The schematic and layout are shown in Figure 3.34 and Figure 3.35 respectively.

Figure 3.34 D Flip-Flop Schematic

4.94um

Figure 3.35 D Flip-Flop Layout

42

3.16 Scan Flip-Flop

The design of a Scan flip-flop is implemented using a combination of MUX 2:1 and a D flip-flop.

The output of MUX 2:1 is given to the input (D) of D flip-flop. The design has 3 diffusion breaks.

Block Diagram is shown in Figure 3.36. The schematic is shown in Figure 3.37 and the layout in

Figure 3.38. Only one drive is designed, that is 1x. The functionality for this design is given by the following: Q = ((~s & D) + (s & SI)). Q is the output, also called next state; s is scan enable input;

D is the input or present state; SI is scan in; R is reset. Φ is clock input. Scan flip-flops are used mainly for design verification purposes, also called design for test (DFT). A Scan flip-flop is used in scan chain testing; this is a method to detect various manufacturing faults in the silicon.

Figure 3.36 Scan Flip-Flop Block Diagram

43

Figure 3.37 Scan Flip-Flop Schematic

4.94um

Figure 3.38 Scan Flip-Flop Layout

44

3.17 Filler

In the standard cell automatic place and route (APR) flow, the cells in the design are placed in rows. To make sure that each cell gets a power and ground connection, the cells are abutted together so that the VDD and VSS terminal of neighboring cells overlap. This makes it possible to tap power only at one point anywhere in the row. But it is virtually impossible to fill 100% of the die area with regular cells, due to the need for extra vertical routing space. So, we use filler cells to fill these spaces between regular library cells to ensure a contiguous power rail. Filler cells are used for connecting the gaps between the cells after placement. Filler cells primarily are non- functional cells used to continue the VDD and VSS rails. Figure 3.39 shows the layout of the filler cell. The width of the filler should be equal to a grid size, which in our case is 0.26um.

4.94um

0.26um

Figure 3.39 Filler Layout

45

Table 3.3 Cell Library Cells Summary

# Gate Drive Sizes # Gate Drive Sizes

1 Inverter 0.25x, 0.5x, 1x, 2x, 3x, 2 Buffer 0.25x, 0.5x, 1x, 2x, 3x,

4x, 6x, 8x, 12x, 16x 4x, 6x, 8x, 12x, 16x

3 NAND2 0.5x, 1x, 2x, 3x, 4x, 6x, 4 NAND3 0.5x, 1x, 2x, 3x, 5x, 7x

8x

5 NOR2 0.5x, 1x, 2x, 3x, 4x, 6x, 6 AOI21 0.5x, 1x, 2x, 3x, 5x, 7x

8x

7 AOI22 0.5x, 1x, 2x, 3x, 4x, 5x, 8 OAI21 0.5x, 1x, 2x, 3x, 5x, 7x

7x

9 OAI22 0.5x, 1x, 2x, 3x, 5x, 7x 10 Mirror- 0.5x, 1x, 2x, 3x, 5x

Carry

11 Mirror- 0.5x, 1x, 2x, 3x, 5x 12 MUX 2:1 0.5x, 1x, 2x, 3x, 5x

Sum

13 XOR2 1x 14 XNOR2 1x

15 D FF 1x 16 Scan FF 1x

Table 3.3 gives details of all the cells designed with different drive sizes, 83 cells were designed in total.

46

CHAPTER 4

OVERALL DESIGN FLOW

The design flow of any ASIC is from register transfer logic (RTL) to graphic database system

(GDS). This flow involves various steps, which are explained in the Figure 4.1.

Figure 4.1 ASIC Design Flow

47

4.1 DRC, LVS, PEX

Once the cell layouts are designed, the next step is to check if each cell design is DRC (design rule check) and LVS (layout versus schematic) clean. To do so, we use Calibre DRC and Calibre LVS.

After it is found that the designs are clean, parasitic extraction (PEX) is done. In our case, resistance, capacitance and coupled capacitance are considered. The output file is set as SPICE

(.sp) file. After running the PEX, three new files are generated (with extensions .sp, .pex, and .pxi).

These files are used for library characterization.

4.2 Abstract Generation and LEF file

Post DRC, LVS, and PEX, we are required to generate the abstract views of each of the cells designed. Abstract view accounts for the following layers: metal1, metal2, and via1. This view is used to create the library exchange format (LEF). LEF contains all the routing details. The LEF file is created by exporting the abstract views from Cadence.

4.3 Library Characterization using Siliconsmart ACE

Siliconsmart ACE is a software tool that creates a library in Liberty (.lib) format from a set of

SPICE models, cell functional descriptions, and associated . The created library can then be used for power, timing and noise analysis with compatible tools such as library compiler, IC compiler, design compiler, and Primetime. Using the PEX files, we are able to characterize the library. To characterize the library, Siliconsmart ACE is used. The following are the steps involved in characterizing the library using Siliconsmart: Run the Siliconsmart ACE tool. Store the configuration file, instance file (functionality of the cell), netlists (3 generated PEX files), and the

48

library characterized file into a separate folder. To create this separate folder, “create -legacy

” command is used. “Config” folder should contain the configuration file (usually a tcl script specific to technology). “Control” folder should contain the instance file (which has the functionality details about the cell). “Netlists” folder should contain the three PEX files. These are the initial steps before starting the characterization. To start with, source the Synopsys directory and then run Siliconsmart tool by typing a simple command, “siliconsmart”. To start characterization, the location of the cell to be characterized must be set. This is done by using the command “set_location ”. After the location is set, the next step is configuring, followed by characterizing. The commands “configure” and “characterize” are used to configure and characterize the library. To finish the process, the command “model” is used. If there are any errors executing the characterization step, then there is a problem with the netlists. For Global

Foundries 65nm Technology, the configuration file should include the path given below,

“/proj/cad/library/mosis/GF65_LPe/cmos10lpe_CDS_oa_dl064_11_20160415/models/YI-

SM00030/Hspice/models/design.inc”. This is specific to the particular technology. The cell area value and function are to be inputted into the instance file. If the cell is characterized successfully, then the library characterized file (.lib) is created in the model folder. All the .lib files can be combined into one single .lib file. Combining all the .lib files completes the library characterization process.

4.4 Library Compilation using LC Shell

Library Compiler reads the liberty (.lib) file and writes it into a database (.db) format. This written database file is further used to generate the mapped netlist file. The compilation is done using the

49

lc shell. First, we need to run the compiler by using the command “lc_shell” and then read the liberty file using the command “read_lib /library.lib”. Finally, the read library needs to be written into database format; this is done with the help of command “write_lib library -format db -output library.db”.

4.5 Gate-Level Netlist using Design Vision

We use Design Vision to convert the behavioral level code (verilog or vhdl) to the gate level. This gate level code is eventually used to automatically place and route the design. First, we need to setup the database library file (generated using library compiler) and then read the behavioral code.

Then, we need to compile the design. After compiling is done, the gate level netlist is generated for the design, including a report of the number the cells. We need to save the file as verilog (.v) to read the design into the automatic place and route tool.

4.6 Automatic Place and Route in Encounter

To automatically place and route the design in Encounter, the first step is to import the design. For this, we are required to import library exchange format (LEF) and mapped verilog netlist generated using Design Vision. We need to define the power and ground pins while importing the design.

After importing, we need to specify the floorplan. Floor planning is done based on the pin pitch and offset. In our case pin pitch was 0.26um and offset was 0.13um. So, it is better to have a design which is an integral multiple of pin pitch. The row spacing is set to be 4.94um (the same as the height of the designed cell). This would be a better choice because it provides symmetry to the design. The row orientation is not set to double back rows and the cell orientation is set to zero

50

degrees/no rotation. After floor planning, we need to place standard cells and filler. The next step is to specify global nets, VDD, and VSS (ground). Power planning needs to be done after setting the global nets. Adding rings around the design will make the routing easier. Everything is now set for routing, and here we use nanoroute, which includes both global routing and detail routing.

Once the routing is successfully completed without any DRC violations, this means that the design is successfully placed and routed. We then must save the design as design exchange format (DEF) file. The saved DEF and LEF files need to be imported into Cadence for final design verification.

4.7 Static Timing Analysis by PrimeTime

Static timing analysis (STA) is performed here using a tool called PrimeTime. To perform the

STA, five files are needed. First is the mapped verilog netlist. Second is the primetime.script file which contains all the information about the analysis to be performed. Third is the liberty file (.lib) which was generated using Siliconsmart ACE. Fourth is the database file (.db) generated using the library compiler. Fifth is the variables file which sets the library file, verilog file, and cell name. It also sets parameters such as clock period, load, clock name, and reset pin name. Based on the cell to be analyzed, the cell name must be changed in the variables file. To run PrimeTime, we source the Synopsys profile and then enter the command “pt_shell -f primetime.script”. The tool will report the slack as positive or negative. If the slack is negative, we need to fix it by increasing our clock period to make it positive. The tool also reports the power consumption.

51

CHAPTER 5

RESULTS

The results found are shown step wise based on the design flow.

5.1 DRC, LVS, PEX Results

Figure 5.1 shows the output Calibre DRC window after the design rule check is done.

Figure. 5.1 (a) is the output if there are any errors in the design. Figure 5.1 (b) shows the clean output. Figure 5.2 shows output of the LVS check. Figure 5.2 (a) shows a failed LVS check and

Figure 5.2 (b) shows a clean LVS check. Figure 5.3 shows the typical PEX file.

Figure 5.1 (a) DRC Window with Error

52

Figure 5.1 (b) DRC Clean Window

Figure 5.2 (a) LVS Errors Window

53

Figure 5.2 (b) LVS Clean Window

Figure 5.3 Contents of PEX File

54

5.2 Abstract View

Figure 5.4 shows the abstract view of the D flip-flop. Abstract view encloses only metal1 (M1), metal2 (M2), and via1 (V1). Hence in this view all the others are not visible.

Figure 5.4 Abstract View of D Flip-Flop

5.3 Library Characterization Results

The instance file consists of the information on the functionality of the cell. The functionality has to be changed for the other cells. Figure 5.5 shows the instance file contents of AOI21. Table 5.1 has the area of all the 1x cells in the library. Figure 5.6 shows the window after characterizing the library.

55

Figure 5.5 Contents of AOI21 Instance File

Figure 5.6 Library Characterization Output

56

Table 5.1 Area of All 1x Cells in Library

# GATE Area (um2) # GATE Area (um2)

1 AOI21 5.1376 9 Mirror-Carry 8.9908

2 AOI22 6.422 10 Mirror-Sum 19.266

3 Buffer 3.8532 11 MUX 2:1 10.2752

4 D Flip-Flop 25.688 12 OAI21 5.1376

5 Inverter 2.5688 13 OAI22 6.7184

6 NAND2 3.8532 14 Scan Flip-Flop 35.9632

7 NAND3 5.1376 15 XNOR2 7.7064

8 NOR2 3.8532 16 XOR2 7.7064

5.4 Library Compilation Result

Figure 5.7 shows the successful compilation of the library in the terminal.

Figure 5.7 Library Compiler Output

57

5.5 Cell Report in Design Vision

The total number of the cells used in the design is found here. The cells the design uses are the physical cells designed in the cell library. Figure 5.8 shows the reported cells.

Figure 5.8 Cell Report

5.6 Automatic Place and Route Results

Figure 5.9 shows the terminal of our design with zero DRC violations. Figure 5.10 (a) shows the design post nanoroute. Figure 5.10 (b) shows the zoomed in view of the design after nanoroute.

Figure 5.11 (a) and Figure 5.11 (b) show that the designs are imported successfully into Cadence.

58

Figure 5.9 Terminal Window after Nanoroute

Figure 5.10 (a) Design after Nanoroute

59

Figure 5.10 (b) Design Zoomed In

Figure 5.11 (a) LEF import successful

60

Figure 5.11 (b) DEF import successful

Figure 5.12 (a) Imported Design in Cadence

61

Figure 5.12 (b) Imported Design in Cadence Zoomed in

62

CHAPTER 6

CONCLUSIONS AND FUTURE WORK

A power efficient standard cell library was designed using the 65nm technology by Global

Foundries, and some of the cells contained cells with transistor folding. A complete RTL to GDS flow was implemented using the 65nm technology by Global Foundries. Each step of the flow was successfully completed. A cell library consisting of 16 different cell functions was included with varying drive sizes. Designing larger drive sizes became easier with the use of transistor folding.

The overall cell width reduced significantly with the use of transistor folding. With the usage of transistor folding, as the drive size increases, the cell height remains same while the cell width keeps increasing. A total of 83 cells were designed in the cell library. The drive sizes varied from

0.25x to 16x for simpler cells and as the complexity of the cell design increased, the number of drive sizes reduced. This was done based on the need for a particular function in a cell library. A

D flip-flop and a Scan flip-flop were designed with only one drive size because having these cells with different drive sizes makes no difference, that is, the performance remains the same. XOR2 was designed with NOR2 and AOI21; XNOR2 was designed with NAND2 and OAI21. This was done since fewer transistors are used compared to any other variant. The smaller the number of transistors, the lesser the effective area of the cell. The full adder was designed using the mirror- carry and mirror-sum. This mirror design was the most area efficient design. XOR based full adder was the fastest design, but this design had a greater number of transistors and diffusion breaks, making the effective number of transistors very high compared to the mirror design. Hence, we chose a mirror based full adder over the XOR based full adder. After the design of cells, library characterization was done using the tool Synopsys Siliconsmart ACE. The library was then

63

compiled using lc shell. The verilog netlist was generated using Synopsys Design Vision. A number of different designs were automatically placed and routed using Cadence Encounter. The static timing analysis was performed using Synopsys PrimeTime. As it is apparent, this was completely a design implementation and tool verification-based research. Further work can be done on Static Timing Analysis, that is, power and timing analysis based on changes in clock period and load. In the design of a cell library, more cells with different β ratios can be added to improve the performance if possible.

64

REFERENCES

[1] R. Afonso, M. Rahman, H. Tennakoon and C. Sechen, “Power Efficient Standard Cell Library Design,” Proceedings of the IEEE Dallas Circuits and Systems Workshop, Dallas, Texas, Oct. 2009.

[2] Jordi Cortadella, “Area-Optimal Transistor Folding for 1-D Gridded Cell Design,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 32, No. 11, Nov. 2013.

[3] Gustavo H. Smaniotto, Matheus T. Moreira, Adriel M. Ziesemer Jr., Felipe S. Marques, Leomar S. da Rosa Jr., “Toward Better Layout Design in ASTRAN CAD Tool by Using an Efficient Transistor Folding,” 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS), Abu Dhabi, UAE, October 2016.

[4] T. W. Her and D. F. Wong, “Cell Area Minimization by Transistor Folding,” Design Automation Conference, with EURO-VHDL '93. Proceedings EURO-DAC '93., European, , 1993.

[5] Avaneendra Gupta, John P. Hayes, “Optimal 2-D Cell Layout with Integrated Transistor Folding,” IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers, 1998.

[6] Jaewon Kim, S.M. Kang, “An Efficient Transistor Folding Algorithm for Row-based Cmos Layout Design,” Proceedings of the 34th Design Automation Conference, June 1997.

[7] Gustavo H. Smaniotto, João J. S. Machado, Matheus T. Moreira, Adriel M. Ziesemer, Felipe S. Marques, Leomar S. da Rosa, “Optimizing Cell Area by Applying an Alternative Transistor Folding Technique in an Open Source Physical Synthesis CAD Tool,” 2016 IEEE 7th Latin American Symposium on Circuits & Systems (LASCAS), Mar. 2016.

[8] K.S. Berezowski, “Transistor Chaining With Integrated Dynamic Folding for 1-D Leaf Cell Synthesis,” Proceedings Euromicro Symposium on Digital Systems Design, Sept. 2001.

[9] C. Lursinsap, D.D. Gajski, “A Technique for Pull-up Transistor Folding,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume: 7, Issue: 8, Aug 1988.

[10] M. Rahman, R. Afonso, H. Tennakoon and C. Sechen, “Design Automation Tools and Libraries for Low Power Digital Design,” Proceedings of the IEEE Dallas Circuits and Systems Workshop, Dallas, Texas, 2010

65

[11] B. P. Bhuvana, B. R. Manohar, V. S. Kanchana Bhaaskaran, “Standard Cell Characterization for Reversible Logic,” International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE) Sept. 2016

66

BIOGRAPHICAL SKETCH

Vibhav Salimath was born in Gulbarga, in the state of Karnataka in India on November 28, 1993.

He is the son of Girija and Kumarswami Salimath. In 2015, he earned his bachelor’s degree in

Electronics and Communication from PES Institute of Technology, Bangalore. After completing his undergraduate studies, he worked as an intern in the startup Green Robot Machinery for a year.

In fall 2016, he entered the master’s degree program in Electrical Engineering at The University of Texas at Dallas.

67

CURRICULUM VITAE – VIBHAV SALIMATH

Education

Master of Science in Electrical Engineering (Master’s with Thesis) Aug. 2016 – May 2018 The University of Texas at Dallas, Richardson, TX GPA: 3.75 Concentration: Digital Systems Course work: VLSI Design (EECT 6325), (EEDG 6304), Advanced Digital Logic (EEDG 6301), Analog IC Design (EECT 6326), RF & Microwave Systems Engineering (EERF 6395), RF & Microwave Circuits (EERF 6311), Microprocessor Systems (EEDG 6302), Research in Electrical Engineering (EEGR 8v70)

Bachelor of Engineering in Electronics and Communication Engineering Sept. 2011 – May 2015 PES Institute of Technology, Bangalore, India CGPA: 8.39 Honor: Distinction awards for all the semesters

Research Experience

Research in Electrical Engineering (Masters with Thesis) Sept 2017 – present Standard Cell Library Design with Transistor Folding using 65 nm Technology by Global Foundries - Designing the entire cell library with Transistor folding, this will help reduce the cell area and power consumption significantly. - Designed the cell schematic and layouts using Cadence for Inverter, buffer, NAND2, NAND3, NOR2, AOI21, AOI 22, OAI21, OAI 22, MUX 2:1, Mirror Carry, Mirror Sum, XOR, XNOR, DFF, DFF with scan (negative edge triggered DFF configurations) - Each cell was designed with 12 different folding i.e. 0.25x, 0.5x, 1x, 2x, 3x, 4x, 5x, 6x, 7x, 8x, 12x, 16x for analysis. - Generated the spice files and used H-Spice to observe their responses. Found the minimum Energy-Delay- Product(EDP). - Characterized the Library using Siliconsmart ACE. - Static Timing Analysis(STA) using Synopsys Primetime. - Automatic Place and Route using Encounter. - Advisor for Master Thesis - Dr. Carl Sechen.

Green Robot Machinery Pvt. Ltd., Bangalore, India. Sept 2015 - Apr 2016 - Position: Research Intern - design and development of hardware and software for IoT based Product. - Devised an instrument named ‘Smart Irrigator’ with Wifi technology using Mediatek’s Linkit One board and Spark’s Particle Photon board. - Devised RFID Data Logger, and Temperature & Humidity Data Logger using MQTT and MongoDB. - Proposed a solution to build an end-to- end technology based ticketing solution that will eliminate the challenges and add efficiency to the event manager. - Developed a prototype to monitor and maintain various parameters such as temperature, humidity in the IVF samples to avoid the thermal shock which the sample experiences when it is moved out from the cold storage devices using Mediatek’s Linkit One board.

68

Academic and Research Projects: SRAM Design and Layout using Global Foundries 65nm Technology Jun 2017 – Aug 2017 - Design and Layout of 256 word 6T-SRAM with sizing of components. - Designed the with least possible area, and then designed memory array, Row Decoder and Column Decoder, Write Driver, the sizing was done using Logical Effort. - Designed Pre- charge, Current mode Sense Amplifier, Clock buffer, T – gate for the SRAM cell. - SRAM was designed for best read & write times, minimum area and power. - Used Cadence Virtuoso, Calibre DRC, Calibre LVS and HSPICE Simulation.

Arithmetic Logic Unit using Finite State Machine Sept. 2016 – Oct. 2016 - Designed and Implemented 32-bit Arithmetic Logic Unit using Finite State Machine with the use of Verilog HDL. - Implemented and completed the synthesis, Library Characterization, Floor planning, Placement, Routing and Static timing analysis using Synopsys and Cadence tools. - Used these tools: Xilinx ISE, Design Vision, WaveView, Siliconsmart Library Compiler, HSPICE, Cadence for layout (IBM 130 nm Tech), PrimeTime.

Implementation of CAD Tools to design layout and verify the functionality of cells using IBM 130 nm Technology Nov. 2016 - Designed the cell schematic and layouts using Cadence for Inverter, NAND2, NOR2, MUXI2:1, DFF, OAI and AOI, negative edge triggered DFF configurations. - Generated the spice files and used H-Spice to observe their responses. Found the minimum Energy-Delay- Product(EDP). - Characterized the Library using Siliconsmart ACE. - Static Timing Analysis(STA) using Synopsys Primetime.

Vehicular Pre-Crash Safety System Feb 2017 – Apr 2017 - Simulated an RF System on VSS-AWR which would provide a Communication Link and Radar Channel to detect the relative positions between cars. The communication link frequency was 12-14GHz whereas the r adar frequency was 76-77GHz.

Temperature Logging Using MQTT and MongoDB Feb. 2016 – Apr. 2016 - Designed a Temperature and Humidity Monitoring system which senses the data using the DHT11 sensor connected to the MediaTek LinkIt One. Using the MQTT broker called Mosquitto, the data is sent to MQTT server from the LinkIt One. This data from MQTT Server is sent to MongoDB (DataBase).

RFID Data Logger Nov. 2015 – Jan. 2016 - Designed a RFID data logger which reads the data in the card using the MFRC522 card-reader connected to the MediaTek LinkIt One. Using the MQTT broker called Mosquitto, the data is sent to MQTT server from the LinkIt One. This data from MQTT Server is sent to MongoDB (DataBase).

Smart Farming using Zigbee and GSM Jan. 2015 – May 2015 - Devised an instrument helpful to farmers which keeps updating about the moisture level of the soil. - Used by a Farmer to control the water sources such as pumps in his farm sitting in his home.

69

Design of antenna using HFSS (High Frequency Structural Simulator) Aug. 2014 – Dec. 2014 - Designed antennas, and complex RF elements. - Included filters, packaging and transmission lines. - Used as a part of Pi-Sat, the satellite which was launched recently from ISRO.

Arduino control using Android Mar. 2014 – May 2014 - Demonstrated the use of Bluetooth module and MIT’s app inventor to establish a wireless serial link between android phone and Arduino board. Implemented this further to control household appliances using a relay.

Beta testing of next-gen oscilloscope, Tektronix. Sept. 2013 – Nov.2013 - Worked on all features and functions of the device to give a valuable feedback for the enrichment of the quality of the device. Took initiative of conducting new experiments, to explore the devices in detail and submit a report to the industry people.

Technical Skills

- Programming languages: C, C++, Java, Python, Verilog HDL, Embedded C, MongoDB, Assembly Programming. - Simulation Tools: Cadence: Virtuoso Design Suite, SoC Encounter; Synopsys: Design Compiler, IC Compiler; Matlab, Multisim, VirSIM, Hspice, Keil uVision, Matlab, Xilinx, Tanner EDA, Verilog HDL, FPGA, VSS- AWR (Microwave office), SIMULINK, Wireshark, Cisco Packet Tracer. - Operating Systems: Windows, familiarity with Linux (Ubuntu). - Document preparation system: LaTeX

Industrial Visits

- ISRO Base station, Bangalore. October 2014: First hand insight into the first ever successful Indian Mars Orbiter Mission (MOM) or ‘Mangalyaan’ that entered the Mars orbit on 24th September 2014 from the scientists involved in the program. - ISRO Telemetry Tracking and Command networking station, Bangalore. April 2014: It was a deep space station visited to get familiarized with satellite communications and monitoring the satellites.

Other Academic Achievements/ Exposure

- Highest grades in RF & Microwave Systems Engineering, RF & Microwave Circuits, and Advanced VLSI Design at UTD. - Participated in Two-day workshop on Network Implementation and Security conducted by Association for Computing Machinery (IIT, Delhi), learnt all the basics of Networking, Creating Networks, Configuring CISCO Routers and Switches and has worked on software like Packet Tracer, Wireshark. - Attended the Two-day workshop on Swarm Robotics conducted by BOTRIO.com and have built a robot and do various jobs which include that in-Home Automation, Convoy of Vehicles, Group of Robots performing a task

70