Standard Cell Library Design and Optimization with CDM for Deeply Scaled FinFET Devices.

by

Ashish Joshi, B.E

A Thesis

In

Electrical Engineering

Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of

MASTER OF SCIENCES IN ELECTRICAL ENGINEERING

Approved

Dr. Tooraj Nikoubin Chair of Committee

Dr. Brian Nutter

Dr. Stephen Bayne

Mark Sheridan Dean of the Graduate School

May, 2016

© Ashish Joshi, 2016

Texas Tech University, Ashish Joshi, May 2016

ACKNOWLEDGEMENTS

I would like to sincerely thank my supervisor Dr. Nikoubin for providing me the opportunity to pursue my thesis under his guidance. He has been a commendable support and guidance throughout the journey and his thoughtful ideas for problems faced really been the tremendous help. His immense knowledge in VLSI designs constitute the rich source that I have been sampling since the beginning of my research. I am especially indebted to my thesis committee members Dr. Bayne and Dr. Nutter. They have been very gracious and generous with their time, ideas and support. I appreciate Dr. Nutter’s insights in discussing my ideas and depth to which he forces me to think. I would like to thank Texas Instruments and my colleagues Mayank Garg, Jun, Alex, Amber, William, Wenxiao, Shyam, Toshio, Suchi at Texas Instruments for providing me the opportunity to do summer internship with them. I continue to be inspired by their hard work and innovative thinking. I learnt a lot during that tenure and it helped me identifying my field of interest. Internship not only helped me with the technical aspects but also build the confidence to accept the challenges and come up with the innovative solutions. I have the great pleasure working with Dr. Li. He helped me understanding the intricacies of the Analog Designs which further strengthen my interest towards mixed signal design and verification. His guidance to his students goes well beyond the regular duty of the course instructor. I am highly indebted and thankful to my family, Dr. Surinder Kumar Joshi, Mrs. Renu Joshi and Ashinder Joshi for their continual moral support, encouragement and confidence in me, without whom it was not possible. Lastly, to all my friends, thank you for your understanding and encouragement in many moments of crisis. I cannot list all the names here, but you are always on my mind.

ii

Texas Tech University, Ashish Joshi, May 2016

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ...... ii

TABLE OF CONTENTS ...... iii

ABSTRACT ...... vi

LIST OF TABLES ...... vii

LIST OF FIGURES ...... viii

1. INTRODUCTION ...... 1

1.1 Motivation ...... 1 1.1.1 Why do need Low Power ...... 1 1.1.2 Why Improve the Based Design flow ...... 1 1.2 Contribution of the Thesis ...... 2 1.3 Organization of the Thesis ...... 2 2. DIGITAL DESIGN FLOW ...... 3

2.1 Full Custom Design Flow ...... 3 2.2 Semi-Custom Design Flow ...... 4 2.2.1 Introduction ...... 4 2.2.2 ASIC Cell based design flow ...... 4 2.2.3 Advantages and Limitations of ASIC ...... 5 2.2.4 Application and Trends ...... 5 2.2.5 Standard Cell library design in Industry ...... 6 2.3 Power Dissipation in the CMOS Circuits ...... 6 2.3.1 Dynamic Power ...... 6 2.3.1.1 Hazards and Glitch Power ...... 7 2.3.2 Short Circuit Power Dissipation ...... 8 2.3.3 Leakage Power Dissipation ...... 9 3. FinFET vs PLANER BULK MOSFET DEVICES ...... 12

3.1 I-V Characteristics ...... 13 3.2 Drain Induced Barrier Lowering ...... 14 3.3 Subthreshold Swing ...... 15 4. CDM LOGIC STYLE ...... 17

4.1 CDM with Complementary Outputs Cells ...... 17

iii

Texas Tech University, Ashish Joshi, May 2016

4.2 Feedback and Correction Mechanism ...... 18 4.3 Performance comparison in CDM and CMOS for complementary outputs . 19 4.3.1 AND-NAND gate implemented in CDM and CMOS ...... 19 4.3.2 OR-NOR gate implemented in CDM and CMOS Logic Style ...... 22 4.3.3 3-Input AND-NAND gate in CMOS and CDM Logic Style ...... 25 4.3.4 3-Input NOR-OR gate implemented in CMOS and CDM Logic Style . 28 4.4 Power Saving with CDM ...... 30 4.5 CDM with Single Output Cells ...... 31 4.5.1 AND Gate...... 33 4.5.2 OR gate...... 34 4.5.3 3-Input AND gate in CDM single output logic style ...... 36 4.5.4 3-Input OR Gate ...... 37 4.5.5 Half Comparison ...... 39 4.5.6 Full Adder Comparison ...... 40 4.5.7 4:2 Compressor comparison ...... 42 4.5.8 4 bit by 4 bit Multiplier ...... 43 4.6 Data Analysis ...... 45 5. SINGLE OUTPUT CDM STANDARD CELL LIBRARY DESIGN ...... 47

5.1 Standard Cell Library Design Flow ...... 49 5.2 Benchmark Circuits ...... 50 5.3 Synthesis Results with CDM standard cell library ...... 51 5.4 Data Analysis ...... 54 5.5 Binary to BCD Converter ...... 56 5.5.1 Binary to BCD converter in CBLD ...... 57 5.5.1.1 Introduction ...... 57 5.5.1.2 Binary to BCD converter architecture ...... 58 5.5.1.3 Synthesis Results ...... 59 5.5.1.4 Synthesis Results with CDM standard cell library ...... 63 6. CONCLUSION AND FUTURE WORK...... 66

Appendix A ...... 68

DESIGN COMPILER SCRIPT ...... 68 C-CMOS LOGIC CELL ...... 69 CDM LOGIC CELL NETLIST ...... 70

iv

Texas Tech University, Ashish Joshi, May 2016

SILICON SMART STANDARD CELL LIBRARY CHARACTERIZATION SCRIPT ...... 76 References ...... 80

v

Texas Tech University, Ashish Joshi, May 2016

ABSTRACT

In this thesis, we propose the new methodology to implement the logic cells with complementary and single output. Logic cells has been designed in CDM logic style for the single and complementary outputs. The CDM logic style has been analyzed and compared with the conventional CMOS logic style with the FinFET devices in super- threshold operation. Comprehensive study of the performance parameters of the logic cells with single and complementary outputs implemented in CDM and C-CMOS logic style has been done in this thesis. Standard cell library with FinFET logic gates in CDM and static CMOS logic style has been developed in various selected technologies (7nm, 10nm, 14nm, 16nm & 20nm) and used to synthesize the ISCAS’85 benchmark designs to evaluate the performance improvement. silicon smart and library compiler tool has been used to generate the standard cell libraries using FinFET device models from PTM and design compiler to synthesize the designs with developed standard cell libraries. The simulation results shows that CDM based standard cell library achieve the average power improvement of 17-21% and average PDP improvement of 7-26% for all benchmark designs compared with conventional CMOS standard cell library in 7nm, 10nm, 14nm, and 16nm & 20nm technology node respectively. Hence we demonstrated that our low power standard cell design is comparable to the contemporary custom design optimization techniques used to save power in the design.

vi

Texas Tech University, Ashish Joshi, May 2016

LIST OF TABLES

4.1. Performance parameter for AND/NAND in CDM & C-CMOS ...... 22 4.2. Performance parameter for OR/NOR in CDM & C-CMOS ...... 25 4.3. Performance parameter for 3 input AND/NAND in CDM & C-CMOS ...... 28 4.4. Performance parameter for 3 input OR/NOR in CDM & C-CMOS ...... 30 4.5. Performance parameter for 2 input AND in CDM & C-CMOS ...... 34 4.6. Performance parameter for 2 input OR in CDM & C-CMOS ...... 35 4.7. Performance parameter for 3 input AND in CDM & C-CMOS ...... 37 4.8. Performance parameter for 3 input OR in CDM & C-CMOS ...... 38 4.9. Performance parameter for half adder in CDM & C-CMOS ...... 40 4.10. Performance parameter for full adder in CDM & C-CMOS...... 41 4.11. Performance parameter for 4:2 compressor in CDM & C-CMOS ...... 43 4.12. Performance parameter for 4 bit multiplier in CDM & C-CMOS ...... 45 5.1. Logic functions from Single level Single output CDM basic cell ...... 48 5.2. ISCAS’85 Benchmark Designs ...... 51 5.3 Synthesis Results with 7nm Standard Cell Library ...... 51 5.4 Synthesis Results with 10nm Standard Cell Library ...... 52 5.5 Synthesis Results with 14nm Standard Cell Library ...... 52 5.6 Synthesis Results with 16nm Standard Cell Library ...... 53 5.7 Synthesis Results with 20nm Standard Cell Library ...... 53 5.8. Post-Layout synthesis results with 90nm technology node...... 60 5.9. Pre-Layout Synthesis with 90, 45 & 32nm technology node...... 60 5.10. Pre-Layout synthesis with 14(CMOS), 7 & 5nm (FinFET) technology...... 61 5.11. Power dissipation with CDM and CMOS in 7nm technology node ...... 63 5.12. Power Delay Product with CDM and CMOS in 7nm ...... 64

vii

Texas Tech University, Ashish Joshi, May 2016

LIST OF FIGURES

2.1. Full-Custom Design Flow ...... 3 2.2. ASIC Design Process [20] ...... 5 2.3. An example of the static hazard ...... 7 2.4. Effect of load capacitance on short circuit power dissipation ...... 9 2.5. Short Circuit Energy Dissipation vs Input rise/fall time ...... 9 2.6. Leakage Power in the inverter before occurrence of the transition at Input...... 11 2.7. Leakage Current vs Drain voltage characteristic of the MOSFET ...... 11 3.1. Different structures of the FinFET Devices ...... 13 3.2. I-V characteristics of the bulk CMOS (22nm) and FinFET (20nm) devices ...... 13 3.3. Ion/Ioff variation for CMOS and FinFET with different supply voltages ...... 14 3.4. Drain Current vs Gate Source Voltage for the FinFET and bulk CMOS ...... 14 3.5. Drain current versus Gate source voltage while Vds=VDD [13] ...... 15 3.6. Gate Dielectric tunneling current in NMOS bulk planer device...... 16 4.1. Basic Cell representation in CDM logic Style [4] ...... 17 4.2. Different feedback circuits to get full swing outputs [4] ...... 19 4.3. Schematic of AND-NAND in CDM logic Style ...... 19 4.4. Schematic of NAND-AND in CMOS logic Style...... 20 4.5. Test Bench for NAND/AND implemented in CMOS and CDM logic style ...... 20 4.6. Output NAND/AND waveforms for CDM logic gate...... 21 4.7. Output waveforms from CMOS NAND-AND gate...... 21 4.8. PDP between CMOS and CDM with varying load capacitance...... 21 4.9. Delay vs Power plot for AND-NAND in CMOS and CDM logic style...... 22 4.10. Schematic of NOR-OR in CDM Logic Style ...... 22 4.11. Schematic of NOR-OR in CMOS Logic Style ...... 23 4.12. Test Bench for performance comparison between CMOS and CDM...... 23 4.13. Output Waveforms for the CDM Logic Style ...... 24 4.14. Output Waveform for CMOS logic Style ...... 24 4.15. PDP between CMOS & CDM OR-NOR with varying load capacitance ...... 24 4.16. Delay vs Power plot for NOR-OR in CMOS and CDM logic style...... 25 4.17. Schematic of 3 input NAND-AND in CDM Logic Style ...... 25

viii

Texas Tech University, Ashish Joshi, May 2016

4.18. Test Bench for the 3 input NAND-AND in CMOS and CDM Logic Style ...... 26 4.19. Output Waveforms for 3 input CDM NAND-AND logic gate...... 26 4.20. Output Waveforms for 3 input CMOS NAND-AND logic gate...... 27 4.21. PDP for 3-NAND-AND in CMOS and CDM with varying load cap...... 27 4.22. Delay vs Power comparison for 3 Input CMOS and CDM AND-NAND...... 27 4.23. Schematic of 3 input NOR-OR gate in CDM logic Style ...... 28 4.24. Test Bench for 3 input CMOS and CDM NOR-OR gate ...... 29 4.25. Output waveforms from the CDM NOR-OR Logic cell...... 29 4.26. Output Waveforms from CMOS NOR-OR Logic cell...... 29 4.27. PDP for the CMOS and CDM NOR-OR with varying load capacitance ...... 29 4.28. Delay vs Power comparison for CMOS and CDM OR-NOR...... 30 4.29. Single Output CDM basic cells (a) Single Level (b)-(d) Two Level ...... 32 4.30. Schematic of AND gate with CDM single output logic style ...... 33 4.31. Test Bench for CMOS and CDM AND gate ...... 33 4.32. Output Waveforms from CMOS and CDM AND gate ...... 34 4.33. Schematic of the OR gate in the single output CDM logic style ...... 34 4.34. Test Bench for CMOS and CDM OR gate...... 35 4.35. Output waveforms from the CMOS and CDM OR gate...... 35 4.36. Schematic of 3 input AND gate in CDM single output logic style...... 36 4.37. Test Bench for CMOS and CDM 3 Input AND gate...... 36 4.38. Output waveforms from 3 input AND gate in CDM and CMOS...... 37 4.39. Schematic of 3 Input OR gate in CDM single output logic style...... 37 4.40. Test Bench for 3 Input CMOS and CDM OR gate...... 38 4.41. Output Waveforms from 3 Input OR gate in CMOS and CDM...... 38 4.42. Schematic of Half Adder in single output CDM logic style ...... 39 4.43. Test Bench for CDM and CMOS half adder ...... 39 4.44. Output waveforms from the half adder in CMOS and CDM logic style...... 40 4.45. Schematic of Full Adder in CDM single output logic style...... 40 4.46. Test Bench for CDM and CMOS Full adder...... 41 4.47. Output waveforms from the Full adder in CMOS and CDM...... 41 4.48. Schematic of 4:2 compressor in single output CDM logic style...... 42 4.49. Test Bench for CMOS and CDM 4:2 compressor design ...... 42

ix

Texas Tech University, Ashish Joshi, May 2016

4.50. Output Waveforms from 4:2 compressor in CMOS and CDM...... 43 4.51. Schematic of 4 bit by 4 bit multiplier in CDM logic style...... 44 4.52. Test Bench for CDM and CMOS 4 bit by 4 bit multiplier...... 44 4.53. Output Waveforms from the Multiplier in CMOS logic style...... 44 4.54. Output Waveforms from the Multiplier in CDM logic style ...... 44 5.1. CDM Logic Cells (a) Single Level (b)-(d) Two Level ...... 47 5.2. Standard Cell Library Design Flow ...... 49 5.3.ISCAS-85 c6288 16x16 multiplier...... 54 5.4. Full adder module for ISCAS-85 c6288 16x16 multiplier ...... 55 5.5. Power Improvement with CDM over CMOS standard cell libraries ...... 56 5.6. PDP Improvement with CDM over CMOS standard cell libraries ...... 56 5.7. Binary to BCD converter design with CBLD algorithm ...... 59 5.8. Normalized Pre-Layout synthesis with 90nm technology...... 61 5.9 Normalized Post-Layout Synthesis with 90nm technology...... 62 5.10 Pre-Layout delay result for 14, 7 and 5nm technology node...... 62 5.11 Pre-Layout PDP results for 14nm (), 7nm & 5nm (FinFET)...... 63 5.12 Power Dissipation with CDM and CMOS in 7nm technology...... 64 5.13. Power Delay Product with CDM and CMOS in 7nm technology...... 65

x

Texas Tech University, Ashish Joshi, May 2016

CHAPTER 1

INTRODUCTION The primary contribution of this work is the low power driven standard cell library based design methodology. We have worked on designing the standard cell library with the new logic style. The results obtained are comparable to power saving figures from various glitch reduction methodologies tailored for the full custom design flow, thus reducing the performance gap between the two design styles. 1.1 Motivation 1.1.1 Why do need Low Power The Continual decrease in the feature size, corresponding increase in the device density and high operating frequencies have made the power consumption a major concern in the VLSI Design. Excessive power consumption in the integrated circuits discourages their use in the portable systems. Excessive power consumption also results in overheating resulting in decrease in the reliability and lifetime of the chip. To control the temperature levels within the chip, specialized cooling and packaging techniques are used thereby further increasing the chip cost. The growing need for the portable communication and computing systems has increased the need for the power optimization within the chip. Hence the low power design is the critical technology required in the semiconductor industry today. Simultaneously, we need to decrease the critical path delay while reducing the overall power consumption of the chip. 1.1.2 Why Improve the Standard Cell Based Design flow The standard cell design is semi-custom design styles that is based on the set of the prefabricated standard cells. The design flow used the highly automated synthesis and tools that uses the highly optimized advanced algorithms. This reduces the manual efforts required to complete the design in silicon. Existing semi-custom design flow doesn’t leave any flexibility to optimize for power consumption by reducing the glitch power. Hence there is strong need to design the standard cell library with cells

1

Texas Tech University, Ashish Joshi, May 2016

having non-skewed (balanced) output to minimize the glitches and their propagation within the design to minimize the power consumption. 1.2 Contribution of the Thesis In this thesis, we have successfully designed the standard cell library with cells having balanced outputs (for multi output cells) with new logic style called as CDM, hence minimizing the glitches within the design. We have applied the proposed technique to ISCAS’85 benchmark circuits and found that our methodology is capable of producing the minimum transient energy design. Standard cell library has been designed with the FinFET device models from PTM based on BSIM-CMG on five different technology nodes (7nm, 10nm, 14m, 16nm & 20nm) and has been used to synthesize the benchmark designs. Simulation results has shown the power improvement of 20-21% and Energy Improvement (PDP) of 7-21 % compared with the standard cell libraries designed with C-CMOS logic style on the same technology nodes. 1.3 Organization of the Thesis A detailed explanation of the thesis work is provided in the following six chapters.Chapter-2 reviews the basic digital design flows and main sources of the power consumption in CMOS digital ICs. Chapter 3 demonstrates the advantages of the FinFET devices over the planer bulk CMOS devices with short channels and hence the reason for using the FinFET devices in designing the standard cell library for technology nodes like 7nm, 10nm etc. In Chapter 4, the new logic style Cell Design Methodology (CDM) has been introduced. Cell has been designed in CDM logic style with both complementary outputs and Single outputs and compared with their C-CMOS implementation. Power and PDP improvement obtained for each cell in CDM over C- CMOS has been shown with various simulation results. Chapter 5 describes the flow used for the standard cell library design, experimental setup to prove the proposed concept and presents the results from the ISCAS’85 benchmark circuits. Finally Chapter-6 presents the conclusion from our experiments and proposed future work.

2

Texas Tech University, Ashish Joshi, May 2016

CHAPTER 2

DIGITAL DESIGN FLOW 2.1 Full Custom Design Flow In the full custom design flow, the design is divided into smaller submodules and each submodule is designed at the level. W/L of each transistor is decided optimally and other parameters to improve and meet the required module level specifications. Full Custom design flow provides the designer full control over all the design parameters and highly efficient designs can be completed with full custom design flow. Even today, the full custom design flow is used for the building of Analog Blocks and Analog ICs. Full Custom design flow can provide us highest performance but has longer development time. For completing the design for the application specific quickly thereby reducing the time to market, the cell based (Semi-Custom) design flow is gaining more attraction. Steps involved in full custom design flow are shown in Fig.2.1

Fig.2.1. Full-Custom Design Flow

3

Texas Tech University, Ashish Joshi, May 2016

2.2 Semi-Custom Design Flow 2.2.1 Introduction Cell based Design is widely adopted design approach in the current application specific Integrated Circuits (ASIC) and System on Chip (SoC) designs. Standard cell libraries are the critical component in the cell based design flow. A standard cell library is the collection of the primitive gates that are used in the process of synthesizing the behavioral RTL to the gate level netlist for the cell based designs. The use of the standard cell library offers shorter time to development, have minimum errors in the design and easier to maintain. 2.2.2 ASIC Cell based design flow An ASIC design flow typically starts with the VHDL or description of the design. The description is then synthesized into the gate level netlist and placed and routed to generate the layout. Standard cell are the building blocks for the gate level netlist and the layout. The design described in the Verilog is first simulated for the verification and if it meets the design specifications, synthesized with the standard cell library into the gate level netlist and verified again for the functionality. If the gate level netlist meets the design specifications, the netlist is imported into Cadence integrated circuit front to back (ICFB) as the schematic view and cadence SoC encounter to generate the place and route view. Once the layout is imported into cadence SoC encounter, the physical verification consisting of DRC checks and LVS matching is done to verify if the synthesized netlist matches the layout. The I/O pads are then added manually to the design and submitted to the foundry for fabrication. Power analysis, STA analysis can also be done after the design synthesis stage.Fig.2.2 shows the complete flow of the ASIC design process as follows:

4

Texas Tech University, Ashish Joshi, May 2016

Fig 2.2. ASIC Design Process [20] 2.2.3 Advantages and Limitations of ASIC Cell based ASIC top –down approach shortens the design time compared with the full custom design and promotes the design reuse to lower the cost. However, the ASIC design performance cannot match the full custom design performance in terms of speed. Microprocessors designed with full custom approach can work at much higher frequencies relative to the microprocessor design using ASIC/Semi-custom design flow. 2.2.4 Application and Trends ASIC designs has been used in the multiple applications including home appliances, telecommunications, medical applications etc. The increased demands for the more capability in the small designs with shorter time for development will metamorphose ASIC design methodology with single processor specific to some application to multiple systems with processor on chip. The new generation of ASIC is called as SoC, the new trend i.e., nowadays leading the design community. As the designers moving from ASIC to SoC, the standard cell library regains the interest because they are extensively used in both these design approaches. Also, they remain the basic building blocks in both mentioned design approaches.

5

Texas Tech University, Ashish Joshi, May 2016

2.2.5 Standard Cell library design in Industry Independent of the methodology adopted, IC design companies uses the standard cell library to reduce the design time and reusability of the designs .Due to the large resources needed to develop the standard cell library, they use the standard cell library developed by other companies like Synopsys, Nangate etc. Some of the companies develop their own standard cell for the internal uses. At the logic level, the design is in the form of the network of the logic gates and their interconnections. This is the good representation of the design but cannot be used to determine the performance precisely. Library mapping binds the logic level netlist to the cells available in the standard cell library that includes the primitive gates like AND, NAND, NOR, OR etc. The design representation is still in the form of the network but now consists of the standard cells with known characteristics which enables to analyze the performance of the design accurately since the power, delay and area information for each of the standard cell is well known. At this level of abstraction, we can get the estimate but more precise will be from physical level which includes information about parasitics too. Hence followed by the physical design (placement and routing) called as back end can change the design performance drastically. 2.3 Power Dissipation in the CMOS Circuits There are three main sources for the power dissipation in the CMOS Circuits: • Dynamic Power • Short Circuit Power • Leakage Power 2.3.1 Dynamic Power The Dynamic Power in the CMOS Circuits us due to the charging and discharging of the load capacitance driven by gate. This capacitance consists of the wiring capacitance of the fanout net and the capacitance of the gate terminals of the controlled by the fanout net. The Power dissipation can be calculated by the following equation: 1 (1) = 2 2 𝑃𝑃𝑑𝑑𝑑𝑑𝑑𝑑 𝐶𝐶𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑉𝑉𝑑𝑑𝑑𝑑 𝑓𝑓 𝐷𝐷

6

Texas Tech University, Ashish Joshi, May 2016

Where

• Pdyn : Dynamic Power Dissipation of the Gate

• Cload: Load capacitance of the Gate

• Vdd: Power Supply • f: Clock frequency • D: Switching probability The dynamic power dissipation is thus proportional to the number of the transitions occurring at the gate. Thus an accurate estimation of switching probability in the circuit provide the estimate of the dynamic power dissipation. In the earlier technologies, dynamic power dissipation accounts for most of the power dissipation within the circuit. But with the advent of the deep sub-micron technologies, the other components of the total power consumption are also becoming significant. Dynamic Power can be classified into necessary switching activity for the correct functionality and unnecessary transitions due to the unbalanced paths and skewed outputs (unbalanced outputs) from the cells in the circuit. The latter component of the dynamic power is called as glitching power and is explained in the next section. 2.3.1.1 Hazards and Glitch Power Before signal of the digital circuit reach the steady state, gates can have multiple transitions. Since the power consumed is proportional to the number of the transitions, the unnecessary transitions increases the power dissipation. These unnecessary transitions are called as glitches or hazards. Glitches happen in the circuit due to unequal arriving time of the signal to the input gates. Glitch power contribute significantly to the overall power dissipation in some of the typical cases like adders etc.

Fig 2.3. An example of the static hazard

7

Texas Tech University, Ashish Joshi, May 2016

Consider the example of Fig.2.3 with each gate having one unit of delay. Due to the unequal arriving time of the inputs to the AND gate, the output of the AND gate shows the glitch and transmits the pulse of 1 unit width, which equal the inverter delay. This is known as the Static Hazard. 2.3.2 Short Circuit Power Dissipation Short circuit power dissipation occurs when the gate switches. During the transition, there is the short time when both nMOS and pMOS conduct. This effect is equivalent to shorting the power supply and ground for the shorter amount of time. The current flowing during these transitions dissipates power called as the Short Circuit Power Dissipation. The value of the short circuit current depends on the value of the capacitance connected to the output of the gate. Consider the example shown in Fig.2.4.For the larger load capacitance, the output fall time is significantly larger than the input rise time and conversely for the low load capacitance, the output fall time is substantially smaller than the input rise time. The amount of the short circuit power dissipation can be calculated by the following formula: (2) = ( 2 ) 12 𝛽𝛽 3 𝜏𝜏 Where𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑆𝑆ℎ𝑜𝑜𝑜𝑜𝑜𝑜 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑉𝑉𝑑𝑑𝑑𝑑 − 𝑉𝑉𝑇𝑇 𝑇𝑇 • Pshort-circuit=Short Circuit Dissipation • β : gain factor of the gate

• Vdd: Power Supply • τ : rise or fall time of the Input Signal • T: Clock period

8

Texas Tech University, Ashish Joshi, May 2016

Fig 2.4. Effect of load capacitance on short circuit power dissipation

Fig 2.5. Short Circuit Energy Dissipation vs Input rise/fall time Fig.2.5 is the plot of the energy dissipation versus the ratio of the input rise/fall time to the propagation delay. It can be observed from the graph that Short Circuit power dissipation increases with increase in the input rise and fall time. For most of the ICs, the short circuit power dissipation is 5-10% of the overall power dissipation.

2.3.3 Leakage Power Dissipation There are types of the leakage currents: reverse bias leakage on the transistors drains, and sub-threshold leakage through the channel of the device. The magnitude of those currents is predominantly determined by the processing technology. However there are some things that designer can lay with to minimize the contribution. Diode leakage current flows when a transistor is turned off and another active device charges up/down the drain with respect to the bulk potential. Consider the inverter in Fig.2.4, when it is given the high input. The pMOS transistor will be off and nMOS

9

Texas Tech University, Ashish Joshi, May 2016

transistor is making the drain to bulk potential –Vdd for the pMOS. The diode current thus flowing through the junction is given by the expression: = (3)

Where 𝐼𝐼 𝐴𝐴𝑑𝑑𝐽𝐽𝑠𝑠 • Ad: area of the diffusion at the junction of drain and body • Js: leakage current density; set by the technology It is desirable to reduce both the quantities. The leakage current density increases with the temperature. The other component of the power dissipation is the Sub-threshold conduction. In the inverter shown the fig. 2.6, when the transistor is turned off there is still current flowing through the channel due to the drain –source voltage Vds and this current is called as subthreshold current. The plot of the drain current with Vds is shown in the fig 2.7 and it exhibits the exponential relation in the sub-threshold conduction region. This is due to the decrease in the threshold voltage with increase in Vds. In other words the width of the drain junction depletion region increases with increase in Vds. This effect is known as Drain Induced Barrier Lowering and causes a sharp increase in the current. The magnitude of the sub-threshold current is the functions of the process, device size and supply voltage and is given by the following formula: (4) = . [ ][1 / ] 2 𝑉𝑉𝑉𝑉𝑉𝑉−𝑉𝑉𝑉𝑉ℎ 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊 ∗ 𝑉𝑉𝑉𝑉 1 8 𝑛𝑛𝑛𝑛𝑛𝑛ℎ −𝑣𝑣𝑣𝑣𝑣𝑣 𝑉𝑉𝑉𝑉 𝐼𝐼 µ0𝐶𝐶𝑜𝑜𝑜𝑜 𝑒𝑒 𝑒𝑒 − 𝑒𝑒 In the above equation𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 that process parameter that effects the value of the sub threshold current is the threshold voltage. Reducing the threshold voltage significantly increases the subthreshold current. The Subthreshold current is proportional to the supply voltage and device size.

10

Texas Tech University, Ashish Joshi, May 2016

Fig 2.6. Leakage Power in the inverter before occurrence of the transition at Input.

Fig 2.7. Leakage Current vs Drain voltage characteristic of the MOSFET

11

Texas Tech University, Ashish Joshi, May 2016

CHAPTER 3

FinFET vs PLANER BULK MOSFET DEVICES As the CMOS devices are shrinking to the nanometer regime, increasing the consequence in short channel effects, subthreshold swing, Gate induced drain leakage and variation in the process parameters which lead to cause the reliability in the circuit as well as performance. To solve all the above issues, FinFET is one of the promising technologies without sacrificing reliability and performance of the design. Moore’s law motivates the technology scaling in order to improve the performance features like power, area, and speed. While the circuits and system takes the inevitable advantage of scaling down the technology, undesired features like Short channel effects (SCE) and sensitivity to the process parameters increases. SCE includes limitation imposed on electron drift characteristics in the channel and threshold voltage variation along with Ion/Ioff reduction and increase in the leakage current have made the use of bulk CMOS transistors in sub 22nm technologies impossible. Also, the leakage current increment increases the static power consumption in the circuits. SCE can be reduced by using the thinner gate oxide, while it lead to higher gate leakage current due to tunneling. It increases the total power consumption and reduce the device reliability. FinFET considered to be the best candidate compared with bulk CMOS in deep sub-micron technologies. The thin silicon fin in FinFET plays the role of the channel and conducts electrons between source and drain. Fig.3.1 shows the different structures of the FinFET. As shown, the channel is surrounded from three sides with the gate results in superior control of the channel. It also reduces the short channel effect due to the fully depleted channel that causes less sensitivity to process variations.

12

Texas Tech University, Ashish Joshi, May 2016

Fig 3.1. Different structures of the FinFET Devices 3.1 I-V Characteristics Fig.3.2 shows the I-V characteristics of the bulk CMOS (22nm) and FinFET (20nm) Transistors when Vgs changed from 0 to 0.9v.It can be shown that under strong inversion region, the level of the ON current for the FinFET is higher than Bulk CMOS and also it has higher output resistance due to fact that channel surrounded with gate from 3 sides providing better control.

Fig 3.2. I-V characteristics of the bulk CMOS (22nm) and FinFET (20nm) devices Fig.3.3 shows the Ion/Ioff ratio versus supply voltage for both the devices. For lower supply voltages Ion/Ioff ratio is higher for the FinFET than CMOS, while for the higher

13

Texas Tech University, Ashish Joshi, May 2016

supply voltages (higher than 0.72v) it is higher for CMOS. It’s due to the fact that for higher supply voltages CMOS has lower Ioff but for lower supply voltages Ioff for CMOS is comparable with FinFET while FinFET have higher Ion current compared with CMOS.

Fig 3.3. Ion/Ioff variation for CMOS and FinFET with different supply voltages 3.2 Drain Induced Barrier Lowering

Fig 3.4. Drain Current vs Gate Source Voltage for the FinFET and bulk CMOS Fig 3.4. Shows the variation of the drain current with Gate to Source voltage for the FinFET and bulk CMOS Devices when Vds is 0.1v and 1.1v.It can be observed from the figure that threshold voltage decreases with increase in the gate source voltage for

14

Texas Tech University, Ashish Joshi, May 2016

the short channel devices. This effect is called as Drain Induced Barrier Lowering. Increase in the drain voltage caused the depletion region at drain to penetrate more into the channel Drain Induced barrier lowering is higher for the bulk CMOS devices (124mV/V) as compared with the FinFET devices (58mV/V).It shows the lower threshold variation due to short channel for the FinFET devices. Another important fact from the figure is the lower threshold voltage for the FinFET devices compared with CMOS devices which is one of the reason for higher Ion/Ioff ratio. 3.3 Subthreshold Swing Fig.3.4 also shows that subthreshold swing for the FinFET is 21% lower than the bulk CMOS at room temperature. Subthreshold Swing of the device is defined as the change in gate voltage required to increase the drain current by a decade. It shows more dependency of the drain current on the gate voltage in the FinFET devices. Hence the drain current increases at faster rate with the change in the gate source voltage for the FinFET devices. 3.4 Gate Induced Drain Leakage Gate current leakage in the nanoscale devices is the biggest concern. Gate Induced Drain Leakage in the bulk CMOS devices happens due to the lateral diffusion of the source and drain regions. For calculating the Gate induced drain leakage the gate voltage is swept from negative to positive voltage values.

Fig 3.5. Drain current versus Gate source voltage while Vds=VDD [13]

15

Texas Tech University, Ashish Joshi, May 2016

It can be shown from the fig.3.5 that behavior of the FinFET devices is different from the bulk CMOS devices for the negative values of the gate source voltages and shows the better GIDL. With negative value of Vgs, the drain current of both the devices decreases but for CMOS, it kept constant till Vgs<-0.1 and increases rapidly at Vgs<- 0.3.Higher negative gate voltages for the bulk CMOS devices results in the band bending at polysilicon, oxide and p well interface as shown in Fig.3.6, resulting electrons from the valence band in p well tunnel to the conduction band of the n+ and increasing the gate leakage current.

Fig 3.6. Gate Dielectric tunneling current in NMOS bulk planer device.

From all the above results, It can be observed that FinFET possess better characteristics compared with the bulk CMOS devices for the short channels. FinFET provide better control over the channel with less leakage and subthreshold conduction which significantly contributes to the overall power dissipation. Also, with the independent gate FinFET devices, we can dynamically change the threshold voltage of the device by connecting the back gate to the reverse voltage further reducing the off state current. Better off state current performance from the FinFET devices is the motivation to design the standard cell libraries in sub-micron technologies with FinFET devices.

16

Texas Tech University, Ashish Joshi, May 2016

CHAPTER 4

CDM LOGIC STYLE Power Reduction is the serious concern nowadays. As the MOS devices are widespread, there is need for the low power designs mainly for the portable devices which run on batteries. The CDM is the better way to implement the circuits designed for the low power applications. CDM Logic style can have single or double complementary outputs. The complementary outputs from the CDM logic cells are balanced with each other, hence resulting in no glitches since both the complementary outputs are available at the same instant. It has also been observed that cells with multiple outputs (ex. half adder, full adder) in the CDM logic style have non-skewed outputs [9]. Non-skewed outputs helps in minimizing the glitch propagation and hence saving power. We have analyzed the primitive logic cells performance with complementary outputs and single outputs implemented in the CDM logic style and compared with their C-CMOS logic style implementation. CDM logic style allows the inputs to be tied to the source and drain of the transistors, thus creating the possible situations where NFET has to drive logic 1 and PFET has to drive logic 0.Since NFET is not the good pull-up device, the output voltage will suffer from threshold voltage drop, therefore different feedback mechanism has been devised to achieved the full swing output with sufficient drive capability. 4.1 CDM with Complementary Outputs Cells The following Fig 4.1. shows the typical CDM design with complementary outputs.

Fig 4.1. Basic Cell representation in CDM logic Style [4]

17

Texas Tech University, Ashish Joshi, May 2016

In the process of designing balanced complementary circuits we face two independent inputs and two complementary outputs. In the elementary basic cell which has been presented in Fig. 4.1 we present four elements, deciding two outputs (each output includes two elements).Each element is a transistor and has two input controls, i.e., the gate and either the drain or the source. The input signals (applied to the two input terminals of these transistors) and the selection of pMOS and nMOS transistors decide various output states. As presented in Figure 1 we refer to the input pins (IN1 to IN4) as A or B, or their complements respectively. We also assume that pins G1 to G4 can also be A or B or their complements. This form of the circuit (as the elementary basic cell) is power-less and ground-less (P-/G-).Therefore, the complementary outputs are only affected by input drivability and are charged or discharged. 4.2 Feedback and Correction Mechanism All circuits with complementary outputs have the ability to optionally determine the state of an output or amplify it through the use of another output and a suitable transistor. Transistor or transistors which are placed between the two outputs to influence the second output through activating the first one, are called feedback networks. This feedback network is placed between the two complementary outputs and causes the high impedance output states to be eliminated and replaced by the desired levels. Also, it is possible to ensure full swing operation at the outputs. As different basic cell versions presented in this work come with different short comings, the required feedback network should be different. In Fig. 4.2 we present four such networks: Fp, Fn, Fc and Fnp. Fp is a feedback network using two pMOS transistors. Fn is a feedback network with two nMOS transistors. Fn is a complementary feedback network and Fnp includes nMOS and pMOS transistors placed between the two complementary outputs Y and

.Note that we improve the driving capability of feedback networks as we use VDD and𝑌𝑌� GND connections.

18

Texas Tech University, Ashish Joshi, May 2016

Fig 4.2. Different feedback circuits to get full swing outputs [4] Now in the next section, we have designed various logic cells in complementary output CDM logic style and compared them with same cells designed in C-CMOS logic style. 4.3 Performance comparison in CDM and CMOS for complementary outputs Basic logic cells has been implemented with 7nm FinFET devices in both CDM and CMOS logic style with complementary outputs. The Schematics, output waveforms, PDP variation with load, power consumption and delay comparison for the various cells has been shown in the following figures. 4.3.1 AND-NAND gate implemented in CDM and CMOS

Fig 4.3. Schematic of AND-NAND in CDM logic Style

19

Texas Tech University, Ashish Joshi, May 2016

Fig 4.4. Schematic of NAND-AND logic gate in CMOS logic Style.

Fig 4.5. Test Bench for NAND/AND implemented in CMOS and CDM logic style

20

Texas Tech University, Ashish Joshi, May 2016

Fig 4.6. Output NAND/AND waveforms for CDM logic gate.

Fig 4.7. Output waveforms from CMOS NAND-AND gate.

Fig 4.8. PDP between CMOS and CDM with varying load capacitance.

21

Texas Tech University, Ashish Joshi, May 2016

Fig 4.9. Delay vs Power plot for AND-NAND in CMOS and CDM logic style. Table 4.1. shows the various performance parameters for the AND-NAND logic cell in CMOS and CDM with 1f F load capacitance. Table 4.1. Performance parameter for AND/NAND in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 22.98 18.22 Power (in nW) 129.6 136.8 PDP (in 10 e-18) 2.979 2.492

4.3.2 OR-NOR gate implemented in CDM and CMOS Logic Style

Fig 4.10. Schematic of NOR-OR in CDM Logic Style

22

Texas Tech University, Ashish Joshi, May 2016

Fig 4.11. Schematic of NOR-OR in CMOS Logic Style

Fig 4.12. Test Bench for performance comparison between CMOS and CDM

23

Texas Tech University, Ashish Joshi, May 2016

Fig 4.13. Output Waveforms for the CDM Logic Style

Fig 4.14. Output Waveform for CMOS logic Style

Fig 4.15. PDP between CMOS & CDM OR-NOR with varying load capacitance

24

Texas Tech University, Ashish Joshi, May 2016

Fig 4.16. Delay vs Power plot for NOR-OR in CMOS and CDM logic style. Table 4.2. shows the performance parameter comparison of 2 input OR-NOR between CMOS and CDM with 1f F load capacitance. Table 4.2. Performance parameter for OR/NOR in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 7.609 12.08 Power (in nW) 100.1 98.25 PDP (in 10 e-18) 761.7m 1.037

4.3.3 3-Input AND-NAND gate in CMOS and CDM Logic Style

Fig 4.17. Schematic of 3 input NAND-AND in CDM Logic Style

25

Texas Tech University, Ashish Joshi, May 2016

Fig 4.18. Test Bench for the 3 input NAND-AND in CMOS and CDM Logic Style

Fig 4.19. Output Waveforms for 3 input CDM NAND-AND logic gate.

26

Texas Tech University, Ashish Joshi, May 2016

Fig 4.20. Output Waveforms for 3 input CMOS NAND-AND logic gate.

Fig 4.21. PDP for 3-NAND-AND in CMOS and CDM with varying load cap.

Fig 4.22. Delay vs Power comparison for 3 Input CMOS and CDM AND-NAND.

27

Texas Tech University, Ashish Joshi, May 2016

Table 4.3. shows the performance parameter comparison of 3 input AND/NAND between CMOS and CDM with 1f F load capacitance Table 4.3. Performance parameter for 3 input AND/NAND in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 13.74 16.71 Power (in nW) 114.5 89.83 PDP (in 10 e-18) 1.573 1.501

4.3.4 3-Input NOR-OR gate implemented in CMOS and CDM Logic Style

Fig 4.23. Schematic of 3 input NOR-OR gate in CDM logic Style

28

Texas Tech University, Ashish Joshi, May 2016

Fig 4.24. Test Bench for 3 input CMOS and CDM NOR-OR gate

Fig 4.25. Output waveforms from the CDM NOR-OR Logic cell.

Fig 4.26. Output Waveforms from CMOS NOR-OR Logic cell.

Fig 4.27. PDP for the CMOS and CDM NOR-OR with varying load capacitance

29

Texas Tech University, Ashish Joshi, May 2016

Fig 4.28. Delay vs Power comparison for CMOS and CDM OR-NOR. Table 4.4. shows the performance parameter comparison of 3 input OR/NOR between CMOS and CDM with 1f F load capacitance Table 4.4. Performance parameter for 3 input OR/NOR in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 10.59 16.04 Power (in nW) 131.9 104 PDP (in 10 e-18) 1.398 1.667

From the above simulation results, it can be shown that CDM logic is more efficient in comparison with the CMOS logic for the complementary outputs with balanced and symmetrical designs. CMOS logic style works better for NAND-AND logic gates, hence CMOS standard cell library are more efficient for the NAND/AND intensive design synthesis. 4.4 Power Saving with CDM The Power consumption of the circuit can be reduced by considering the following parameters: • Switching Activity in the circuit. • Switching capacitance of each node. • Supply Voltage • Short Circuit current • Leakage current

30

Texas Tech University, Ashish Joshi, May 2016

Now, the advantage of CDM comes from the fact that it is best suitable to implement all the above power reduction techniques: 1. Switching Activity in the circuit can be reduced by eliminating the glitches. CDM Designs provides balanced complementary outputs and non-skewed outputs for the cells with multiple outputs, hence the reduced chances for the glitches and power dissipation due to glitch propagation. 2. Switching capacitance of the node in CDM will be small compared to the node in the CMOS design, due to the smaller size if the transistors in CDM implementation because of less no. of transistor in the critical path (less parasitic capacitance). 3. Like the CMOS technology, the supply voltage can be reduced but with increase in the delay for the circuit. 4. There are few ground and power connections means fewer VDD to GND connections during switching. So CDM implementation should draw the least amount of the short circuit power. 5. Leakage current contribute significantly as going deep the feature size and therefore to address this problem, FinFET devices has been used in place of the bulk MOS transistors to minimize the leakage power. FinFET devices has better Ioff current performance compared with bulk MOS transistors.

4.5 CDM with Single Output Cells Fig 4.29. shows the basic single level and two level logic cells with single output CDM logic style

31

Texas Tech University, Ashish Joshi, May 2016

Fig 4.29. Single Output CDM basic cells (a) Single Level (b)-(d) Two Level Single output CDM basic cell can be seen as the half of the complementary output CDM basic cell. All the feedback networks used with the complementary outputs CDM cells, requires the outputs to be complementary which is not true with the single output CDM cells. Hence we have used inverter at the output of the cells to get the full swing outputs with enough drive capability. Single output CDM are observed efficient compared with static CMOS for the better implementation of the arithmetic circuits such as adders, multipliers and other XOR intensive circuits. It can proved with the following simulation results for the various arithmetic modules like half adder, full adder,4 bit multiplier and various primitive gates like 2 input AND gate, NAND gate, OR gate ,NOR gate, XOR gate , 3 input AND gate, NAND gate, OR gate, NOR gate and XOR gate.

32

Texas Tech University, Ashish Joshi, May 2016

4.5.1 AND Gate

Fig 4.30. Schematic of AND gate with CDM single output logic style

Fig 4.31. Test Bench for CMOS and CDM AND gate

33

Texas Tech University, Ashish Joshi, May 2016

Fig 4.32. Output Waveforms from CMOS and CDM AND gate Table 4.5. shows the delay and power consumption of the AND gate implemented in both logic styles Table 4.5. Performance parameter for 2 input AND in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 13.15 12.53 Power (in nW) 75.96 75.95 PDP (in 10 e-18) 0.998 0.951

4.5.2 OR gate

Fig 4.33. Schematic of the OR gate in the single output CDM logic style

34

Texas Tech University, Ashish Joshi, May 2016

Fig 4.34. Test Bench for CMOS and CDM OR gate.

Fig 4.35. Output waveforms from the CMOS and CDM OR gate. Table 4.6. shows the delay and ,power consumption of the OR gate implemented in both logic styles. Table 4.6. Performance parameter for 2 input OR in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 12.21 12.92 Power (in nW) 92.81 87.88 PDP (in 10 e-18) 1.103 1.135

35

Texas Tech University, Ashish Joshi, May 2016

4.5.3 3-Input AND gate in CDM single output logic style

Fig 4.36. Schematic of 3 input AND gate in CDM single output logic style.

Fig 4.37. Test Bench for CMOS and CDM 3 Input AND gate.

36

Texas Tech University, Ashish Joshi, May 2016

Fig 4.38. Output waveforms from 3 input AND gate in CDM and CMOS. Table 4.7. shows the delay and ,power consumption of the AND gate implemented in both logic styles Table 4.7. Performance parameter for 3 input AND in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 20.89 13.14 Power (in nW) 66.1 59.36 PDP (in 10 e-18) 1.381 780m

4.5.4 3-Input OR Gate

Fig 4.39. Schematic of 3 Input OR gate in CDM single output logic style.

37

Texas Tech University, Ashish Joshi, May 2016

Fig 4.40. Test Bench for 3 Input CMOS and CDM OR gate.

Fig 4.41. Output Waveforms from 3 Input OR gate in CMOS and CDM. Table 4.8. shows the delay and power consumption of the OR gate implemented in both logic styles Table 4.8. Performance parameter for 3 input OR in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 18.1 12.81 Power (in nW) 92.95 102.4 PDP (in 10 e-18) 1.6 1.3

38

Texas Tech University, Ashish Joshi, May 2016

4.5.5 Half Adder Comparison

Fig 4.42. Schematic of Half Adder in single output CDM logic style

Fig 4.43. Test Bench for CDM and CMOS half adder

39

Texas Tech University, Ashish Joshi, May 2016

Fig 4.44. Output waveforms from the half adder in CMOS and CDM logic style. Table 4.9. shows the delay and ,power consumption of the half adder gate implemented in both logic styles. Table 4.9. Performance parameter for half adder in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 16.23 14.16 Power (in nW) 164 197 PDP (in 10 e-18) 2.66 2.79

4.5.6 Full Adder Comparison

Fig 4.45. Schematic of Full Adder in CDM single output logic style.

40

Texas Tech University, Ashish Joshi, May 2016

Fig 4.46. Test Bench for CDM and CMOS Full adder.

Fig 4.47. Output waveforms from the Full adder in CMOS and CDM. Table 4.10. shows the delay and ,power consumption of the full adder gate implemented in both logic styles. Table 4.10. Performance parameter for full adder in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 28.66 19.08 Power (in nW) 462.5 690.2 PDP (in 10 e-18) 12.7 13.17

41

Texas Tech University, Ashish Joshi, May 2016

4.5.7 4:2 Compressor comparison 4:2 Compressor has been designed using the full adders in both CMOS and CDM logic style. The following figure shows the compressor schematic in the CDM logic style consisting of full adders again implemented in CDM logic style.

Fig 4.48. Schematic of 4:2 compressor in single output CDM logic style.

Fig 4.49. Test Bench for CMOS and CDM 4:2 compressor design

42

Texas Tech University, Ashish Joshi, May 2016

Fig 4.50. Output Waveforms from 4:2 compressor in CMOS and CDM. From the Fig 4.50, it can be observed that CMOS outputs has more glitches when compared with CDM logic cell, hence CMOS implementation results in more power consumption. Table 4.11. shows the delay and power consumption of the 4:2 compressor implemented in both logic styles Table 4.11. Performance parameter for 4:2 compressor in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 26.71 20.65 Power (in nW) 668.4 1092 PDP (in 10 e-18) 17.85 22.54

4.5.8 4 bit by 4 bit Multiplier

43

Texas Tech University, Ashish Joshi, May 2016

Fig 4.51. Schematic of 4 bit by 4 bit multiplier in CDM logic style.

Fig 4.52. Test Bench for CDM and CMOS 4 bit by 4 bit multiplier.

Fig 4.53. Output Waveforms from the Multiplier in CMOS logic style.

Fig 4.54. Output Waveforms from the Multiplier in CDM logic style

44

Texas Tech University, Ashish Joshi, May 2016

Again for the multiplier design, from the above shown figures, we can observe that CMOS multiplier has more glitches in the output as compared with the CDM logic style and hence has more power consumption. Table 4.12. shows the delay and power consumption of the multiplier implemented in both logic styles. Table 4.12. Performance parameter for 4 bit multiplier in CDM & C-CMOS Parameters CDM CMOS Delay (in ps) 30 24 Power (in nW) 2K 2.8K PDP (in 10 e-18) 60 67.2

4.6 Data Analysis Single Output CDM is not universally better than static CMOS for all types of the designs; for the NAND/AND intensive circuits, static CMOS can result in better implementation as compared with single output CDM for the smaller load capacitance. But for larger load capacitance, CDM implementation of the 2 and 3 input NAND gate works better compared with C-CMOS logic style. This is demonstrated in the simulation result for the 2 & 3 input AND/NAND gates (Fig. 4.8 and Fig. 4.21). Single Output CDM implementation is better in the XOR rich or MUX rich designs compared with the static CMOS logic style. Motivated with the performance of the Single output CDM logic gates, there’s the need to design the standard cell library with these cells and integrate these logic cells in the existing design flows for the optimal performance. Since Single Output CDM and CMOS have their respective advantages and disadvantages in terms of the performance characteristics like area, power and delay. Standard cell library design with C-CMOS and Single Output CDM logic cells on various technology nodes has been designed in the thesis and performance improvement for the designs synthesized with each CMOS and Single Output CDM standard cell library are analyzed. In addition to CMOS standard cell library design, proposed Single Output CDM standard cell library contains wide variety of complicated logic cells constructed from only few basic 1 & 2 level single output CDM cells. Note that

45

Texas Tech University, Ashish Joshi, May 2016

traditional CMOS standard cell library usually consists of thousands of logic cells with individual layouts but Single Output CDM standard cell library consists of various complicated logic functions that can be derived by changing the signals (VDD, GND, and Variable) at the input lines with same layout. Hence Single Output CDM standard cell libraries result in synthesized designs to be symmetrical. With same footprint and by changing the input signal, different logic functions can be generated in the single output CDM logic style. Therefore it results in reduced manual design efforts for the standard cell library design.

46

Texas Tech University, Ashish Joshi, May 2016

CHAPTER 5

SINGLE OUTPUT CDM STANDARD CELL LIBRARY DESIGN We have already explored various CDM logic cells with complementary outputs and compared their performance parameter with CMOS logic cells with complementary outputs. We have observed CDM logic cells are more efficient with respect to CMOS logic cells, but industry standard tools available for the ASIC design flow are designed for the single output (but not complementary outputs) standard cell library. Hence to enjoy the benefits from the complementary output CDM logic cells, algorithmic level modification is required with the standard tools which is outside the scope of the thesis. Hence, we worked on designing the single output CDM standard cell library with FinFET devices and compare the performance parameters after synthesizing the benchmark designs with both the designed CDM and CMOS standard cell libraries. Fig 5.1 shows the basic single level and two level logic cells with single output CDM logic style

Fig 5.1. CDM Logic Cells (a) Single Level (b)-(d) Two Level The Basic expression for the output from the single level and two level CDM single output logic cells can be written as follows:

47

Texas Tech University, Ashish Joshi, May 2016

Y (a) = ( 1) + ( 2) (5) Y (b) = (6) ( 1)𝐴𝐴+𝐼𝐼𝐼𝐼( ( 𝐴𝐴2)𝐼𝐼𝐼𝐼+ ( 3)) Y (c) = (7) 𝐴𝐴 𝐼𝐼𝐼𝐼( 1)𝐴𝐴+ 𝐵𝐵(𝐼𝐼𝐼𝐼2) +𝐵𝐵 𝑖𝑖(𝑖𝑖 3) Y (d) = (8) ( 𝐴𝐴1�)𝐵𝐵+𝑖𝑖𝑖𝑖( 2)𝐵𝐵 +𝑖𝑖𝑖𝑖 (�( 𝐴𝐴3)𝐼𝐼𝐼𝐼+ ( 4)) The major advantage from the CDM logic cells is that they support the automatic logic 𝐴𝐴 �𝐵𝐵 𝑖𝑖𝑖𝑖 𝐵𝐵 𝑖𝑖𝑖𝑖 � 𝐴𝐴 𝐵𝐵 𝐼𝐼𝐼𝐼 𝐵𝐵 𝑖𝑖𝑖𝑖 design. We can define different algorithms to extract the different logic functionalities just with above two basic cells. From the single level CDM basic cell, total of 32 = 9 different logic functions can be generated shown in the following Table 5.1. Table 5.1. Logic functions from Single level Single output CDM basic cell In1 In2 Y 0 0 1 0 1 0 B 1 0 A𝐴𝐴 1 1 0𝐴𝐴 𝐴𝐴 1 B + C 0 𝐴𝐴 𝐵𝐵 C 1 + 𝐴𝐴𝐶𝐶 C B + 𝐴𝐴 𝐶𝐶

𝐴𝐴𝐶𝐶 𝐴𝐴𝐴𝐴 Similarly with the two level CDM logic cell, total of 34 = 81 different logic functions can be generated. Even after removing the repeating/redundant logic cell, total of 55 different logic cells can be generated with CDM logic style just with two level implementation. Layout for the basic cells remains the same, only changing the input lines can change the functionality of the cell. Therefore, this results in CDM cell library being richer than CMOS cell library with reduced area and power consumption and less manual efforts. We have confined the CDM logic cells to two level only though there is feasibility to extend it to 3 levels to make the library richer in terms of the logic functions. CMOS standard cell library has also been generated consisting of the primitive logic gates along with arithmetic modules like full adder and half adder. All

48

Texas Tech University, Ashish Joshi, May 2016

the cells in both the designed standard cell library has single instance and device sizes hasn’t been scaled to multiples to allow better drive capability with more power consumption. Standard cell libraries designed are optimized for Energy (PDP) and hence the logic cells are sized for the minimum PDP for both static CMOS and Single Output CDM standard cell library. Since standard cell library are designed using the various FinFET device models from PTM, we can only change the number of the Fins for the FinFET devices and other parameters are fixed with the model library. Therefore the number of the fins for PFET and NFET used in the logic cells are selected for the minimum PDP consumption. SEA algorithm [6] has been used for FinFET sizing in the Single Output CDM standard cell library to minimize the PDP. The FinFETs at the identical positions in the basic cells has been grouped together for fin sizing as one variable and then sweep is performed for all the variables to find the combination for minimum PDP. 5.1 Standard Cell Library Design Flow Fig 5.2 shows the complete flow for the standard cell library characterization for both CDM and C-CMOS logic cells

Fig 5.2. Standard Cell Library Design Flow

49

Texas Tech University, Ashish Joshi, May 2016

Once the number of FINS has been decided for Single Output CDM and CMOS logic cells, netlist for the logic cells was generated and formatted as per HSPICE format and the fed into silicon smart with device models for the standard cell library characterization using HSPICE simulator. Silicon smart generate the standard cell library in the liberty format (.lib) containing all the timing and power information of the cells included in the standard cell library. Liberty format is then converter into the database format (.db) using the library compiler. All the tools mentioned were from Synopsys. Standard Cell libraries with database (.db) format are further used to synthesize various benchmark circuits using Synopsys design vision. All the scripts used in this flow for the various tools and logic cell netlist for both CMOS and Single Output CDM cells has been included in the appendix A BSIM-CMG FinFET device models for feature size 7nm, 10nm, 16nm and 20nm are available from PTM for HP (high performance) and LSTP (low stand by power).Hence standard cell library has been designed for all the available device models in different feature size. All the designed standard cell libraries has been used to synthesize the benchmark designs to prove the design methodology is independent of the technology feature size. 5.2 Benchmark Circuits Benchmark circuits are the collection of the various circuits to evaluate more objectively the performance of the various synthesis tools. Some of the popular benchmark circuits includes ISCAS’85, ISCAS’89 and ITC’02 .In general ISCAS’85 is the generally used for circuits. Since the designed standard cell library consists of the combinational cells only, we use ISCAS’85 for our experiments. Table 5.2 shows the functionality of the various benchmark designs synthesized with the designed standard cell libraries for the performance comparison.

50

Texas Tech University, Ashish Joshi, May 2016

Table 5.2. ISCAS’85 Benchmark Designs

5.3 Synthesis Results with CDM standard cell library Synthesis results for the various circuits in the benchmark designs with both the C- CMOS and Single output CDM standard cell library is shown in the following Tables. Table 5.3 shows the synthesis results for the benchmark circuits with 7nm CDM and CMOS FinFET standard cell libraries. Table 5.3 Synthesis Results with 7nm Standard Cell Library Architecture CMOS CDM CMOS CDM Power Delay Power Delay PDP PDP C1355 14.27 124.68 9.3 136.27 1779.18 1267.31 c1908a 7.2 110.77 5.6 152.77 797.54 855.51 c3540a 13.84 197.46 11.21 241.38 2732.85 2705.87 c499 14.7 103.85 9.8 120.79 1526.60 1183.74 c432 2.72 162.16 2.06 211.99 441.08 436.70 c6288 95.94 608.74 82.89 754.88 58402.52 62572.00

51

Texas Tech University, Ashish Joshi, May 2016

c880 4.6 130.21 4.15 165.74 598.97 687.82 c17 0.064 16 0.057 18.61 1.02 1.06 c2670 10.63 148 8.5 159.33 1573.24 1354.31 c5315 27.5 146.5 22.5 175.5 4028.75 3948.75 c7552 39.8 337.19 31.3 251.53 13420.16 7872.89 Table 5.4 shows the synthesis results for the benchmark circuits with 10nm CDM and CMOS FinFET standard cell libraries. Table 5.4 Synthesis Results with 10nm Standard Cell Library Architecture CMOS CDM CMOS CDM Power Delay Power Delay PDP PDP C1355 22.03 146 13.7 153.12 3216.38 2097.744 c1908a 10.6 129.45 8 158.56 1372.17 1268.48 c3540a 19.4 226.6 16.4 214.34 4396.04 3515.176 c499 16.8 125.09 14.2 132.42 2101.512 1880.364 c432 3.83 183.86 3 213.51 704.1838 640.53 c6288 121 683.8 121.9 825.75 82739.8 100658.9 c880 6.73 143.34 5.78 171.31 964.6782 990.1718 c17 0.0832 18.24 0.0814 19.36 1.517568 1.575904 c2670 15.4 171.86 12.2 222.58 2646.644 2715.476 c5315 40.15 166.12 32.77 175.69 6669.718 5757.361 c7552 59 382.08 45 322.64 22542.72 14518.8 Table 5.5 shows the synthesis results for the benchmark circuits with 14nm CDM and CMOS FinFET standard cell libraries. Table 5.5 Synthesis Results with 14nm Standard Cell Library Architecture CMOS CDM CMOS CDM Power Delay Power Delay PDP PDP C1355 28.1 171 17.1 171.77 4805.1 2937.267 c1908a 14.3 148.57 10.44 182.26 2124.551 1902.794 c3540a 26.4 263.08 22.1 262.57 6945.312 5802.797 c499 22.8 135.84 19.3 144.72 3097.152 2793.096 c432 5.23 207 4 229.43 1082.61 917.72 c6288 162.9 765.18 164.2 868.42 124647.82 142594.6 c880 9 162.41 7.94 170.6 1461.69 1354.564 c17 0.1118 19.98 0.11 20.56 2.233764 2.2616 c2670 20.8 193.13 17.1 179.3 4017.104 3066.03 c5315 53.6 180.57 43.8 194.39 9678.552 8514.282 c7552 80 441.12 60 311.24 35289.6 18674.4

52

Texas Tech University, Ashish Joshi, May 2016

Table 5.6 shows the synthesis results for the benchmark circuits with 16nm CDM and CMOS FinFET standard cell libraries. Table 5.6 Synthesis Results with 16nm Standard Cell Library Architecture CMOS CDM CMOS CDM Power Delay Power Delay PDP PDP C1355 46.3 234.48 26.4 220.8 10856.424 5829.12 c1908a 21.8 213.79 15.7 237.77 4660.622 3732.989 c3540a 40.6 355.32 33.3 338.85 14425.992 11283.71 c499 35.2 189.45 28.4 196.28 6668.64 5574.352 c432 8.23 282.97 6.52 304.22 2328.8431 1983.514 c6288 252.2 1038.66 259.3 1204.05 261950.05 312210.2 c880 14.1 225.92 12 220.7 3185.472 2648.4 c17 0.1736 25.92 0.19918 24.49 4.499712 4.877918 c2670 32.4 264.69 24.2 281.84 8575.956 6820.528 c5315 83.6 260.24 67.3 250.28 21756.064 16843.84 c7552 126 611.92 90 398.18 77101.92 35836.2

Table 5.7 shows the synthesis results for the benchmark circuits with 20nm CDM and CMOS FinFET standard cell libraries. Table 5.7 Synthesis Results with 20nm Standard Cell Library Architecture CMOS CDM CMOS CDM Power Delay Power Delay PDP PDP C1355 72.6 336.57 40 301.41 24434.982 12056.4 c1908a 33.8 298.92 24.4 309.5 10103.496 7551.8 c3540a 65.6 528.88 49.4 418.72 34694.528 20684.77 c499 55.3 273.83 41.9 254.83 15142.799 10677.38 c432 13.2 403.29 10.12 431.25 5323.428 4364.25 c6288 396.7 1463.13 371.1 1499.75 580423.67 556557.2 c880 22.1 325.96 18.4 295.56 7203.716 5438.304 c17 0.272 35.25 0.305 31.67 9.588 9.65935 c2670 51.7 375.79 39 390.43 19428.343 15226.77 c5315 133 363.04 100.2 360.83 48284.32 36155.17 c7552 195.1 918.48 140 627.01 179195.45 87781.4

From the above simulation results by the designed standard cell libraries in different technologies with both single output CDM and CMOS logic styles, it can be observed

53

Texas Tech University, Ashish Joshi, May 2016

that CDM standard cell library results in Power and PDP efficient designs as compared with the CMOS standard cell libraries. 5.4 Data Analysis Power and PDP (Power Delay Product) savings with CDM compared with C-CMOS has been calculated from the data presented in the tables and shown in Fig 5.5 & Fig 5.6. From the figures it can be observed that Power and PDP has saved with CDM for all the benchmark designs except c17 and c6288.Design c17 is the small six-NAND gate circuit and c6288 is 16x16 bit multiplier, with the following schematic shown in Fig5.3. We have already observed that CDM NAND/AND gate are not optimized compared with C-CMOS logic style, hence c17 being the NAND intensive design has been optimized in terms of power and energy compared with C-CMOS standard cell libraries.

Fig 5.3.ISCAS-85 c6288 16x16 multiplier Full Adder and half adder logic cells has been used in the multiplier design has been implemented as shown in Fig.5.4

54

Texas Tech University, Ashish Joshi, May 2016

Fig 5.4. Full adder module for ISCAS-85 c6288 16x16 multiplier The full adder module has been implemented with primitive logic gates (NOR) in the design. Hence during synthesis with the CDM standard cell, even if there’s full adder cell in the library the compiler chose to use logic gates to implement full adder and use that in multiplier design. If compiler would have chosen CDM full adder directly from standard cells rather than designing it using logic gates, than power and PDP savings are possible with CDM Standard cell libraries for c6288 as well. This requires change in the HDL modelling to design c6288 design but since we are using the original benchmark design only therefore the results are not optimized compared with C-CMOS. For the rest of the benchmark designs, % savings in terms of power and PDP are significant. Synthesis with the CDM standard cell libraries has resulted in average power saving of 17-21% for all the benchmark designs and 7-26% PDP savings compared with C-CMOS standard cell libraries for all benchmark designs.

55

Texas Tech University, Ashish Joshi, May 2016

Fig 5.5. Power Improvement with CDM over CMOS standard cell libraries

Fig 5.6. PDP Improvement with CDM over CMOS standard cell libraries

5.5 Binary to BCD Converter Other than the Benchmark circuits, we have also used the designed standard cell libraries for the synthesis of the various Binary to BCD converter designs [30-36].New architecture for binary to BCD converter has also been devised using Complement Based Logic Design Algorithm (CBLD) [29].CBLD algorithm tend to make you design more XOR gate intensive and CDM standard cell library has been proved to be efficient compared with C-CMOS for the XOR intensive designs. Hence the synthesis results of

56

Texas Tech University, Ashish Joshi, May 2016

the CBLD based designs with CDM standard cell libraries results in more savings in terms of energy and power. The following sections explains about the design of the binary to BCD converter design with CBLD and comparison with various other state of art designs. Comparison has been completed in various technologies and result shows CBLD algorithm capable of designing fast and energy efficient modules. Later synthesis for all binary to BCD architectures has been completed with the designed C-CMOS and CDM standard cell libraries and result shows the CBLD designs with CDM standard cell libraries achieve 50% energy saving compared with the best near performance design. 5.5.1 Binary to BCD converter in CBLD 5.5.1.1 Introduction The goal of this new method of the converter design is to optimize the conversion speed, power dissipation and area consumed. Most of the recently proposed multiplier designs uses the 7 bit binary to BCD converters. Binary to BCD converters is the critical component of the multiplier designs and hence the proposed algorithm has been designed for such multipliers is based on Complement Based Logic Design. For better understanding, let us assume the arbitrary truth table for three inputs A, B, C and three outputs Y1, Y2, Y3, such that:

Y1 = (A, B, C) (9) ( ) (10) Y2 = ⨍ A, B, C Y3 = (A, B, C) (11) ⨍

⨍ Therefore, so as to implement all our output functions in terms of the inputs, we can use the identity matrix multiplied with all our outputs and further multiplying the output identity matrix with the inputs as shown in in the following equation:

= 𝒚𝒚𝒚𝒚 ⊕ 𝟎𝟎 𝟎𝟎 𝑨𝑨 𝑩𝑩 𝑪𝑪 𝒚𝒚𝒚𝒚⨁𝑨𝑨 𝒚𝒚𝒚𝒚⨁𝑩𝑩 𝒚𝒚𝒚𝒚⨁𝑪𝑪 𝑭𝑭𝑭𝑭𝑭𝑭 𝑭𝑭𝑭𝑭𝑭𝑭 𝑭𝑭𝑭𝑭𝑭𝑭 � 𝟎𝟎 𝒚𝒚𝒚𝒚 ⊕ 𝟎𝟎 � ∗ �𝑩𝑩 𝑪𝑪 𝑨𝑨� �𝒚𝒚𝒚𝒚⨁𝑩𝑩 𝒚𝒚𝒚𝒚⨁𝑪𝑪 𝒚𝒚𝒚𝒚⨁𝑨𝑨� ≅ �𝑭𝑭𝑭𝑭𝑭𝑭 𝑭𝑭𝑭𝑭𝑭𝑭 𝑭𝑭𝑭𝑭𝑭𝑭� 𝟎𝟎 𝟎𝟎 𝒚𝒚𝒚𝒚 ⊕ 𝑪𝑪 𝑨𝑨 𝑩𝑩 𝒚𝒚𝒚𝒚⨁𝑪𝑪 𝒚𝒚𝒚𝒚⨁𝑨𝑨 𝒚𝒚𝒚𝒚⨁𝑩𝑩 𝑭𝑭𝑭𝑭𝑭𝑭 𝑭𝑭𝑭𝑭𝑭𝑭 𝑭𝑭𝑭𝑭𝑭𝑭

57

Texas Tech University, Ashish Joshi, May 2016

With functions F11, F12 and F13, output Y1 is expressed in terms of the inputs A, B, C and hence out of those three functions, one is selected for less area, power and high speed. The above proposed algorithm is scalable and is possible to generate all the possible functions with respect to inputs with the help of the integration of MATLAB and Quine-McCluskey Software. Let’s assume we select functions F11, F21, F32 out of all the available ones as they been simpler and smaller when compared with others. Hence the final outputs can be expressed with the following equations:

Y1=F11 A (12) Y2=F21 B (13) ⊕ Y3=F32 A (14) ⊕ Using the conventional methods of logic realization, we can only get SOP and POS ⊕ functions, but here we are defining our outputs with the help of the final XOR gate which is either buffering or complementing the input line as per the output function with the help of the functions defined above.

5.5.1.2 Binary to BCD converter architecture Let ABCDEFG be the seven binary bits to be converted into two BCD digits (C3C2C1C0B3B2B1B0). We applied the above algorithm for Binary to BCD converter and each of the output is expressed in terms of the 7-bit binary input and the optimized function selected out of the seven possible functions are as follows:

F1=AC (15) F2= AC' + BC'D' (16) F3= BCG' + CD'E' + BD' + A (17) F4= BCG + DFG' + (A'CD'E' + C'DE + BC'D) + AD (18) F5= CDE' + CEF + B'CD'F' + A'B'DE'F' (19) F6= AC'D' + BFG + A'B'CF' + B'DEF' + CD (20) F7= BCD'+E'FG+C'D'F+(A'CD'E'+C'DE+BC'D) F’+AD (21) F8= G (22)

The final architecture of binary to BCD converter is represented in Fig. 5.7.

58

Texas Tech University, Ashish Joshi, May 2016

Final Output functions are shown in the following equations (23-30)

C3 = AC (23)

C2= (AC' + BC'D') B (24)

C1= (BCG' + CD'E' + BD' + A) C (25) ⨁ C0= (BCG + DFG' + (A'CD'E' + C'DE + BC'D) + AD) B (26) ⨁ B3= (CDE' + CEF + B'CD'F' + A'B'DE'F') C (27) ⨁ B2= (AC'D' + BFG + A'B'CF' + B'DEF' + CD) E (28) ⨁ B1= (BCD'+E'FG+C'D'F+(A'CD'E'+C'DE+ BC'D)F'+AD) B (29) ⨁ B0 = G (30) ⨁

This architecture is based on three stages, two first stages are with Sum of Product (SOP) structure for producing control functions (Fi) and the last stage contains two input XOR gates.

A C B C B C E B G

F1 F2 F3 F4 F5 F6 F7

C3 C2 C1 C0 B3 B2 B1 B0

Fig.5.7. Binary to BCD converter design with CBLD algorithm 5.5.1.3 Synthesis Results We have compared nine different designs for Binary to BCD Conversion (Table 2). These designs are: (i) Four different architectures proposed in [30], (ii) Three-Four split [31], (iii) Four-Three split [31], (iv) the design proposed in [33], and (v) the version of architecture of [32]. We describe all architectures using Verilog HDL structural modelling. The designs were verified with all the input possible combinations using Isim. All designs were synthesized using Synopsys design vision and IC compiler with saed90nm_typ_ht cell library from Synopsys for both the Pre-Layout and Post-Layout Analyses (Fig. 5.8 & 5.9) with saed32hvt_tt1p05v25c for 32nm (from Synopsis) while 45nm using OSU Standard Cell Libraries [37] and 14nm, 7nm, 5nm using the standard

59

Texas Tech University, Ashish Joshi, May 2016

cells libraries from USC [28] for Pre-Layout Analysis (Fig. 5.10 & 5.11). The synthesis results are shown in Table 5.8 and Table 5.9-5.10. From the simulation results, we can confirm that proposed architecture is consistently fast and Energy efficient for all the technologies from 90nm to 14nm (using CMOS based standard cells) and 7nm, 5nm using FinFET based standard cell library. Table 5.8. Post-Layout synthesis results with 90nm technology node. Architecture Area Power Delay 3-4[31] 293 90 1.23 4-3[31] 295 67.2 1.52 Binary New(BN)[32] 524 85.7 1.18 Shift Add by 3[30] 257 84.8 2.52 3-3-1 Design[33] 411 97.8 1.62 331 modified 1 [30] 405.5 110.2 1.95 331 modified 2 [30] 387.0 98.1 2.06 Range Detect(RD) 330.8 69 1.8 [30] CBLD[29] 348 70.3 1.03

Table 5.9. Pre-Layout Synthesis with 90, 45 & 32nm technology node. Architecture 90nm 45nm 32nm µm2 µW ns µm2 µW ns µm2 µW ns Parameters Area Pow Del Area Pow Del Area Pow Del 3-4[31] 324 94.53 1.41 122.4 51 0.44 93.52 8.13 0.57 4-3[31] 294 68 1.47 114 40 0.48 76.75 6.42 0.69 Binary 509 84 1.15 232 62.25 0.43 135 10.4 0.54 New[32] Sh-Add-3 267.2 81.36 2.53 95.2 43.14 0.71 74.71 6.99 1.13 [30] 3-3-1 [33] 434 103 1.76 193.3 69 0.58 117.2 10.1 0.77 331 mod 1 451 108.5 1.62 185 68.4 0.49 124 10.47 0.75 [30] 331 mod 2 428.5 103.8 1.71 174.1 62.12 0.55 119 9.92 0.77 [30] Range 304 54 1.81 151 45 0.64 86.66 6.97 0.74 Detect [30] CBLD[29] 354 76 1.08 158.6 46.24 0.32 98.60 6.38 0.52

60

Texas Tech University, Ashish Joshi, May 2016

Table 5.10. Pre-Layout synthesis with 14(CMOS), 7 & 5nm (FinFET) technology. Architecture 14nm 7nm 5nm µm2 µW ns µm2 µW ns µm2 µW ns Parameters Area Pow Del Area Pow Del Area Pow Del 3-4[31] 11.44 2.65 121.48 1.27 1.05 63.98 0.468 0.45 12.51 4-3[31] 11.03 2.48 181 1.28 0.98 74.59 0.476 0.42 13.18 Binary 18.44 2.4 130.32 2.08 1.38 70.4 0.898 0.52 11.79 New[32] Sh-Add-3 9.62 2.67 266.8 1.08 0.78 120.6 0.396 0.4 22.46 [30] 3-3-1 [33] 15.85 3.26 165.8 1.7 1.31 73 0.696 0.61 13.73 331 mod 1 16.67 4.12 211.14 1.87 1.35 68.74 0.72 .60 14.44 [30] 331 mod 2 15.35 3.13 175.6 1.72 1.22 68.35 0.706 0.61 14.79 [30] Range 10.81 2.32 160.07 1.25 0.84 70.09 0.498 0.32 17.13 Detect [30] CBLD[29] 12.76 1.50 119.9 1.47 0.99 56.19 0.58 0.30 7.28

Pre-Layout_90nm 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Area Power Delay PDP APDP EDP 3-4[31] 4-3[31] Binary New[32] Shift Add by 3[30] 3-3-1 Design[33] 331 modified 1 [30] 331 modified 2 [30] Range Detection [30] CBLD[29]

Fig 5.8. Normalized Pre-Layout synthesis with 90nm technology.

61

Texas Tech University, Ashish Joshi, May 2016

Post_Layout_90nm

1

0.8

0.6

0.4

0.2

0 Area Power Delay PDP APDP EDP 3-4[31] 4-3[31] Binary New[32] Shift Add by 3[30] 3-3-1 Design[33] 331 modified 1 [30] 331 modified 2 [30] Range Detection [30] CBLD[29]

Fig 5.9 Normalized Post-Layout Synthesis with 90nm technology.

300 Delay with different Technologies 267 250 211 200 181 176 166 160

150 130 121 120 121 Delay(in ps)

100 75 70 73 70 68 69 64 56 50 22 14 15 13 13 7 17 14 12 0

14nm 7nm 5nm Fig 5.10 Pre-Layout delay result for 14, 7 and 5nm technology node.

62

Texas Tech University, Ashish Joshi, May 2016

PDP with different Technologies 870 900

800 712 700 550 600 541 500 449 371 322 400 313 PDP(µW.ps) 300 180 200 96 93 97 73 94 83 67 59 56 5.5 9.0 8.7 8.4

100 5.6 5.5 6.1 2.2 9.0 0

14nm 7nm 5nm Fig 5.11 Pre-Layout PDP results for 14nm (cmos), 7nm & 5nm (FinFET). 5.5.1.4 Synthesis Results with CDM standard cell library Further synthesis with the newly designed CDM and C-CMOS standard cell library in 7nm technology node for all the binary to BCD converter designs are shown in table 5.11 & table 5.12 and in Fig 5.12 & Fig 5.13. From the following figures it can be observed that synthesis with CDM standard cell library results in power savings for all the referenced architectures and achieving 10% more PDP efficiency in CDM compared with the C-CMOS logic style for the proposed CBLD based binary to BCD converter design. Table 5.11. Power dissipation with CDM and CMOS in 7nm technology node Power Dissipation(7nm) (µW) CMOS CDM CBLD[29] 0.878 0.676 3-4[31] 1.268 0.864 BN[32] 1.466 1.078 RD[30] 0.877 0.741 3-3-1[33] 1.434 1.129 331mod2[30] 1.32 1.078 4-3[31] 0.896 0.773 331mod1[30] 1.489 1.266 Shiftadd[30] 1.085 0.919

63

Texas Tech University, Ashish Joshi, May 2016

Table 5.12. Power Delay Product with CDM and CMOS in 7nm PDP(7nm)(µW.ps) CMOS CDM CBLD[29] 33.90 31.30 3-4[31] 79.49 51.84 BN[32] 70.87 75.63 RD[30] 60.96 62.93 3-3-1[33] 108.87 96.47 331mod2[30] 93.34 99.55 4-3[31] 59.10 72.73 331mod1[30] 107.31 113.76 Shiftadd[30] 122.06 139.46

Power Dissipation with CDM & CMOS (7nm)

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

CMOS CDM

Fig 5.12 Power Dissipation with CDM and CMOS in 7nm technology.

64

Texas Tech University, Ashish Joshi, May 2016

PDP with CDM and CMOS Cells(7nm) 139 140.00 122 114 109 120.00 107 100 96 100.00 93 79 76 73 80.00 71 63 61 59

60.00 52 34 31 PDP(µW.ps) 40.00 20.00 0.00

CMOS CDM Fig 5.13. Power Delay Product with CDM and CMOS in 7nm technology.

65

Texas Tech University, Ashish Joshi, May 2016

CHAPTER 6

CONCLUSION AND FUTURE WORK FinFET technology is becoming prominent VLSI technology due to its extraordinary properties and advantages compared to planer bulk transistors. Standard cell library facilitated the circuit synthesis and performing the timing and power analyses for the circuits. Standard cell libraries with primitive gates and arithmetic modules like full adder and half adder has been designed in CMOS and CDM logic style in various technology node using FinFET devices. Simulation results from benchmark circuits predicted the average power improved of 17-21% and average PDP improvement of 7- 26% with CDM standard cell library compared with CMOS standard cell in 7nm, 10nm, 14nm, 16nm and 20nm respectively. It can be observed that CDM standard cell library results in Power and PDP efficient designs as compared with the CMOS standard cell libraries. CDM implementation is area efficient compared with CMOS standard cells because CDM logic style, results in symmetrical implementation of the logic cells and with few basic cells (single level, two level & three level) CDM standard cell library with massive number of logic cells and complicated functionality can be generated, resulting in CDM standard cell library more richer and area efficient compared with CMOS counterparts. Simulation Results from Binary to BCD converter designs synthesis with the designed CDM standard cell libraries shows the Power Efficiency of 15-31% for all referenced architecture and further 10% PDP saving compared with C- CMOS logic style for the proposed Binary to BCD converter design using CBLD algorithm. Proposed architecture being XOR intensive due to CBLD algorithm, is the reason for power and PDP efficiency with CDM standard cell library synthesis. Future Work includes making the CDM cell library richer with cells along with combinational cells and synthesizing the benchmark designs using both combinational and sequential cells. We have seen CDM NAND/AND gate are not efficient compared with C-CMOS AND/NAND gate, hence design for the hybrid standard cell libraries has been planned which includes the logic cells from both C-

66

Texas Tech University, Ashish Joshi, May 2016

CMOS and CDM logic style. Hybrid Standard cell libraries can provide us PDP efficient designs exploiting the power efficiency from CDM cells and timing efficiency from C- CMOS cell during logic synthesis.

67

Texas Tech University, Ashish Joshi, May 2016

Appendix A DESIGN COMPILER SCRIPT #/**************************************************/ #/* Compile Script for Synopsys */ #/* dc_shell-t -f */ #/**************************************************/ #/* All verilog files, separated by spaces */ set my_verilog_files [list c6288.v]

#/* Top-level Module */ set my_toplevel c6288 #/* Reserved time for output signals (Holdtime etc.) */ #/**************************************************/ #/* No modifications needed below */ #/**************************************************/ set link_library /home/Ashish/lib/tsmc018/imp/cmos7nm_hp.db set target_library /home/Ashish/lib/tsmc018/imp/cmos7nm_hp.db define_design_lib WORK -path ./WORK analyze -f verilog $my_verilog_files elaborate $my_toplevel current_design $my_toplevel link ungroup -all -flatten -simple_names compile -map_effort medium set filename [format "%s%s" $my_toplevel ".vh"] write -f verilog -output $filename set filename [format "%s%s" $my_toplevel ".sdc"] write_sdc $filename redirect timing.rep { report_timing } redirect cell.rep { report_cell } redirect power.rep { report_power } quit

68

Texas Tech University, Ashish Joshi, May 2016

C-CMOS LOGIC CELL NETLIST Inverter .subckt INV in VSS VDD out M0 out in VDD in pfet nfin=1 M1 out in VSS in nfet nfin=1 .ends INV

2 Input AND gate .subckt ANDx2 A B VSS VDD AND M1 net07 B VDD B pfet nfin=1 M0 net07 A VDD A pfet nfin=1 M3 net19 B VSS B nfet nfin=2 M2 net07 A net19 A nfet nfin=2 X0 net07 VSS VDD AND INV .ends ANDx2

3 Input AND gate .subckt ANDx3 A B C VSS VDD AND M2 net07 C VDD C pfet nfin=1 M1 net07 B VDD B pfet nfin=1 M0 net07 A VDD A pfet nfin=1 M5 net31 C VSS C nfet nfin=3 M4 net32 B net31 B nfet nfin=3 M3 net07 A net32 A nfet nfin=3 X0 net07 VSS VDD AND INV .ends ANDx3

2 Input XOR gate .subckt XORx2 A B VSS VDD XOR M3 net29 B VSS B nfet nfin=2 M2 net05 A_Bar net29 A_Bar nfet nfin=2 M1 net28 B_Bar VSS B_Bar nfet nfin=2 M0 net05 A net28 A nfet nfin=2 M7 net05 A_Bar net5 A_Bar pfet nfin=2 M6 net05 B net5 B pfet nfin=2 M5 net5 B_Bar VDD B_Bar pfet nfin=2 M4 net5 A VDD A pfet nfin=2 X8 net05 VSS VDD XOR INV X1 B VSS VDD B_Bar INV X0 A VSS VDD A_Bar INV .ends XORx2

69

Texas Tech University, Ashish Joshi, May 2016

2 Input OR gate .subckt ORx2 A B VSS VDD OR M1 net07 B net27 B pfet nfin=2 M0 net27 A VDD A pfet nfin=2 M3 net07 B VSS B nfet nfin=1 M2 net07 A VSS A nfet nfin=1 X0 net07 VSS VDD OR INV .ends ORx2

3 Input OR gate .subckt ORx3 A B C VSS VDD OR M2 net010 C net34 C pfet nfin=3 M1 net34 B net35 B pfet nfin=3 M0 net35 A VDD A pfet nfin=3 M5 net010 C VSS C nfet nfin=1 M4 net010 B VSS B nfet nfin=1 M3 net010 A VSS A nfet nfin=1 X0 net010 VSS VDD OR INV .ends ORx3

Full Adder .subckt full_adder1 A B C VSS VDD Carry Sum X1 net19 C VSS VDD Sum XORx2 X0 A B VSS VDD net19 XORx2 X8 net19 C VSS VDD net25 ANDx2 X9 A B VSS VDD net24 ANDx2 X10 net25 net24 VSS VDD Carry ORx2 .ends full_adder1

Half Adder .subckt half_adder1 A B VSS VDD Carry Sum X0 A B VSS VDD Sum XORx2 X7 A B VSS VDD Carry ANDx2 .ends half_adder1

CDM LOGIC CELL NETLIST 2 Input OR gate .subckt ORx2 A B VSS VDD OR M10 net6 B A_bar B pfet nfin=2 M7 net6 B VSS B nfet nfin=2 X0 A VSS VDD A_bar INV X2 net6 VSS VDD OR INV .ends ORx2

70

Texas Tech University, Ashish Joshi, May 2016

2 Input XOR gate .subckt XORx2 A B VSS VDD XOR M3 net7 A_bar B_bar A_bar nfet nfin=2 M0 net7 A B A nfet nfin=2 X10 B VSS VDD B_bar INV X2 A VSS VDD A_bar INV X8 net7 VSS VDD XOR INV .ends XORx2

2 Input AND gate .subckt ANDx2 A B VSS VDD AND M14 A_bar B net4 B nfet nfin=2 M13 net4 B VDD B pfet nfin=2 X0 A VSS VDD A_bar INV X3 net4 VSS VDD AND INV .ends ANDx2

3 Input AND gate .subckt ANDx3 A B C VSS VDD AND M6 net02 A VDD A pfet nfin=1 M5 net02 A_bar net1 A_bar pfet nfin=4 M4 net1 B VDD B pfet nfin=4 M20 net1 B_bar C_bar B_bar pfet nfin=4 X2 C VSS VDD C_bar INV X1 B VSS VDD B_bar INV X0 A VSS VDD A_bar INV X3 net02 VSS VDD AND INV .ends ANDx3

3 Input OR gate .subckt ORx3 A B C VSS VDD OR M7 net02 A VSS A nfet nfin=1 M6 net02 A_bar net2 A_bar nfet nfin=4 M5 net2 B_bar C_bar B_bar nfet nfin=4 M4 net2 B VSS B nfet nfin=4 X4 net02 VSS VDD OR INV X2 C VSS VDD C_bar INV X1 B VSS VDD B_bar INV X0 A VSS VDD A_bar INV .ends ORx3

Full Adder .subckt full_adder1 A B C VSS VDD Carry Sum M17 net012 B_bar VSS B_bar pfet nfin=4

71

Texas Tech University, Ashish Joshi, May 2016

M16 net012 B C_bar B pfet nfin=4 M15 net07 B_bar C_bar B_bar pfet nfin=4 M14 net07 B VDD B pfet nfin=4 M13 net072 A_bar net012 A_bar pfet nfin=4 M12 net072 A net07 A pfet nfin=4 M11 net6 B_bar C_bar B_bar pfet nfin=4 M10 net071 A_bar net6 A_bar pfet nfin=4 M9 net6 B C B pfet nfin=4 M8 net071 A net1 A pfet nfin=2 M7 net1 B_bar C B_bar pfet nfin=4 M1 net1 B C_bar B pfet nfin=2 X13 net072 VSS VDD Carry INV X11 net071 VSS VDD Sum INV X9 B VSS VDD B_bar INV X8 A VSS VDD A_bar INV X16 C VSS VDD C_bar INV .ends full_adder1

Half Adder .subckt half_adder1 A B VSS VDD Carry Sum M8 net36 A_bar B_bar A_bar pfet nfin=2 M7 net36 A VDD A pfet nfin=2 M4 net35 A_bar B A_bar pfet nfin=2 M3 net35 A B_bar A pfet nfin=2 X11 net36 VSS VDD Carry INV X13 net35 VSS VDD Sum INV X5 B VSS VDD B_bar INV X4 A VSS VDD A_bar INV .ends half_adder1

Different functions generated by changing the input lines to single output CDM single level and two level basic cell

Func1 .subckt func1 A B E VSS VDD output M4 net11 B VDD B nfet nfin=1 M0 net12 A E A nfet nfin=1 M2 VSS B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output not .ends func1

Func2

72

Texas Tech University, Ashish Joshi, May 2016

.subckt func2 A B E VSS VDD output M4 net11 B E B nfet nfin=1 M0 net12 A VDD A nfet nfin=1 M2 VSS B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output not .ends func2

Func3 .subckt func3 A B E D VSS VDD output M4 net11 B D B nfet nfin=1 M0 net12 A E A nfet nfin=1 M2 VSS B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output not .ends func3

Func4 .subckt func4 A B E D VSS VDD output M4 net11 B VDD B nfet nfin=1 M0 net12 A E A nfet nfin=1 M2 D B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output not .ends func4

Func5 .subckt func5 A B E D VSS VDD output M4 net11 B E B nfet nfin=1 M0 net12 A VDD A nfet nfin=1 M2 D B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output not .ends func5

Func6 .subckt func6 A B E C D VSS VDD output M4 net11 B D B nfet nfin=1 M0 net12 A E A nfet nfin=1 M2 C B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output not .ends func6

73

Texas Tech University, Ashish Joshi, May 2016

Func7 .subckt func7 A B E C D F VSS VDD output M5 net012 B F B nfet nfin=1 M4 net11 B D B nfet nfin=1 M0 net12 A net012 A nfet nfin=1 M3 E B net012 B pfet nfin=1 M2 C B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output not .ends func7

Func8 .subckt func8 A B VSS VDD output M0 net12 A B A nfet nfin=1 M1 VSS A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func8

Func9 .subckt func9 A B VSS VDD output M0 net12 A B A nfet nfin=1 M1 VDD A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func9

Func10 .subckt func10 A B VSS VDD output M0 net12 A VSS A nfet nfin=1 M1 B A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func10

Func11 .subckt func11 A B VSS VDD output M0 net12 A VDD A nfet nfin=1 M1 B A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func11

Func12 .subckt func12 A B E VSS VDD output M4 net11 B VSS B nfet nfin=1 M0 net12 A E A nfet nfin=1 M2 VDD B net11 B pfet nfin=1

74

Texas Tech University, Ashish Joshi, May 2016

M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func12

Func13 .subckt func13 A B E VSS VDD output M4 net11 B E B nfet nfin=1 M0 net12 A VSS A nfet nfin=1 M2 VDD B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func13

Func14 .subckt func14 A B E VSS VDD output M4 net11 B E B nfet nfin=1 M0 net12 A VDD A nfet nfin=1 M2 VDD B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func14

Func15 .subckt func15 A B E D VSS VDD output M4 net11 B D B nfet nfin=1 M0 net12 A E A nfet nfin=1 M2 VDD B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func15

Func16 .subckt func16 A B E D VSS VDD output M4 net11 B VSS B nfet nfin=1 M0 net12 A E A nfet nfin=1 M2 D B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func16

Func17 .subckt func17 A B E VSS VDD output M4 net11 B VDD B nfet nfin=1 M0 net12 A VSS A nfet nfin=1

75

Texas Tech University, Ashish Joshi, May 2016

M2 E B net11 B pfet nfin=1 M1 net11 A net12 A pfet nfin=1 X0 net12 VSS VDD output INV .ends func17

SILICON SMART STANDARD CELL LIBRARY CHARACTERIZATION SCRIPT # See SiliconSmart User Guide Appendix B for a complete list of parameters and definitions ################################# # OPERATING CONDITIONS DEFINITION ################################# # # Create one or more operation conditions here. Example: # create_operating_condition op_cond set_opc_process op_cond { {.lib "~/standard/PTM-MG/model" ptm7hp} } add_opc_supplies op_cond VDD 0.7 add_opc_grounds op_cond VSS 0.0 set_opc_temperature op_cond 25 # ################################# # GLOBAL CONFIGURATION PARAMETERS ################################# define_parameters default {

set_parameter pmos_model_names pfet set_parameter nmos_model_names nfet

# List of operating conditions as defined by create_operation_condition set active_pvts { op_cond }

# HSPICE set simulator hspice set simulator_cmd {hspice -o }

# HSPICE (client/server mode) # set simulator hspice_cs # set simulator_cmd {hspice -CC -port -o }

76

Texas Tech University, Ashish Joshi, May 2016

# SPECTRE # set simulator spectre6 # set simulator_cmd {spectremdl -tab -batch -design >&/dev/null}

# ELDO # set simulator eldo # set simulator_cmd {eldo -compat -i > >&/dev/null}

# MSIM # set simulator msim # (csh) # set simulator_cmd {msim -hsp -i -o >&/dev/null} # (sh) # set simulator_cmd {msim -hsp -i -o 2>/dev/null}

# Default simulator options for Finesim, Hspice, Spectre, Msim, and Eldo set simulator_options { "common,finesim: finesim_mode=spicehd finesim_method=gear finesim_speed=0 finesim_dvmax=0.1"

"common,hspice: probe=1 runlvl=5 numdgt=7 measdgt=7 acct=1 nopage"

"common,spectre6: compression=yes step=10ps maxstep=1ns relref=allglobal" "common,spectre6: method=trap lteratio=4 gmin=1e-18 autostop=0 save=none"

"common,msim: probe=1 accurate=1"

"common,eldo: gmindc=1n gmin=1p itl1=500 ingold=1 numdgt=4 measout=0 cptime=18000 relvar=0.01" "op,eldo: dv=0.5 method=gear" "tran,eldo: brief=0 relvar=0.001" "optimize,eldo: lvltim=3 relvar=0.001" "power,eldo: method=gear" }

# Simulation resolution set time_res_high 1e-12

# Controls which supplies are measured for power consumption set power_meas_supplies { VDD }

77

Texas Tech University, Ashish Joshi, May 2016

# list of ground supplies used (required for Functional Recognition) set power_meas_grounds { VSS }

# specifies which multi-rail format to be used in Liberty model; none, v1, or v2. set liberty_multi_rail_format none

# LOAD SHARE PARAMETERS # job_scheduler: 'lsf' (Platform), 'grid' (SunGrid), or 'standalone' (local machine) set job_scheduler standalone set run_list_maxsize 1 set normal_queue "lsf_queue_name" }

############################ # DEFAULT PINTYPE PARAMETERS ############################ pintype default {

set logic_high_name VDD set logic_high_threshold 0.8

set logic_low_name VSS set logic_low_threshold 0.2

set prop_delay_level 0.5

# Number of slew and load indices # (when importing with -use_default_slews -use_default_loads) set numsteps_slew 5 set numsteps_load 5 set constraint_numsteps_slew 3

# Operating load ranges set smallest_load 1e-15 set largest_load 50e-15

# Operating slew ranges set smallest_slew 10e-12 set largest_slew 1.2e-9 set max_tout 1.0e-9

# Automatically determine largest_load based on max_tout; off or on set autorange_load off

78

Texas Tech University, Ashish Joshi, May 2016

# Noise of points in for noise height set numsteps_height 8

# Input noise width. set numsteps_width 5

# driver model: pwl, emulated, active, active-waveform, custom set driver_mode pwl

# driver cell name (relevant only when driver_mode is "active") set driver pwl }

##################################### # LIBERTY MODEL GENERATION PARAMETERS ##################################### define_parameters liberty_model { # Add Liberty header attributes here for use with "model -create_new_model" set_parameter liberty_time_unit "1ps" set delay_model "table_lookup" set default_fanout_load 0.0 set default_inout_pin_cap 0.0 set default_input_pin_cap 0.0 set default_output_pin_cap 0.0 set default_cell_leakage_power 0.0 set default_leakage_power_density 0.0 } ####################### # VALIDATION PARAMETERS ####################### define_parameters validation { # Add validation parameters here }

79

Texas Tech University, Ashish Joshi, May 2016

References [1] Q. Xie, X. Lin, Y. Wang, S. Chen, M.J. Dousti, and M. Pedram. “Performance Comparisons between 7nm FinFET and Conventional Bulk CMOS Standard Cell Libraries,” IEEE Trans. on Circuits and Systems II, Vol. 62, No. 8, Aug. 2015, pp. 761-765. [2] Q. Xie, X. Lin, Y. Wang, M.J. Dousti, A. Shafaei, M. Ghasemi-Gol, and M. Pedram. “5nm FinFET standard cell library optimization and circuit synthesis in near- and super-threshold voltage regimes,” Proc. of IEEE Computer Society Annual Symp. on VLSI, Jul. 2014. [3] Shen-Fu Hsiao,Ming-Yu Tsai, Chia-Sheng Wen."Low Area/Power Synthesis Using Hybrid Pass Transistor/CMOS Logic Cells in Standard cell-Based Design Environment,"IEEE Trans. on Circuits and Systems II,EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010. [4] T. Nikoubin, F. Eslami, A. Baniasadi, and K. Navi, “A new cell design methodology for balanced XOR-XNOR circuits for hybrid-CMOS logic” Journal of Low Power Electronics 5, 2 (2009). [5] T.Nikoubin,, Grailoo, M., & Mozafari, H. (2010) “Cell design methodology based on transmission gate for low-power high-speed balanced XOR-XNOR circuits in hybrid-CMOS logic” Journal of Low Power Electronics, 6, 1–10. [6] Tooraj Nikoubin,Poona Bahrebar,Sara Pouri,Keivan Navi, and Vaez Iravani2."Simple Exact Algorithm for Transistor Sizing of Low-Power High-Speed Arithmetic Circuits".Hindawi Publishing Corporation VLSI Design Volume 2010, Article ID 264390. [7] K. Yano, Y. Sasaki, K. Rikino, and K. Seki, “Top-down pass-transistor logic design,” IEEE J. Solid-State Circuits, vol. 31, no. 6, pp. 792–803, Jun. 1996. [8] C. Yang and M. Ciesielski, “Bds: a bdd-based system,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 7, JULY 2002.

80

Texas Tech University, Ashish Joshi, May 2016

[9] T. Nikoubin, M. Grailoo, and C. Li, “Cell design methodology (cdm) for balanced carry-inversecarry circuits in hybrid-cmos logic style,” International Journal of Electronics, vol. 101, no. 10, pp. 1357–1374,2014. [10] http://web.eecs.umich.edu/~jhayes/iscas.restore/c6288.html [11] Uppalapati, Siri, Michael L. Bushnell, and Vishwani D. Agrawal. "Glitch-free design of low power ASICS using customized resistive feedthrough cells." Proc. of the 9th VLSI Design and Test Symposium. 2005. [12] http://venividiwiki.ee.virginia.edu/mediawiki/index.php/Main_Page [13] Farkhani, Hooman, et al. "Comparative study of FinFETs versus 22nm bulk CMOS technologies: SRAM design perspective." System-on-Chip Conference (SOCC), 2014 27th IEEE International. IEEE, 2014. [14] http://ptm.asu.edu/ [15] Brunvand, E. “Digital VLSI Chip Design with Cadence and Synopsys CAD Tool,” Addison-Wesley, 2010. [16] Synopsys, Design Compiler User Guide, Product Version 13.3, April 2013. [17] Liberty User Guides and Reference Manual Suite, Version 2013.03 [18] Synopsys Inc., "Liberty™ ncx user guide," F-2011.06 ed., 2011. [19] https://www.coursera.org/course/vlsicad [20] Standard Cell Library design, Lecture Notes Advanced VLSI Design,CMPE- 641,UMBC [21] http://www.ecs.umass.edu/ece/labs/vlsicad/bds/bds.html [22] https://embedded.eecs.berkeley.edu/pubs/downloads/sis/ [23] J. Rabaey, Low Power Design Essentials (Integrated Circuits and Systems), 2009. [24] Sung-Mo Kang and Yusuf Leblebici, CMOS Digital Integrated Circuits (Analysis and Design), 2nd Edition. [25] T.Nikoubin, N.Navi, and O.Kavei, “A new method in reorganization of the timing behavior of symmetric XOR/XNOR circuits”. CSI J. Computer Science and Engineering 5, 276 (2007).

81

Texas Tech University, Ashish Joshi, May 2016

[26] K. Yano et al., “A 3.8ns CMOS 16×16-b multiplier using complementary pass- transistor logic”. IEEE J. Solid-State Circuits 25, 388 (1990). [27] S. Rapolu and T. Nikoubin, "Fast and energy efficient FinFET full adders with Cell Design Methodology (CDM)," 2015 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Denton, TX, 2015, pp. 1-5. [28] http://sportlab.usc.edu/

[29] Ashish Joshi, Sri Rathan Rangisetti, Tooraj Nikoubin." Fast and Energy efficient binary to BCD converter with Complement based logic design, "IEEE Trans. on Circuits and Systems II,EXPRESS BRIEFS(Submitted). [30] Sri Rathan Rangisetti, Ashish Joshi, Tooraj Nikoubin, “Area-Efficient and Power-Efficient Binary to BCD Converters”, IEEE, Sixth International Conference on Computing, Communications and Networking Technologies 6th ICCCNT– 35239, Denton, U.S.A, July 13 - 15, 2015. [31] Osama Al-Khaleel, Zakaria Al-Qudah and Mohammad Al-Khaleel, “Fast and compact binary-to-BCD conversion circuits for decimal multiplication,” IEEE 29th International Conf. on Computer Design, pp. 226 – 231, Oct. 2011. [32] Tso-Bing Juang,Yu-Ming Chiu."Fast Binary to BCD Converters for Decimal Communications Using New Recoding Circuits". IEEE International Symposium on Integrated Circuits (ISIC), pp.188 – 191, 2014. [33] Arvind Kumar Mehta, Mukesh Gupta, Vipin Jain, Sudhir kumar." High Performance Vedic BCD Multiplier and Modified Binary to BCD Converter". IEEE Annual India Conference (INDICON), pp. 1 – 6 2013. [34] J. Bhattacharya, A. Gupta, and A. Singh. “A high performance binary to BCD converter for decimal multiplication”. IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT), pp. 315 – 318, 2010. [35] G. Jaberipur and A. Kaivani, “Improving the Speed of Parallel Decimal Multiplication” IEEE Transactions on Computers, vol. 58, issue 11, pp. 1539 - 1552. 2009.

82

Texas Tech University, Ashish Joshi, May 2016

[36] Tso-Bing Juang and Yu-Ming Chiu. "High-speed binary to binary-coded- decimal converters for decimal multiplications". IEEE International, SoC Design Conference (ISOCC), pp. 370 – 371, 2013. [37] 45nm Standard Cell Library. [online]. Available: http://www.eda.ncsu.edu/wiki/FreePDK45:Contents

83