Test and Testability of Asynchronous Circuits

Nastaran Nemati

A thesis submitted for the degree of Doctor of Philosophy

University of New South Wales

June 2017

To Mum and Dad

ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowl- edge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, ex- cept where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed ......

Date ......

COPYRIGHT STATEMENT

‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.’

Signed ......

Date ......

Acknowledgments

Firstly, I would like to express my gratitude to my supervisor Prof. Mark C. Reed for the continuous support of my PhD study and related research. His steady guidance and motivation helped me greatly at all stages of my work, from early research through to thesis-writing. Besides my supervisor, I would like to give my heartfelt thanks to Prof. Paul Beckett, Mr. Karl Fant and Prof. Sri Parameswaran for their insightful comments and encour- agement, but also for asking me the hard questions which encouraged me to broaden my research. I must thank Ms. Dominique Kazan for her patience, support and advice during my project, Ms. Denise Russell for generously donating her time before my submission to proofreading my thesis, and Ms. Maya Gunawardena, from the Academic Language and Learning Unit of UNSW Canberra, who helped me improve my writing and pre- sentation skills at several stages throughout the project. I would also like to thank my wonderful friends, Mona, Sanaz, Maryam, Negin, Nadia, Mansoureh, Elham and Saba; I’ve known some of you for most of my life and knowing you are always there helps me in difficult times. I am most grateful to my partner John for supporting me during my PhD studies in every possible way, from reviewing my paper while he himself was on a conference trip to spending so many weekends looking after me while I was writing up my thesis. His support, encouragement, patience and unwavering love were unquestionably the bedrock upon which the past couple of years of my life have been built. I would like to thank my parents and my brothers for supporting me throughout writing this thesis and in my life in general. I thank my parents, for their faith in me and for giving me the freedom to be as ambitious as I wished. I thank them for being loving, understanding and patient when I couldn’t look after them as much as I would have liked. I thank my brother and sister-in-law Nima and Homa for being there for me, every single time. I am grateful to my other brother Mani and his wife Arsha, for being such lovely and lively people in my life. I also thank John’s parents, Ian and Jane, and his sister and her partner Katherine and Betsie, who provided me with unending encouragement and support. Through all of this love, patience and support I’ve been able to complete this long dissertation journey. Thank you all. I couldn’t have done it without you.

ix

Abstract

The ever-increasing transistor shrinkage and higher clock frequencies are causing seri- ous clock distribution, power management, and reliability issues. Asynchronous design is predicted to have a significant role in tackling these challenges because of its dis- tributed control mechanism and on-demand, rather than continuous, switching activity. Null Convention (NCL) is a robust and low-power asynchronous paradigm that introduces new challenges to test and testability algorithms because 1) the lack of deterministic timing in NCL complicates the management of test timing, 2) all NCL gates are -holding and even simple combinational circuits show sequential be- haviour, and 3) stuck-at faults on gate internal feedback (GIF) of NCL gates do not always cause an incorrect output and therefore are undetectable by automatic test pat- tern generation (ATPG) algorithms. Existing test methods for NCL use clocked hardware to control the timing of test. Such test hardware could introduce metastability issues into otherwise highly robust NCL devices. Also, existing test techniques for NCL handle the high-statefulness of NCL circuits by excessive incorporation of test hardware which imposes additional area, propagation delay and power consumption. This work, first, proposes a clockless self-timed ATPG that detects all faults on the gate inputs and a share of the GIF faults with no added design for test (DFT). Then, the efficacy of quiescent current (IDDQ) test for detecting GIF faults undetectable by a DFT- less ATPG is investigated. Finally, asynchronous test hardware, including test points, a scan cell, and an interleaved scan architecture, is proposed for NCL-based circuits. To the extent of our knowledge, this is the first work that develops clockless, self-timed test techniques for NCL while minimising the need for DFT, and also the first work conducted on IDDQ test of NCL. The proposed methods are applied to multiple NCL circuits with up to 2,633 NCL gates (10,000 CMOS Boolean gates), in 180 and 45 nm technologies and show average fault coverage of 88.98% for ATPG alone, 98.52% including IDDQ test, and 99.28% when incorporating test hardware. Given that this fault coverage includes detection of GIF faults, our work has 13% higher fault coverage than previous work. Also, because our proposed clockless test hardware eliminates the need for double-latching, it reduces the average area and delay overhead of previous studies by 32% and 50%, respectively.

xi xii Contents

Acknowledgments ix

Abstract xi

List of Figures xvii

List of Tables xxi

Peer-Reviewed Publications based on this Thesis xxiii

Acronyms xxvii

1 Introduction 1 1.1 Roadmap of semiconductor technology ...... 1 1.2 Past, present and future of asynchronous designs ...... 5 1.2.1 Processors ...... 6 1.2.2 System-on-Chips (SoCs) ...... 9 1.3 Pros and cons of asynchronous design ...... 10 1.4 Key challenges involved in asynchronous design ...... 11 1.4.1 CAD tools ...... 11 1.4.2 Test and testability ...... 12 1.5 Description of Null Convention Logic (NCL) ...... 12 1.5.1 Challenges of testing Null Convention Logic (NCL) ...... 14 1.6 Outline of thesis ...... 15 1.6.1 Objectives ...... 15 1.6.2 Implementation and evaluation methods ...... 15 1.6.3 Contributions of this research ...... 16 1.7 Previous work ...... 17 1.7.1 Test and testability of asynchronous circuits ...... 17 1.7.2 Test and testability of delay-insensitive circuits ...... 18 1.7.3 Test and testability of Null Convention Logic (NCL) ...... 18

1.7.3.1 Previous work on IDDQ testing of NCL circuits ...... 20 1.8 Structure of thesis ...... 22 1.9 Summary ...... 22

xiii xiv Contents

2 Background 25 2.1 Null Convention Logic (NCL) ...... 25 2.1.1 Structure of NCL gates ...... 26 2.1.1.1 Static NCL gates ...... 27 2.1.1.2 Semi-static NCL gates ...... 28 2.1.2 NCL gate library ...... 28 2.1.3 Timing and control in NCL ...... 28 2.1.3.1 NCL is hazard-free and glitch-less ...... 30 2.2 Test and testability ...... 31 2.2.1 Basics of test ...... 32 2.2.1.1 Fault, error, failure ...... 32 2.2.1.2 Cost of test - rule of 10 ...... 33 2.2.1.3 Fault model - single stuck-at fault ...... 35 2.2.1.4 Test technique evaluation ...... 36 2.2.2 Automatic test pattern generation (ATPG) ...... 37 2.2.2.1 Combinational ATPG ...... 37 2.2.2.2 Sequential ATPG ...... 38 2.2.3 Design For Test (DFT) ...... 39 2.2.3.1 Test point insertion ...... 39 2.2.3.2 Scan insertion ...... 40 2.2.3.3 Built-in Self-test (BIST) ...... 42

2.2.4 IDDQ test ...... 45 2.3 Summary ...... 47

3 Automatic Test Pattern Generation for NCL 49 3.1 The number of required test vectors per fault ...... 50 3.1.1 Fault activation in NCL using one Null or one Data ...... 50 3.1.2 Fault propagation in NCL using one Null or one Data ...... 50 3.2 ATPG timing ...... 55 3.2.1 ATPG timing using a clock ...... 55 3.2.2 From the DUT ...... 55 3.2.3 From a golden model ...... 55 3.2.4 From both the DUT and a golden model ...... 55 3.3 Effectiveness of mixed timing in detecting faults ...... 57 3.4 Cases where mixed timing does not work for NCL ...... 59 3.4.1 Example of faults causing delay on primary outputs ...... 59 3.4.2 Example of faults causing no voltage defect ...... 60 3.5 Deterministic test generation - N-PODEM ...... 62 3.5.1 Basics of PODEM for Boolean logic ...... 63 3.5.1.1 An example of PODEM ...... 65 Contents xv

3.5.2 Adaptation of PODEM for NCL (N-PODEM) ...... 67 3.5.2.1 An example of N-PODEM ...... 70 3.6 Experimental results ...... 70 3.7 Summary ...... 74

4 IDDQ Test for Null Convention Logic 75 4.1 Behaviours of static NCL gates with faulty GIF lines ...... 75 ∗ 4.1.1 Stuck-at0 fault on Null feedback line (Zn-at0) ...... 80 ∗ 4.1.2 Stuck-at1 fault on Data feedback line (Zd -at1) ...... 81 ∗ 4.1.3 Stuck-at0 fault on Data feedback line (Zd -at0) ...... 81 ∗ 4.1.4 Stuck-at1 fault on Null feedback line (Zn-at1) ...... 82 4.2 Behaviours of semi-static NCL gates with faulty GIF lines ...... 83 4.2.1 Stuck-at0 on GIF line (GIF-at0) ...... 84 4.2.2 Stuck-at1 on GIF line (GIF-at1) ...... 85 4.3 Experimental results ...... 87 4.4 Summary ...... 88

5 Design for Test for Null Convention Logic 93 5.1 Test point insertion for NCL circuits ...... 93 5.1.1 Controllability point for NCL circuits ...... 94 5.1.2 Observability point for NCL circuits ...... 95 5.2 Scan insertion for NCL circuits ...... 97 5.2.1 NCL scan cell for NCL circuits ...... 98 5.2.2 Asynchronous interleaved scan architecture for NCL circuits (AISA) 99 5.2.3 Timing and control of scan chain for NCL circuits ...... 101 5.3 NCL Built-in Self-test (BIST) ...... 102 5.3.1 Control structure (BIST controller) ...... 105 5.4 Experimental results ...... 106 5.5 Summary ...... 110

6 Conclusions and Recommendations for Future Work 113 6.1 Future work ...... 115 6.1.1 Techniques for improving ATPG ...... 115

6.1.2 Techniques for improving IDDQ testing ...... 116 6.1.3 Techniques for improving DFT ...... 117

Bibliography 119

A HDL/PLI Environment 129 A.1 PLI functions ...... 130 A.1.1 Fault injection and removal ...... 130 A.1.2 Fault collapsing ...... 130 xvi Contents

B Effect of Sizing on NCL Gate Delays 131

C Probability-based Controllability for NCL Circuits 135 List of Figures

1.1 Moore’s Law regarding number of transistors and processing power (Na- ture News) [1] ...... 2 1.2 Increasing cost per transistor after 28 nm node (source: The Linley Group, 2013 [2]) ...... 3 1.3 Energy and peak currents for (a) clocked and (b) clockless ARM cores [3]8 1.4 Electromagnetic emissions for (a) clocked and (b) clockless ARM cores [3]8 1.5 Photon emission images of synchronous (left) and asynchronous (right) 80C51 microcontroller executing same program (red dots indicate levels and distributions of power dissipation which are apparently lower and more localised in the asynchronous 80C51 chip [4] ...... 9 1.6 Photomicrographs showing: (a) low-noise characteristics of NCL; and (b) measured substrate-induced noise of synchronous and NCL circuits [5] . 13 1.7 Impact of NCL on application’s energy requirements [5] ...... 14

2.1 NCL threshold m-of-n gate - THmn (k = l = 1) ...... 26 2.2 Hysteresis behaviour of TH22 gate ...... 27 2.3 Static transistor level-implementation of NCL gates ...... 27 2.4 Semi-static transistor-level implementation of NCL gates ...... 28 2.5 NCL registration stage (bold) and completion detection (grey) ...... 30 2.6 Multi-level model of system reliability [6] ...... 32 2.7 Examples of possible defects in electronic circuits (a) top-down and (b) cross-section SEM images of open defect [7], (c) watermark in water im- mersion lithography (features under watermark are not printed because of defect [8], (d) residues of processed film or dissolved contaminant from tool construction material causing short between multiple lines [8], (e) in- complete connection between copper trace and PTH barrel [9], (f) shorts and breaks in metal lines caused by scratch in photoresist [10], and (g) inter-layer short between two Al interconnects [11] ...... 34 2.8 (a) NOR gate symbol (b) transistor-level Nor - ‘A’ stuck-at0 (c) truth table

of the NOR gate without fault (Outg) and with fault (Out f )...... 35 2.9 Test cost pyramid [12] ...... 35 2.10 Simplest implementation of a controllability point ...... 39 2.11 Implementation of a test point using a flip-flop allows cascading the CPs 40 2.12 Simplest implementation of observability point ...... 40

xvii xviii LIST OF FIGURES

2.13 (a) sequential circuit with functional loop and (b) scan chain breaking functional loop ...... 41 2.14 Scan cell ...... 41 2.15 Most common scan architectures: (a) full; (b) partial; and (c) multiple . . 42 2.16 Configurable LFSR structure ...... 43 2.17 Configurable MISR ...... 44 2.18 BEST architecture ...... 44 2.19 RTS architecture ...... 45

2.20 IDDQ test for generalised CMOS IC [13] ...... 46

3.1 Part of NCL 4-bit up/down counter calculating the third bit of count [14] 53 3.2 Test control for the test equipment program ...... 56 3.3 Flow of test control procedure ...... 58 3.4 Example of early detection of Null ...... 60 ∗ 3.5 Behaviour of static NCL gates with Zn-at0 ...... 62 3.6 PODEM flowchart [15] ...... 64 3.7 Example of PODEM ...... 65 3.8 Example of PODEM binary decision tree ...... 66 3.9 Example of N-PODEM ...... 70

4.1 Static transistor-level implementation of NCL gates ...... 76 ∗ 4.2 Zn-at1 fault models ...... 77 4.3 Monotonic transitions of NCL pipelines between Null and Data, in four phases ...... 77 ∗ 4.4 IDDQ test for static NCL gates- Zn-at0 fault ...... 80 ∗ 4.5 IDDQ test for static NCL gates- Zd -at1 fault ...... 81 ∗ ∗ 4.6 IDDQ graphs of golden, and Zn-at0 and Zd -at1 faults ...... 82 ∗ 4.7 IDDQ test for static NCL gates- Zd -at0 fault ...... 82 ∗ 4.8 IDDQ test for static NCL gates- Zn-at1 fault ...... 83 ∗ ∗ 4.9 IDDQ graphs of golden, and Zd -at0 and Zn-at1 faults ...... 83 4.10 Semi-static transistor-level implementation of NCL gates ...... 84 4.11 Models of GIF-at0 and GIF-at1 faults ...... 84

4.12 IDDQ test for semi-static NCL gates- GIF-at0 fault ...... 85

4.13 IDDQ test for semi-static NCL gates- GIF-at1 fault ...... 86

4.14 IDDQ graphs of golden, and GIF-at0 and GIF-at1 fault models ...... 86

5.1 NCL read/write memory [16] ...... 94 5.2 NCL controllability point ...... 95 5.3 Waveform of CP insertion (three cascaded CPs) ...... 96 5.4 NCL observability point ...... 96 5.5 Timing diagram of OP insertion (three cascaded OPs) ...... 97 LIST OF FIGURES xix

5.6 Functional loop in NCL design ...... 97 5.7 NCL scan cells ...... 98 5.8 Monotonic changes of Null and Data in NCL pipeline stages ...... 99 5.9 Using traditional scan architecture for NCL - all registers have Data . . . 99 5.10 Using traditional scan architecture for NCL - all registers have Null . . . 100 5.11 Interleaved scan structure for NCL ...... 100 5.12 AISA for NCL BIST: (a) RTS; and (b) STUMPS ...... 101 5.13 Data application and output collection in NCL AISA ...... 103 5.14 Combinational part of asynchronous LFSR ...... 104 5.15 Asynchronous LFSR ...... 104

6.1 Upper bounds of sizes of NCL circuits testable by traditional IDDQ test . 116

A.1 Building blocks of HDL/PLI test environment [17] ...... 129

B.1 Effect of changes in Wp on TH22 gate delay ...... 131

B.2 Effect of changes in Wph on TH22 gate delay ...... 132

B.3 Effect of changes in Wp/Wph on TH22 gate delay ...... 133 xx LIST OF FIGURES List of Tables

1.1 Percentages of asynchronous circuits in whole fabricated chip area from 2007 to 2026 (data from ITRS report 2007 [18] and ITRS report 2011 [19] merged) ...... 7 1.2 Comparison of NCL and two synchronous microcontrollers at same effec- tive frequency [5] ...... 13 1.3 Comparison of previous work and proposed methods ...... 21

2.1 NCL gate library ...... 29

3.1 Values for fault propagation on NCL gate inputs ...... 54 3.2 Early detection of Null for GIF stuck-at0 fault ...... 61

3.3 Example of PODEM steps for l9 stuck-at0 fault ...... 68 3.4 Karnaugh map of Boolean AND gate ...... 69 3.5 Karnaugh map of TH22 gate including X value ...... 69 3.6 Karnaugh map of TH22 gate in N-PODEM ...... 69

3.7 N-PODEM example steps - l9 stuck-at1 ...... 71

3.8 N-PODEM example steps - l9 stuck-at0 ...... 71 3.9 Experimental results obtained from proposed random ATPG for static NCL 73 3.10 Experimental results obtained from proposed random ATPG for semi- static NCL ...... 73

4.1 Behaviour of generic fault-free static NCL gate ...... 78 ∗ 4.2 Behaviour of generic faulty static NCL gate - Zn-at0 fault ...... 78 ∗ 4.3 Behaviour of generic faulty static NCL gate - Zn-at1 fault ...... 78 ∗ 4.4 Behaviour of generic faulty static NCL gate - Zd -at0 fault ...... 79 ∗ 4.5 Behaviour of generic faulty static NCL gate - Zd -at1 fault ...... 79

4.6 Voltage and IDDQ tests of static and semi-static NCL gates - VDD = 1.8 V, tech = 1.8 µm, Lmin = 1.8 µm...... 89

4.7 Voltage and IDDQ tests of static and semi-static NCL gates - VDD = 1.1 V, tech= 45 nm, Lmin = 45 nm ...... 90

4.8 Experimental results obtained from traditional ATPG + IDDQ tests for NCL circuits ...... 91

5.1 Control and data flow of NCL AISA ...... 102

xxi xxii LIST OF TABLES

5.2 Delay and area of NCL storage elements and the proposed asynchronous DFT elements ...... 107 5.3 Fault coverage and overhead (compared to the fault cov- erage and transistor count of the original DUT) for the proposed scan architecture ...... 109 Peer-Reviewed IEEE Publications based on this Thesis

1. N. Nemati, P. Beckett, M. C. Reed, K. Fant, “Clock-less DFT-less Test Strategy for Null Convention Logic,” IEEE Transactions on Emerging Topics in Computing (TETC), Accepted Jun. 2016 - Issue 99, DOI: 10.1109/TETC.2016.2593628 – based on Chapter 3 and 4.

2. N. Nemati, M. C. Reed, S. Parameswaran, K. Fant, “Self-Timed Automatic Test Pattern Generation for Null Convention Logic,” in IEEE Int. Midwest Sympo- sium on Circuits and Systems (MWSCAS), Abu Dhabi, UAE, Oct. 2016, DOI: 10.1109/MWSCAS.2016.7870032 – based on Chapter 3.

3. N. Nemati, M. C. Reed, K. Fant, P. Beckett, “Asynchronous Interleaved Scan Ar- chitecture for On-line Built-in Self-test of Null Convention Logic,” in Proc. IEEE Int. Symp. on Circuits and Systems (ISCAS), Montreal Canada, May. 2016, pp. 746 - 749, DOI: 10.1109/ISCAS.2016.7527348 – based on Chapter 5.

4. N. Nemati, M. C. Reed, M. R. Frater, “Asynchronous Test Hardware for NULL Convention Logic,” in Proc. IEEE Int. Symp. on Circuits and Systems (IS- CAS), Melbourne, Australia, Jun. 2014, pp. 1744 - 1747, DOI: 10.1109/IS- CAS.2014.6865492 – based on Chapter 5.

xxiii xxiv LIST OF TABLES Acronyms

• ADC: Analogue to Digital Converter

• AISA: Asynchronous Interleaved Scan Architecture

• ATE: Automatic Test Equipment

• ATPG: Automatic Test Pattern Generation

• BCI: Brain- Interfacing

• BDT: Binary Decision Tree

• BICS: Built-in Current-Sensors

• BIST: Built-in Self-test

• CDC: Completion Detection Circuitry

• CEI: Cambridge Electronics Inc.

• CMOS: Complementary Metal Oxide Semiconductor

• CP: Control Point

• CPU:

• CUT: Circuit Under Test

• DFT: Design for Test

• DI: Delay Insensitive

• DOE: Department of Energy

• DSP:

• DTG: Deterministic Test Generation

• DUT: Design Under Test

• EDA: Electronic Design Automation

• EMI: Electromagnetic Interference

• FAN: FANout oriented test pattern generation

xxv xxvi LIST OF TABLES

• FC: Fault Coverage

• FI: Fault Injection

• FL: Functional Loop

• FPGA: Field-Programmable Gate Array

• FR: Fault Removal

• GALS: Globally Asynchronous Locally Synchronous

• GIF: Gate Internal Feedback

• GT: Golden Timing

• HDL: Hardware Description Language

• IC:

• IDDQ: Quiescent Current

• ITRS: International Technology Roadmap for Semiconductor

• LP PTM: Low-Power Predictive Technology Model

• LFSR: Linear Feedback Shift Registers

• MISR: Multiple Input Signature Registers

• MT: Mixed Timing

• NCL: Null Convention Logic

• NMOS: Negative Metal Oxide Semiconductor

• ORA: Output Response Analyser

• OP: Observation Point

• PCB: Printed Circuit Boards

• PI: Primary Input

• PMOS: Positive Metal Oxide Semiconductor

• PO: Primary Output

• PLI: Procedural Language Interface

• PODEM: Path Oriented DEcision Making test pattern generation

• PVT: Process, Voltage, Temperature LIST OF TABLES xxvii

• RTL: Register Transfer Level

• RTS: Random Test Socket

• SCOAP: Sandia Controllability/Observability Analysis Program

• SoC: System on Chip

• SOCRATES: Structure-Oriented Cost-Reducing Automatic TESt pattern genera- tion system

• SNDR: Signal to Noise and Distortion Ratio

• TOPS: TOPological Search algorithm

• TP: Test Point

• TPI: Test Point Insertion

• TRAN: TRANsitive closure algorithm

• TV: Test Vector

• QDI: Quasi Delay Insensitive xxviii LIST OF TABLES Chapter 1

Introduction

This chapter provides an introduction to this thesis, beginning with the motivations for conducting research on the test and testability of Null Convention Logic (NCL). Then, the objectives and methodology are explored, a literature review presented and a summary of the structure of this thesis provided. Section 1.1 considers the trends in semiconductor technology and how significantly different approaches are required to keep up with the constant demand for generating faster, smaller and cheaper circuits. Then, Section 1.2 introduces one of these possible approaches, asynchronous design, and explores its achievements and predictions for its place in the future of the semiconductor industry. Section 1.3 discusses the advan- tages and disadvantages of asynchronous circuits, and Section 1.4 highlights the main challenges of this design method. Then, in Section 1.5, the focus of this study on one of these main challenges, the test and testability of asynchronous circuits, is described. Section 1.6 explains the research objectives and methods chosen for the implementation and evaluation of the ideas in this study. Section 1.7 presents a comprehensive litera- ture review that investigates the test and testability of asynchronous circuits, focusing on a specific delay-insensitive one called NCL. Section 1.8 provides a brief overview of the structure and organisation of this thesis, and, finally Section 1.9 summarises this chapter.

1.1 Roadmap of semiconductor technology

In 1965, Gordon Moore, a later co-founder of Intel™, perceived a regularly incremental trend in the number of transistors in integrated circuits (ICs). Based on this observation, he determined that the number of components per chip double every 12 months and predicted that this trend would continue. Since then, his remark regarding this trend has been known as “Moore’s Law”. However, as in 1975 experimental data revealed problems with the original Moore’s Law, Gordon Moore updated it to indicate that the doubling of the number of components would happen every 24 months rather than the initially stated 12. Since then, Moore’s Law has been considered more a guideline than prediction for the semiconductor industry. Hundreds of researchers and manufacturers have contin-

1 2 Introduction

ually been working to remain in step with the exponential demands of Moore’s Law which has resulted in a design approach called “More Moore” [20]. However, recently, significant manufacturers, including Intel itself, have been unable to follow their regular two-year cycle for the chip manufacturing process which was taken for granted over the past few decades. Currently, it appears that shrinking transistors does not automatically result in higher performances, with ICs becoming too hot for sizes 90 nm and smaller. This significant increase in heat is due to the faster movement of electrons in smaller cir- cuits and the greater Silicon density in their geometries. Consequently, in recent years, as microprocessors became incapable of tolerating the heat caused by higher clock fre- quencies, at some point, designers halted their ongoing attempts to increase a circuit’s clock rates. As illustrated in Fig. 1.1, after steady growth since the early 1970s, clock frequencies plateaued at around 4 GHz in 2004 [1].

Fig. 1.1. Moore’s Law regarding number of transistors and processing power (Nature News) [1]

To achieve better performances from the same silicon area, circuit designers have adopted approaches such as using pipelining and/or multi-core processors. However, the efficiency of pipelining is compromised by the large wire delays not shrinking along with the transistor delays down the technology curve. Furthermore, the effectiveness of multi-core processors is application-dependent, and this technique can only improve the performances of the parts of some applications that are suitable for parallel processing. Thus, the majority of applications cannot use all the performance-enhancing capabilities of multi-core processors. Another indication of a major turning point in semiconductor technology is cost per §1.1 Roadmap of semiconductor technology 3

component. As seen in Fig. 1.2, since the development of the 28 nm technology node in 2012, the cost per transistor on an IC node has not decreased. It is the first time in the past fifty years that this change in the number of transistors that can be bought per dollar has occurred. This shift indicates a new significant challenge for the electronics industry.

Fig. 1.2. Increasing cost per transistor after 28 nm node (source: The Linley Group, 2013 [2])

Furthermore, the leakage of power is rapidly increasing and reaching a hard-to- manage limit. This increase has encouraged engineers to pursue new structures and materials for transistors as well as new manufacturing processes. Although somewhat promising, these new techniques could impose high costs on the already expensive process of down-sizing. Clock networks consume an ever-increasing part of a power budget and silicon area. Methods for managing this budget, such as using innovative clock trees and clock gat- ing alone, do not change the fact that a high-quality , routing through an IC, exhausts a large share of the power budget of the circuit. Logic glitch power is another dominant source of dynamic power in Boolean logic. In clocked circuits, er- roneous outputs and glitches are masked by sampling the data signal only after it has stabilised at the tick of the clock. However, this clock-driven sampling cannot stop the energy wastage resulting from the chaotic switching of transistors. Studies show that this chaotic glitching in circuits is responsible for 20 to 70% of their total power con- sumption [21]. All the above-mentioned facts indicate a turning point in the trend of transistor 4 Introduction

down-sizing. However, the relentless influence of Gordon Moore’s Law still sets expec- tations for the future of the semiconductor industry. Today, this increases the pressure on researchers and designers to investigate all cutting-edge technologies for designs of electronic circuits in the hope that a technology shift might return the industry to the two-year cadence of Moore’s Law. These investigated technologies include a vast range, from alternative materials to exotic transistor structures, and from studying quantum effects to examining superconducting properties. For example, several research institutions, including Intel, the Department of En- ergy (DOE) and an MIT spin-out, Cambridge Electronics Inc. (CEI), have recently con- sidered using a layer of gallium nitride on top of silicon [22]. These studies propose that GaN transistors have significantly lower resistances than their silicon counterparts which result in higher performances and power savings of their electronic components. Other alternative materials include 2D graphene-like compounds and spintronic mate- rials which, instead of moving electrons, use flipping electron spins to perform compu- tations [1]. Another example of an emerging technology is the considerable investment of com- panies such as Qualcomm in monolithic 3D die structures. In this unusual structure, multiple layers of electronic components are built on top of each other on a single piece of the silicon die. Based on the empirical results from Qualcomm, a monolithic 3D IC can lead to a 40% better performance, 30% lower power consumption and 5 to 10 per- cent lower cost for the same technology node [23]. However, currently, this 3D technique is fundamentally more suitable for building memory chips which, as they are usually accessed sparingly over time, do not, by nature, have a heat problem or excessive power consumption. A team at Stanford University has managed to stack memory and pro- cessing units on top of each other to make a potentially 1000 times faster architecture. However, this design is yet to be tested outside a laboratory because a solution that works well for cooling a 3D stacked CPU-memory unit is not yet available [24]. Two other potential solutions are quantum-computing [25], which demonstrates huge increases in speed for some classes of applications, and neuromorphic comput- ing [26] which attempts to make electronic circuits based on the processing elements of the brain. However, none of these solutions has yet been used in a commercial product [20]. One of the many design alternatives at this critical point in the history of semi- conductors is a fundamentally different design method that would not be significantly affected by down-sizing. Asynchronous design is an example of such a method which has the potential to modify the fundamentals of electronics design. In a synchronous electronic circuit, one or more global clock signals are responsible for managing the cor- rectness of communications among its components. On the contrary, the components of asynchronous circuits eliminate clock signals from their communication protocols and, instead, use handshaking protocols to communicate with other components. §1.2 Past, present and future of asynchronous designs 5

The centralised clocked high-activity regions in synchronous circuits are replaced by distributed active-on-demand computational areas in asynchronous circuits. As a result, asynchronous circuits have fewer hotspots and lower current peaks [3] and their reliability is significantly higher. Moreover, the unpredictability of the timing and trend of data communications makes it more difficult for hackers to access data and steal information. Consequently, asynchronous circuits are intrinsically suitable for high- security applications (e.g., smart cards [27]). Delay-insensitive asynchronous design methods, such as NCL, are glitch-less which means that they inherently eliminate the unnecessary toggling of signals that accounts for 20 to 70% of total power consumption [21]. This power saving is in addition to the dynamic power reduction achieved by the elimination of a continuously operating clock signal. Running a clean low-skew clock tree through a chip is very challenging and expen- sive and, despite all the care taken, there is always a non-zero probability of having clock races and skews. Therefore, designers need to always leave some room for the setup and hold times for state-holding elements which, in some cases, account for more than 25% of the actual duty cycle of the clock. Also, the maximum clock signal is always deter- mined based on the worst-case delay path in the circuit, due to which all the other paths slow down whereas, an asynchronous circuit’s performance is based on the average-case delay, with no delay penalty required for the setup and hold times. Therefore, by nature, asynchronous design has the potential to achieve better performances. Undoubtedly, these benefits of asynchronous design come at a cost; for example, asynchronous circuits usually occupy larger areas than their synchronous counterparts and their physical testing is more challenging. We provide more details of the trade-offs involved in asynchronous design in Sections 1.3 and 1.4. However, firstly, we inves- tigate the history of this circuit design method and discuss the reasons for it being a good candidate for the technology shift in semiconductors required at this point in their roadmap.

1.2 Past, present and future of asynchronous designs

It is predicted in recent International Technology Roadmap for Semiconductor (ITRS) reports [28], [18], [19] that there will possibly be a change from synchronous to asyn- chronous design style in the near future, with the increasing percentages of asyn- chronous designs in the electronics world illustrated in Table 1.1. These reports show that, in 2007 and 2011, 7% and 19% of whole fabricated chip areas, respectively, were in the form of asynchronous circuits, a trend estimated by the ITRS to be steadily incre- mental and reaching 54% by 2026 [18], [19] (ITRS2011 tables). Also, the 2011 ITRS report expects that the main approach to low-power design in 2025 will be an asynchronous one [29]. Moreover, the critical need for investment in developing asynchronous elec- tronic design automation (EDA) tools and asynchronous test methodologies has been 6 Introduction

mentioned in ITRS reports over the past few years [30]. Currently, asynchronous design is practised in almost all categories of electronics systems. Biomedical applications are clear examples of signal-processing algorithms concerned with energy management, the use of a clock and the risk of signal corrup- tion due to electromagnetic interference (EMI) [31], [32]. Asynchronous data acquisition methods have been employed in many studies to address these issues in biomedical sys- tems; for example, in the decomposition of heart signals [32], the brain-computer inter- facing (BCI) used in brain monitoring and treatment [31], [33] and the control of human robots [34]. Clockless implementations of channel access methods are also repeatedly addressed in the literature [35], [36], [37], [38]. Audio signal processing systems, such as audio sample-rate converters, have also benefited from the delay-insensitivity charac- teristic of clockless design [39], [40]. In addition, some research has been conducted on improving the components of signal processing systems; for instance, in [41], an asyn- chronous design helped to significantly decrease the signal to noise and distortion ratio (SNDR) in a clockless buffer-free Analogue to Digital Converter (ADC) which, in turn, decreased the energy and area used. Finally, because of its non-deterministic behaviour and timing, as asynchronous design has unpredictable characteristics, it has recently been shown to work well in the areas of data encryption and security [42], [27].

1.2.1 Processors

The achievements regarding asynchronous hardware are not limited to signal- processing and specific-purpose devices. Asynchronous design was first applied to create clockless processors [43], [44] and improved their performance, reliability and power efficiency. However, the lack of proper EDA tools and sufficient expert designers as well as the hard-to-test nature of the resulting circuits prevented these processors from replacing commercial clocked central processing units (CPUs) and digital signal processors (DSPs) [45]. Since the early 1990s, several asynchronous field programmable gate arrays (FPGAs) have been designed [46], [47], [48], [49] for facilitating the formation of asynchronous design methods and realisation of proposed clockless devices. Achronix™ is an asyn- chronous FPGA company that emerged from Cornell’s VLSI research group under the supervision of Dr.Rajit Manohar in 2004 [50]. The company’s 65 nm high-speed FPGA reaches 1.5 GHz, approximately three times faster than the FPGA devices made by the two biggest FPGA companies Altera™ and Xilinx™. Achronix™ recently fabricated its latest FPGA in 22 nm in Intel’s fab using the tri-gate process [51]. It is the first model of its kind with embedded hard IP cores for high-performance wireless applications [52]. The same trend is evident for asynchronous digital processors [44]. Recently, Philips™ Handshake Solutions fabricated an asynchronous ARM™ with a 40% lower power consumption than its identical synchronous device [3]. The effect of asyn- §1.2 Past, present and future of asynchronous designs 7 25% 54% 2015 2026 23% 52% 2014 2025 22% 49% 2013 2024 20% 47% 2012 2012 19% 45% 2011 2011 17% 43% 2010 2010 15% 40% 2009 2020 11% 35% 2008 2019 7% 30% 2007 2016-18 Manufacturable solutions exist and areManufacturable being solution optimised are known Interim solutions are known Manufacturable solutions are NOT known Year of Production Asynch Global Signaling:design driven % by handshake ofing clock- a Year of Production Asynch Global Signaling:design driven % by handshake ofing clock- a Table 1.1: Percentages of asynchronous(data circuits from in ITRS whole report fabricated 2007 chip [18] area and from ITRS 2007 report to 2011 2026 [19] merged) 8 Introduction

chronous design on the reductions in power and EMI in an ARM processor is clearly shown in Fig. 1.3 and 1.4, respectively. As can be seen, the current peaks are consid- erably lower in the asynchronous ARM9 processor which results in much less power being consumed. Furthermore, as the result of the lower EMI, the asynchronous ARM9 core has significantly better reliability than its clocked counterpart.

Fig. 1.3. Energy and peak currents for (a) clocked and (b) clockless ARM cores [3]

Fig. 1.4. Electromagnetic emissions for (a) clocked and (b) clockless ARM cores [3]

Also, an asynchronous 80C51 microcontroller, which demonstrates a very high tol- erance to process and environmental variations [53], was implemented by Handshake Solutions in 2004 [4], with asynchronous and synchronous versions of it shown in Fig. 1.5. Both ICs were realised by Phillips™ in 0.5 µm technology under the same oper- ating conditions. These images were taken using photon emission microscopy, with the red parts indicating the hotspots on the chips due to highly localised activity. The synchronous version (Fig. 1.5 (left)) has greater and more widespread activity than the asynchronous one (Fig. 1.5 (right)). The distributions of hotspots on these chips are indicators of power dissipation as well as the reliability of the end device. The asyn- §1.2 Past, present and future of asynchronous designs 9

chronous chip is only activated where and when needed whereas the synchronous one is activated across the whole IC due to the activity of the clock signal.

Fig. 1.5. Photon emission images of synchronous (left) and asynchronous (right) 80C51 microcontroller executing same program (red dots indicate levels and distributions of power dissipation which are apparently lower and more localised in the asynchronous 80C51 chip [4]

1.2.2 System-on-Chips (SoCs)

Combining IP blocks from different IP providers is a common practice in today’s elec- tronics industry. Nevertheless, this is usually not straightforward in a synchronous design because of the many concerns about the different clock frequencies and process variations in different IP cores. However, as fewer such considerations are required for an asynchronous design and these circuits do not have strict timing assumptions, they are better candidates for IP re-usability and plug-and-play operations. Even when the cores of the system are synchronous, asynchronous design may pro- vide more reliable communication among them. When clocked cores are connected with asynchronous interconnections, the result is a Globally Asynchronous Locally Syn- chronous (GALS) system. A GALS design develops coarse-grained functional modules using conventional design techniques and then adds local clock generation and self- timed wrappers. As a result, the modules in a GALS system can communicate using asynchronous handshake protocols [54]. In a clockless system on chip (SoC), regardless of the timing control method used in its cores, its communication elements are always based on some handshaking request and acknowledgement signals and, sometimes, buffers. So far, we know that asynchronous design could be a potential solution to some of the obstacles in the way of downsizing electronics components. We have learnt about 10 Introduction

the history and future of asynchronous design and explored some of the advantages and disadvantages of this design technique. The next section provides a list of the pros and cons of asynchronous design compared with those of synchronous design for easy future reference.

1.3 Pros and cons of asynchronous design

The following summarises the benefits of asynchronous over synchronous design method.

• Lower power: eliminating the clock may lead to reductions in the switching power.

• Average-case performance: no worst-case delay as delay dependent on input data.

• Technology-independent: as delay-insensitive, can be independent of physical properties, such as technology, scale, ageing and manufacturing variations.

• Tolerant of physical changes: in voltage, temperature, age, manufacturing varia- tions and implementation environment.

• Reliability: no race, hazard or skew and all delay-related issues automatically eliminated.

• Security: as, unlike synchronous design, valid data appears on the outputs from asynchronous circuits at irregular intervals, it is harder to interfere with or decode the information that has been stored and transformed using this method.

The advantages of clockless design are not without cost and come with a multitude of trade-offs. The following list itemises the costs of using asynchronous techniques for circuit design at this time in the history of the semiconductor industry.

• The area overhead of asynchronous design may be up to 4 times greater than that of a synchronous one due to the addition of completion detection and design-for- test circuits.

• Fewer people are trained in this technology than in that of synchronous design.

• Being independent of the worst-case propagation delay, asynchronous circuits have non-deterministic performances and non-deterministic timing behaviours [29]. Although this can be an advantage in security-related apps, it also adds to the challenges of design, verification and testing of asynchronous circuits.

• These clockless circuits also lack well-established design and verification EDA tools [55], [29], [56]. §1.4 Key challenges involved in asynchronous design 11

• Due to higher statefulness, asynchronous designs are inherently more challenging to test and debug than synchronous ones [29].

In order to integrate asynchronous designs in the electronics industry, it is neces- sary to tackle the above-mentioned issues. The next section concentrates on the main challenges in asynchronous design.

1.4 Key challenges involved in asynchronous design

Two of the main challenges of the asynchronous design method are the lack of well- established CAD tools and testing methods. Firstly, there are no automated design flows and tools that can extend the specifications of an asynchronous design to those required for a fabrication-ready circuit. On the other hand, if a designer manages to turn an asynchronous design into an actual IC, the methods currently available for testing asynchronous designs are immature and not adapted to the specific structure and behaviour of asynchronous circuits. These two significant shortcomings limit the acceptance of asynchronous designs among designers and engineers. Investments in improving the design and test methods for this important circuit design technique could cease a vicious cycle which has continued for a few decades. The following subsections explain these limitations in more detail which lead to the reasons behind conducting the current thesis.

1.4.1 CAD tools

The lack of high-level and user-friendly EDA tools has prevented the widespread use of asynchronous designs and slowed their development. Over time, several issues related to asynchronous CAD tools have been addressed, and a few solved, mainly in univer- sities. These efforts have concentrated primarily on prototyping or defining the flow of an asynchronous behavioural design. There are two main approaches for the development of asynchronous EDA tools [57], one of which builds on existing synchronous tools. Since very few designers are familiar with asynchronous concepts, providing them with the opportunity to work with tools with which they are familiar increases the possibility of their embracing this new design methodology. In this way, research can easily and naturally become closer to fulfilling the requirements for practical use. The UNCLE CAD tool [14] from Mississippi State University is an example of this group of tools. The core and engine of the other approach are developed from scratch, with their developers believing that, to benefit from all the advantages of asynchronous design, the core of the related CAD tools must be customised to this methodology and not represent only a patchwork of existing ones. The Haste/TiDE tool from Philips’ Handshake Solutions [3] and the Balsa one [58] from the University of Manchester are two well-known examples. 12 Introduction

1.4.2 Test and testability

ITRS test reports [30] mention the lack of proper test and testability methods for asyn- chronous design and the necessity for investment in this area. Asynchronous test tech- niques face a higher level of complexity than synchronous ones. An asynchronous sys- tem exhibits non-deterministic behaviour, and it is not always clear when its results are valid and can be checked. Another factor is the high number of feedback loops added to a circuit for handshaking. Finally, similar to synchronous circuits, an asynchronous one can include functional loops (FL)1 which are significant concerns when designing test hardware. The need to integrate test hardware in an asynchronous circuit under test (CUT) raises the issue of controlling the timing of the hardware when there is no clock in the original design under test (DUT). Because of these complexities, test techniques for asynchronous circuits are not nearly as well-established or advanced as those for synchronous ones [59] which is one of the main obstacles to popularising asynchronous systems. Physical test and testability decisions must be made in the earliest possible stages of the design of a circuit as the longer they are postponed, the higher the price of their implementation. Therefore, it is important to invest in defining proper physical test techniques for asynchronous design methods as soon as possible. This work is dedi- cated to developing test and testability methods for a specific category of asynchronous circuits called NCL. More details about NCL and why we focus on testing this clockless design method are provided in the following section.

1.5 Description of Null Convention Logic (NCL)

NCL [16] is a delay-insensitive, self-timed asynchronous paradigm. Like most asyn- chronous design methods, it can result in robust clockless systems with lower power consumptions and higher performances than their synchronous counterparts. However, it has a few additional benefits over other asynchronous design paradigms. An NCL- based circuit has great potential for heavy pipelining which can result in significant per- formance improvements [60], [61]. Its need for only a little or no time analysis makes it perfect for component re-usability and also a suitable candidate for the integration of multi-rate circuits. Finally, it has considerably little crosstalk between the analogue and digital parts of the circuit. As a trade-off for all these advantages, it usually has a larger area than the synchronous version of a circuit. Table 1.2 compares the area and energy consumption of an NCL 80C51 microcon- troller and two different synchronous implementations of it in 0.25 µm [5]. As shown, the area of the NCL circuit is almost 1.6 to 1.7 times those of the synchronous ones while its energy consumption at the same effective frequency is 3 to 4 times less.

1e.g., the loop that reads the current count value of a counter from a register and writes the next count value back into the register. §1.5 Description of Null Convention Logic (NCL) 13

Table 1.2: Comparison of NCL and two synchronous microcontrollers at same effective frequency [5]

Design Routed Area (mm2) Energy (mW/MHz) Processor Speed NCL8051 259,935 47 36.5 MHz (effective) DW8051A 179,772 157.4 36.5 MHz (actual) DW8051B 169,740 182.4 36.5 MHz (actual)

It is unquestionable that the area overhead of NCL is a disadvantage for commercial products with a limited budget. However, for applications in which power consumption is a critical factor, its use of 3 to 4 times less energy may be worth the cost of its 70% larger area [5]. An NCL-based design, when compared with its synchronous counterparts, has less power, EMI and substrate noise, and a higher tolerance for PVT variations. Fig. 1.6 (a) depicts an NCL circuit and a synchronous version of a pseudo-random generator circuit next to an analogue circuitry. The impact of the substrate noise introduced by each of these two circuits on the analogue circuitry is shown in Fig. 1.6 (b) which demonstrate that the coupled noise generated by the NCL-based circuit is 25 dB less than that of the synchronous pseudo-random generator.

Fig. 1.6. Photomicrographs showing: (a) low-noise characteristics of NCL; and (b) mea- sured substrate-induced noise of synchronous and NCL circuits [5]

The graphs in Fig. 1.7 illustrate the changes in the energy consumptions of NCL and synchronous versions of a safety-critical circuit over a temperature profile whereby, as the temperature increases, the NCL circuit consumes 7× less energy than the syn- 14 Introduction

chronous one.

Fig. 1.7. Impact of NCL on application’s energy requirements [5]

Having considered the significant benefits of NCL design, the current work concen- trates on proposing a complete test strategy for this robust asynchronous technique. The next subsections discuss the challenges of testing NCL-based circuits.

1.5.1 Challenges of testing Null Convention Logic (NCL)

To implement testing techniques for NCL circuits, firstly we need to investigate the challenges of developing testing procedures for this asynchronous design method, in- cluding:

1. the lack of a clock signal and deterministic timing in an NCL circuit results in the time management for testing it being more complex as it is not clear when new test data can be applied to the DUT and when the output from the DUT is ready to be captured and checked against the expected values; and

2. as the state-holding NCL gates exhibit unexpected responses to stuck-at faults on their gate internal feedback (GIF), in the process of testing an NCL circuit, detecting these faults has always been extremely difficult [62], [63].

Previous work conducted on testing NCL gates has at least one of the following shortcomings:

• ignoring the faults on the GIF and, therefore, decreasing the reported fault cover- age by 20% [62], [64].

• changing the internal structure of NCL gates which imposes an area overhead, slows down the circuit and/or leaves an average of 13% of faults in the design undetected in testing due to the additional test hardware [63], [65]; and §1.6 Outline of thesis 15

• using clocked hardware for testing a clockless design [62], [63] which presents a clock-tree distribution, synchronous/asynchronous interfaces and the possibility of metastability issues and timing violations in highly reliable NCL systems [61]. Also, this hardware may decrease the tolerance of an NCL system to PVT varia- tions. Unlike synchronous circuits that work based on the worst-case delay, NCL circuits operate according to the average-case timing. Therefore, there is minimum timing assumption involved in an NCL design which makes it highly tolerant of PVT variations. Introducing a clocked design for test (DFT) into an NCL design imposes timing assumptions that can limit the PVT variation tolerance of the orig- inal circuit, especially if the DFT-inserted design is to be tested online.

Having looked at the motivations behind conducting this study, the next section is dedicated to describe the outline of this thesis, its objectives, evaluation criteria and contributions.

1.6 Outline of thesis

This study proposes a complete testing strategy for the highly robust asynchronous circuit design technique NCL [16]. Although we focus on testing NCL circuits, the proposed techniques are fundamentally applicable to all delay-insensitive asynchronous designs. This section looks at the objectives of this work and its measurable evaluation parameters, selected implementation and assessment methods, and main contributions to the literature.

1.6.1 Objectives

The main objective of this research is to provide a complete test and testability method- ology for the physical testing of NCL. The chosen fault model is the single stuck-at fault model, which is the most general one capable of approximating the majority of transistor-level and transient faults [12]. Also, other fault types, such as bridging and multiple stuck-at ones, are detectable by test vectors generated for single stuck-at faults [12]. The purpose behind this work is to develop testing techniques that can detect more than 99% of the single stuck-at faults in an NCL circuit. The measurable factors for eval- uating the quality of the proposed test techniques are the fault coverage, the number of test vectors, test time and area overhead (for a definition of these terms, please refer to Chapter 2, Section 2.2). Part of the objectives of this work is to maximise the fault coverage while minimising the number of test vectors, test time and area overhead.

1.6.2 Implementation and evaluation methods

Most of the methods proposed in this study are implemented using the hardware de- scription language (HDL) Verilog testbenches and simulated using the Mentor Graph- 16 Introduction

ics’s ModelSim™ CAD simulation tool. For the purpose of HDL simulations, a delay- annotated NCL gate library is developed in 0.18 µm and 45 nm. The fault injection, fault removal and fault collapsing functions are the same as those implemented in pre- vious work [17] using the procedural language interface (PLI) of Verilog. The reasons for choosing an HDL/PLI for implementation and assessment of the proposed meth- ods are explained in Appendix A. The test hardware developed (please see Chapter 2, subsection 2.2.3 and Chapter 5) are implemented using Verilog and their test programs written using Verilog testbenches and PLI functions, with the fault coverage, number of test vectors and test time measured in the testbenches.

For IDDQ testing (more details are provided in Chapter 2, subsection 2.2.4 and Chap- ter 4), Hspice is used to implement good and faulty NCL gate libraries, and also to simulate and measure the power-supply current. Python programming is used to auto- mate the conversion of the HDL NCL circuit into Hspice, inject faults into the DUT and run the Hspice simulation in an automatic manner. Also, the resultant fault coverage, number of test vectors and test time are collected using Python. In this study, “fault” means “single stuck-at fault”. Also, in order to avoid confu- sion, the inputs and outputs of circuits are called “primary inputs (PIs)” and “primary outputs (POs)” to distinguish them from the inputs and outputs of gates.

1.6.3 Contributions of this research

The main contributions of this work include the following.

• Managing the timing of automatic test pattern generation (ATPG) and DFT for NCL circuits without using a clock signal. This is achieved using a proposed mixed-timing method extracted from the handshaking signals of the DUT and a fault-free model (golden model) (Chapter 3, subsection 3.2.4).

• Demonstrating that some faults in an NCL circuit are untestable using any given voltage test (Chapter 3, subsection 3.3).

• Both theoretically and experimentally showing that an IDDQ test can complement the proposed ATPG to achieve a greater than 99% fault coverage for NCL GIF faults (Chapter 4).

• Proposing DFT techniques specifically designed for NCL when the specifications of a circuit require it for a better test performance (Chapter 5).

• Achieving 13% more fault coverage, 50% less delay and 32% less area overhead compared to previous work. §1.7 Previous work 17

1.7 Previous work

In this section, we summarise some of the most important studies conducted on the test and testability of asynchronous circuits, focusing more on those on the testing of delay-insensitive circuits, especially NCL. It is recommended that readers who are unfamiliar with the concepts of NCL and/or test and testability read the related background information in Chapter 2 before proceed- ing.

1.7.1 Test and testability of asynchronous circuits

This section concentrates on the existing work published on the testing of non-delay- insensitive asynchronous circuits. Although the focus of this study is on the test and testability of a delay-insensitive asynchronous design method, these other test meth- ods help us gain a better understanding of the general challenges inherent in testing asynchronous circuits. Sequential ATPG methods with hazard identification were proposed in [66], [67] and [68]. In most asynchronous methods, metastability, hazards and races can occur and, due to the lack of a clock signal, their effects cannot be masked. However, if sufficient care is taken to make an NCL circuit observable and orphan-free (Chapter 2, subsection 2.1.3.1), as hazards and metastability will not occur in it, this is not an issue for the development of its ATPG. Reference [69] focused mainly on developing a DFT for an asynchronous intercon- nected architecture called CHAIN which, it was hoped, could be applied to all types of asynchronous interconnections used for GALS focusing on C-element-based circuits or circuits with one feedback. The drawback of this study was using a clocked LSSD-based DFT for asynchronous design. The method proposed by Efthymiou et al. [59], which was based on synchronous sequential ATPG algorithms, initialises the internal state of the DUT before applying test vectors to its primary inputs. The first step in this initialisation-based TPG algorithm is breaking the global loops using scan chains. In the second step, as all the state-holding elements (C-elements) are considered equivalent to the flip-flops in synchronous design, they define the internal state of the CUT. The third step involves finding a set of {test pattern, internal state} that detects each fault using deterministic ATPG (PODEM [15]). Finally, the fourth step determines the sequence of test vectors that put the DUT into the internal state established in the third step. The main contribution of the test approach presented by Efthymiou et al. was finding hazard-free test patterns for asynchronous DUTs. However, as previously mentioned, as an NCL circuit is hazard-free by nature, it would not benefit from hazard-free test generation. This method used scan cells, but the area overhead and synchrony/asynchrony of the scan cells were not reported. Moreover, the proposed method in [59] was based on the scarcity of state-holding elements and 18 Introduction

functional loops in asynchronous circuits that could be false in the scale real-size NCL circuits. Kang et al. [70] proposed a scan latch with a low area overhead for efficiently detect- ing stuck-at and delay faults in asynchronous micro-pipeline circuits. Their method im- proved the fault coverage for delay faults which, in turn, increased the controllability of the second pattern needed to detect a delay fault. Their proposed method demonstrated a 4% improvement in average fault coverage for five asynchronous ISCAS benchmarks over that in previous work. However, its reported fault coverage is still low and, for one circuit (AS13207), only 21.40%. Furthermore, the clock tree added to the asynchronous circuit may cause metastability issues in the DUT.

1.7.2 Test and testability of delay-insensitive circuits

Cheng et al. [64] proposed a fully asynchronous scan cell for testing dual-rail circuits by modifying an original latch composed of two C-elements and a NOR gate into a scan cell with two C-elements, an XOR gate and three hazard-free multiplexers, on the inputs and output of the latch. Then, they designed a scan architecture which, instead of connecting the closest asynchronous registers of the DUT, joins the registration bits with the next registration stages in an interleaving manner to retain a periodic Null/Data arrangement for the scan cells. As their method is a DFT one that does not include gate-internal feedback faults, it did not cover approximately 20% of the faults of the NCL DUT. The experimental results obtained from this work indicated a 55.52% area overhead for the data path of an 80C51 microcontroller. The scheme presented by LaFrieda and Manohar [65] increased the fault tolerance of quasi-delay-insensitive (QDI) circuits without developing an ATPG. The authors sug- gested modifications to the gates and the completion detection circuitry. Their main goal was to translate each possible fault into a deadlock in the QDI system. Although their approach covers faults on the feedback paths of the QDI gates and decreases the test time, the reported results showed that it imposes up to a 32% area overhead and 50% slower timing on the circuit.

1.7.3 Test and testability of Null Convention Logic (NCL)

In their test method for NCL circuits, Kondratyev et al. [62] first proved that, in acyclic NCL circuits (i.e., those with no computational feedback loops), fault detection in the registration stages and completion detection circuitry is redundant to detecting faults in combinational components. Based on this idea, the authors proposed that the test algorithm should eliminate registration and completion detection circuitry and concate- nate on the combinational parts of an NCL circuit, and replace the NCL gates with their Bool-set and Bool-reset equivalents for GoToData and GoToNull phases, respec- tively. Therefore, as the original NCL DUT changes to a purely combinational circuit, §1.7 Previous work 19

conventional combinational ATPG algorithms could be used for the resultant circuit. Kondratyev et al. proposed that the test algorithm should first convert cyclic NCL DUTs to acyclic circuits using clocked scan cells to break their functional loops. Their experi- mental results showed a maximum area overhead of 23% for a Viterbi decoder. Such a large clocked DFT hardware can introduce an extensive clock tree into an asynchronous DUT. This method concentrated on gate input faults and did not consider the GIF faults of NCL gates. Satagopan et al. [71] designed an automated test-point insertion tool for increasing the controllability of feedback loops in NCL circuits. They first improved the observ- ability of the DUT using XOR trees but, despite the resultant high area overhead, the fault coverage achieved was very low. Therefore, they proposed using scan latches to improve the observability of the NCL DUT and improved the path delay of their scan chain using a balanced tree structure for scan latches. To implement their proposed au- tomatic DFT insertion tool, they used standard CAD tools, such as Mentor Graphics™ and Synopsys™. Their final design converts NCL gates into their equivalent Boolean gates that can be understood by conventional ATPG tools. Their experimental results showed a 75% improvement in fault coverage with less than a 5% area overhead. Given the lack of automatic CAD tools for designing and testing NCL circuits, this tool is very valuable. However, using a clocked DFT in it could cause metastability issues in an oth- erwise highly robust NCL DUT. Also, given that their method replaces NCL gates with their GoToData part before test generation, it seems that their proposed ATPG does not cover faults on the GIF of NCL gates. In [72], Satagopan et al. improved their automatic DFT insertion tool that they pro- posed in [71] by firstly enhancing their ATPG gate library by adding a feedback line to the Boolean equivalent of the NCL gates so that the hysteresis behaviour of the NCL gates could be considered at the time of test generation. However, using the techniques in [71] and adding testability points using XOR gates to the CUT had a very insignifi- cant effect on the fault coverage; for example, that reported for a 4x4 dual-rail full-word pipelined multiplier with no DFT was 16% and adding XOR gates increased it to just above 21%. The contribution of its scannable latch insertion to fault detection was also trivial and resulted in a still unacceptable 45% fault coverage. Finally, in [72], by break- ing the internal feedback loop of all NCL gates using a scannable latch, almost 100% of faults were detected. However, the area overhead reported for this test strategy imposed an unreasonably high gate overhead of at least 85% on the DUT. This is unacceptable, particularly because the additional test hardware is synchronous which could signifi- cantly compromise the reliability and power efficiency of the NCL CUT. The test methods proposed by Al-Assadi et al. in [73], [74], and [63] offer improve- ments over those presented in [72]. Here, we explain the contributions of only [63] as they also cover those of the methods presented in [73] and [74]. The unacceptably high area overheads of the methods in [72] were reduced in [63] by 20 Introduction

replacing the scannable latch on the feedback loop of NCL gates with an AND gate and adding a test-enable signal with the GIF. The automatic DFT insertion tool ADIF was improved to reduce the area overhead. One improvement was using preliminary SCOAP observability figures and adding testability points to unobservable lines of the circuit by XORing them in groups of 2-, 3- and/or 4-input XOR gates (based on the criticality of the unobservable point). The experimental results showed that the 85% minimum area overhead for the NCL circuit examined in [72] was improved to an average of 37% in [63]. This test method uses clocked logic to test a clockless system and adds an average area overhead of 17% to each type of NCL gate. Furthermore, it may not cover faults on the test-enable signal added which means that, on average, 13% of the faults in the circuit will not be covered. Table 1.3 shows a comparison of our proposed techniques and the features of existing methods for testing of delay-independent and quasi-delay-independent categories of asynchronous design. As described in this section, all the existing approaches have at least one drawback (listed in the last row in Table 1.3) that our method overcomes. In this table, tick ( ), cross (×) and question mark (?) mean “included”, “not included”, and “not clear”, respectively.

1.7.3.1 Previous work on IDDQ testing of NCL circuits

For decades, the quiescent current (IDDQ) test has been used in synchronous design to detect particular types of faults, including hard-to-detect stuck-at ones [13], [75]. Roncken et.al [76] used it for the QDI handshake control logic of 4-phase hand-shaking single-rail data asynchronous circuits in 0.8 µm. However, to the best of our knowledge, no work has been undertaken on testing an NCL using the IDDQ test. The main differences between the work by Roncken et al. and that in this study are:

• Roncken et al.’s work was on the QDI handshake control logic of 4-phase hand- shaking single-rail data asynchronous circuits while ours is on NCL;

• we show that for a clockless DFT-less test strategy, the IDDQ test is essential for some GIF faults, and we apply it to only these faults while Roncken et al. used it as a complement to a voltage test for all faults;

• Roncken et al. added DFT to increase the effectiveness of IDDQ test while we do not need DFT to test NCL circuits; and

• Roncken et al.’s work was on 0.8 µm and ours is on 45 nm technology.

As formerly stated, all the previous studies conducted on the test and testability of an NCL circuit have some room for improvement. This thesis is the result of our attempt to learn from the advantages of those studies, overcome their drawbacks and provide a complete testing strategy for NCL. The following section presents the structure of this dissertation. §1.7 Previous work 21 × × × × × 266 3615 No GIF (no ATPG) 2 m µ × × N/A N/A × × ? ×× × No GIF Table 1.3: Comparison of previous work and proposed methods Impractical area overhead Impractical area overhead ×× × × × Full/Partial ScanClockless DFT Test Point Insertion DFT Overhead (%) PartialSequential ATPG Test Tools Used Faults on 11.95% GIF Largest Test CaseNumber of GatesFault Coverage Full DSP-32Drawbacks 55.52 3425 99.85 8051 DP 9490+560L Partial - 99.59 ? Partial(FLs) Adder Partial(FLs) 41 N/A 37.3 100 MAC Partial 22375 CHAIN 45 64bit 15502 AND 100 14.46 DES 23.0 Clocked DFT Clocked DFT 100 Clocked DFT 100 ReferenceYear PublishedAsynchronous MethodDFT Insertion NCL Proposed 2016 Dual rail [64] 2011 QDI [59] 2010 NCL [63] c-element 2009 QDI [69] 2005 NCL [65] 2004 2002 [62] 22 Introduction

1.8 Structure of thesis

The remainder of this thesis is organised as follows. Chapter 2 presents a brief intro- duction to the background material required for this work, with Section 2.1 explaining details of NCL and Section 2.2 the standard concepts of test and testability. Chapter 3 describing the proposed self-timed ATPG for NCL and the motivation for combining it with an IDDQ test. Chapter 5 discusses the IDDQ test for faults on the GIF of static and semi-static NCL gates and Chapter 4 explains our proposed clockless test hardware and architecture. The experimental results obtained in each part of this work are in- cluded at the end of the related chapter. Finally, Chapter 6 draws conclusions and offers suggestions for future work.

1.9 Summary

In this chapter, by looking at trends in the semiconductor industry, we described how different methods are now required to keep up with the relentless demand for cheaper, faster, smaller circuits. Asynchronous design is one method that introduces the po- tential for low-power, highly reliable and high-performance circuits. It has a long his- tory of developing successful products, such as a clockless ARM™ processor by Philips Handshake solutions™ (that requires almost 3× less power than a clocked version) and high-performance FPGAs by Achronix™ (that are 3× faster than those by Xilinx™ and Altera™). However, despite its great potential, asynchronous design still has not been adequately integrated into the electronics industry. Two major reasons for this are the lack of commercial CAD tools and the shortcomings of the test and testability techniques available for asynchronous circuits. By examining previous work on the physical test- ing of asynchronous circuits, it became apparent that, despite the great deal of valuable research undertaken, there is an opportunity to develop a complete test solution that is clockless, considers all faults in a DUT and does not require extra hardware for testing combinational asynchronous circuits. The possibility of conducting research on the test and testability of asynchronous circuits is exciting as it is one step towards developing an electronics industry in which, when suitable, asynchronous design can be embraced as an option for low-power and high-reliability design. This study attempts to pro- vide a complete testing framework for a delay-insensitive class of asynchronous circuits called NCL. Apart from the general benefits of asynchronous circuits, by nature, NCL is glitch-less, hazard-free, very tolerant of process, voltage and temperature (PVT) varia- tions, and has been successfully used for high-security data transfer applications. All its advantages, described in detail in this chapter, are the reasons for the topic of this study being the testing of NCL circuits. Also, the motivations for, background to, objectives of, and criteria and framework for developing a thorough testing technique, including

ATPG, IDDQ testing and DFT specifically designed for delay-insensitive circuits, were §1.9 Summary 23

discussed. 24 Introduction Chapter 2

Background

As, in the author’s experience, it is uncommon for readers to have an extensive knowl- edge of both NCL and test and testability concepts, some background material on these two topics is provided in this chapter. This background will also help to gain a better understanding of the challenges involved in testing NCL circuits and the benefits and shortcomings of the methods proposed in this thesis. In this chapter, Section 2.1 presents a summary of the fundamentals of NCL, intro- duces its three core conventions, explains the structure of its gates and, finally, discusses the timing and control of NCL circuits. Section 2.2 explains the preliminaries of the physical post-fabrication test and testability of electronic circuits. It firstly clarifies the basic concepts of test, such as the differences between a fault, error and failure, the cost of testing, the fault model used and the criteria for evaluating the proposed techniques. Then, it explains the fundamentals of automatic test pattern generation (ATPG), design for test (DFT), and quiescent current (IDDQ) test, laying the foundation for the tech- niques presented in Chapter 3, 4 and 5, respectively. Finally, Section 2.3 summarises this chapter.

2.1 Null Convention Logic (NCL)

NCL is a delay-insensitive asynchronous method whereby the outputs are independent of the inputs’ arrival times [16]. Its first principle is the Null convention, that is, consid- ering a wire with both high and low voltages, one state is assigned as Data and the other as Not Data or Null. This is in contrast to the common practice of one state meaning Data value 1 and the other Data value 0. With only one state indicating Data, the second principle of NCL is the multi-rail convention. Based on it, a variable with more than one value needs multiple rails, i.e., a separate rail for each possible value; for example, a binary variable (A) needs two rails (two wires) (A1 and A0), that is, Data values of 1 and 0, respectively, only one of which can be Data at a time. Then, variable A based on wires A1, A0 is {A = Data0: A1A0 = 01} and {A = Data1: A1A0 = 10}, respectively. Note that {A1A0 = 11} is considered illegal because the rails are mutually exclusive while {A1A0 = 00} indicates that A is empty

25 26 Background

or Null. This value is used as a spacer to “wash” the circuit into a known state before subsequent Data is introduced.

The third principle of NCL is the completion convention whereby the patterns of completeness for the two states are Null-completeness and Data-completeness. It is considered that a pattern of Null-completeness is detected on a set of variables

(A1, A2, ..., An) if each variable has Null on both its rails, i.e., Ai0 Ai1 = 00. A Data- completeness pattern is detected when each variable has one and only one Data on its dual rails, i.e., Ai0 Ai1 = 01 or Ai0 Ai1 = 10. The inputs to an NCL circuit must change monotonically between Data-completeness (in a Data wavefront) and Null-completeness (in a Null wavefront).

2.1.1 Structure of NCL gates

NCL achieves its delay-insensitivity feature using a set of fundamental gates called threshold gates which are usually defined as THmnwkl. Fig. 2.1 shows THmn where the output is only asserted when at least m of n inputs are Data. If there is a weights portion (wkl), the first and second inputs have weights equal to k and l times that of the other inputs, respectively. Only weights greater than 1 are specifically mentioned after the w in a gate’s name. The output changes to Data when the condition of the gate, which is a Boolean expression, is satisfied. However, the asserted output remains as Data until all its inputs are Null. This behaviour, which is called hysteresis, forms the basis of NCL’s asynchronous property. As an example, TH34w3 is an NCL gate with four inputs (A, B, C, D), where A is weighted as 3, and the Boolean expression for asserting the output of this gate is “A + BCD”.

Fig. 2.1. NCL threshold m-of-n gate - THmn (k = l = 1)

Fig. 2.2 demonstrates the hysteresis behaviour of the TH22 gate, that is, when its inputs are {A,B} = (0,0), it has an output value of 0. When either A or B transitions to 1 and the other remains as 0, the output remains as 0. When both A and B have values of 1, the output transitions to that value. Once the value of a gate’s output is 1, it remains 1 until both A and B transition to 0. The arcs between the (1,1)1 and (1,0)1, and (1,1)1 and (0,1)1 nodes show that, once the output is 1, it remains 1 regardless of the transitions of the gate’s inputs until both transition to 0 which results in a value of 0 for the gate’s output. §2.1 Null Convention Logic (NCL) 27

Fig. 2.2. Hysteresis behaviour of TH22 gate

2.1.1.1 Static NCL gates

To form hysteresis behaviour, the static transistor-level implementation of NCL gates needs the four networks shown in Fig. 2.3. The “GoToNull” and “HoldNull” are made of PMOS and the “GoToData” and “HoldData” of NMOS transistors. Each network is based on a Boolean expression of the inputs of the gate. The relationships among these ∗ ∗ networks are shown in equations (2.1) to (2.4). The wires marked Zn and Zd in Fig. 2.3 are the gate internal feedback (GIF) lines of static NCL gates.

Fig. 2.3. Static transistor level-implementation of NCL gates

“GoToData” in equations (2.1) and (2.4) is asserted when the threshold function is met or the associated Boolean equation fulfilled, “GoToNull” in equations (2.2) and (2.3) when all the inputs are Null(0), “HoldData” in equations (2.1) and (2.3) when at least one rail remains Data and “HoldNull” in equations (2.2) and (2.4) when the threshold function is not met. 28 Background

∗ Zd = GoToData + (Zd .HoldData) (2.1) ∗ Zn = GoToNull + (Zn.HoldNull) (2.2) HoldData = GoToNull (2.3) HoldNull = GoToData (2.4)

2.1.1.2 Semi-static NCL gates

The semi-static transistor-level implementation of NCL gates requires the two networks shown in Fig. 2.4. The “GoToNull” one is made of PMOS transistors and conducts when all the inputs of the gate are Null. On the other hand, the “GoToData” one is made of NMOS transistors and conducts when the condition of the gate, which is a Boolean expression, is satisfied. In the semi-static implementation, the hysteresis behaviour of NCL is implemented through the weak inverter that feeds the output of the gate back to the internal node (M).

Fig. 2.4. Semi-static transistor-level implementation of NCL gates

2.1.2 NCL gate library

As NCL achieves its delay insensitivity using threshold gate techniques, a standard gate library is defined. Table 2.1 presents all the 27 possible threshold gates with four inputs or less as well as the number of transistors in their static and semi-static implementations with their symbols.

2.1.3 Timing and control in NCL

As shown in Fig. 2.5, the computational components in an NCL pipeline are surrounded by two asynchronous registration stages, each with a request signal (Ki) and an acknowl- edgement signal (Ko). Completion detection circuitry (CDC) is responsible for detecting Null- or Data-completeness in each of these registration stages. §2.1 Null Convention Logic (NCL) 29

Table 2.1: NCL gate library

NCL Boolean Function #Tr (Static) #Tr (Semi Static) Gate Symbol

TH12 A+B 6 6

TH22 AB 12 8

TH13 A + B + C 8 8

TH23 AB + AC + BC 18 12

TH33 ABC 16 10

TH23w2 A+BC 14 10

TH33w2 AB + AC 14 10

TH14 A + B + C + D 10 10

TH24 AB + AC + AD + BC + BD + CD 26 16

TH34 ABC + ABD + ACD + BCD 24 16

TH44 ABCD 20 12

TH24w2 A + BC + BD + CD 20 14

TH34w2 AB + AC + AD + BCD 22 15

TH44w2 ABC + ABD + ACD 23 15

TH34w3 A+BCD 18 12

TH44w3 AB+AC+AD 16 12

TH24w22 A+B+CD 16 12

TH34w22 AB + AC + AD + BC + BD 22 14

TH44w22 AB + ACD + BCD 22 14

TH54w22 ABC + ABD 18 12

TH34w32 A + BC + BD 17 12

TH54w32 AB + ACD 20 12

TH44w322 AB + AC + AD + BC 20 14

TH54w322 AB + AC + BCD 21 14

THxor0 AB + CD 20 12

THand0 AB + BC + AD 19 13

TH24comp AC + BC + AD + BD 18 12

The values of Ki and Ko translate as follows:

• Ki = 0: the next pipeline stage is ready for Null; 30 Background

Fig. 2.5. NCL registration stage (bold) and completion detection (grey)

• Ki = 1: the next pipeline stage is ready for Data;

• Ko = 0: the current stage is finished with the last set of Data and its input is ready to receive a new set of Null from the previous stage; and

• Ko = 1: the current stage is finished with the last set of Null and its input is ready to receive a new set of Data from the previous stage.

2.1.3.1 NCL is hazard-free and glitch-less

All NCL systems must satisfy two criteria to be delay-insensitive: input-completeness and observability.

• Input-completeness: not all outputs of an NCL circuit may transition from Null to Data (Data to Null) until all its inputs transition from Null to Data (Data to Null) [61]. In other words, at least one output must remain in the current state of the circuit (Null or Data) until all the inputs transition to a new state [61].

• Observability: any threshold gate that transitions must cause at least one output to transition [61]. A wire that transitions in a wavefront but does not affect the transition of any circuit output is called an orphan [16]. Observability requires that no orphan passes the boundary of an NCL gate.

An NCL circuit designed with sufficient care paid to its input-completeness and observability requirements is glitch-less and hazard-free. When an NCL pipeline stage receives Null on its inputs (after Data-completeness is detected), its gates transition from Data to Null. In this situation, as there are transitions from only Data to Null, each gate will either not transition (as it is already at Null) or transition exactly once to a correct output. Likewise, when an NCL pipeline stage receives a new set of Data on its inputs (after Null-completeness is detected), there are only transitions from Null to Data. Each gate will either not transition (remain at Null) or transition exactly once to a correct §2.2 Test and testability 31

output with no glitches and no hazardous or incorrect transitions. Neither the order nor timing of inputs is important. During a Data wavefront, as soon as a Data input arrives, it propagates through the circuit and reaches the NCL gates which, if their thresholds are met, transition to Data. Likewise, in a Null wavefront, when Null arrives as an input, it propagates through the circuit and washes the Data off any gate with all-Null on its inputs. This is a crucial concept in NCL behaviour that we will use to form the timing method of our test techniques in Chapter 3. For more detailed information about NCL, please refer to [16], [61].

2.2 Test and testability

Today, integrated circuits (ICs) can be found almost everywhere, including in safety- critical applications such as an implant in a person’s arm or a circuit stimulating a patient’s heart or a chip in a space shuttle. As such applications are vital and cannot tolerate defects or errors, they need to be regularly tested to ensure they have no un- derlying problems. The manufacture of an IC involves many mechanical and chemical stages during all of which it is prone to physical imperfections and damage. Some defects simply cause failure, and the defective chip is removed from the production cy- cle. However, many other defects may only degrade the functionality of the circuit or remain undetected during simple tests and later cause lethal consequences. So, it is im- portant that manufactured ICs be thoroughly tested for physical defects before delivery to customers. In post-production test techniques, stimuli (test vectors) are applied to the primary inputs (PIs), and responses collected from the primary outputs (POs) of the circuit under test (CUT) are compared with the expected values. If inconsistencies are detected, it is considered that there is a fault in the circuit and the chip must be marked as erroneous. Undoubtedly, the greater the number of stimuli, the more thorough the testing process for detecting faults. However, the ever-increasing complexity of ICs makes it infeasible to run an exhaustive test consisting of all the possible 2n test vectors, where n is the number of PIs. As a practical alternative, a pre-computed test set is required to detect the majority of potential defects (for more details, please see subsection 2.2.2 and Chapter 3). Also, to reduce the vast number of possible defects in a circuit, logical fault models are used to present physical defects at a more abstract level (for more details, please see subsection 2.2.1.3). Therefore, test and testability techniques are vital and include, but are not limited to:

• Fault collapsing which uses equivalence and dominance relationships among faults in a list to minimise the number of faults in that list and, as a result, re- duce the time and complexity of the test;

• Test generation generates optimised sets of stimuli and concentrates on detecting 32 Background

the maximum number of faults with the lowest possible number of test vectors;

• Fault simulation simulates the efficiency of a given set of stimuli (the test set) for detecting a given set of faults (the fault list) for a given CUT; and

• Design for test (DFT) designs hardware specifically for the purpose of making the original CUT more testable, with minimising the area and delay overhead caused by these techniques always critical.

In the following subsections, a fundamental introduction to the basics of testing and more details of the test methods mentioned above are provided.

2.2.1 Basics of test

Testing is the process of detecting physical imperfections that may occur during or after the process of manufacturing an IC.

2.2.1.1 Fault, error, failure

Before providing details of test techniques, it is important to clarify the differences be- tween “fault”, “error” and “failure”. The chart in Fig. 2.6 shows the different conditions of an electronics circuit in terms of physical correctness which are explained as follows.

Fig. 2.6. Multi-level model of system reliability [6]

• Ideal: there is no imperfection in the circuit, and the design accords exactly with its specifications.

• Defective: the circuit has one or more defects. Some examples of physical defects in ICs are shown in Fig. 2.7. In Fig. 2.7(a) and (b) a physical open defect is §2.2 Test and testability 33

illustrated in top-down and cross-section SEM images, respectively. The defect shown in Fig. 2.7(c) is a water mark defect that causes incomplete circuit printing, while Fig. 2.7(d) depicts a material contamination defect. The defects shown in Fig. 2.7(e) and Fig. 2.7(f) are incomplete connection between circuit traces and shorts and break defects caused by a scratch, respectively, and finally, Fig. 2.7(g) represents an inter-layer short between two aluminium tracks. Also, in the NOR gate in Fig. 2.8(b), the defective wire ‘a’ is shorted to the ground.

• Faulty: the circuit has an exposed defect which results in incorrect values or deci- sions being formed in the circuit; for example, in Fig. 2.8, the short defect between two nodes causes a stuck-at0 fault on wire ‘a’.

• Erroneous: if the effect of a fault spreads to other parts of the circuit and contam- inates other components in it, the system is erroneous; for example, in Fig. 2.8, when a = 1 and b = 0, the output from the gate is supposed to have a value of 0 but, because of the stuck-at fault on wire ‘a’, has an erroneous value of 1.

• Malfunctioning: an error built into the system may or may not result in a mal- function; for example, if the system is constructed with redundancies and fault- /error-tolerant techniques, the error will be masked at some point and not affect the system’s functionality.

• Degraded: even if a fault/error does not cause a malfunction, it can result in system degradation; for instance, some delay faults may leave the functionality of a circuit intact but slow it down so that it does not achieve its maximum potential performance. Another example of a degraded system is one with faults that do

not affect the correctness of the output but create a path between the VDD and ground and, as a result, drain the battery faster than usual or increase the power consumption. We see examples of these types of faults in our study and address

them in Chapter 4 while using IDDQ to detect them.

• Failed: either a malfunction or degradation of a system can lead to failures; for instance, if there is a faulty gate in the controller of a microprocessor, the system may be stuck-at a particular state and fail to function properly. In the example

of system degradation caused by a path conducted between the VDD and ground, the power consumption could increase to an unacceptable level and damage the battery or system itself.

2.2.1.2 Cost of test - rule of 10

As testing is usually the most laborious and costly activity involved in the process of implementing an electronic system [13], engineers are often pressured to compromise it. However, inadequate attention to testing in the right stages of production may degrade 34 Background

Fig. 2.7. Examples of possible defects in electronic circuits (a) top-down and (b) cross- section SEM images of open defect [7], (c) watermark in water immersion lithogra- phy (features under watermark are not printed because of defect [8], (d) residues of processed film or dissolved contaminant from tool construction material causing short between multiple lines [8], (e) incomplete connection between copper trace and PTH barrel [9], (f) shorts and breaks in metal lines caused by scratch in photoresist [10], and (g) inter-layer short between two Al interconnects [11] the performance of the final product. Postponing the detection of a fault in a product to later steps in the manufacturing process can drastically increase the cost of testing. The pyramid in Fig. 2.9 illustrates the different levels in the circuit design process. An operational electronic system contains sub-systems, each of which has printed circuit boards (PCBs) with multiple devices, each of which may have multiple cores. The so-called “rule-of-10” in testing suggests that the cost of a test increases by a factor of 10 when moving from one level of this pyramid to a lower one [12]. It is §2.2 Test and testability 35

Fig. 2.8. (a) NOR gate symbol (b) transistor-level Nor - ‘A’ stuck-at0 (c) truth table of the NOR gate without fault (Outg) and with fault (Out f )

Fig. 2.9. Test cost pyramid [12] crucial that, when cores are put together to form a device, they are already thoroughly tested and device-level testing tests only the interconnections and interfaces between them. The same rule applies to testing devices before putting them on PCBs, testing PCBs before making sub-systems out of them and, finally, testing its sub-systems before building an entire operational system. If a faulty core progresses far enough into the production cycle to reach an operational system, the cost of detecting it then is 10,000 times that of what it would have been to detect it at the core level. Therefore, instead of being perceived as an expensive process, testing should be regarded as a value-adding activity.

2.2.1.3 Fault model - single stuck-at fault

Because of the vast number of possible defects in an electronic circuit, it is often more practical to reduce the test complexity using logical fault models which adopt mathe- matical abstractions to model the behaviours of defects. As a result, multiple defects can be modelled by a single fault model. Furthermore, unlike physical defects fault models are technology independent, and the test generation method that detects them does not need to adapt to changes in the underlying technology. As the most widely used fault model capable of approximating most transistor-level 36 Background

and transient faults is the single stuck-at one [12], it is the one considered in this work. Also, studies show that other fault types, such as bridging and multiple stuck-at ones, are detectable by test vectors generated for single stuck-at ones [12]. The stuck-at fault model fits in the category of static fault models which means that in it, the site-of-fault is assumed to be “stuck” at a permanent value and its logic level does not change with variations in the PIs of the circuit. There are two different stuck-at values considered for each wire in the CUT and, when a wire is stuck at logic values of

0 (or ground) or 1 (or VDD), it is called stuck-at0 or stuck-at1, respectively. Even after modelling physical defects using logical fault models, a TPG still needs to examine an enormous number of faults. To further decrease the complexity of test algorithms, test engineers usually use fault dominance and equivalence relationships to reduce the number of faults that have to be considered by the ATPG [12].

2.2.1.4 Test technique evaluation

The main objective of testing an electronic system is to do so as thoroughly as possible while requiring the least possible cost and time [12]. To evaluate the effectiveness of a certain test method for a certain design under test (DUT), the test engineer needs to measure the following.

• Fault coverage: this is the most important factor for evaluating a test method which indicates the percentage of faults in the fault list that can be detected in the DUT using the test procedure adopted. Techniques such as DFT and ATPG are usually used to improve the fault coverage of a given DUT.

• Test time: although a very high level of fault coverage is of great significance in the testing of a DUT, it is not acceptable to take an unreasonable amount of time to achieve the desired one. In the course of designing electronic circuits, testing already takes an average of 30-40% of the whole design process [13]. Therefore, it is crucial that test methods be designed cleverly with the aim of minimising the time required to obtain the anticipated fault coverage.

• Test storage: for test methods that use automatic test equipment (ATE), it is neces- sary to consider the ATE’s available memory. If the amount of test data generated to test a DUT requires more storage than what is available, it is essential to im- prove the test generation algorithm and/or apply test compression/compaction techniques to reduce the size of the test set. While this thesis does not cover the concepts of test compression and compaction, interested readers can refer to [12] for more information about these techniques.

• Area overhead: as previously mentioned, a DFT might be used to improve the fault coverage of a given DUT. In such a case, it is important to keep the area of §2.2 Test and testability 37

the added test hardware to a minimum as more test hardware means a larger area for the final product, added delays to the paths of the circuit and the test hardware itself may remain untested which decreases the actual fault coverage of the DUT.

In summary, maximising the fault coverage in an acceptable amount of test time within the limits of the available test storage and minimising the area overhead of the test hardware are the main objectives of any high-quality test strategy.

2.2.2 Automatic test pattern generation (ATPG)

ATPG belongs to the class of electronic design automation (EDA) methods. It generates a test sequence that, when applied to the PIs of an electronic circuit, enables the ATE to differentiate between correct and faulty circuit behaviour caused by defects.

2.2.2.1 Combinational ATPG

The following are the most common ATPG techniques for combinational circuits.

• Exhaustive: the brute-force technique for test generation applies all the possible test values that the inputs of a circuit can take, i.e., 2n possible stimuli for a circuit with n PIs. Although this test generation method works as easy as running an n-bit counter and its fault coverage is high, it is evident that, for large n, it is impractically time-consuming.

• Pseudo-exhaustive: one possible way of improving exhaustive test generation is to partition a DUT into fan-in cones by backtracking from each PO to the PIs influencing it and use a separate exhaustive test for each cone. In this pseudo- exhaustive test generation, fan-in cones are tested in parallel which reduces the test time but the test vectors generated can result in low fault coverage for the whole DUT.

• Random: in exhaustive test generation, consecutive test vectors are too similar and, most likely, access a similar area of a circuit, and as a result they may detect the same faults. Therefore, a large number of exhaustive test vectors are needed to detect faults in different locations of the circuit. A simple improvement over exhaustive test generation is using random test vectors. By increasing the entropy of test vectors and selecting them from a wider spectrum, there is a higher proba- bility of accessing different parts of the circuit and a greater chance of fault detec- tion with fewer test vectors. In this technique, although generating test vectors is straightforward, applying them to the DUT is still time-consuming.

• Deterministic: although generating test vectors using deterministic ATPGs is usu- ally laborious, generally, these techniques achieve high fault coverage and the ap- 38 Background

plication of these vectors to the DUT is fast. Some of the most commonly used of these techniques are as follows.

– Sensitised path method [77]: is a heuristic fault-oriented test generation method consisting of two steps:

1. creating a sensitised path from the site-of-fault to a PO; and 2. justifying the input by tracing assignments made to each gate on the sensitised path back to the PIs.

In both steps, many decisions need to be made while tracing the tree of gates, a process which is blind and arbitrarily chooses one of the possible paths with no assurance of it being the best. As a result, this ATPG method does not guarantee the detection of all detectable faults, particularly in the presence of re-convergent fan-outs. – D-algorithm [78]: this is a modified version of the sensitised path method that is guaranteed to find a test vector for a detectable fault. The main difference between these two approaches is that the D-algorithm sensitises all possible paths from the site-of-fault to the POs of the DUT. – PODEM (Path-Oriented DEcision Making) [15]: is a more efficient version of the D-algorithm. While the D-algorithm makes a binary decision tree (BDT) around all the nodes in a sensitised path, PODEM expands the BDT only around the PIs and, accordingly, decreases the complexity of the algorithm. Also, it checks if the decision tree has reached a deadlock and, if so, imme- diately backtracks to the previous node of the BDT and accelerates the test generation process.

In this work, we consider the elements of deterministic test generation for NCL circuits based on PODEM (more details available in Chapter 3, Section 3.5) as it is a basis for many other deterministic test generation algorithms. More advanced deterministic test generation techniques are discussed in the future work section of Chapter 6.

2.2.2.2 Sequential ATPG

Unlike a combinational circuit, a sequential one has state-holding elements. Therefore, a sequential ATPG must find a sequence of test vectors that places the sequential circuit into a particular state that activates a fault and propagates the effect of that activated fault to the POs. Two common methods for sequential test generation are:

• time-frame expansion; and

• simulation-based [13]. §2.2 Test and testability 39

However, the most common approach in the industry for testing sequential circuits is integrating DFT hardware to access the internal state of the CUT through added input/output pins and then applying combinational test techniques to the CUT. More details of this approach are provided in the next subsection.

2.2.3 Design For Test (DFT)

Adding extra hardware to the original DUT to make it more testable is called establish- ing a DFT, the most commonly used elements of which are as follows.

2.2.3.1 Test point insertion

Even in a fully combinational circuit, there might be sections that are difficult to test as it is possible that no test vector can reach them or the effect of the faults in them is masked before reaching the POs. An internal node of a DUT that has a value which is difficult to set using the PIs is called hard-to-control. Similarly, one with a value that is hard to propagate to a PO is called hard-to-observe. Such nodes decrease the testability of a DUT and increase the cost of testing. The test hardware that can improve the controllability of a hard-to-control node is called a controllability point (CP) while that which can improve the observability of a hard-to-observe node is referred to as an observability point (OP). As shown in Fig. 2.10, the simplest possible implementation for a CP is a multiplexer that passes the controllability value at the time of test and the value generated by the DUT at any other time.

Fig. 2.10. Simplest implementation of a controllability point

Placing test data into the added CPs requires an extra PI for each CP. Since adding pins to the original circuit is very expensive, it is common to use a storage element for each test point as shown in Fig. 2.11, cascade all the CPs together like a chain and add only one serial input for all of them. Then, the test data will be shifted to the chain of CPs through the single added pin. After all controllability values are shifted, they can be applied to the DUT by setting the function mode of the circuit to test. 40 Background

Fig. 2.11. Implementation of a test point using a flip-flop allows cascading the CPs

Fig. 2.12 shows the simplest implementation method for an OP, which is taking the hard-to-observe wire out of an output pin. However, as for a CP, chained flip-flops tend to be used to collect the data and then shift them to the one pin added to the POs (Fig. 2.11).

Fig. 2.12. Simplest implementation of observability point

2.2.3.2 Scan insertion

The most common reason for using DFT elements in industrial circuits is to decrease the sequential complexity of the CUT [12]. In a sequential circuit, the POs not only depend on the PIs but also on the internal state of the circuit. Fig. 2.13(a) shows the functional loop of a sequential circuit and Fig. 2.13(b) the test hardware, called a scan chain, which can break this loop and provide access (both controllability and observability) to the internal state of this circuit. Fig. 2.14 shows a sample scan cell and the next subsection provides more details of scan chains. In the scan insertion technique, either all or some of the registers of the circuit are connected like a chain, and they can be accessed serially from one PI through one PO. A scan chain sets the required state of a sequential circuit and then, by applying one item of data to the inputs, one data will be available for collection at the output of the combinational circuit. The structure that scan cells form in a scan chain is called “scan architecture”, with some common ones described below.

• Full scan: all the flip-flops of the DUT are connected to each other to form one §2.2 Test and testability 41

Fig. 2.13. (a) sequential circuit with functional loop and (b) scan chain breaking func- tional loop

Fig. 2.14. Scan cell

scan chain. Although it has a very high area overhead and shift-in/shift-out time, full scan requires only one extra input and one extra output pin, as shown in Fig. 2.15(a). This scan architecture converts the sequential circuit to a fully combina- tional one at the time of test.

• Partial scan: only the most important or least testable flip-flops of the DUT are connected to each other to form a scan chain. This architecture has a smaller area overhead and shift-in/shift-out time than a full scan one, as shown in Fig. 2.15(b). However, the scan-inserted DUT is not fully combinational at the time of test and sequential test generation is needed for testing it. 42 Background

• Multiple scan: the flip-flops of the DUT are arranged in different groups to form multiple scan chains. This architecture requires more than one extra input and one extra output pin and has a high area overhead but, from a favourable perspec- tive, decreases the shift-in/shift-out time, as shown in Fig. 2.15(c). Multiple scan architecture can apply to both full and partial scan chains.

Fig. 2.15. Most common scan architectures: (a) full; (b) partial; and (c) multiple

When testing a clocked scan-inserted DUT, it is necessary to clock the circuit in test mode for sufficient cycles to serially shift the desired state into the scan chain. Then, a test vector is applied to the PIs and the circuit clocked in the function mode. The resultant state of the circuit will be shifted out during the next test phase simultaneously with the shifting in of the next new state. Although this solution for testing sequential circuits increases the hardware overhead and sometimes the testing time, it decreases the test memory required and significantly increases the resultant fault coverage.

2.2.3.3 Built-in Self-test (BIST)

A BIST is a type of DFT hardware that enables an IC to perform a physical test on itself. Its test procedure does not rely on an ATE to generate test vectors and analyse the outputs of the DUT. Instead, its circuitry generates the test vectors and applies them to the DUT, and then collects and analyses the outputs when they are ready. The architecture of a BIST specifies the numbers of TPGs and output response anal- ysers (ORAs), their types, how they are placed relative to the DUT and other necessary details of the test application and response collection. Furthermore, integrating BIST with scan chains and test points also contributes to defining its architecture. There are several templates for BIST architectures, each of which is suitable for a certain type of DUT. Although a BIST is usually fine-tuned by the repeated use of fault §2.2 Test and testability 43

simulation, several studies have been conducted on using intelligent methods to opti- mise the configuration of the BIST architecture [79], [80]. Any BIST architecture could be divided into three fundamental parts: 1) its internal structure which defines the scan architecture and probable test points inserted inside the original DUT; 2) its external structure which shows the details of its TPGs and ORAs; and 3) its timing and control which are handled by a BIST controller. As previously outlined, the internal structure of a BIST architecture is the DFT in- serted inside the DUT to improve its testability. The most important DFT element is scan chains, the design of which for NCL is described in Chapter 5, Section 5.2. The external structure of a BIST architecture shows the numbers, locations and configurations of its TPG(s) and ORA(s). Test pattern generator (TPG): In a BIST architecture, the on-chip memory that can be allocated for test purposes is very limited or non-existent. Therefore, instead of having a set of test data in a memory, it generates its own. The TPG in a BIST can be performed using various methods, with one of the most common being linear feedback shift registers (LFSRs) [81]. An LFSR generates pseudo-random test patterns, the range of which is defined based on the LFSR’s parameters. As shown in Fig. 2.16, a synchronous LFSR consists of a series of flip-flops wired as a shift register with feedback loops through XOR gates. The XOR gates are modulo-2 adders and the flip-flops considered delay elements. The test data generated by an LFSR is based on its initial value, known as a seed, and the numbers and locations of the XOR gates defined by its polynomial which can be represented by the vector P[n-1..0], where P[n-1] is always equal to 1.

Fig. 2.16. Configurable LFSR structure

Output response analyser (ORA): In order to analyse the output responses, cor- rect responses are required as references. However, the DUT’s limited chip area does not allow the use of on-chip memories for storing all of the expected output values. This problem can be resolved by storing a signature or a compressed version of all test responses. In this way, instead of checking individual test vector responses, a BIST archi- tecture checks only the signature obtained from a test response against the programmed signature in its hardware. The task of generating the signature from output responses is achieved by a multiple-input signature register (MISR). 44 Background

The structure of a MISR is the same as that of an LFSR except for the addition of XOR gates at the inputs of the register flip-flops for bringing in parallel data. Fig. 2.17 shows the details of the implementation of a MISR.

Fig. 2.17. Configurable MISR

Existence or absence of a scan chain in the structure of the BIST, the scan architecture, and numbers and locations of TPGs and ORAs are the factors that specify the BIST architecture. The following provides a brief introduction to four standard RTL BIST architectures: BEST; RTS; STUMPS; and BILBO. BEST, the simplest BIST architecture which is shown in Fig. 2.18, has one TPG for applying data to the PIs and one ORA for collecting the responses of the circuit.

Fig. 2.18. BEST architecture

When the sequential depth of a circuit is large or there are too many feedback loops in a DUT, the time cost of BEST can be unreasonably high. In such a situation, using RTS or STUMPS architectures may be preferable. In RTS, a full or partial scan architecture is used with additional TPGs and ORAs serially pushing the desired state into and out of the internal registers of the design, respectively, as shown in Fig. 2.19. Although it uses more hardware than BEST and requires scan insertion, it takes less testing time and provides greater accessibility to hard-to-reach design components and, therefore, higher §2.2 Test and testability 45

fault coverage.

Fig. 2.19. RTS architecture

As, for a DUT with many registers, the shift phase for a full scan can be very time- consuming, a STUMPS architecture may be a better candidate. In it, the registers of a DUT are chained to construct two or more scan chains, each of which is filled sepa- rately with TPG(s) and analysed in parallel with the others by ORA(s). This architecture requires more hardware than BEST and RTS but may significantly reduce the testing time. Another choice for BIST insertion in RTL designs is BILBO which is useful when the registers are very distributed or partial testing of a circuit is considered. In it, the internal registers of the circuit are replaced with new ones which can be configured to act as 1) regular registers, 2) TPGs, 3) ORAs or 4) shift registers. The BIST controller specifies the functionality and timing of each BILBO register in each stage of testing.

2.2.4 IDDQ test

Instead of an incorrect output, it is possible for faults on an internal feedback node of a gate to result in a delay in the arrival of the correct output (i.e., a delay fault). While these faults might not affect the overall operation of a circuit, they could increase its leakage current which could lower its yield in the burn-in test phase [82] or affect the battery life of a portable device [13]. Since a circuit’s functionality is intact, in a delay-insensitive system like an NCL, such faults are unlikely to be detected by a voltage test and perhaps not even by delay testing. However, they may be detectable by IDDQ tests [82], [83], [84].

IDDQ or quiescent current testing works on the basis that, in a defect-free CMOS circuit, there is no static path between the VDD and ground. Therefore, there is only a small 46 Background

leakage current during the quiescent state of the circuit. In an IDDQ test, a large leakage current at least one order of magnitude larger than the fault-free current [13] is used as a sign of fault detection. IDDQ testing is known to be much more effective for industrial synchronous circuits than ATE and industrial test tools which usually offer this facility [13].

Fig. 2.20. IDDQ test for generalised CMOS IC [13]

While it is clear that IDDQ tests are becoming less effective due to the higher leakage currents in nanometric technologies, they are still considered a valuable test methodol- ogy. Therefore, there are ongoing attempts to propose improved versions of this tech- nique to maintain its usefulness [85], such as, but not limited to, using built-in current sensors (BICS) [86], [87], power gating [88], differential IDDQ [89], multi-temperature

IDDQ [90], process variation-aware techniques [91], [92], [75], and a statistically-defined

IDDQ threshold [93].

Fig. 2.20(a) presents an example of a fault that requires IDDQ. The asterisk on the right-hand side of the CMOS inverter indicates a defect in the PMOS transistor that causes its impedance to fall from infinity to a finite value. In such a case, the current flows in the steady state through the dotted path shown by the arrow which increases the IDDQ. Fig. 2.20(b) illustrates the input and output voltages of the inverter, with the drain current of the inverter denoted as the IDDQ.

In a fault-free circuit, the IDDQ drops to an insignificant value whereas, in a faulty one, it remains elevated even long after switching is completed. Such faults are de- tectable by measuring the IDDQ after a certain amount of waiting time, i.e., at the time shown by the arrow in Fig. 2.20(b) [13].

As the basis of IDDQ testing is measuring the quiescent current, it is necessary that all current spikes in a circuit caused by switching activities have died out before the measurement is performed. Based on the literature, a 1-10 ms wait is an adequate amount of time before the IDDQ current is measured through the VSS or VDD of the circuit [88]. §2.3 Summary 47

2.3 Summary

This chapter provided a short background to the main concepts covered in this thesis: NCL (Section 2.1) and the physical testing of electronic circuits (Section 2.2). Firstly, the structure and behaviour of NCL circuits were explained. Knowing the structures of NCL gates helps the reader understand the challenges involved in testing the faults on their internal feedback. Also, the behaviour and timing of NCL circuits indicate the difficulties involved in controlling the timing of test methods for them. Then, this chapter familiarised the reader with the core topics for testing digital circuits, the importance of testing and the costs involved, and introduced the main al- gorithms used for test generation, DFT and IDDQ test. This introduction will facilitate a better understanding of the NCL test methods proposed in Chapter 3, 4 and 5, respec- tively. 48 Background Chapter 3

Automatic Test Pattern Generation for NCL

ATPG algorithms generate test vectors (TVs) that can detect physical defects inside a given CUT. When applied to the inputs of the CUT, such TVs enable the ATE to dif- ferentiate between correct and faulty circuit behaviours. This chapter discusses our proposed ATPG methods for NCL. Background information on ATPG is provided in Chapter 2, subsection 2.2.2.

Because of the sequential complexity of NCL caused by the statefulness of its gates, studies of ATPG for it have based their methods on the theory that it needs more than one Data value to detect each fault. As a result, its normal {Null, Data, Null, Data, ...} behaviour cannot be maintained during testing. Consequently, current test methods require an external clock signal and synchronous test hardware to control the timing of ATPG. This additional clocked hardware can cause metastability issues in an otherwise highly reliable NCL circuit. Also, because a clocked DFT requires more accurate timing assumptions than an NCL DUT, it can undermine the NCL’s high tolerance for PVT variations.

In this chapter, we first use Boolean difference calculus [94] to show that the normal {Null, Data, Null, Data, ...} flow of NCL can be used at the time of testing (Section 3.1). Then, Section 3.2 uses the concepts discussed in Section 3.1 to propose an automatic timing method for testing NCL circuits with no need for a clock signal or timing analysis of the DUT. Using this method, we achieve a self-timed clockless ATPG strategy for testing NCL circuits which is also used for the DFT methods proposed in Chapter 5. Section 3.3 examines the effectiveness of the proposed timing method and defines the types of faults it cannot detect which leads to Chapter 4 in which the applicability of

IDDQ testing for detecting such faults is examined. Section 3.5 studies the adaptability of concepts in deterministic test generation methods for NCL circuits and proposes a method called N-PODEM.

Finally, Section 3.6 analyses the experimental results obtained from our proposed ATPG method and Section 3.7 summarises this chapter.

49 50 Automatic Test Pattern Generation for NCL

3.1 The number of required test vectors per fault

Every element of an NCL circuit is sequential, including the gates. An NCL gate is considered a sequential circuit due to its state-holding behaviour which means that its output is not determined based on only its current input values. Due to its lack of a clock signal, the timing of an NCL circuit is less deterministic than that of a synchronous one. The sequential behaviour of its gates further adds to the already low level of temporal determinism in an NCL circuit. Therefore, many studies have jumped to the conclusion that the normal {Null, Data, Null, Data, ...} flow of NCL cannot be applied during testing and used one Data value to put the DUT into a deterministic state and a second to detect a fault. However, because of the periodic Null/Data wavefront in an NCL circuit, an NCL gate already carries some deterministic information as, between each two of its undefined stable Data states, there is one known state: Null. This deterministic state could be used to either initialise the gate into a known state (when applied as a {Null, Data} pair) or check if it can be moved to an expected state (when applied as a {Data, Null} pair). In this section, we prove that our proposed test generation technique needs only one pair, {Null, Data} or {Data, Null}, per detectable fault in an NCL circuit. It works if, and only if, each TV pair is capable of both activating the fault at its site and propagating it to the POs.

3.1.1 Fault activation in NCL using one Null or one Data

Activating a fault means driving the opposite value of the fault onto the site-of-fault. For each computational part of an NCL circuit, its internal values are determined by just one input pattern. As mentioned in Chapter 2, between each two Data values applied to the PIs of a circuit, a Null value is applied which “washes” the effects of the first Data. Therefore, as each Data value inside the circuit is generated by one and only one Data on the circuit’s PIs, activating the fault can be performed by only one Data or one Null value. A stuck-at1 fault on a GIF is activated immediately after a Null phase, that is, the gate output transitions to Null in the Null phase, a value of 0 is driven onto its GIF and then, if the GIF is stuck-at1, the fault is activated at this point. On the other hand, a stuck-at0 fault on a GIF could be activated immediately after a Data phase if, and only if, the gate’s output is asserted Data. Then, a value of 1 is driven onto the GIF of the gate and, if it is stuck-at0, its fault is activated at this point.

3.1.2 Fault propagation in NCL using one Null or one Data

Fault propagation means propagating the effect of an activated fault to one or more of the DUT’s outputs. As we consider only single stuck-at faults, for each gate in an NCL §3.1 The number of required test vectors per fault 51

circuit, only one of the following cases is correct immediately after the application of Null:

• Z∗ = 0, where the GIF (Z∗) is faultless; or

• Z∗ = stuck-at-0/stuck-at-1, where the GIF (Z∗) is faulty.

For the former, we prove that, for each NCL gate in the circuit, if Z∗ is faultless and equals zero (after a Null phase), there is at least one input combination (Data) that can propagate a fault from one of the inputs to the output of the gate. For the latter, we prove that, for each NCL gate in the circuit, if Z∗ is faulty, there is at least one input combination (Null or Data based on the type of fault) that can propagate the fault from Z∗ to Z and from there to a PO. In Boolean logic, to propagate a fault through one of a gate inputs to its output, all its other inputs must be set to non-controlling values; for example, for AND and OR gates, to propagate a fault, all the other inputs must be 1 and 0, respectively. As a similar policy needs to be defined for NCL gates, in this study, we use Boolean difference calculus to define the non-controlling values for inputs of NCL gates [12], [94]. In a Boolean function f (x, y, z), differentiation based on one of the function variables (x) shows how the function changes based on the changes in x, i.e., from 0 to 1 and from 1 to 0. As a result, the Boolean difference is defined as:

∂ f = f (x = 0) ⊕ f (x = 1) (3.1) ∂x

Similarly, in Boolean logic, detecting a fault on wire x in function f means finding the combination of inputs that gives different values for the outputs of the function for x = 1 and x = 0. This happens when f (x = 0) ⊕ f (x = 1) = 1, that is, when ∂ f ∂x = 1. Therefore, detecting a fault on wire x of a Boolean function ( f ) is equivalent to ∂ f calculating ∂x = 1 and setting the values of the other variables based on the result. As an application of the Boolean difference for fault propagation in NCL gates, we consider propagating faults on the GIFs (Z∗) of threshold gates which means that we need to calculate the Boolean difference of the gate equations based on Z∗. Replacing f and x in equation (3.1) by Z and Z∗ gives the Boolean difference of the NCL gates based on Z∗ as:

∂Z = Z(Z∗ = 0) ⊕ Z(Z∗ = 1) (3.2) ∂Z∗

Equations (2.1) to (2.4) in Chapter 2, which show the relationships among the “Go- ToData”, “GoToNull”, “HoldNull” and “HoldData” transistor networks of static NCL gates, are repeated here for easy reference. For the sake of simplicity, when calculating the Boolean difference equations, we replace these transistor network names with “toD”, “toN”, “HoldN” and “HoldD”, respectively. 52 Automatic Test Pattern Generation for NCL

∗ Zd = GoToData + (Zd .HoldData) (3.3) ∗ Zn = GoToNull + (Zn.HoldNull) (3.4) HoldData = GoToNull (3.5) HoldNull = GoToData (3.6)

Replacing Z in equation (3.2) with the expression in equation (3.3) and calculating the ∗ ∂Z Boolean difference of Z based on Z results in ∂Z∗ = toD ⊕ (toD + HoldD) = toD.HoldD which gives:

∂Z = HoldN.HoldD (3.7) ∂Z∗

Based on this equation, propagating faults on the GIF (Z∗) of any threshold gate requires finding the input combinations that cause HoldN.HoldD = 1. Using fault propagation for Z∗ of the TH23 gate as an example, the gate equation is:

Z = (AB + AC + BC) + (Z∗.(A + B + C)). (3.8)

Test generation using the Boolean difference for the GIF (Z∗) of this gate requires HoldN.HoldD = 1 which results in:

(AB + AC + BC)(A + B + C) = 1 (3.9) ⇒ A¯BC¯ + AB¯ C¯ + AB¯C¯ = 1 (3.10)

which means that any combination of inputs with two 0s and one 1 could propagate the fault on the GIF of the TH23 gate to its output. The same principle applies to the faults on the inputs of gates; for instance, using test generation to propagate faults through input A of the TH23 gate results in B¯CZ¯ ∗ + BC¯ Z¯∗ + BC¯Z¯∗ = 1. Therefore, there are three different combinations of inputs and Z∗ values that can propagate faults through input A to the output of a TH23 gate. It is straightforward to demonstrate that the Boolean difference ( ∂Z ), which is calculated for ∂xi the gate inputs when Z∗ = 0, has solutions for every NCL gate, as shown in Table 3.1, ∗ where xi ∈ {A, B, C, D, Z }. It is clear that, for a fault on any input of a threshold gate (including Z∗), there is at least one combination on the other inputs of that gate that will allow the fault to propagate to the gate’s output. In order to fit the Boolean differences for all inputs to each gate in one row in the table, we minimised the Boolean logic in a non-SOP (sum of product) form. Before using the entries of this table for test generation, ∂Z it is important to convert them to an SOP form. For example, the result of ∂Z∗ for the TH23w2 gate is recorded as A¯(B ⊕ C) in Table 3.1. By converting this equation to a SOP ∂Z ¯ ¯ ¯ ¯ form, we have ∂Z∗ = ABC + ABC which means that ABC = 010 and ABC = 001 are the §3.1 The number of required test vectors per fault 53

two TVs that can propagate a fault of a TH23w2 gate from Z∗ to Z.

An example of using one pair of {Null, Data} to detect a GIF stuck-at1 fault in a small NCL circuit is shown in Fig. 3.1. In this figure, part of an NCL 4-bit up/down counter from the UNCLE CAD tool [14] calculates the next value of count[3] based on the current value of count and the up input.

Fig. 3.1. Part of NCL 4-bit up/down counter calculating the third bit of count [14]

Assume that we want to detect a stuck-at1 fault on the GIF of the highlighted TH23w2 gate. After a Null wavefront, this fault is activated and as discussed in Section 3.1, to propagate it to the output of the gate, we need to solve the equation HoldN.HoldD = 1 for it. Since its “toD” expression is A + BC, as shown in equations

(3.11) to (3.13), either A¯BC¯ or AB¯ C¯ can propagate this activated fault from the GIF to l7.

HoldN = toD = A¯B¯ + A¯C¯ (3.11) HoldN.HoldD = (A¯B¯ + A¯C¯).(A + B + C) (3.12) = A¯BC¯ + AB¯ C¯ (3.13)

We choose the first option and try to find the PIs that provide {l5 = 0, t_cnt[3] = 0, f _cnt[3] = 1}. Using these values, both the “toN” and “toD” networks of the circuit are off, and either the HoldN or HoldD one maintains the current state of the gate (the faulty value on the GIF) by sending it to the output. In order to justify the PIs that satisfy {l5 = 0, t_cnt[3] = 0, f _cnt[3] = 1} and also propagate the fault from l7 to the POs, any conventional test generation technique can be used. An example of such a TV is TV={up,count} = 00101 which means that {t_TV, f_TV} = {00101, 11010} detects the fault. Since both fault activation and fault propagation can be performed using a normal {Null, Data} or {Data, Null} pair, the normal timing of the NCL, which is implicit in its handshaking signals, can be used to manage the timing of a test. Given that the normal {Null, Data, Null, Data, ...} behaviour of NCL can be maintained at the time of testing, we now determine a timing method for our proposed ATPG for NCL. 54 Automatic Test Pattern Generation for NCL TH24 TH34w3 TH44w2 TH44 TH34 TH14 TH33w2 TH23w2 TH33 TH23 TH12 Gate TH24w2 TH13 TH22 TH44w3 TH24w22 TH34w2 TH44w22 TH34w22 TH34w32 TH54w32 TH54w22 THxor0 TH54w322 TH44w322 TH24comp THand0 C C B ABC ABD AB ACD BC AC BCD BC BC B BC B B A B B B B B B B B B B BC B B BC B B BC BD BC ∂ ∂ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A Z C C C C C C C C + ⊕ + + + + + ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ D D D + + + + + + + + + + + ¯ ¯ ¯ + C A A C C C C DA CD C ¯ ¯ ¯ + + C BD B B B BD DAC D BD BC B B BD B + + ¯ ¯ ¯ ¯ ¯ ¯ C D D D A D D C B B ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ D A D CD CD ¯ + + + + + + + + ¯ ¯ CD B DAC CD C CD B C A BCD BCD ¯ ¯ ¯ ¯ ¯ + + D C D ¯ ¯ ¯ D + C AC BCD B ¯ ¯ C ¯ C D ¯ D ¯ AC A AC A A ACD ACD A A AC AC AC ACD AC AC ∂ ∂ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Z B C A C C C CD C C C C ⊕ al .:Vle o al rpgto nNLgt inputs gate NCL on propagation fault for Values 3.1: Table ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ D D D D A D + + + + + + + + + + ¯ ¯ ¯ ¯ ¯ A C + + + + + + A AD DAB AD AD A A A A DA AD A ¯ ¯ ¯ ¯ C D A D D D D A A A A AC A ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ D DAB CD CD CD C C ¯ + + + ¯ ¯ ¯ ¯ ¯ D D ¯ ¯ C A ACD ¯ + + D C ¯ ¯ A D C AB ACD A ¯ ¯ C ¯ D ¯ — — — — — A AB A ABD AB ABD A A AB AB ABD ABD AB AD ∂ ∂ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ C Z B B B B BD BD B DA BD D ⊕ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ D D D D A D D AB D D + ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ + + B + + + + + + + + A B BD ¯ B A D A A A AB A A AB A ¯ ¯ ¯ ¯ ¯ A D DAB BD BD BD B BD D ¯ ¯ ¯ ¯ ¯ ¯ ¯ D D A D ¯ ¯ ¯ + + A B AB ABD ¯ B ¯ D ¯ — — — — — — A AB ABC B A ABC A A AB ABC ABC AB AC ∂ ∂ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ D Z C B B BC B BC B BC ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ C C C C A C C A C C A C + + ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ + + + + + + + CA B BC ¯ A C A A A A AB AB A ¯ ¯ ¯ ¯ BC B BC BC CA BC ¯ ¯ ¯ ¯ ¯ C C C ¯ ¯ ¯ + + A ABC ¯ B ¯ C ¯ — ( ( A A A A A A AB A A AB AB AB A A A ∂ ∂ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A A Z Z C ( ( ( ( ( ( B B BC B B B B B ⊕ + ⊕ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ∗ ¯ C B B B B B ⊕ ⊕ C C C C ( ( ( D D + + + ¯ ¯ ¯ ¯ ¯ ¯ C C C ( B BCD + + ⊕ ⊕ ⊕ + D D D + + ¯ ¯ ¯ B B B + + AC AC AD ⊕ ⊕ ⊕ ¯ ¯ ¯ C C C C C ) ) D + + + + AB AB C C AB A ¯ ¯ + ) + ) ) + ) ¯ ¯ D D D ¯ ¯ + ) + D A A AB C + + BC + + + ¯ ¯ ¯ ¯ ¯ + ) ( + ) ) C D C BC BC D ¯ ¯ + ¯ ¯ ¯ + A AD B B A A A A ¯ C ¯ A + ¯ ¯ ¯ ¯ + + + ) + ¯ C ( B CD ( D BD C A B ¯ D ¯ B ¯ ¯ ¯ + + ¯ A ¯ ¯ ¯ C A A D C A A A D D ¯ ) B ¯ + ¯ ¯ ¯ ¯ + ¯ ¯ + + A B ⊕ A A CD BD + ( BD ¯ ( ¯ + ¯ ¯ ¯ ¯ + + C C B A ( B BD BC A ¯ C ¯ ¯ D B B ¯ C ¯ C AC A BC + ¯ B ¯ ⊕ ¯ + ) ¯ + C ) + + ¯ + ) D ¯ C B + CD ¯ + ¯ ¯ D ¯ + B D AC B BC D D D ¯ ¯ ¯ ¯ ¯ ¯ C C ) ) A C A ¯ ¯ B ¯ ¯ + D ( CD D D ¯ B D D ¯ ¯ ¯ A ¯ ¯ ¯ C A ¯ + ¯ + + D BCD ¯ ¯ A C B ¯ ¯ ) CD D ¯ ¯ ) §3.2 ATPG timing 55

3.2 ATPG timing

In this section, we establish our proposed technique for ATPG time management by considering the alternatives in a step-by-step approach.

3.2.1 ATPG timing using a clock

Several previous studies have proposed the insertion of clocked test hardware to test a clockless design [62], [63]. This introduces a clock tree distribution, synchronous/asyn- chronous interfaces and the possibility of metastability issues and timing violations into highly reliable NCL systems [64]. A clocked test hardware may also decrease the tol- erance for process, voltage and temperature variations in an NCL system. This issue could be even more problematic when considering the online testing of an NCL circuit at its normal run-time.

3.2.2 From the DUT

In order to avoid imposing external timing and clocking onto an NCL circuit for testing purposes, the means by which NCL controls its own timing can be used, i.e., handshak- ing signals. These signals can notify the tester when the POs of the circuit are stable and ready to be checked (Ki=0) or when the NCL circuit is ready to receive a new TV on its

PIs (Ko=1). The problem with using the handshaking signals of a DUT to control the timing of the test is the possibility of having faulty ones. As these signals are carried through physical wires that, like any other physical wire in the circuit, can have physical defects, they can have faulty values. If a handshaking signal is faulty, this means that its value is incorrect, and it should not be used to control a test application. This is similar to using the clock of a DUT for testing synchronous systems where its clock tree can become faulty.

3.2.3 From a golden model

An alternative to using the timing of a DUT for testing is using that of a golden model of the circuit, i.e., a defect-free version of the circuit (usually a simulation model) with back-annotated gate delays, which an industrial ATE can incorporate in its test program. However, the problem with this is that each instance of a circuit may have a slightly different delay from any other one, even with the same technology. So, we cannot assume that the timing of the DUT and the golden model match completely.

3.2.4 From both the DUT and a golden model

As mentioned in Chapter 2, subsection 2.1.3, in a Data/Null wavefront, all signals in an NCL circuit transition monotonically from Data/Null to Null/Data regardless of the 56 Automatic Test Pattern Generation for NCL i.3.2. Fig. etcnrlfrtets qimn program equipment test the for control Test §3.3 Effectiveness of mixed timing in detecting faults 57

ordering and delays of the PIs. Therefore, as an NCL circuit is capable of removing these delays and ordering, we use this feature to eliminate the slight time differences between a golden model and the DUT. The combination of handshaking signals required to handle the timing of ATPG for NCL is shown in Fig. 3.2. It should be noted that this “mixed timing” is implemented as part of the test program for ATPG and is shown in NCL gates for illustration purposes only; i.e., no actual hardware is added to the circuit for testing purposes. Algorithm 1 shows the pseudo-code of the test program while the diagram in Fig. 3.3 illustrates the flow of the test control procedure. This diagram is not to scale and is intended to convey only a high-level concept of the operation. In it, POg is the output of a golden model which is usually a delay-annotated simulation version of the circuit and PO f the output of the physical DUT. As shown in Algorithm 1, the test program monitors POg and PO f and, when it detects Data completeness, issues Data-completeg and Data-complete f , respectively. When both POg and PO f are Data-complete, the test program collects and compares Data from the outputs and applies Null to the inputs.

Shortly after applying Null to the PIs, POg and PO f start receiving Null values and are no longer complete. As a result, Data-completeg and Data-complete f are de-asserted, i.e., returned to zero, and the CollectData/ApplyNull signal also de-asserted. At this point, the test program waits for Null-complete on both POg and PO f and then collects Null and applies Data. Similarly, when both the golden model and DUT are Null- complete, the ATPG collects the stable Null from the PO of the DUT and applies a new set of Data to the PIs.

3.3 Effectiveness of mixed timing in detecting faults

In our proposed mixed-timing method, a fault can affect the outputs in the following different ways.

1. The fault affects the correctness of the handshaking protocol or completeness of the DUT, a situation in which the DUT halts and the fault is detected.

2. Although Data completeness occurs for both the golden model and DUT, because

of the fault, the output values, (POg[i]) and (PO f [i]), respectively, are different. The structure shown in Fig. 3.2 removes all the timing discrepancies between the two models and correctly detects those in their output values, thereby detecting the fault.

3. The fault causes slight delays on the POs from the DUT. As this kind of fault can be detected using the timing of the golden model but is masked by the structure in Fig. 3.2, it cannot be detected using the mixed-timing method (see subsection 3.4.1 for more details). 58 Automatic Test Pattern Generation for NCL i.3.3. Fig. lwo etcnrlprocedure control test of Flow §3.4 Cases where mixed timing does not work for NCL 59

Algorithm 1 Pseudo-code for mixed timing included in the test program of ATE

Data-completeg = bitwise or of the two rails of POg Data-complete f = bitwise or of the two rails of PO f Null-completeg = bitwise nand of the two rails of POg Null-complete f = bitwise nand of the two rails of PO f if Data-completeg and Data-complete f then CollectData/ApplyNull = 1 end if if Data-completeg nor Data-complete f then CollectData/ApplyNull = 0 end if if Null-completeg and Null-complete f then CollectData/ApplyNull = 1 end if if Null-completeg nor Null-complete f then CollectData/ApplyNull = 0 end if always @ posedge(CollectData/ApplyNull) begin if POg != PO f then Fault is detected. Record PI and mark fault as detected PI = Null end if end always @ posedge(CollectNull/ApplyData) begin PI = new Data test vector end

4. The fault does not cause a catastrophic defect in the circuit. However, it estab-

lishes a path between VDD and ground which decreases the battery’s efficiency or increases the power consumption to an unacceptable level. In this situation, not only our proposed ATPG but also all other voltage tests fail to detect the fault (see subsection 3.4.2 for details).

Points 3 and 4 in the above list are analysed in more detail in the next section.

3.4 Cases where mixed timing does not work for NCL

As mentioned in Section 3.3, there are situations in which the proposed ATPG cannot detect a fault. One is when the proposed mixed timing masks the effect of the fault and another when the fault does not affect the voltages of the outputs of the DUT and, therefore, is not detectable by any ATPG, as discussed in the following subsections.

3.4.1 Example of faults causing delay on primary outputs

Considering the NCL circuit shown in Fig. 3.4 as an example, we assume that the gates are semi-static and the GIF of the highlighted TH22 gate stuck-at0 which is activated 60 Automatic Test Pattern Generation for NCL

when the previous value of Z or Z∗ is 1. This means that, in the previous wavefront, the pull-down network drove a value of 1 onto Z, indicating that that wavefront was a Data one and we are moving to a Null one. Table 3.2 illustrates the transitions on the wires of this circuit over time and during a Data-to-Null wavefront. The initial Data on the

PIs of the circuit at t0 activates the GIF stuck-at0 fault, while the Null applied on the PIs propagates the effect of this fault to the POs of the circuit at t3. At this point, the values of the POs of the fault-free and faulty circuits are “D1” or “Data1” and “N” or “Null”, respectively. However, this discrepancy only lasts until t4 when the PO of the fault-free circuit also transitions to Null, and the values of the POs of both circuits become equal.

Fig. 3.4. Example of early detection of Null

Therefore, we can conclude that a GIF stuck-at0 fault in this situation results in a slightly earlier detection of Null-complete when using mixed timing for testing and this fault is not detectable by our voltage test. It could be argued that, by keeping some signals at Data, a voltage test would detect such faults. Since the inputs of the circuit usually do not change simultaneously in a Data to Null wavefront, before Null-complete is expected on the POs, there will be a mixture of Null and Data on both the PIs and POs of the circuit, and a voltage test should be capable of detecting discrepancies. However, as Null and Data flow freely through an NCL circuit, a method that requires control to maintain some signals at Data needs to control the delays and timing in the circuit. The extensive timing analysis required for these tests undermines the delay-insensitive nature of NCL and lowers the possibility of using the normal timing of the circuit for testing which defeats the purpose of the proposed method.

In Chapter 4, we demonstrate that IDDQ testing is a more suitable means of detecting this category of faults.

3.4.2 Example of faults causing no voltage defect

For an example of faults causing no digital logic discrepancy on the POs, consider a ∗ Zn-at0 fault on the GIF of a static NCL gate. This fault is activated when the inputs to the gate transition from Data to Null. In this case, as the previous value of Z is 1, the §3.4 Cases where mixed timing does not work for NCL 61 Data-to-Null complete-Null complete-Null complete-Null fault-free state faulty state N 0 1 l D1 Data-to-Null Data-to-Null D1/ N N 9 l D1/ N 8 l 7 l NN 6 l 5 l 4 l 3 Table 3.2: Early detection of Null for GIF stuck-at0 fault l 2 l 1 l NNNNNNN NNNNNNNNN NNNNNNNNNN N N N Data-to-Null Data-to-Null D1 D1 D1 D1 D1 D1 D1 D1 D1 D1 complete-Data complete-Data 0 1 2 3 4 t t t t t Time A B C D E F Out 62 Automatic Test Pattern Generation for NCL

∗ value of Zd also becomes 1. As a result, both pull-up and pull-down networks compete to set a value on the internal node (M). In this situation, the voltage of node M depends on the strengths of these networks which are functions of the transistor parameters and coupling capacitors.

∗ Fig. 3.5. Behaviour of static NCL gates with Zn-at0

In a situation in which the strengths of these capacitors causes Vtn < VM < VDD −

|Vtp|, as shown in Fig. 3.5, both the NMOS and PMOS transistors of the inverter create a path between the VDD and ground. Although the unknown voltage on node M may or may not be identified by a voltage test as a faulty value, this path elevates the IDDQ of the CUT and is detectable by an IDDQ test. For more detailed examples of faults undetectable by a voltage test, please refer to Chapter 4, Section 4.1. The next section discusses the adaptability of deterministic test generation algo- rithms for NCL and their applicability on a simple NCL circuit. Then, the experimental section demonstrates the results of applying our proposed ATPG on a few NCL circuits.

3.5 Deterministic test generation - N-PODEM

Deterministic test generation algorithms target faults that are difficult to detect by ran- dom test generation and generate TVs specifically for them. Although this process is more time-consuming than random test generation, it typically results in a higher fault coverage and requires less test time. In this section, firstly, a brief description of a common deterministic test generation algorithm, Path Oriented DEcision Making (PODEM) [15], and a simple example are provided. Then, it is shown how the principles of PODEM can be adapted to NCL circuits, with this version called N-PODEM. Finally, two simple examples of applying N-PODEM on/to a small NCL sub-circuit using the timing methods proposed in Section 3.2 are presented. §3.5 Deterministic test generation - N-PODEM 63

3.5.1 Basics of PODEM for Boolean logic

Finding a TV using PODEM is like traversing a binary decision tree. Fig. 3.6 shows the flowchart of the general PODEM algorithm for Boolean logic and the decisions made during the process of test generation using it. Before explaining this flowchart in detail, the following important concepts of deterministic test generation are described.

•“ D/D¯ ”: when a stuck-at0 fault is activated, there is a value 1/value 0 combina- tion on the same wire in the good/faulty circuit, a discrepancy called a D value. Similarly, an activated stuck-at1 fault causes a value 0/value 1 discrepancy or D¯ .

•“ D-frontier”: in PODEM, the main objective is to propagate a D/D¯ value from the site-of-fault to a PO. For this process to be possible, it is necessary to always have at least one gate with D/D¯ on at least one of its inputs and X or “don’t care” on its output. Having such gates means that: 1) the effect of the fault is not masked by the decisions made in the algorithm; and 2) there is at least one input that is not yet set and can be used to propagate the effect of the fault to a PO. A set of such gates is referred to as a D-frontier.

• “Implication”: when a new value is set for a PI, it can propagate and affect other parts of the circuit. In deterministic test generation, it is important to simulate this propagation and check the consequences of the input application, a process called implication.

• “X-path check”: after each implication, it is necessary to ensure that the effects of the decision do not block the paths for propagating D/D¯ to POs. This requires checking if there is at least one path from the D-frontier to the POs, all the wires of which have undetermined values. To do this, in the original PODEM algorithm, all the PIs are initialised to X and their values propagated through the DUT until they reach the POs. A path from the D-frontier to the POs with all the wires on it having values of X is called an X-path. It shows the non-assigned wires in the circuit which are the potential path(s) for propagating the effect of the fault from a sensitised line to the PO(s). If an X-path check fails, it is necessary to apply a “backtracking” step in the algorithm.

• “Backtracking”: many decisions are taken in a deterministic test generation algo- rithm. After each one, it is important to check that the D-frontier is not empty and there is an X-path from it to a PO. In any stage of the algorithm, if the D-frontier is empty or the X-path check fails, it is necessary to return the state of the algorithm to the point at which the last choice was made. This means going backwards on the binary decision tree and clearing the effect of the previous PI application from all parts of the DUT. In PODEM, this return is called backtracking and, after it, the binary decision tree is ready to make a new decision. 64 Automatic Test Pattern Generation for NCL

Fig. 3.6. PODEM flowchart [15]

Now that it is easier to understand the steps involved in the PODEM test generation algorithm, the flowchart in Fig. 3.6 is explained.

1. The algorithm starts by selecting one of the PIs either randomly or based on some testability measurement and applying either 0 or 1 to it. Appendix C provides details of testability measurements and how they apply to NCL circuits. For the sake of simplicity and to keep the binary decision diagram short, in this chapter, we always begin from the first input and move to the last, with values of 0 and 1 when detecting stuck-at1 and stuck-at0 faults, respectively.

2. The next step is implication or applying the effect of this newly assigned PI to the parts of the circuit affected by it.

3. If applying these effects results in propagating a D/D¯ to one or more of the POs, PODEM successfully finds a TV that detects the current file. This TV can be ob- §3.5 Deterministic test generation - N-PODEM 65

tained by traversing the binary decision tree from the leaf up to the root in an absolutely ascending manner, i.e., no downward movement is accepted (for an example, see Fig. 3.8). At this point, the algorithm exits successfully.

4. If there is not a D/D¯ on the POs, it is necessary to determine if either of the two conditions that require backtracking has occurred, that is, check if the D-frontier is empty or the X-path check fails. In either case, we need to backtrack on the binary decision tree to the latest decision node. If there is an unassigned value for the current PI, we apply this value and move to the implication step. If not, we completely reverse this PI.

5. If there is any unassigned PI, we move to the first step in the flowchart and select a new PI. If there is no other PI left, which means that all possible combinations on the PIs of the circuit have been tried and a TV for detecting the fault not found, PODEM exits with failure for the current fault. We mark this fault as undetectable by PODEM and move to the next one on the list of faults. Improved versions of the simple original PODEM or other deterministic generation algorithms capable of detecting these undetected faults are mentioned in the future work section of Chapter 6. Interested readers can refer to [12] for examples of more advanced deterministic test generation algorithms.

3.5.1.1 An example of PODEM

Table 3.7 shows the steps in PODEM applied to the sub-circuit in Fig. 3.7 and the implication of values through the DUT, with the circled letters the values defined in that step. Also, the diagram in Fig. 3.8 illustrates the binary decision tree that selects the values of the PIs in which two backtracking steps can be seen, one related to an empty D-frontier and another to the failure of an X-path check.

Fig. 3.7. Example of PODEM

The following describes the steps shown in Table 3.3 for generating a TV for the sub-circuit in Fig. 3.7. 66 Automatic Test Pattern Generation for NCL

Fig. 3.8. Example of PODEM binary decision tree

• Step 0: all inputs are initially set to X.

• Step 1: input ‘A’ is selected and set to 0, an assignment that causes only line l1 to become 0.

• Step 2: input ‘B’ is selected and set to 0 whereby line l2 becomes 0 and l6, l7 and l9 1. As, at this point, the site-of-fault becomes 1 instead of the desired D¯ value, there is no D/D¯ in the circuit and, consequently, the D-frontier is empty. Therefore, it is necessary to backtrack to the previous node of the binary decision tree (Fig. 3.8).

• Step 3: the assignment made in Step 2 is reversed and ‘B’ is set to 1. The implica-

tion does not go further than l2 = 1 and l6 = 0 because the value of input ‘C’ is X.

• Step 4: input ‘C’ is set to 0 which makes l3 = 0 and, consequently, results in l7 = 0,

l8 = 0, l9 = D¯ , and l13 = 0. At this point, the fault is activated, the AND gate is the only gate in the D-frontier and the X-path check is successful.

• Step 5: a value of 0 is assigned to input ‘D’ which causes l4 = l10 = l11 = 0,

l12 = l15 = 1 and l14 = 0. At this point, as the D-frontier is empty and the X-path check fails, it is necessary to backtrack to the previous node in the binary decision tree.

• Step 6: the value of input ‘D’ is reversed to 1 which results in l4 = l10 = l11 = 1,

l12 becoming 0 and, consequently, l14 becomes D.¯ At this point, the XOR gate is in the D-frontier and the X-path check is successful.

• Step 7: input ‘E’ is set to 0 which makes l15 = 0 and the value of l16 becomes

D. At this point, the OR gate closest to the PO is in the D-frontier. Because l13 is §3.5 Deterministic test generation - N-PODEM 67

already 0 from Step 4, without further action, the D value propagates to l17 and to the PO. By traversing the binary decision tree in Fig. 3.8, it can be seen that

ABCDE={01010} is a test for the l9 stuck-at1 fault.

3.5.2 Adaptation of PODEM for NCL (N-PODEM)

When implementing PODEM for NCL, all the rules, except one, of the original PO- DEM for Boolean logic explained in subsection 3.5.1 are followed. The NCL circuit is initialised with all 0 instead of all X and, for backtracking, the N- rather than X-paths are checked. The rest of this subsection explains the reasons behind this decision and provides an example of applying N-PODEM on a small NCL circuit. For implication in PODEM, it is important to know the output of each gate in the presence of {0, 1, D, D,¯ X} on each of its inputs which can be achieved using the truth table of the gate. For example, consider the truth table of a Boolean AND gate repre- sented in the Karnaugh map in Table 3.4. The values on the top and left-hand side show the values of inputs A and B, respectively. A Karnaugh map is chosen rather than a truth table in order to make better use of the space on this page. In the same way, when applying PODEM on NCL circuits, implication can be per- formed using the truth tables of the NCL gates but, because of their GIFs, there is an extra input for each gate, i.e., Z∗. Table 3.5 illustrates the truth table of the TH22 gate for different values of Z∗. The Karnaugh map for Z∗ = D¯ is exactly the same as that for Z∗ = D when all the D values are replaced with D¯ . As can be seen, the truth table for a TH22 NCL gate when the circuit is initialised with all X is much more complicated than that of an AND gate. It is necessary to review the reasons behind using an X-path check in Boolean logic to determine if it can be applied to NCL circuits or needs to be adapted. In an X-path check of a Boolean circuit, we use X as a symbol to indicate that there are still paths from a D/D¯ in this circuit to a PO not masked by a value of 0/1. It is important that the gates on an X-path have unassigned inputs that can be used to propagate a D/D¯ . However, because of the GIFs, it is possible that an NCL gate with non-X values on all its inputs has an X value on its output. For example, as shown in 3.5(c), a combination for the TH22 gate of Z∗ = X, A = D and B = D¯ results in a value of X, whereby an X-path is no longer a sign of non-assigned paths in the circuit. However, using an X-path check for a circuit containing this gate could mislead us into thinking that there is at least one unassigned input of this gate that can propagate a D/D¯ to a PO. It can be seen that using an X value for an N-PODEM test of NCL not only comes at a high computational cost but also can cause confusion and erroneous results in the algorithm. Fortunately, only a slight change in the original PODEM algorithm is required to adapt it for NCL, that is, using an N instead of X value. Like X, N is not a real physical 68 Automatic Test Pattern Generation for NCL tp0 nt hoenwPI new Choose Init. X X X X X X X X 1 X 0 X 0 1 X 0 1 0 X 1 0 7: Step 1 X 0 6: Step X 0 5: Step 0 4: X Step 0 3: Step X 2: Step X 1: Step 0: Step 1l 3l 5l 7l 9l0l1l2l3l4l5l6l7Rsl etaction Next Z Result l17 l16 l15 l14 l13 l12 l11 l10 l9 l8 l7 l6 l5 E l4 D l3 C l2 B l1 A 0 hoenwPI new Choose X X X X X X X X X X X X X X X X 1 0 X X X X X X 0 0 X X 1 0 0 0 0 0 0 0 X X 0 0 0 0 0 1 al .:Eapeo OE tp for steps PODEM of Example 3.3: Table hoenwPI new Choose X X X X X X X X X X X 0 1 X 0 0 0 1 1 D D D D 1 ¯ ¯ ¯ ¯ mt -rnirBacktrack D-frontier Empty X X X X X X X X X X X 1 0 1 0 0 1 0 0 0 D D al ciae hoenwPI new Choose activated Fault X X X X l 0 ¯ ¯ 9 tc-t fault stuck-at0 hoenwPI new Choose X X X 0 1 D 1 D -ahcekfi Backtrack fail check X-path X Success §3.5 Deterministic test generation - N-PODEM 69

0 1 D DX¯ 0 0 0 0 0 0 1 0 1 D D¯ X D 0 D D 0 X D¯ 0 D¯ 0 D¯ X X 0 X X X X

Table 3.4: Karnaugh map of Boolean AND gate

0 1 D DX¯ 0 1 D DX¯ 0 1 D DX¯ 0 0 0 0 0 0 0 0 1 D D¯ X 0 0 X X X X 1 0 1 D D¯ X 1 1 1 1 1 1 1 X 1 D D¯ X 0 1 X D D 1 D 0 X D D 1 D 1 X D X D D X X 0 0 D X D¯ 0 0 D¯ X X D¯ D¯ 1 1 D¯ X D¯ X D¯ X D¯ X 1 D 1 X 0 X 1 X X X X X 1 X X X X X X X X X X X X X

a) Z∗ = 0 b) Z∗ = 1 c) Z∗ = X d) Z∗ = D

Table 3.5: Karnaugh map of TH22 gate including X value voltage value but a concept defined for the N-PODEM algorithm. In this definition, the output of an NCL gate transitions from N to 0, 1, D or D¯ if, and only if, all its inputs have non-N values. When N is on the inputs and outputs of gates, it is interpreted as X and, when on the GIFs, as zero.

0 1 D DX¯ 0 0 0 0 0 N 1 0 1 D D¯ N D 0 D D 0 N D¯ 0 D¯ 0 D¯ N N N N N N N

Table 3.6: Karnaugh map of TH22 gate in N-PODEM

Using an N instead of X value in N-PODEM for NCL is in line with the characteristics of NCL circuits and does not affect the structure of the NCL DUT. Firstly, as mentioned in Section 3.2, the interleaving {Null, Data} behaviour of NCL is maintained. Therefore, at the time of applying TVs to the DUT, with no need for any further action, N is the initial value before any Data TV. Unlike Boolean logic, it is a safe and correct practice for NCL circuits to be initiated by all zero values. Although in a synchronous design, initialising a circuit with all zeroes determines the Data of the circuit, in NCL, it indicates Null and has no Data meaning. Secondly, as mentioned in Chapter 2, in NCL, in each Data or Null wavefront, any non-faulty NCL gate transitions only once and to the correct 70 Automatic Test Pattern Generation for NCL

Fig. 3.9. Example of N-PODEM value. Then, as the relevant wire retains its value until the opposite wavefront arrives, some combinations in the truth table do not occur; for example, it is impossible to have Z∗ = 1 while one of the inputs is 0. Therefore, replacing an X value with an N one in N-PODEM makes the algorithm more accurate for NCL circuits, decreases the computational cost of the test generation algorithm and maintains the structure of the NCL DUT. Table 3.6 shows the Karnaugh map of the TH22 gate in N-PODEM. The next subsection presents an example of the application of N-PODEM on a simple NCL circuit to verify our proposed method.

3.5.2.1 An example of N-PODEM

Tables 3.7 and 3.8 show the steps in N-PODEM for detecting l9 stuck-at1 and l9 stuck-at0 faults in the circuit in Fig. 3.9. This NCL circuit has a right level of complexity for testing because it consists of different kinds of NCL gates and, more importantly, has a reconvergent fanout. After a Null phase, ABCD = 0001 can detect a stuck-at1 fault and ABCD=110X a stuck-at0 one on l9.

3.6 Experimental results

The proposed ATPG is implemented in an HDL/PLI environment [17] (see Appendix A) and simulated using Mentor Graphics’s ModelSim™. A Verilog testbench is configured and used to generate the TVs, convert them to dual-rail values, use the structure in Fig. 3.2 to control the timing of the ATPG and check the POs for discrepancies. It uses PLI [17] to inject faults into the DUT, remove faults and prepare a list of faults (fault collapsing) [17]. Firstly, a TV is generated using the $random() function in the Verilog testbench and applied to the PIs of the circuit. Then, a fault is injected into the circuit using the §3.6 Experimental results 71 Success - X-path failSuccess Backtrack - D ¯ 0 D N Fault Activated Choose new PI 1 ¯ ¯ ¯ D D D stuck-at1 stuck-at0 9 9 D l l 0 D ¯ ¯ ¯ D D 0 D 0 D ¯ ¯ ¯ D D D D ¯ ¯ ¯ D D D 1 0 1 N N N N N N N N Choose new PI 0 N N N N N N N N Choose new PI 1 0 1 Table 3.7: N-PODEM example steps - Table 3.8: N-PODEM example steps - 0 0 0 0 0 0 0 0 0 N 1 1 0 1 N 0 0 1 N N 1 1 N N N N N N N N Choose new PI 0 N N 0 0 N N N N N N N N Choose new PI 1 NNN 0 NNN 1 ABCDZl1 l2 l3 l4 l5 l6 l7 l8 l9 l10 l11 l12 l13 l14 Result Next Action 0 ABCDZl1 l2 l3 l4 l5 l6 l7 l8 l9 l10 l11 l12 l13 l14 Result Next Action Step 2:Step 3: 1 1 1 Step 0:Step 1: N N N N N N N N N N N N N N Init. Choose new PI Step 0:Step 1: N N N N N N N N N N N N N N Init. Choose new PI Step 2:Step 3: 0 0 0 Step 4: 0 0 0 Step 5: 0 0 0 72 Automatic Test Pattern Generation for NCL

PLI function $faultInjection() implemented in [17]. Next, the test program waits for the CollectData/ApplyNull signal to be asserted and, if the PO values are different from the expected ones, the injected fault is marked as detected. Then, the current injected fault is removed from the circuit using the PLI function $faultRemoval() [17], the circuit is reset and another undetected fault is injected into the circuit in the presence of the current random TV. This process is repeated for each undetected fault and, after it is finished, the current TV is stored in the final test set only if it detected a sufficient number of single stuck-at faults. This significance is evaluated by a function of the current numbers of detected faults and stored TVs, with each successive TV expected to detect fewer faults than previous ones. The fault coverage increases as 1 − exp(−n) with n TVs which means that the fault coverage graph versus the number of TVs asymptotically approaches 100% and slows down when it gets close to 100% but never reaches it. The results obtained from conducting the proposed ATPG on static and semi-static implementations of the selected NCL circuits are shown in Tables 3.9 and 3.10, respec- tively. This small number of circuits, chosen from the UNCLE asynchronous tool set [14], represent a range of circuit complexities from 4 to 2,633 gates, roughly equivalent to up to 10,000 CMOS Boolean gates. The specifications of each DUT are shown in columns 1 to 7 as: 1) name of the DUT; 2) number of gates; 3) number of PIs; 4) number of POs; 5) number of faults that can occur on the gate inputs (Inp. Faults); 6) number of faults on the GIFs (GIF faults); and 7) total number of faults. Column 8 shows the fault coverage of faults on the gate inputs (Inp. FC) and columns 9 and 10 that of the GIF faults using the golden timing (GIF-FC-GT) and mixed-timing (GIF-FC-MT) methods, respectively. Columns 11 and 12 display the total fault coverage for each circuit using the golden timing (GT) and mixed- timing (MT) methods, respectively with, for a better comparison, the same number of TVs applied for both. The number of the {Null, Data}/{Data, Null} TV pairs required to achieve the reported fault coverage is shown in the last column in each table. The largest test case, which has 2,633 NCL gates and 24,454 faults (including those on GIFs), has a 98.85% fault coverage using 383 TVs. Using the mixed-timing method, the discrepancies caused by Z∗ stuck-at0 faults are masked, and the fault coverage reduces to 89.03%. As the early detection of Null is considered fault detection in the golden timing method, the average fault coverage for this method becomes 98.41% for the reported circuits. When mixed timing is used, the structure shown in Fig. 3.2 eliminates delays, and the early detection of Null is masked which reduces the average fault coverage shown in Table 3.9 to 88.98%. The reason for the different fault coverages of these methods is explained in more detail in Section 3.2. A decrease in fault coverage is based on the ratio of the GIF faults to faults on the inputs of the gates which becomes smaller for circuits with higher percentages of 4-input and 3-input NCL gates. The average fault coverages for the semi-static implementation of the selected circuits §3.6 Experimental results 73 Table 3.9: Experimental results obtained from proposed random ATPG for static NCL Table 3.10: Experimental results obtained from proposed random ATPG for semi-static NCL NameFA GatesMux2 PIsPP-G POsCount4 Inp. 4 6 FaultsCount8 11 GIF 52 FaultsMult8 134 6 3 All FaultsAdd32 8 1 Inp. 458Div32 2 FC 4 1 (%) 640Mod16 7 GIF-FC-GT(%) 4 16 1212 8Alu16 64 GIF-FC-MT(%) 1392 16 48 All-FC-GT(%) 24 40 32 32 2633 76 256 All-FC-MT(%) 32 696 34 16 TV 2800 3072 32 7054Name 16 16 7136 160 36FA 436 1704 15354 GatesMux-2 1920 PIs 4200PP-G POs 4528 40 56 416Count-4 4504 6 1132 Inp. 9100 112 4 Faults 4992Count-8 52 11 GIF 11254 FaultsMult-8 11664 134 3 6 AllAdd-32 Faults 24454 98.86 1 98.26 8 100 100 100 458Div-32 Inp. 2 98.89 100 1 FC 4 640 97.57 (%)Mod-16 98.84 4 16 7 GIF-FC-GT(%) 1212 8Alu-16 64 98.03 1392 GIF-FC-MT(%) 16 48 99.11 40 98.85 All-FC-GT(%) 32 24 32 2633 98.12 99.32 256 100 100 76 32 97.64 All-FC-MT(%) 100 696 34 16 98.76 2800 TV 3072 98.85 32 7054 7136 8 74.11 8 73.85 80 73.12 218 74.32 18 852 15354 72.64 960 75 75 73.76 75 2100 73.85 2264 48 336 4550 914 32 3652 98.95 98.49 94 99.27 4032 99.05 9154 97.59 9400 98.81 19904 100 100 98.34 98.86 100 98.26 100 100 100 98.89 100 97.57 89.50 88.86 89.66 98.84 89.44 98.03 88.27 89.10 97.24 129 137 89.03 97.5 92.86 94.44 95.82 91.96 152 94 90 100 97.18 198 100 100 95.33 211 96.95 383 4 32 4 47.24 44.44 45.82 47.5 47.18 50 45.33 50 46.95 50 98.02 98.94 98.15 99.40 98.48 97.06 98.38 100 100 100 86.48 87.50 86.58 86.10 85.58 86.34 86.63 68 91.67 87.50 59 73 89.36 84 144 173 309 4 4 17 74 Automatic Test Pattern Generation for NCL

are 97.98% and 86.36% using the golden timing and mixed-timing methods, respectively. The total fault coverages shown in Table 3.10 for the semi-static implementations are always slightly smaller than those for the static ones (Table 3.9). This is because the total number of faults for the former is smaller than that for the latter, and the effect of each undetected fault is greater in the final fault coverage. The reported number of TVs is in sync with the complexity of the circuits based on their numbers of inputs, outputs, gates and sequential feedback loops. However, these numbers could be optimised using common tricks to improve test generation al- gorithms. As the simple sequential circuits in the experimental results (the divider, counters and Mod16) only have one functional loop, it is easy to customise the ATPG specifically for them. In Chapter 5, we combine our proposed asynchronous DFT methods with our proposed ATPG and IDDQ ones to address more complex sequential circuits.

3.7 Summary

Working towards the implementation of robust, reliable, energy-saving NCL circuits, this chapter presented aspects of the methods required for their automatic test gen- eration, with the proposed ATPG differing from previous work in that it is clockless, self-timed and capable of detecting GIF faults. To configure these methods, firstly, Boolean difference calculus was used to establish that a single pair of {Null, Data} or {Data, Null} for each detectable stuck-at fault was sufficient for testing NCL circuits. Secondly, we located a certain combination of handshaking signals from the DUT and a delay-annotated golden model to effectively control the timing of our proposed ATPG. Then, the efficiency of this mixed-timing method was examined to show that, for some faults, IDDQ testing is more efficient than ATPG, with IDDQ test of NCL circuits studied in Chapter 4. Finally, the principles of the existing PODEM deterministic ATPG were examined in the context of NCL circuits. For this purpose, the concepts of the original PODEM algorithm were adapted specifically for NCL, and a slightly modified algorithm called N-PODEM was proposed. In the experimental results section, a small set of test cases with up to 2,633 gates confirmed the validity of the proposed ATPG with 98.41% and 88.98% average fault coverages using the golden timing and mixed-timing methods for static NCL circuits, respectively. We demonstrated that, in both theory and practice, our ATPG could detect single stuck-at faults with a very high level of coverage without using a clock or chang- ing the internal data structure of NCL gates. Suggestions for improving the proposed ATPG techniques are provided in Chapter 6. Chapter 4

IDDQ Test for Null Convention Logic

As shown in Chapter 3, faults on the internal feedback of NCL gates may not be de- tectable by a voltage test. This chapter describes how an IDDQ test can detect single stuck-at faults on the GIF lines of static and semi-static NCL gates. It works based on the fact that, in a defect-free CMOS circuit, there is no static path between the VDD and ground, and only a subtle leakage current exists when the circuit is in its quiescent state. Therefore, if a large leakage current, of at least one order of magnitude larger than the fault-free current, exists in the CUT, the IDDQ test recognises it as a sign of fault de- tection. Although IDDQ testing has been known to be effective for industrial clocked circuits, it has not been extensively practised for asynchronous ones. While some work has been conducted on the IDDQ testing of 0.8 µm quasi-delay-insensitive circuits [76], to the best of our knowledge at the time of writing this dissertation, none has been undertaken on that of NCL circuits. In this chapter, Sections 4.1 and 4.2 explain, in detail, the effects of stuck-at faults on the GIF lines of static and semi-static NCL gates, respectively, and show why IDDQ test- ing is required to detect some of them. Section 4.3 reports the results of applying IDDQ tests on all NCL gates and a few NCL circuits, with Hspice used to implement static and semi-static transistor-level NCL gates in (1.8 µm, 1.8 V) and (45 nm, 1.1 V) technolo- gies, for which the VDD currents in the fault-free and faulty circuits are measured and compared. The experimental results reported in Section 4.3 show that a faulty current is orders of magnitude higher than a fault-free leakage one. This considerable differ- ence indicates that IDDQ testing might be an efficient and low-cost method for detecting stuck-at faults on the GIF of NCL gates. Finally, Section 4.4 concludes this chapter. More detailed background information on IDDQ testing is provided in Chapter 2, subsection 2.2.4.

4.1 Behaviours of static NCL gates with faulty GIF lines

The transistor-level implementation of static NCL gates previously shown in Fig. 2.3 is ∗ ∗ repeated in Fig. 4.1 for easy reference. In this implementation, Zn and Zd become faulty independently of each other, forming the following four possible faults on the GIF of

75 76 IDDQ Test for Null Convention Logic

static NCL gates:

∗ • stuck-at0 fault on the Null feedback line (Zn-at0);

∗ • stuck-at1 fault on the Null feedback line (Zn-at1);

∗ • stuck-at0 fault on the Data feedback line (Zd -at0); and

∗ • stuck-at1 fault on the Data feedback line (Zd -at1).

Fig. 4.1. Static transistor-level implementation of NCL gates

As explained in Chapter 3, to detect a fault, a TV must be capable of activating it and then propagating it to the POs. To detect faults on the GIF of NCL gates, it must be able to set the inputs of a gate so that they propagate the activated fault from the GIF to the output of that gate. To implement the four possible GIF faults in our Hspice models of NCL gates, we need a fault model. Fig. 4.2 shows two potential models for ∗ implementing Zn-at1 faults. A stuck-at fault is usually modelled as a short-circuit between the site-of-fault and one of the voltage sources; i.e., the VDD for stuck-at1 and ground for stuck-at0 ones [12]. A short-circuit can be simulated by a small resistor (≤ 1Ω) in parallel with the shorted component [95] which looks like the model in Fig. 4.2(a)1. In an NCL gate, because of its GIF line, a fault modelled as in Fig. 4.2(a) is always automatically transferred as a stuck-at fault from the GIF to the output of the gate and, therefore, is not difficult to detect. On the other hand, in the model shown in Fig. ∗ ∗ ∗ 4.2(b), the connection between Zn and Z is open, with Zn stuck at the VDD and Zd still connected to Z. This means that, in this model, we need to find a way of propagating the fault from the GIF to Z. As the model shown in Fig. 4.2(b) is closer to the real effect of the fault than the one in Fig. 4.2(a), we choose it as the stuck-at fault model in this work.

1However, in [76], the authors used 100Ω and 10kΩ to model short-circuits. §4.1 Behaviours of static NCL gates with faulty GIF lines 77

∗ Fig. 4.2. Zn-at1 fault models

As mentioned in Chapter 2, any pipeline stage inside an NCL circuit constantly goes through monotonic changes between Null and Data, as represented in Fig. 4.3 which shows the four different phases of Null-complete, Null-to-Data, Data-complete and Data-to-Null.

Fig. 4.3. Monotonic transitions of NCL pipelines between Null and Data, in four phases

Tables 4.1 to 4.5 show the behaviours of generic static NCL gates in the absence or presence of a stuck-at fault on either of their GIF lines. The four phases of the pipeline stage shown in the rows and the state of each of the four transistor networks of the NCL gates and the values of their outputs (Z) in the columns. In these tables, “toN”, “HN”, “toD” and “HD” are short for “GoToNull”, “HoldNull”, “GoToData” and “HoldData”, respectively. Table 4.1 shows these values for a fault-free NCL gate, where all the networks behave as expected and drive their expected values onto the output of the gate. In Tables 4.2 to 4.5, although there is a stuck-at fault on one of the GIF lines of the NCL gate, the four transistor networks always behave correctly during the Null- complete and Data-complete phases because, during these phases, the GoToData and ∗ ∗ GoToNull networks rewrite the output. Therefore, as the values of Zn and Zd are irrele- vant to the value of the output, the output does not reflect the presence of a fault inside the gate. Based on this analysis, when considering Tables 4.2 to 4.5, we study only the 78 IDDQ Test for Null Convention Logic

Table 4.1: Behaviour of generic fault-free static NCL gate

toN HN toD HD Z Null-complete on on off off N Null-to-Data off on off off N Data-complete off off on on D Data-to-Null off off off on D behaviour of the gate during the Null-to-Data and Data-to-Null phases. Table 4.2 presents the behaviours of the transistor networks in a static NCL gate ∗ when there is a stuck-at0 fault on the Zn feedback line. In this situation because, during ∗ a Null-to-Data phase, the previous value of the output is 0, the stuck-at0 value on Zn does not have any effect on the correct functioning of the gate. It is only in the Data-to- ∗ ∗ ∗ Null phase that, because Zn has a stuck-at0 value, Zn = 0 and Zd = 1, and both paths to HoldNull and HoldData conduct. This causes a fight condition regarding setting the voltage of node M and, as a result, the value of Z.

∗ Table 4.2: Behaviour of generic faulty static NCL gate - Zn-at0 fault

toN HN toD HD Z Null-complete on on off off N Null-to-Data off on off off N Data-complete off off on on D Data-to-Null off on off on fight Early Ptrans on

Table 4.3 presents the behaviours of the transistor networks in a static NCL gate ∗ when there is a stuck-at1 fault on the Zn feedback line. In this situation because, during ∗ a Data-to-Null phase, the previous value of the output is 1, the stuck-at1 value on Zn does not have any effect on the correct functioning of the gate. It is only in the Null-to- ∗ ∗ ∗ Data phase that, because Zn has a stuck-at1 value, Zn = 1 and Zd = 0, and neither of the paths to HoldNull and HoldData conducts. This causes a float condition in which no transistor network sets the voltage of node M and, as a result, the value of Z remains undetermined.

∗ Table 4.3: Behaviour of generic faulty static NCL gate - Zn-at1 fault

toN HN toD HD Z Null-complete on on off off N Null-to-Data off off off off float Early Ptrans off Data-complete off off on on D Data-to-Null off off off on D

Table 4.4 presents the behaviours of the transistor networks in a static NCL gate §4.1 Behaviours of static NCL gates with faulty GIF lines 79

∗ when there is a stuck-at0 fault on the Zd feedback line. In this situation because, during ∗ a Null-to-Data phase, the previous value of the output is 0, the stuck-at0 value on Zd does not have any effect on the correct functioning of the gate. It is only in the Data-to- ∗ ∗ ∗ Null phase that, because Zd has a stuck-at0 value, Zn = 1 and Zd = 0, and neither the path to HoldNull nor that to HoldData conducts. This causes a float condition in which no transistor network sets the voltage of node M and, as a result, the value of Z remains undetermined. ∗ Table 4.4: Behaviour of generic faulty static NCL gate - Zd -at0 fault

toN HN toD HD Z

Null-complete on on off off N

Null-to-Data off on off off N

Data-complete off off on on D

Data-to-Null off off off off float Early Ntrans off

Table 4.5 presents the behaviours of the transistor networks in a static NCL gate ∗ when there is a stuck-at1 fault on the Zd feedback line. In this situation because, during ∗ a Data-to-Null phase, the previous value of the output is 1, the stuck-at1 value on Zd does not have any effect on the correct functioning of the gate. It is only in the Null-to- ∗ ∗ ∗ Data phase that, because Zd has a stuck-at1 value, Zn = 0 and Zd = 1, and both paths to HoldNull and HoldData conduct. This causes a fight condition regarding setting the voltage of node M and, as a result, the value of Z.

∗ Table 4.5: Behaviour of generic faulty static NCL gate - Zd -at1 fault

toN HN toD HD Z Null-complete on on off off N Null-to-Data off on off on fight Early Ntrans on Data-complete off off on on D Data-to-Null off off off on D

Any fight or float situation leaves point M at a voltage that is a function of transistor sizing and the circuit’s parasitic and coupling capacitors. Then, the output (Z) may have either a weak or strong logic voltage because of the gate’s analogue voltage gain. A weak output voltage occurs when Vtn < VM < VDD − |Vtp|. If VM < Vtn or VM > VDD − |Vtp|, the output (Z) receives a strong voltage and may become a stuck-at0 or stuck-at1 fault. For a weak voltage, and even sometimes a strong one, a voltage test fails to detect the fault and another testing technique is required. We describe the behaviours of gates in the presence of each GIF fault in subsections 4.1.1 to 4.1.4, and include possible methods for testing each fault. In each of Figs. 4.4 to 4.8, the dotted line next to the inverter of the static NCL gate indicates the path 80 IDDQ Test for Null Convention Logic

between the VDD and ground which causes an elevation in the faulty IDDQ compared to the fault-free current. ∗ ∗ Because of the similar ways in which Zn-at0 and Zd -at1 faults affect the behaviours of NCL gates, the descriptions of them in subsections 4.1.1 and 4.1.2, respectively, are symmetrical with minor but significant differences, as are those in subsections 4.1.3 and 4.1.4.

∗ 4.1.1 Stuck-at0 fault on Null feedback line (Zn-at0)

∗ A stuck-at0 fault on the Null feedback line (Zn-at0) of an NCL gate is activated when the inputs to that gate transition from Data to Null (Table 4.2). In this case, as the previous ∗ value of Z is 1, the value of Zd is also 1. As a result, both pull-up and pull-down networks conduct and fight over setting a value for the internal node (M) (Fig. 4.4). Then, the voltage of M depends on the strengths of these networks which are functions of the transistors parameters and coupling capacitors.

∗ Fig. 4.4. IDDQ test for static NCL gates- Zn-at0 fault

If VM < Vtn, only the PMOS transistor in the inverter conducts and transitions Z to a value of 1. As this does not allow a Null-completeness which results in a system halt, the fault is detectable by a voltage test. If VM > VDD − |Vtp|, only the NMOS transistor ∗ of the inverter conducts and transitions Z to a value of 0. Since Zn-at0 is activated when the inputs of the gate transition from Data to Null, as discussed in the example in Chapter 3, subsection 3.4.1, this situation results in the early detection of Null which is not detectable using a voltage test. Finally, if Vtn < VM < VDD − |Vtp|, as shown in Fig. 4.4, both the NMOS and PMOS transistors in the inverter conduct and create a path between the VDD and ground which elevates the IDDQ and is detectable by IDDQ test. ∗ The IDDQ of a 32-bit NCL adder with an injected Zn-at0 fault can be compared with that of a golden model of the same adder in the graph in Fig. 4.6. §4.1 Behaviours of static NCL gates with faulty GIF lines 81

∗ 4.1.2 Stuck-at1 fault on Data feedback line (Zd -at1) ∗ A stuck-at1 fault on the Data feedback line (Zd -at1) of an NCL gate is activated when the inputs to the gate transition from Null to Data (see Table 4.5) and, as the previous ∗ ∗ value of Z or Z is 1, Zd has the same value. As a result, both the pull-up and pull-down networks conduct and fight over setting a value for the internal node (M) (Fig. 4.5). In this situation, the voltage of M depends on the strengths of these networks which are functions of the transistors parameters and coupling capacitors.

∗ Fig. 4.5. IDDQ test for static NCL gates- Zd -at1 fault

As, if VM < Vtn, only the PMOS transistor conducts and transitions Z to a value of

1 when 0 is expected, a voltage test can detect this fault whereas, if VM > VDD − |Vtp|, ∗ only the NMOS transistor conducts and transitions Z to a value of 0. A Zd -at1 fault is activated when the inputs to the gate transition from Null to Data. Therefore, unlike ∗ in the case of the Zn-at0 fault, a value of 0 propagates to the POs when a value of 1 is expected, with a voltage test capable of detecting this fault. Finally, if Vtn < VM <

VDD − |Vtp|, as shown in Fig. 4.5, both the NMOS and PMOS transistors in the inverter conduct and create a path between the VDD and ground which elevates the IDDQ and is detectable by IDDQ test (Fig. 4.6).

∗ 4.1.3 Stuck-at0 fault on Data feedback line (Zd -at0) ∗ A stuck-at0 fault on the Data feedback line (Zd -at0) of an NCL gate is activated when the inputs to that gate transition from Data to Null (Table 4.4) and, as the previous value ∗ of Z is 1, Zd has the same value. As a result, neither the pull-up nor pull-down network conducts and, as shown in Fig. 4.7, node M floats. ∗ In this situation, the previous values of Z, Zd and M are 1, 1 and 0, respectively. If M retains its value of 0, this causes Z to keep its value of 1 which, if propagated to the POs, can prevent Null-completeness and halt the system, thereby enabling fault detection. However, because of parasitic leakage, the voltage of node M is likely to increase after a while and, if it reaches Vtn < VM < VDD − |Vtp|, both the NMOS and PMOS transistors 82 IDDQ Test for Null Convention Logic

∗ ∗ Fig. 4.6. IDDQ graphs of golden, and Zn-at0 and Zd -at1 faults

∗ Fig. 4.7. IDDQ test for static NCL gates- Zd -at0 fault

in the inverter conduct and create a path between the VDD and ground. As a result, the IDDQ gradually increases, as shown in Fig. 4.9 and, after a certain point, the fault becomes detectable by an IDDQ test.

∗ 4.1.4 Stuck-at1 fault on Null feedback line (Zn-at1)

∗ A stuck-at1 fault on the Null feedback line (Zn-at1) of an NCL gate is activated when the inputs to that gate transition from Null to Data (Table 4.3) and, as the previous value ∗ of Z is 0, Zd has the same value. As a result, neither the pull-up nor pull-down network ∗ conducts and, as shown in Fig. 4.8, node M floats, with the previous values of Z, Zd and M 0, 0 and 1, respectively. As node M retains its high voltage through the surrounding coupling capacitors, which can result in a value of 0 when 1 is expected on the POs, the fault is detectable by a voltage test. However, after a while, M starts losing its charge which eventually §4.2 Behaviours of semi-static NCL gates with faulty GIF lines 83

∗ Fig. 4.8. IDDQ test for static NCL gates- Zn-at1 fault

drops to Vtn < VM < VDD − |Vtp|. Then, as both the NMOS and PMOS transistors in the inverter conduct and create a path between the VDD and ground, and the IDDQ gradually increases, as shown in Fig. 4.9, after a certain point, the fault becomes detectable by an

IDDQ test. ∗ ∗ From the discussion of the Zn-at0 and Zd -at1 faults, it can be seen that a balanced transistor sizing, where the pull-up and pull-down networks are equally strong, not only equalises the gates’ rise and fall delays but also makes all the GIF faults on the static NCL gates testable by an IDDQ test.

∗ ∗ Fig. 4.9. IDDQ graphs of golden, and Zd -at0 and Zn-at1 faults

4.2 Behaviours of semi-static NCL gates with faulty GIF lines

The transistor-level implementation of semi-static NCL gates previously shown in Fig. 2.4 is repeated in Fig. 4.10 for easy reference. 84 IDDQ Test for Null Convention Logic

Fig. 4.10. Semi-static transistor-level implementation of NCL gates

In this implementation, a GIF line can have two different stuck-at faults that form two possible faults on the GIF lines of semi-static NCL gates:

• a stuck-at0 fault on the feedback line, i.e., GIF-at0; and

• a stuck-at1 fault on the feedback line, i.e., GIF-at1.

For the same reason as choosing the fault model in Fig. 4.2(b) for the static imple- mentation of NCL gates (Section 4.1), the model in Fig. 4.11(b) is used to represent faults on the GIF lines of semi-static NCL gates.

Fig. 4.11. Models of GIF-at0 and GIF-at1 faults

The following subsections explain the behaviours of semi-static NCL gates under the influence of each of GIF-at0 and GIF-at1 faults in more detail.

4.2.1 Stuck-at0 on GIF line (GIF-at0)

A stuck-at0 fault on the GIF of a semi-static NCL gate (GIF-at0) causes a weak value of 1 on the internal node (M). If neither the GoToNull nor GoToData networks conducts, as §4.2 Behaviours of semi-static NCL gates with faulty GIF lines 85

the gate is in a hold condition, the GIF-at0 fault is activated only if the gate transitions from Data to Null. In this case, fault-free and faulty gates drive values of 1 and 0, respectively, onto the output which causes an early detection of Null, as explained in Chapter 3, subsection 3.4.1, and the fault is not detectable by a voltage test. If the GoToNull network conducts, as Z will be correctly set to Null, which does not activate the GIF-at0 fault, the fault will not be detected by a voltage test. It is only when the GoToData network conducts that both it and the weak inverter conduct and fight over setting the value of node M, as shown in Fig. 4.12 in which the red dotted line next to the strong inverter indicates the path between the VDD and ground that causes increases in the IDDQ. If Vtn < VM < VDD − |Vtp|, both the NMOS and PMOS transistors in the strong inverter conduct and create a path between the VDD and ground which elevates the IDDQ and the fault becomes detectable by an IDDQ test (see Fig. 4.14).

Fig. 4.12. IDDQ test for semi-static NCL gates- GIF-at0 fault

4.2.2 Stuck-at1 on GIF line (GIF-at1)

A stuck-at1 fault on the GIF of a semi-static NCL gate (GIF-at1) causes a weak value of 0 on the internal node (M). Then, if neither the GoToNull nor GoToData networks conducts, as the gate is in a hold condition, the GIF-at1 fault is activated only if the gate transitions from Null to Data. In this case, as fault-free and faulty models of the gate drive value of 0 and 1, respectively, onto the output, this fault is detectable by a voltage test. If the GoToNull network conducts, Z is correctly set to Null and, as the GIF-at0 fault is not activated, it is not detectable by a voltage test. Also, when the GoToData network conducts, as both it and the weak inverter con- duct, they fight over setting the value of node M, as shown in Fig. 4.13 in which the red dotted line indicates the path between the VDD and ground that causes increases in the IDDQ. If Vtn < VM < VDD − |Vtp|, both the NMOS and PMOS transistors in the strong inverter conduct and create a path between the VDD and ground which elevates 86 IDDQ Test for Null Convention Logic

the IDDQ and the fault becomes detectable by an IDDQ test (Fig. 10(c)).

Fig. 4.13. IDDQ test for semi-static NCL gates- GIF-at1 fault

It can be seen in Figs. 4.6, 4.9 and 4.14 that different stuck-at faults cause the IDDQ to elevate to different absolute values, each of which is a complex function of the type of gate and stuck-at0/stuck-at1 fault. As explained and shown in the experimental results in Tables 4.6 and 4.7, a stuck-at0 fault on the GIF of a semi-static NCL gate (GIF-at0) is only detectable by an IDDQ test whereas, given the right TV, a GIF-at1 fault is detectable by both voltage and IDDQ tests. The above discussion shows that all single stuck-at faults on the GIF lines of static and semi-static NCL gates are detectable using an IDDQ test. Although this does not imply that the fault coverage of an IDDQ test for all NCL circuits is necessarily 100%, it means that there is no NCL GIF fault that is intrinsically undetectable by an IDDQ test.

However, as IDDQ fault coverage varies based on the size and complexity of the NCL circuit, to increase it requires careful IDDQ test generation.

Fig. 4.14. IDDQ graphs of golden, and GIF-at0 and GIF-at1 fault models §4.3 Experimental results 87

4.3 Experimental results

To evaluate our theory in practice, we apply an IDDQ test on all 27 state-holding NCL gates (Tables 4.6 and 4.7) and a few NCL circuits with up to 2,633 NCL gates (Table 4.8).

To demonstrate that an IDDQ test of NCL circuits works for a range of feature sizes, we implement NCL gates using Hspice in (1.8 µm, 1.8 V) and (45 nm, 1.1 V) technologies for both static and semi-static models. Since the results obtained for these two technologies are very similar, we focus on analysing Table 4.7 which contains information on the newer one (45 nm).

Table 4.7 shows the results obtained from the IDDQ test for single NCL gates, where the parameters chosen for the NMOS transistors are VTO=0.623, W=90 nm and L=45 nm, and for the PMOS ones, VT0=-0.587, W=225 nm and L=45 nm. Therefore, the P- to-N width ratio is 5/2. This ratio is chosen to equalise the gates’ rise and fall delays as balancing these delays, as explained in Section 4.1, has a positive effect on detecting more GIF faults. Interested readers can refer to Appendix B for more detail on the effect of transistor sizing on NCL gates. In Table 4.7, ‘V’ means that a fault causes different voltage values on the faulty and fault-free circuits, with the digital 1/0 discrepancy resulting in detection of the fault.

On the other hand, ‘I’ shows that there is a considerable difference between the IDDQ of the fault-free and faulty circuits and, therefore, the fault is detected by an IDDQ test.

Finally, ‘B’ indicates that both voltage and IDDQ testing are able to detect the fault. The measured IDDQ for all the fault-free NCL gates are in the range of pA while that for the faulty ones varies between nA and µA. For all types of NCL gates, the measured

IDDQ of the faulty ones are at least three orders of magnitude larger than those of the fault-free ones, a significant difference that could be a reliable signal of the presence of a fault in a gate.

In this work, the patterns for IDDQ testing are generated using random test genera- tion, with that for detecting a given fault identified using a fault dictionary built during the process of test generation for the proposed IDDQ testing. Most TVs stored for this purpose can detect multiple single stuck-at faults as an IDDQ test needs only a few TVs to detect all the single stuck-at faults on the GIF of an NCL gate. We use at least a 1ms wait (see the “wait” rows in Table 4.7) after changing the inputs and before measuring the IDDQ to ensure that all the current spikes in the circuit have dissipated. Because the numbers of transistors and their placements in the gate change the sizes and behaviours of the coupling and parasitic capacitors of the gate, different waiting times are required for different types of NCL gates. The shorter waiting time for a semi-static gate is because both GIF-at0 and GIF-at1 faults cause a fight situation on its internal node. This fight situation results in an immediate jump in the IDDQ ∗ ∗ whereas, in a static gate, Zn-at1 and Zd -at0 faults cause a float situation on its internal node which requires time to slowly change the previous voltage of that node (M) to 88 IDDQ Test for Null Convention Logic

Vtn < VM < VDD − |Vtp|.

Table 4.8 shows the results obtained from the IDDQ testing of a small number of sample NCL circuits chosen from the UNCLE asynchronous tool set [14] that represent a range of circuit complexities from 4 to 2,633 gates, approximately equivalent to up to 10,000 CMOS Boolean gates, and performed for both static and semi-static imple- mentations of the selected circuits. The number of single stuck-at faults detectable in each circuit is shown along with the number of faults on the GIFs, numbers of PIs and POs, the fault coverages achieved by applying a conventional ATPG on non-GIF faults

(ATPG-FC) and IDDQ test on GIF faults (IDDQ-FC), the total fault coverage and total test time in milliseconds. The “Static” and “Semi” columns show the relative values for the static and semi-static implementations of the NCL circuits, respectively. Combining all these circuits into a complete system results in an average fault coverage of 98.81% and 98.53% for the static and semi-static implementations, respectively. The reason for the total fault coverages for semi-static implementations always being slightly smaller than those for the static ones is that the total number of faults in the former is smaller than that in the latter. Consequently, the effect of each undetected fault, including in the ATPG phase, is higher in the final fault coverage. The testing times reported for the semi-static circuits is also smaller than those for the static ones which is due to not only the lower number of faults but also the lower waiting times required to detect each fault in semi-static gates, as shown in Table 4.7. The testing times for both implementations could be easily improved because all the ∗ GIF stuck-at1 faults and almost all the Zd -at1 ones are detectable by the proposed ATPG which is faster than the IDDQ testing conducted. However, for the sake of relatively easy analysis, we perform ATPG for only faults on the inputs of the NCL gates and IDDQ for all the GIF faults.

4.4 Summary

This chapter examined using an IDDQ test to detect faults on the GIF lines of NCL gates. In developing this method, firstly, the behaviours of static and semi-static NCL gates in the presence of each of the possible faults on their GIFs were analysed (Sections 4.1 and 4.2, respectively). Then, to support this theoretical analysis, Hspice was used to simulate IDDQ measurements for fault-free and faulty versions of all the static and semi- static state-holding NCL gates (Section 4.3) which were implemented in 1.8 µm and 45 nm predictive processes with power supplies of 1.8V and 1.1V, respectively. The experi- mental results indicated that a faulty IDDQ current was orders of magnitude higher than a fault-free leakage current which meant that fault detection for NCL using this method would be reliable. A small set of test circuits with up to 2,633 gates confirmed the valid- ity of the proposed method with average fault coverages of 99.72% and 99.44% for the GIF of static and semi-static implementations, respectively. It was shown that the semi-

§4.4 Summary 89 TH24

I I I I 1 1 B V m TH44w3

µ I I I I 1 1 B B TH34w32

I I I I 2 1 B V TH24w22

I I I 2 1 B B V TH34 I I I I 1 1 B B

measured m, Lmin = 1.8 THxor0 µ I I I I 1 1 B B

DDQ

TH24Comp I

I I I I 2 1 B V TH44w322

I I I I 1 1 B B TH24w2

I I I I 2 1 B V TH33w2 I I I I I

2 B

= 1.8 V, tech = 1.8 1.5 DD TH23w2

I I I 1

B B B V 1.5 THand0

I I I I 1

B V 1.5 TH54w32

I I I I 1 1 B B TH54w22

I I I I 1 1 B B TH44w22

I I I I 1 1 B B

TH34w3 A

I I I I 1 1 µ B B TH34w2

I I I I 1 1 B B TH54w322

I I I I 1

B V 1.5 TH44w2

I I I 1 1 B B B

TH23 test

I I I I 1 1 B V TH34w22 tests of static and semi-static NCL gates - DDQ

I I I I 1 1 B V I TH44 I I I I I

1 1 B DDQ

I TH33 I I I I I 2 1 B

in fault-free circuit in range of pA (at time of test) in faulty circuit in range of nA or TH22 I I I I I 3 1 B DDQ DDQ I I * all * B: both I and* V “wait” tests row shows detect waiting fault time (in milliseconds) after* each all change of input and before * V: fault detected by* voltage I: test fault (discrepancy detected in by digital 0/1) * faulty leakage current at least 3 orders of magnitude larger than fault-free leakage current -at0 -at1 -at0 -at1 -at0 -at1 ∗ ∗ ∗ ∗ n n d d Z Z Z Z GIF GIF Table 4.6: Voltage and Static - Static - Static - Static - Static - wait(ms) Semi- Semi- Semi - wait(ms) 90 IDDQ Test for Null Convention Logic ei-wait(ms) - Semi Semi- Semi- wait(ms) - Static - Static - Static - Static - Static al .:Vlaeand Voltage 4.7: Table GIF GIF Z Z Z Z d d n n ∗ ∗ ∗ ∗ alylaaecreta es reso antd agrta al-relaaecurrent leakage fault-free than larger magnitude of orders 3 least at current leakage before faulty and * input of change all each * after milliseconds) (in all fault time * the waiting detect shows row tests “wait” V * and I both 0/1) B: digital * by in detected (discrepancy fault test I: voltage * by detected fault V: * -at1 -at0 -at1 -at0 -at1 -at0 I I DDQ DDQ B 1 3 I I I I I nfut ici nrneo Aor test) nA of of time range the in (at circuit pA faulty of in range in circuit fault-free in TH22 2.5 B B 1 I I I I

TH33 I DDQ B 1 1 I I I I I I

DDQ TH44 et fsai n eisai C ae - gates NCL semi-static and static of tests V B 1 1 I I I I TH34w22 test V B 1 1 I I I I TH23 B B B 1 1 I I I TH44w2 B B 1 1 I I I I TH54w322 B B 1 1 I I I I TH34w2 µ A B B 1 1 I I I I TH34w3 B B 1 1 I I I I TH44w22 B B 1 1 I I I I TH54w22 B B 1 1 I I I I TH54w32 V B 1 1 I I I I THand0 V B 1 2 I I I I TH23w2 V DD B B 1 2 I I I I

TH33w2 nm 45 = Lmin nm, 45 tech= V, 1.1 = V B 1 2 I I I I TH24w2 B B 1 2 I I I I TH44w322 V B 1 2 I I I I TH24Comp I DDQ B B 1 1 I I I I THxor0 measured B B 1 1 I I I I TH34 V B 1 3 I I I I TH24w22 V B 1 2 I I I I TH34w32 B B 1 1 I I I I TH44w3 V B 1 1 I I I I TH24 §4.4 Summary 91 tests for NCL circuits DDQ I Table 4.8: Experimental results obtained from traditional ATPG + Circuit Specs #GIF-Faults #All-Faults #IDDQ-FC(%) #TOTAL-FC(%) #Test-Time(ms) NameFA #GatesMux-2 #PIsPP-G #POsCount-4 Static 6 4Count-8 Semi 52 11Mult-8 Static 134Add-32 3 6 SemiDiv-32 458 1 8 ATPG-FC(%) 2 640Mod-16 1 Static 4 1212 16Alu-16 4 7 1392 Semi 64 8 16 48 16 16 2633 Static 32 160 32 36 436 Semi 32 1704 34 8 16 8 1920 80 218 18 Static 852 4200 32 960 4528 56 1132 416 Semi 40 2100 4504 112 2264 9100 4992 11254 912 3652 336 11664 48 4550 4032 32 94 9154 9400 24454 19904 98.86 98.26 100 98.89 97.57 100 98.84 100 100 98.03 99.35 99.54 99.71 99.8 99.38 99.29 99.08 99.45 100 99.48 100 100 99.69 99.92 99.05 100 98.75 99.29 98.37 99.24 99.47 100 99.08 98.96 100 100 99.76 98.46 98.01 98.73 99.08 98.94 100 2939 1513 100 100 100 98.36 9450 3600 10238 2372 1188 100 19870 100 100 532 8011 9062 3141 15837 68 365 41 87 43 25 59 92 IDDQ Test for Null Convention Logic

static NCL circuits took less time to test but resulted in slightly lower fault coverage; otherwise, the results were consistent for both types of implementation. Together, these techniques represent the first steps towards developing a complete fault coverage system for NCL. The proposed IDDQ test method and self-timed ATPG proposed in Chapter 3 resulted in average fault coverages of 98.84% and 98.52% for static and semi-static im- plementations of several NCL circuits, respectively. Chapter 5 explains the proposed DFT techniques studied with the aim of improving the fault coverage and testing time of each of the proposed test methods. As previously mentioned, based on our ATPG and IDDQ test techniques, no DFT is required to test the faults on the GIF lines of NCL gates. Therefore, the DFT techniques proposed in Chapter 5 focus on decreasing the sequential complexity caused by functional loops in an NCL circuit, exactly as DFTs are used for sequential synchronous circuits. Chapter 5

Design for Test for Null Convention Logic

A DFT is a test technique that integrates hardware into the original DUT to make it more testable. In NCL circuits, as in synchronous ones, there are hard-to-test parts, particularly because of their functional loops (FLs). Due to its special structure, an NCL circuit is hazard-free, glitch-less and has no metastability issues. It is important to maintain this structure during testing so that its great characteristics are retained. As existing test methods for NCL are synchronous, they create the need to interface the synchronous and asynchronous parts of a circuit with double latching to reduce the risk of metastability issues. Previous work shows that this double latching can lead to a 37% area overhead [63] and introduce a large and otherwise unnecessary clock-tree network into an asynchronous CUT. Also, it can only reduce the probability of metastability issues but not eliminate it. Therefore, an NCL CUT becomes less reliable because of this added synchronous test hardware. This chapter explains our proposed asynchronous design of DFT techniques for NCL circuits, with background material on DFT provided in Chapter 2, subsection 2.2.3. In Section 5.1, a clockless NCL-based test-point structure implemented using the mixed-timing method proposed in Chapter 3, Section 3.2 is discussed. In Section 5.2, a similar approach for developing a clockless scan cell for NCL circuits is presented, and an asynchronous interleaved scan architecture (AISA) specifically designed for NCL circuits is established. Using this scan chain architecture, the normal Null/Data flow in an NCL DUT can be maintained during test. Therefore, as it does not require a clock signal in the testing procedure, it is a potentially great candidate for developing the internal structure of a BIST for NCL circuits, as discussed in Section 5.3. Finally, the experimental results are analysed in Section 5.4 and a summary of this chapter provided in Section 5.5.

5.1 Test point insertion for NCL circuits

Clocked versions of controllability points (CPs) and observability points (OPs) are al- ready parts of the current DFT hardware used for an NCL-based design [62], [63], [71]

93 94 Design for Test for Null Convention Logic

but, as clocked test hardware is not an appropriate choice for NCL circuits, an asyn- chronous alternative would be more suitable. As discussed in Chapter 2, any data applied to the PIs of an NCL circuit flows freely inside the circuit and is not locked in any element. As shown in Fig. 2.11, the efficient implementation of CPs and OPs re- quires flip-flop-like elements. Therefore when designing these DFT elements for NCL, we need to develop a structure that is clockless and capable of holding the test data for later on-demand use. Fig. 5.1 shows a read/write memory element in NCL [16] which receives the data, stores it and only dispatches it when the read control signal is active.

Fig. 5.1. NCL read/write memory [16]

In this study, we use this element as the basis of our proposed NCL DFT elements. As shown in Fig. 5.1, it consists of the separate read and write cycles shaded on the upper right- and left-hand sides, respectively. These two cycles enable control of the storage and consumption of the data that passes through the NCL circuit, where required. The read and write control signals are applied to the memory through the control section shown at the bottom of Fig. 5.1).

5.1.1 Controllability point for NCL circuits

In this study, we design NCL-based CP by modifying the read/write memory shown in Fig. 5.1, with its timing controlled by the handshaking signals of the same combinational circuit of the same pipeline stage. In the NCL CP shown in Fig. 5.2, the grey logic ele- ments are the glue logic added to convert the read/write memory to a CP and the grey boxes marked as THXor0 are types of NCL threshold gates that behave like multiplex- ers. In the functional mode, these gates take the data generated by combinational block ‘A’ and pass it to block ‘B’. In the test mode, they bypass the circuit’s original value generated by block ‘A’ and push the test data into combinational block ‘B’. The test controller of a chain of cascaded CPs has a two-phase procedure. Phase one §5.1 Test point insertion for NCL circuits 95

Fig. 5.2. NCL controllability point is shift-in which pushes the test data into the chain of cascaded CPs. It uses two shift enablers, each of which is connected to the read port of one CP cell and the write port of its two adjacent ones. This interleaving arrangement enables the test data to travel through the cascaded cells from left to right without any data being lost. Phase two operates the circuit in the test mode, whereby the test data stored in the CP chain is applied to the circuit to control the values of the hard-to-control parts of the DUT. The proposed DFT elements are implemented in Verilog and evaluated through a simulation using Mentor Graphics ModelSim™ (see Section 5.4). A sample waveform of the test controller is presented in Fig. 5.3, where three CPs are added to a 4 × 4 multiplier. When the test-enable signal (ten) is zero, the shift-in phase begins by setting the two shift-enable signals and, when it is set to one, the test data is applied to the circuit (phase two). For comparison, one CP-inserted and one original circuit are instantiated in the HDL testbench. The value of Data1 (or “10”) is inserted into the CP points, with the effect shown on the fifth bit of the outputs (the fm and gm signals for the DFT-inserted and original circuits, respectively).

5.1.2 Observability point for NCL circuits

NCL-based OPs are also designed by modifying the read/write memory shown in Fig. 5.1, with the timing of each controlled by the handshaking signals of the combinational 96 Design for Test for Null Convention Logic

Fig. 5.3. Waveform of CP insertion (three cascaded CPs) circuit of the next pipeline stage. The flow of data in an OP chain is shown in Fig. 5.4. The grey logic elements are the glue logic added to convert the read/write memory to an OP and the THXor0 gates behave as multiplexers and either collect the observability data of combinational cloud ‘A’ (in the test mode) or pass the collected data through the cascaded OP cells (in the shift mode).

Fig. 5.4. NCL observability point

The test controller of our proposed OPs is similar to that of the NCL CPs. However, for OPs, phase one is the test phase and phase two the shift-out. Also, one of the shift enablers is connected to the read ports of all the OPs and the other to their write ports. This difference is to ensure that the collected data shifts right out from the last OP cell to the first to avoid losing any of the collected data. The flow of test data through three cascaded OPs is shown in the timing diagram in Fig. 5.5. The vertical axis indicates the flow in the time domain and the horizontal axis §5.2 Scan insertion for NCL circuits 97

that in the spatial domain (the data shifts through the cascaded OPs). The data collected in OP1 is highlighted by a grey border to illustrate its flow.

Fig. 5.5. Timing diagram of OP insertion (three cascaded OPs)

5.2 Scan insertion for NCL circuits

The general schematic of FLs in NCL-based designs is depicted in Fig. 5.6 in which each of the three dotted red rectangles represents a registration stage. In the process of forming an FL in an NCL circuit, at least three registration stages are needed to maintain the interleaving Null/Data behaviour of the circuit.

Fig. 5.6. Functional loop in NCL design 98 Design for Test for Null Convention Logic

Similar to in a synchronous design, the ideal way of dealing with the complexity that an FL adds to testing is to break it down by replacing its registers with scan chain ones. For this purpose, we firstly design a proper scan cell for an NCL circuit (subsection 5.2.1), then propose a scan chain architecture suitable for it (subsection 5.2.2) and, finally, explain its control and timing (subsection 5.2.3).

5.2.1 NCL scan cell for NCL circuits

By adding some extra logic to the control lines of the read/write memory shown in Fig. 5.1, it can be transformed into a scan cell element. This scan cell can replace the three registers of the FL in Fig. 5.6 and break this loop at the time of testing. One scan cell is sufficient to replace all three registration stages in an FL because it already has these stages that maintain the interleaved Null/Data behaviour of an NCL circuit. In Fig. 5.7, the bold gates show the original read/write memory and the grey ones the extra logic added to that memory [16] to turn it into a scan cell at the time of testing. The shaded rectangle on the left-hand side indicates the “write” part of the scan cell that stores the test data while that on the right-hand side highlights the “read” part that, when notified by the control signals, consumes the stored test data and prepares it for reading.

Fig. 5.7. NCL scan cells

The extra control inputs added to the read/write memory include both test and shift enables from the test controller as well as handshaking signals from the DUT block in which the scan chain is to be inserted. When the shift mode is enabled, the serial data is shifted in and out of the scan chain while the test enable controls the insertion of the scan data into the DUT and vice versa. §5.2 Scan insertion for NCL circuits 99

5.2.2 Asynchronous interleaved scan architecture for NCL circuits (AISA)

After designing a scan cell that fits the special characteristics of NCL circuits, it is nec- essary to define a suitable scan chain architecture for these circuits. Fig. 5.8 shows the flow of data in an NCL pipeline in which, as mentioned in Chapter 2, each NCL combi- national block is surrounded by two NCL registration stages. Because there is no clock in an NCL pipeline, to avoid data overlapping and glitches, one set of Null is applied to the PIs between each two consecutive sets of Data.

Fig. 5.8. Monotonic changes of Null and Data in NCL pipeline stages

If a traditional scan architecture is used to test NCL circuits, the adjacent registration stages must be connected to each other. As, when Data is pushed to this scan chain, all the registration stages become Data, this interferes with the required interleaved flow of Null/Data in an NCL pipeline. As a result, a traditional scan architecture can cause halting the system (Fig. 5.9). In a similar scenario, as a result of Null insertion into

Fig. 5.9. Using traditional scan architecture for NCL - all registers have Data an NCL circuit using a traditional scan architecture, all the registration stages are filled with Null and the CUT halts (Fig. 5.10). Inspired by [64] we developed an asynchronous interleaved scan architecture that maintains the natural flow of Null/Data in the NCL DUT. As shown in Fig. 5.11, the input registration stage of each combinational block is connected to the output registra- tion of the next block and its output registration stage connected to the input registration of the previous block. This interleaved structure allows the correct flow of Null/Data through the registration stages of an NCL pipeline. This architecture needs two TPGs, one of which fills one scan chain with Null and the other the second scan chain with 100 Design for Test for Null Convention Logic

Fig. 5.10. Using traditional scan architecture for NCL - all registers have Null

Data. As a result, the DUT maintains the required Null/Data flow, even at the presence of this scan architecture.

The main difference of our scan architecture and the one in [64] is in the connectivity between the scan cells. As the scan cell in [64] only has either Data or Null at any given time, the connections in its scan architecture has to go back and forth between two consecutive registration stages to make the interleaving Null/Data flow in the scan chain. Whereas, as mentioned in subsection 5.2.1, in our work, each proposed scan cell is designed to maintain the interleaved Null/Data behaviour of an NCL circuit inside itself. Therefore, the natural {Null, Data, Null, Data,...} flow of NCL is preserved in the scan chain by simply connecting all the scan cells of a registration stage together and only linking the last scan cell in the input registration stage of one combinational block to the first scan cell in the output registration stage of the next combinational block.

Fig. 5.11. Interleaved scan structure for NCL

In the AISA, the TPGs and ORAs can be configured as either of the two structures shown in Fig. 5.12 which are modified versions of the well-known RTS [13] and STUMPS [13] BIST architectures (more details can be found in Chapter 2, subsection 2.2.3.3). §5.2 Scan insertion for NCL circuits 101

Fig. 5.12. AISA for NCL BIST: (a) RTS; and (b) STUMPS

5.2.3 Timing and control of scan chain for NCL circuits

This subsection discusses how the proposed interleaved scan architecture works during a test session. As explained in Chapter 3, Section 3.2, there are four possible ways of controlling the timing of an NCL DUT using:

1. a clock signal;

2. the timing of the DUT;

3. the timing of a golden model; and

4. a mixed-timing model (a combination of the DUT and golden model)

For more details of the benefits and drawbacks of each of these options, please refer to Chapter 3, Section 3.2 in which it is determined that the most reliable option is the mixed-timing model. A scan-inserted DUT can either be tested by an ATE or be part of a BIST architecture. In the latter case, as the timing of a golden model may not be available, the test’s timing must be controlled using only the timing of the DUT. Although this option involves the risk of faulty handshaking signals, this is no different from the risk of faulty clock signals controlling the timing of a BIST for a . The flow of test data in the scan cells is illustrated in Table 5.1, where each row represents one phase in the data flow, columns 2 and 3 the values stored in the input and output registration stages, respectively, and columns 4 and 5 the signals from the test controller for controlling the timing of the AISA, and TPGs and ORAs, respectively. 102 Design for Test for Null Convention Logic

Table 5.1: Control and data flow of NCL AISA

Phase Input Reg Output Reg Internals Externals shi f t r N r N TPG1 TV Shift-in f unc w D w N TPG2 test test r D r N TPG1 TV Apply f unc w N w N TPG2 shi f t f unc r D r N Ckt Function test TPG1 w N w D shi f t TPG2 shi f t r N r D ORA1 TV Shift-out f unc w N w N ORA2 test

• Phase 1: the first row in Table 5.1 is illustrated in Fig. 5.13 which shows the scan cells replacing the pipeline registers of an NCL combinational block. In it, the data generated by TPG1 is shifted to the scan cells of the input registration stage and form {Null,Data} sets for the {read,write} parts of each scan cell. At the scanning time, TPG2 is disabled and its Null value pushed into the output scan registers which results in {Null,Null} sets. Also, in this phase, the circuit’s functional and test modes are disabled, and shifting enabled to allow the test vectors to be pushed into the DUT.

• Phase 2: the second row in Table 5.1 shows the “TV Apply” phase in which both TPGs are disabled, no more test vectors are shifted to the AISA and the test mode is enabled to apply the scanned-in test data to the DUT.

• Phase 3: in the third row in Table 5.1, the DUT’s normal functional mode is enabled so that it can use the test data and produce output responses.

• Phase 4: the fourth row in Table 5.1 shows the “TV shift-out” phase when the data collected by the output scan registers is shifted out to be analysed by ORA2.

In the next test cycle, TPG1 must be disabled and TPG2 enabled to allow the intended interleaved structure. Over the course of the operation of the AISA, the TV shift-out phase of one registration stage overlaps with the TV shift-in phase of the next. As previously mentioned, scan chains can be part of a circuit tested by an ATE or part of the internal structure of a BIST architecture. The next section explains the concepts involved in designing a BIST for an NCL circuit.

5.3 NCL Built-in Self-test (BIST)

Equation for each LFSR cell illustrated in Chapter 2, Fig. 2.16 is shown in equation (5.1) while equation (5.2) and equation (5.3) show the dual-rail encoding ones required for §5.3 NCL Built-in Self-test (BIST) 103

Fig. 5.13. Data application and output collection in NCL AISA

the NCL representation of each cell. To expand the Dt and D f equations, the minterms required are added to the sum-of-products to form the equations for an input-complete and observable circuit.

D[i] = Q[i + 1] ⊕ (Q[0].P[i]) (5.1)

Dt[i] = Qt[0]Q f [i + 1]Pt[i] + Q f [0]Qt[i + 1]Pt[i]

+ Qt[0]Qt[i + 1]Pf [i] + Q f [0]Qt[i + 1]Pf [i] (5.2)

D f [i] = Qt[0]Qt[i + 1]Pt[i] + Q f [0]Q f [i + 1]Pt[i]

+ Qt[0]Q f [i + 1]Pf [i] + Q f [0]Q f [i + 1]Pf [i] (5.3)

Fig. 5.14 shows the gate-level implementation of this input-complete and observable circuit using NCL threshold gates. Although Fig. 5.14 illustrates only the combinational part required for each LFSR cell, registration stages like those shown in Fig. 5.15 must also be added to allow the correct flow of Null/Data wavefronts in the LFSR. The seed of an LFSR is applied to the designed hardware using TH22n and TH22d 104 Design for Test for Null Convention Logic

Fig. 5.14. Combinational part of asynchronous LFSR

Fig. 5.15. Asynchronous LFSR gates in the registration stages which are similar to the TH22 gate but with initial values of 0 and 1, respectively. The dual-rail encoded equations for the combinational part of a configurable MISR are shown in equations (5.5) and (5.6) (data from the DUT is represented by V[n-1..0]). §5.3 NCL Built-in Self-test (BIST) 105

The NCL implementation is similar to that of an LFSR except that TH44 gates are used to form the minterms and each sum-of-product uses two TH14 and one TH12 gates. The resultant combinational circuit is input-complete and observable. D[i] = (Q[0].P[i]) ⊕ Data[i] ⊕ Q[i + 1]) (5.4)

Dt[i] = Qt[0]Pt[i]Vt[i]Qt[i + 1] + Q f [0]Pf [i]Vf [i]Qt[i + 1]

+ Q f [0]Pt[i]Vf [i]Qt[i + 1] + Qt[0]Pf [i]Vf [i]Q f [i + 1]

+ Qt[0]Pt[i]Vf [i]Q f [i + 1] + Q f [0]Pf [i]Vt[i]Q f [i + 1]

+ Q f [0]Pt[i]Vt[i]Q f [i + 1] + Qt[0]Pf [i]Vt[i]Q f [i + 1] (5.5)

D f [i] = Q f [0]Pf [i]Vf [i]Q f [i + 1] + Q f [0]Pf [i]Vt[i]Qt[i + 1]

+ Q f [0]Pt[i]Vf [i]Q f [i + 1] + Q f [0]Pt[i]Vt[i]Qt[i + 1]

+ Qt[0]Pf [i]Vf [i]Qt[i + 1] + Qt[0]Pf [i]Vt[i]Qt[i + 1]

+ Qt[0]Pt[i]Vf [i]Qt[i + 1] + Qt[0]Pt[i]Vt[i]Q f [i + 1] (5.6)

Both the LFSR and MISR cells designed in this chapter are run-time configurable and their polynomials can be changed by the BIST controller during a test session. As can be seen in the experimental results, these cells impose very high area overheads when a configurable BIST architecture is required. For an NCL DUT that does not require a configurable BIST, a simpler combinational circuit for LFSR and MISR cells can be achieved by replacing Pt[i] and Pf [i], in equations (5.2) to (5.6) with 1 and 0, respectively.

5.3.1 Control structure (BIST controller)

As explained in Chapter 3, Section 3.2, there are four possible ways of controlling the timing of an NCL DUT, that is, using: 1) a clock signal; 2) the timing of the DUT; 3) the timing of a golden model; and 4) a mixed-timing model which is a combination of the DUT and golden model and is the most reliable option. However, because there is no ATE in a BIST, the delay-annotated golden model of the DUT needs to be stored in an on- chip memory. Based on the DUT and the BIST architecture designed for it, this memory may or, more likely, may not be available for storing the timing information of the golden model. If short of memory, it will be necessary to rely on the handshaking signals of the DUT (option 2) for BIST timing. Although, as mentioned in Chapter 3, Section 3.2, this option involves the risk of faulty handshaking signals, this risk is no different from that of faulty clock signals controlling the timing of a BIST for a synchronous circuit. Also, in Chapters 3 and 4, it is mentioned that there are faults inside NCL gates that cannot be detected using a voltage test. To detect them, a complete BIST solution 106 Design for Test for Null Convention Logic

will need a built-in current sensor (BICS), as explained in the section on future work in Chapter 6.

5.4 Experimental results

The DFT elements proposed in this chapter are implemented in Verilog using a back- annotated NCL library. Their functionality and delays are evaluated by simulation using Mentor Graphics ModelSim™. The delays in the NCL library are extracted from a low- power predictive technology model (LP PTM) [96] in (45nm, 1.1V) technology. The specifications of the designed NCL-based DFT elements are listed in Table 5.2 in which the first three rows contain information about the delays and areas of the normal NCL storage elements used as references for both comparison with our proposed DFT elements and calculations of overheads. The remaining rows show details of the delays and areas of the proposed CP, OP and scan, LFSR and MISR cells. “Config.” in rows 7 and 8 means that the polynomial of the LFSR/MISR is configurable and ‘Fix’ in rows 9 and 10 that it is fixed. Columns 2, 3 and 4 show the number of NCL gates, and transistors in the static and semi-static implementations of each element, respectively. Of the CP, OP and scan cell, the last has the highest transistor count. However, it should be noted that, as a scan cell usually replaces the three registration stages in FLs, it imposes 154 and 116 transistor overheads for static and semi-static implementations, respectively. On the other hand, the CP and OP are usually either added to the combinational logic and cause a transistor overhead equal to all their transistors or replace a single registration stage and create 202/142 and 206/142 transistor overheads, respectively. As previously mentioned, a scan cell is the most commonly used DFT element [12]. Therefore, it is advantageous that it has the smallest transistor overhead in our proposed DFT elements. The area of LFSR and MISR cells with a fixed polynomial is slightly larger than that of the other DFT elements and, for configurable LFSR and MISR cells, in particular, is quite large. These configurable elements are suitable for run-time reconfigurable DUTs and for build-time configurations in BIST optimisation algorithms in which the configurability feature does not add to the area overhead. The last four columns in Table 5.2 show the delays of the proposed DFT elements and reference memory elements. Tsh is short for Tshit f and Tdt for Tdata and, in some DFT elements, Tdt has separate values for write and read actions which indicate the amount of time (in picoseconds) it takes to write data from the DUT into the DFT elements or read it from the DFT element into the DUT. For the read/write memory, CP and OP, the delay in shifting the data (Tsh) is always larger than Tdt because shifting uses both the read and write parts of the hardware whereas Tdt always relates to either reading or writing. However, for a scan cell, Tsh is smaller than Tdt because of the shortcut on top of the scan cell that skips two stages of the read/write memory when shifting. While §5.4 Experimental results 107 ) ps ( sh T ) ps ( dt T Static Semi Static Semi Static Semi Storage/DFT Element Gate CountSingle Reg Tr. Count FL RegR/W MemCPOPScan Cell (for FL)LFSR Cell (Config) 3MISR Cell (Config)LFSR Cell (Fix) 18MISR 9 Cell (Fix) 28 25 30 33 176 22 23 90 244 298 128 23 21 490 132.5/123.5 172 66 25 210 34.5 118/110.5 232 322 192/185 236 154 210 164 149.7 103 298 164 161/157 162.5 137 154 29 210 184 162 134 141 137 87.5 144.5 138 149.5 - 156.5 - - - 117.5 121 - 134 188 - - - 163 205 - - 172.5 - - Table 5.2: Delay and area of NCL storage elements and the proposed asynchronous DFT elements 108 Design for Test for Null Convention Logic

this saves time, the data is still passed through three registration stages and, as a result maintains the Null/Data behaviour of NCL. Table 5.3 shows the results of scan insertion in a number of NCL circuits with up to 3,425 NCL gates, which is approximately equal to 15,000 Boolean gates, taken from the open-source asynchronous CAD tool UNCLE [14]. The proposed ATPG used to evaluate the scan insertion is implemented in an HDL/PLI environment [17] (see Appendix A) and simulated using Mentor Graphics’s ModelSim™. ATPG vectors are generated using a pseudo-random test generation technique, and the mixed-timing method explained in Chapter 3, Section 3.2. The number of faults in Table 5.3 indicates only the non-GIF ones. As explained in Chapters 3 and 4, in this study, a combination of clockless DFT- less and self-timed ATPG and IDDQ tests are used to detect GIF faults. The NCL gates for this evaluation are static and each can have four GIF faults (see Chapter 4, Section 4.1). The last column in this table is a fitness function (inspired by the genetic algorithm) designed to compare test scenarios, the value of which is calculated by equation (5.7). It is a very simple fitness function that can be replaced by more complicated equations based on the trade-offs of each particular DUT.

fault coverage improvement × test vector count improvement impact factor = (5.7) transistor overhead

For a simple analysis, all the registration stages in the FLs of the NCL circuits are replaced with NCL scan cells. The results shown in Table 5.3 could be improved by more intelligent choices of scan insertion; for example, the Count-8 circuit would benefit more from a scan cell insertion on its most significant rather than least significant bit. Therefore, instead of replacing all eight FLs of this circuit with scan cells, we can choose to only replace the seventh and eighth. As a result, the area overhead is reduced from 71.5% to 17.8% and a 100% fault coverage is achieved by 72 test vectors, with the impact factor of the scan chain increasing from 1.1 to 2.04. In a few circuits in Table 5.3, most of the registration stages are non-FL registers; for example, the FIFO circuit has eight FIFO stages and, as a result, needs several test data to reach good fault coverage. These stages are implemented as NCL registration ones but only those that use the FIFO’s state machine are in FLs. Again, in this case, our general rule of replacing just the FLs with scan chains is not a suitable solution. Therefore, it is reasonable to consider replacing the normal registration stages of a FIFO circuit with NCL scan cells. When this happens, the transistor overhead is 161%, with 100% fault coverage and only 34 test vectors, a testing strategy which provides an impact factor of 3.5. However, when only the registration stages of the middle FIFO stage are replaced with scan cells, the transistor overhead is 22.49% with 138 test vectors, giving a 99.14% fault coverage and impact factor of 6.9. Obviously, there are many other combinations of the area overhead, fault coverage and test vector count that will result in different impact §5.4 Experimental results 109 CircuitUpmod-10Count-8 Gates 94FIFO Tr.sDiv-32 134 1232 FaultsMod-16 PIs 1723 492GCD-16 1141 1212 POsDSP-32 1392 12750 696 All 16040 1 Regs 2030 17185 5314 FL 7054 Regs 22432 2 3425 4 7136 13 Scan Cells 48 9912 44551 32 8 Tr. Ov(%) 12 15402 32 48 12 FC-no 16 DFT 34 48 FC-DFT 232 24 213 34 TV-no 194 DFT 12 185 TV-DFT 594 15 Imp. 144 Factor 24 162 105 4 198 5 2 8 2 3 66 50 6.04 1.92 71.5 1.79 2.06 22.8 99.12 93.15 97.57 98.26 98.84 96.96 94.14 100 97.35 98.78 100 99.22 99.73 99.85 186 126 95 153 113 226 334 177 115 42 121 64 213 199 3.36 5.5 4.44 0.98 7.73 1.1 10.12 Table 5.3: Fault coverage and transistorscan count architecture overhead (compared to the fault coverage and transistor count of the original DUT) for the proposed 110 Design for Test for Null Convention Logic

factors. It is the responsibility of the relevant test engineer to recognise the trade-offs for any given DUT and choose the most suitable test strategy. Combining all these circuits into a complete system results in an average transistor overhead of 11.95% which can be contrasted with that in [74] of 23.38%. Also, in our study, the average fault coverage before DFT insertion is 96.01% which improves to 99.28% after scan insertion. The average impact factor for replacing all FL registers of these circuits with scan cells is 6.71. As previously mentioned regarding the Count-8 and FIFO circuits, these results can be easily improved by making more intelligent decisions about the numbers and locations of scan insertions. The most important outcome is that it is shown to be possible to have an effective clockless design for test hardware for NCL circuits. As a result, an NCL DUT can maintain its great characteristics and remain free from glitches, hazards and metastability issues.

5.5 Summary

Having asynchronous design for test elements that match the special features of NCL is critical and even more crucial when integrating an online self-test into the NCL DUT. Any clocked test hardware closely incorporated into an NCL circuit may seriously de- crease the reliability of the NCL DUT and make it highly likely that its glitch-less, hazard-free and metastability-free behaviour is compromised. This chapter proposed asynchronous DFT elements that not only eliminate the risk of glitches and metastability issues but also show improvements in delays and area overheads compared with previous work. Firstly, elements such as an asynchronous CP, OP and scan cell were designed and implemented using an NCL structure to improve the testability of the NCL DUT. Then, asynchronous interleaved scan architecture was specially designed to enable this DUT to retain its interleaving Null/Data behaviour during testing. Also, it was shown how the normal timing of the NCL DUT could be used to control the DFT’s timing just as the internal clock of a DUT is used for a DFT of a synchronous DUT. The elements required to implement BIST architecture for an NCL DUT were also designed, with the proposed AISA used for its internal and a TPG and ORAs for its external structure. Furthermore, it was explained how the timing and control methods described in Chapter 3, Section 3.2 could apply to an NCL BIST. In the experimental results, the delay times of the DFT elements were reported using an NCL gate library with back-annotated delays extracted from physical-level simula- tions. The area overheads, which were calculated based on transistor count overheads, were reported to be, on average, 11.95% for several CUTs, much less than that of 28.33% reported in previous work [74]. Furthermore, unlike that in [74], our proposed method did not impose a clock tree or extensive synchronous/asynchronous interfacing using double latches. As a result, our proposed DFT elements are considered suitable for in- §5.5 Summary 111

tegration into a DUT to form the internal structure of an online BIST for NCL circuits without compromising any of NCL’s benefits. 112 Design for Test for Null Convention Logic Chapter 6

Conclusions and Recommendations for Future Work

The objective of this thesis was to provide a complete test and testability strategy for the physical testing of the highly robust asynchronous design technique NCL. This is important because, although the potential of asynchronous designs to implement low- power and highly reliable circuits is understood, the industry has not paid sufficient attention to consolidating the related design and test techniques for them. This has un- dermined the popularity of asynchronous design and created a vicious cycle whereby their designs and test techniques will not improve unless designers invest in them. This thesis is a step towards breaking this cycle and providing solid test and testability tech- niques for NCL to encourage more electronics engineers to consider this advantageous design technique for optimising their circuits. Providing test techniques that match the structure and behaviour of NCL circuits includes finding means of:

• managing the timing/control of a test technique without a clock signal and in spite of the lack of deterministic timing in the NCL DUT; and

• handling the high statefulness of NCL circuits at both their gate and system levels.

Previous test techniques have been developed for NCL and other delay-insensitive asynchronous systems. However, there are no studies in the literature that provide clockless test techniques that result in high fault coverage while considering GIF faults and do not interfere with the internal structures of NCL gates. One of the most significant original contributions of this work is the development of self-timed clockless test techniques for NCL systems. Previous studies used clocked hardware to control the timing of test methods for NCL circuits [62], [63] which could introduce metastability issues into otherwise highly robust NCL devices. By exploring possible options for controlling this timing, a clockless mixed-timing technique was identified in Chapter 3. This method was developed by eliminating the delay between the handshaking signals of a delay-annotated golden model of an NCL circuit and those of the actual DUT. This was the result of noting the delay-elimination characteristics of

113 114 Conclusions and Recommendations for Future Work

NCL circuits based on the fact that the output of an input-complete and observable NCL circuit is independent of the order and timing of its inputs. This solves one of the main challenges of testing NCL circuits and enables future studies to avoid using clocked test techniques for them. Boolean difference calculus was utilised for the first time in this work to determine the required values for the inputs of NCL gates during fault propagation. Also, a new method for generating deterministic test patterns for NCL circuits, called N-PODEM, was developed. Firstly, the behaviour of NCL gates in the presence of the logic values considered in deterministic ATPGs, {0, 1, D, D¯ , X}, was studied and need to modify the traditional PODEM and adapt it for NCL identified. This procedure included replacing the X values with N ones for NCL circuits, providing truth tables for NCL gates with {0, 1, D, D¯ , N} values and, finally, proposing the N-PODEM. This sets the groundwork for future studies and offers suggestions for modifying traditional deterministic ATPG algorithms. A careful analysis of the behaviour of NCL gates in the presence of a GIF fault was performed to identify suitable means of detecting such a fault. As a result, an IDDQ test was considered a complementary tool to a voltage test for achieving a higher than 99% fault coverage for GIF faults in NCL circuits. Hspice models of NCL gates were then implemented before performing IDDQ tests on various NCL circuits in Chapter 4.

Before this research was conducted, only one study [76] had applied an IDDQ test to 0.8 µm non-NCL QDI circuits and, to the best of our knowledge, ours is the only work in the literature that examined using an IDDQ test for NCL circuits. Moreover, our study investigated an IDDQ test for 45 nm NCL gates which is a much newer technology than that in [76] which, given the importance of feature size in IDDQ testing, adds to the significance of our work. Furthermore, unlike [76], we performed IDDQ tests on NCL gates without interfering with their internal structures. Our experimental results for various NCL circuits indicated that the IDDQ in a faulty circuit was orders of magnitude higher than that in a fault-free one which means that an IDDQ test would be a reliable method for detecting faults in NCL circuits. Another novel aspect of this work was that, in Chapter 5, a clockless test point and scan insertion method was implemented in NCL circuits using an NCL read/write memory to lock the free-flowing NCL data. The proposed NCL-based scan cell was then used to implement an asynchronous interleaved scan architecture that maintained the behaviour of the NCL DUT at the time of testing. This architecture was then in- corporated in multiple NCL circuits and assessed by calculating its effects on the fault coverage, testing time and area overhead. This was determined by defining a genetic algorithm-inspired fitness function in order to calculate the efficiency associated with different scan-insertion scenarios for NCL DUTs. For a case in which all the DUTs had full scans, the results were aggregated and averaged which led to averages of 99.28% and 11.95% for the fault coverage and area overhead, respectively, with the latter much §6.1 Future work 115

better than the 28.33% reported in previous work [74]. The proposed asynchronous test techniques and DFT elements not only eliminated the risk of glitches and metastability issues but also showed improvements in the delays, area overheads and fault coverages on those in previous work:

• The fault coverages obtained included the faults on GIF lines and were achieved without changing the internal structures of the NCL gates. As a result, our meth- ods had at least 13% higher fault coverages than those of the methods presented in [63], [65], [62], [64].

• By maintaining the NCL gate structure intact, we avoided delay overhead as large as 50% reported in previous work [65].

• The average area overhead of the proposed DFT elements was calculated to be 11.95%, much smaller than that of 28.33% in previous work [63].

While there is a great deal of room for improving the outcomes of this study, the techniques presented provide a solid basis for developing the clockless test and testabil- ity of NCL systems which can help NCL, and asynchronous designs in general, gain the interest of the electronics industry.

6.1 Future work

The following subsections discuss the limitations of the test techniques proposed in Chapters 3, 4 and 5, respectively, and provide suggestions for improving them.

6.1.1 Techniques for improving ATPG

This study has laid the groundwork for testing NCL circuits by developing a time man- agement system and handling faults on the GIF lines of NCL gates. However, there is room to improve the implemented techniques. For a random ATPG, the common tricks of advanced test generation algorithms can be used to achieve a lower test vector count and testing time with the same fault coverage. Also, efforts must be undertaken to make sequential test generation techniques available for NCL circuits to minimise the use of DFT. Finally, deterministic test generation algorithms other than PODEM should be con- sidered for the testing of NCL circuits. As PODEM is a basis for most deterministic ATPG algorithms, in this work, we implemented N-PODEM as a proof-of-concept to show that, through a slight modification, deterministic ATPGs could be applied to NCL circuits. Although PODEM is very limited even for synchronous circuits, many algo- rithms have increased its effectiveness by adding various heuristic speed-ups, such as FAN (Fan-Out Oriented) [97], TOPS (TOPological Search algorithm) [98], SOCRATES 116 Conclusions and Recommendations for Future Work

(Structure-Oriented Cost-Reducing Automatic TESt pattern generation system) [99] and TRAN (TRANsitive closure algorithm) [100]. Therefore, a suggestion for future work on the test and testability of NCL circuits is to apply the principles proposed in this work to the aforementioned more advanced ATPG techniques.

6.1.2 Techniques for improving IDDQ testing

At smaller process feature sizes, the leakage current of a fault-free circuit becomes higher and less predictable. Therefore, differentiating the IDDQ of a defective circuit from a naturally high leakage current becomes more difficult. Furthermore, as a single fault has a smaller effect on the IDDQ in a larger circuit, it is much harder to detect. The graph in Fig. 6.1 shows the upper bounds on the gate counts of NCL circuits testable by the traditional IDDQ test. In this work, the value of the IDDQ at the occurrence of a single stuck-at fault on the GIF line of a 45nm NCL circuit was simulated to be 40

µA. As, in order to detect a fault using an IDDQ test, the IDDQ of a fault-free circuit must be at least one order of magnitude smaller than that of a faulty one [13], the fault-free

IDDQ of the circuit had to be smaller than 4 µA.

Fig. 6.1. Upper bounds of sizes of NCL circuits testable by traditional IDDQ test

The simulated IDDQ values of the NCL circuits in Chapter 4, Table 4.8 show a linear relationship with the gate count of the fault-free circuit in Fig. 6.1. The trend line of this graph crosses the maximum value of the fault-free IDDQ at approximately 23,000 gates, with a similar approach resulting in 22,000 gates for a semi-static implementation. Using the same method, we calculated the upper limits for LP PTMs as 32 nm, 22 nm and 16 nm [96] and found that, by the 16 nm node, the upper limit on the testable size had dropped to approximately 6,000 gates. Therefore, technology-aware methods for optimising the IDDQ test method will certainly be necessary for nodes at and below 16 §6.1 Future work 117

nm. One example of such a method is power-gating in which a large circuit is parti- tioned into smaller blocks with separate power supplies, each of which can be switched off using a low leakage switch [88]. Considering the calculated upper limits shown in Fig. 6.1, an NCL circuit can be partitioned into blocks of 22,000 gates or less for a power-gating IDDQ test. Furthermore, it is possible to partition the DUT into smaller blocks using built-in current sensors (BICSs) [87] which provide highly sensitive current measurements, decrease the waiting time required for the IDDQ current to settle before being measured and improve the accuracy of a test as they are less susceptible to noise.

For effective IDDQ testing of complex multi-million gate circuits, a combination of par- titioning, power gating, on-chip current sensors and process-aware IDDQ improvement techniques might be required in future work.

6.1.3 Techniques for improving DFT

In this work, we focused on the fault coverages testing time and area overhead of the pro- posed DFT elements as quantitative enhancements to those presented in the literature. Also, eliminating the clock from the test hardware was a qualitative improvement that could result in lower power consumption. However, the time limit for this project did not allow a numerical analysis and simulation of power consumption for comparison with that of clocked test hardware. Therefore, a power analysis of the DFT techniques presented remains a future task. Overall, while this study established the fundamentals of a clockless self-timed ATPG for NCL circuits, it is time to invest in applying the same principles to differ- ent versions of these techniques and help consolidate this advantageous design method for industrial use. 118 Conclusions and Recommendations for Future Work Bibliography

1. M. M. Waldrop, “The chips are down for Moore’s law,” Nature News, vol. 530, no. 7589, pp. 144–147, 2016. (cited on pages xvii, 2, and 4)

2. M. Fritze, P. Cheetham, J. Lato, and P. Syers, (2016, Feb. 18). The death of Moore’s Law. [Online]. Available: http://www.potomacinstitute.org/steps/featured-articles/63- the-death-of-moore-s-law. (cited on pages xvii and 3)

3. A. Bink and R. York, “ARM996HS: the first licensable, clockless 32-bit processor core,” IEEE Micro, vol. 27, no. 2, pp. 58–68, 2007. (cited on pages xvii, 5, 6, 8, and 11)

4. H. Solutions, (2004, May). HT-80C51 microcontroller leaflet. [Online]. Available: http://www.keil.com/dd/docs/datashts/handshake/ht80c51.pdf. (cited on pages xvii, 8, and 9)

5. R. D. Jorgenson, L. Sorensen, D. Leet, M. S. Hagedorn, D. R. Lamb, T. H. Friddell, and W. P. Snapp, “Ultralow-power operation in subthreshold regimes applying clockless logic,” Proc. IEEE, vol. 98, no. 2, pp. 299–314, 2010. (cited on pages xvii, xxi, 12, 13, and 14)

6. B. Parhami, “Defect, fault, error,..., or failure?” IEEE Trans. Reliab., vol. 46, no. 4, pp. 450–451, 1997. (cited on pages xvii and 32)

7. B. Benware, C. Schuermyer, M. Sharma, and T. Herrmann, “Determining a failure root cause distribution from a population of layout-aware scan diagnosis results,” IEEE Des. Test Comput., vol. 29, no. 1, pp. 8–18, 2012. (cited on pages xvii and 34)

8. U. Okoroanyanwu, J. Kye, H. J. Levinson, N. Yamamoto, and K. Cummings, “Prospects & challenges of defectivity in water immersion lithography,” in Pre- sentation at the 2nd International Symposium on Immersion Lithography, 2005. (cited on pages xvii and 34)

9. D. Snider, “A study in PCB failure analysis: the intermittent connection,” SMT Magazine - Test and Inspection, vol. 28, no. 9, pp. 76–80, 2013. (cited on pages xvii and 34)

10. W. Maly, “Realistic fault modeling for VLSI testing,” in Proc. ACM/IEEE Design Automation Conf., 1987, pp. 173–180. (cited on pages xvii and 34)

119 120 BIBLIOGRAPHY

11. C. L. Henderson and J. M. Soden, “Signature analysis for IC diagnosis and failure analysis,” in Proc. IEEE Int. Test Conf., 1997, pp. 310–318. (cited on pages xvii and 34)

12. Z. Navabi, Digital system test and testable design: using HDL models and architectures. Springer, 2011. (cited on pages xvii, 15, 34, 35, 36, 40, 51, 65, 76, and 106)

13. M. Bushnell and V. Agrawal, Essentials of electronic testing for digital, memory and mixed-signal VLSI circuits. Kluwer Academic Publishers, 2000. (cited on pages xviii, 20, 33, 36, 38, 45, 46, 100, and 116)

14. R. Reese, S. Smith, and M. Thornton, “Uncle - An RTL approach to asynchronous design,” in Proc. IEEE Int. Symp. on Asynchronous Circuits and Systems, 2012, pp. 65–72. (cited on pages xviii, 11, 53, 72, 88, and 108)

15. P. Goel and B. C. Rosales, “PODEM-X: an automatic test generation system for VLSI logic structures,” in Proc. IEEE Design Automation Conf., 1981, pp. 260–268. (cited on pages xviii, 17, 38, 62, and 64)

16. K. M. Fant, Logically determined design: clockless system design with Null convention logic. Wiley-Interscience, 2005. (cited on pages xviii, 12, 15, 25, 30, 31, 94, and 98)

17. N. Nemati, M. Namaki-Shoushtari, and Z. Navabi, “A mixed HDL/PLI test pack- age,” in Proc. IEEE East-West Design Test Symp., 2010, pp. 518–523. (cited on pages xix, 16, 70, 72, 108, and 129)

18. International Technology Roadmap for Semiconductors, (2007). ITRS reports - De- sign. [Online]. Available: http://www.itrs2.net/itrs-reports.html. (cited on pages xxi, 5, and 7)

19. International Technology Roadmap for Semiconductors, (2011). ITRS reports - Ta- bles. [Online]. Available: http://www.itrs2.net/2011-itrs.html. (cited on pages xxi, 5, and 7)

20. A. B. Kahng, “Scaling: more than Moore’s Law.” IEEE Des. Test Comput., vol. 27, no. 3, pp. 86–87, 2010. (cited on pages 2 and 4)

21. Y. Lu and V. D. Agrawal, “Total power minimization in glitch-free CMOS circuits considering process variation,” in Proc. IEEE Int. Conf. on VLSI Design, 2008, pp. 527–532. (cited on pages 3 and 5)

22. D. Piedra, B. Lu, M. Sun, Y. Zhang, E. Matioli, F. Gao, J. W. Chung, O. Saadat, L. Xia, M. Azize et al., “Advanced power electronic devices based on Gallium Nitride (GaN),” in Proc. IEEE Int. Electron. Devices Meeting, 2015, pp. 16.6.1–16.6.4. (cited on page 4) BIBLIOGRAPHY 121

23. S. Panth, K. Samadi, Y. Du, and S. K. Lim, “High-density integration of functional modules using monolithic 3D-IC technology,” in Proc. IEEE Asia and South Pacific Design Automation Conf., 2013, pp. 681–686. (cited on page 4)

24. R. Skibba, (2015, Dec. 9). Stanford-led skyscraper-style chip design boosts electronic performance by factor of a thousand [Online]. Available: http://news.stanford.edu/2015/12/09/n3xt-computing-structure-120915/. (cited on page 4)

25. C. P. Williams, Explorations in quantum computing. Springer Science & Business Media, 2010. (cited on page 4)

26. S. Furber, “Large-scale neuromorphic computing systems,” J. Neural Eng., vol. 13, no. 5, p. 051001, 14 pages, 2016. (cited on page 4)

27. S. Moore, R. Anderson, R. Mullins, G. Taylor, and J. J. Fournier, “Balanced self- checking asynchronous logic for smart card applications,” Microprocessors and Mi- crosystems, vol. 27, no. 9, pp. 421–430, 2003. (cited on pages 5 and 6)

28. S. C. Smith, W. K. Al-Assadi, and J. Di, “Integrating asynchronous digital design into the computer engineering curriculum,” IEEE Trans. Edu., vol. 53, no. 3, pp. 349–357, 2010. (cited on page 5)

29. International Technology Roadmap for Semiconductors, (2011). ITRS reports - De- sign. [Online]. Available: http://www.itrs2.net/2011-itrs.html. (cited on pages 5, 10, and 11)

30. International Technology Roadmap for Semiconductors, (2011). ITRS reports - Test. [Online]. Available: http://www.itrs2.net/2011-itrs.html. (cited on pages 6 and 12)

31. S. Senay, L. F. Chaparro, M. Sun, R. Sclabassi, and A. Akan, “Asynchronous sig- nal processing for brain-computer interfaces,” in Proc. IEEE Int. Conf. on Elect. and Electron. Eng., 2009, pp. II–30. (cited on page 6)

32. A. Can, E. Sejdic, O. Alkishriwo, and L. F. Chaparro, “Compressive asynchronous decomposition of heart sounds,” in Proc. IEEE Workshop on Statistical Signal Process., 2012, pp. 736–739. (cited on page 6)

33. M. Yin and M. Ghovanloo, “A flexible clockless 32-ch simultaneous wireless neural recording system with adjustable resolution,” in Proc. IEEE Solid-State Circuits Conf., 2009, pp. 432–433. (cited on page 6)

34. Y. Chae, S. Jo, and J. Jeong, “Brain-actuated humanoid robot navigation control using asynchronous brain-computer interface,” in Proc. IEEE/EMBS Int. Conf. on Neural Eng., 2011, pp. 519–524. (cited on page 6) 122 BIBLIOGRAPHY

35. J. K. Tugnait and T. Li, “Blind detection of asynchronous CDMA signals in mul- tipath channels using code-constrained inverse filter criterion,” IEEE Trans. Signal Process., vol. 49, no. 7, pp. 1300–1309, 2001. (cited on page 6)

36. J. Ma and J. K. Tugnait, “Blind detection of multirate asynchronous CDMA signals in multipath channels,” IEEE Trans. Edu., vol. 50, no. 9, pp. 2258–2272, 2002. (cited on page 6)

37. C. Botteron, A. Host-Madsen, and M. Fattouche, “Cramer-Rao bounds for the estimation of multipath parameters and mobiles’ positions in asynchronous DS- CDMA systems,” IEEE Trans. Signal Process., vol. 52, no. 4, pp. 862–875, 2004. (cited on page 6)

38. D. S. Pham, A. M. Zoubir, R. F. Brcic, and Y. H. Leung, “A nonlinear M-estimation approach to robust asynchronous multiuser detection in non-Gaussian noise,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 1624–1633, 2007. (cited on page 6)

39. F. Rothacher and N. Felber, “VLSI implementation of a fully digital asynchronous audio sample-rate converter,” in Audio Engineering Society Convention 96, 1994. (cited on page 6)

40. P. Midya, B. Roeckner, and T. Schooler, “Asynchronous sample rate converter for amplifiers,” in Audio Engineering Society Convention 121, 2006. (cited on page 6)

41. V. N. Manyam, D. Chhetri, and J. J. Wikner, “Clockless asynchronous delta mod- ulator based ADC for smart dust applications,” in Proc. IEEE/IFIP Int. VLSI and System-on-Chip (VLSI-SoC), 2011, pp. 331–336. (cited on page 6)

42. T. Beyrouthy and L. Fesquet, “An asynchronous FPGA block with its tech-mapping algorithm dedicated to security applications,” Int. J. Reconfigurable Computing, vol. 2013, 2013, Hindawi Publishing Corporation, Article ID 517947, 12 pages. (cited on page 6)

43. T.-Y. Meng, R. W. Brodersen, and D. G. Messerschmitt, “Asynchronous design for programmable digital signal processors,” IEEE Trans. Signal Process., vol. 39, no. 4, pp. 939–952, 1991. (cited on page 6)

44. G. M. Jacobs and R. W. Brodersen, “A fully asynchronous digital signal processor using self-timed circuits,” IEEE J. Solid-State Circuits, vol. 25, no. 6, pp. 1526–1537, 1990. (cited on page 6)

45. Y.-T. Chang, C.-C. Fang, H.-Y. Tsai, W.-M. Cheng, C.-J. Chen, and F.-C. Cheng, “A self-timed two-stage flexible ALU implementation,” in Proc. IEEE Global Conf. on Consumer Electron., 2012, pp. 378–381. (cited on page 6) BIBLIOGRAPHY 123

46. C. LaFrieda, B. Hill, and R. Manohar, “An asynchronous FPGA with two-phase enable-scaled routing,” in Proc. IEEE Symp. on Asynchronous Circuits and Systems, 2010, pp. 141–150. (cited on page 6)

47. D. Shang, F. Xia, and A. Yakovlev, “Asynchronous FPGA architecture with dis- tributed control,” in Proc. IEEE Int. Symp. on Circuits and Systems, 2010, pp. 1436– 1439. (cited on page 6)

48. C. Wong, A. Martin, and P. Thomas, “An architecture for asynchronous FPGAs,” in Proc. IEEE Int. Conf. on Field-Programmable Technology, 2003, pp. 170–177. (cited on page 6)

49. S. Hauck, S. Burns, G. Borriello, and C. Ebeling, “An FPGA for implementing asynchronous circuits,” IEEE Des. Test Comput., vol. 11, no. 3, pp. 60–69, 1994. (cited on page 6)

50. S. Ramaswamy, L. Rockett, D. Patel, S. Danziger, R. Manohar, C. W. Kelly, J. L. Holt, V. Ekanayake, and D. Elftmann, “A radiation hardened reconfigurable FPGA,” in Proc. IEEE Aerospace Conf., 2009, pp. 1–10. (cited on page 6)

51. P. Clark, (2012, Apr. 24). Achronix reveals 22-nm FPGAs, courtesy of Intel. [Online]. Available: http://www.eetimes.com/document.asp?doc_id=1270866. (cited on page 6)

52. N. Rajapaksha, A. Edirisuriya, A. Madanayake, R. J. Cintra, D. Onen, I. Amer, and V. S. Dimitrov, “Asynchronous realization of algebraic integer-based 2D DCT using Achronix Speedster SPD60 FPGA,” J. Control Science and Eng., vol. 2013, p. 1, 2013, Hindawi Publishing Corp., Article ID 834793, 9 pages. (cited on page 6)

53. K.-L. Chang, J. S. Chang, B.-H. Gwee, and K.-S. Chong, “Synchronous-logic and asynchronous-logic 8051 microcontroller cores for realizing the Internet of Things: a comparative study on dynamic voltage scaling and variation effects,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 3, no. 1, pp. 23–34, 2013. (cited on page 8)

54. K.-S. Chong, K.-L. Chang, B.-H. Gwee, and J. S. Chang, “Synchronous-logic and globally-asynchronous-locally-synchronous (GALS) acoustic digital signal proces- sors,” IEEE J. Solid-State Circuits, vol. 47, no. 3, pp. 769–780, 2012. (cited on page 9)

55. A. Can, E. Sejdic, and L. Chaparro, “Asynchronous sampling and reconstruction of sparse signals,” in Proc. IEEE European Signal Process. Conf., 2012, pp. 854–858. (cited on page 10)

56. T. T. Nguyen, K.-N. Le-Huu, T. H. Bui, and A.-V. Dinh-Duc, “A new approach and tool in verifying asynchronous circuits,” in Proc. IEEE Int. Conf. on Advanced Technologies for Commun., 2012, pp. 152–157. (cited on page 10) 124 BIBLIOGRAPHY

57. J. Sparso, “Current trends in high-level synthesis of asynchronous circuits,” in Proc. IEEE Int. Conf. on Electronics, Circuits, and Systems, 2009, pp. 347–350. (cited on page 11)

58. D. Edwards and A. Bardsley, “Balsa: An asynchronous hardware synthesis lan- guage,” Comput. J., vol. 45, no. 1, pp. 12–18, 2002. (cited on page 11)

59. A. Efthymiou, “Initialization-based test pattern generation for asynchronous cir- cuits,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 4, pp. 591–601, April 2010. (cited on pages 12, 17, and 21)

60. M. Ligthart, K. Fant, R. Smith, A. Taubin, and A. Kondratyev, “Asynchronous design using commercial HDL synthesis tools,” in Proc. IEEE Int. Symp. on Advanced Research in Asynchronous Circuits and Systems, 2000, pp. 114–125. (cited on page 12)

61. J. Di and S. C. Smith, “Ultra-low power multi-threshold asynchronous circuit de- sign,” US Patent 7 977 972, Jul., 12, 2011. (cited on pages 12, 30, and 31)

62. A. Kondratyev, L. Sorensen, and A. Streich, “Testing of asynchronous designs by “inappropriate” means. Synchronous approach,” in Proc. IEEE Int. Symp. on Asyn- chronous Circuits and Systems, 2002, pp. 171–180. (cited on pages 14, 15, 18, 21, 55, 93, 113, and 115)

63. W. Al-Assadi and S. Kakarla, “Design for test of asynchronous Null convention logic (NCL) circuits,” J. Electron. Test., vol. 25, no. 1, pp. 117–126, 2009. (cited on pages 14, 15, 19, 20, 21, 55, 93, 113, and 115)

64. C.-H. Cheng and J.-M. Li, “An asynchronous design for testability and implemen- tation in thin-film transistor technology,” J. Electron. Test., vol. 27, no. 2, pp. 193–201, 2011. (cited on pages 14, 18, 21, 55, 99, 100, and 115)

65. C. LaFrieda and R. Manohar, “Fault detection and isolation techniques for quasi delay-insensitive circuits,” in Proc. IEEE Int. Conf. on Dependable Systems and Net- works, 2004, pp. 41–50. (cited on pages 14, 18, 21, and 115)

66. R. Dobai and E. Gramatová, “A novel automatic test pattern generator for asyn- chronous sequential digital circuits,” Microelectronics J., vol. 42, no. 3, pp. 501–508, 2011. (cited on page 17)

67. F. Shi and Y. Makris, “SPIN-TEST: automatic test pattern generation for speed- independent circuits,” in Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, 2004, pp. 903–908. (cited on page 17)

68. O. Roig, J. Cortadella, M. A. Peña, and E. Pastor, “Automatic generation of syn- chronous test patterns for asynchronous circuits,” in Proc. ACM annu. Design Au- tomation Conf., 1997, pp. 620–625. (cited on page 17) BIBLIOGRAPHY 125

69. A. Efthymiou, J. Bainbridge, and D. Edwards, “Test pattern generation and partial- scan methodology for an asynchronous SoC interconnect,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 12, pp. 1384–1393, 2005. (cited on pages 17 and 21)

70. Y.-S. Kang, K.-H. Huh, and S. Kang, “New scan design of asynchronous sequential circuits,” in Proc. IEEE Asia Pacific Conf. on ASICs, 1999, pp. 355–358. (cited on page 18)

71. V. Satagopan, B. Bhaskaran, W. Al-Assadi, and S. C. Smith, “Automation in design for test for asynchronous Null conventional logic (NCL) circuits,” in NASA Symp. on VLSI Design, 2005. (cited on pages 19 and 93)

72. V. Satagopan, B. Bhaskaran, W. K. Al-Assadi, S. C. Smith, and S. Kakarla, “DFT techniques and automation for asynchronous Null conventional logic circuits,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 10, pp. 1155–1159, 2007. (cited on pages 19 and 20)

73. S. Kakarla and W. K. Al-Assadi, “Testing of asynchronous Null conventional logic (NCL) circuits,” in Proc. IEEE Region 5 Conf., 2008, pp. 1–6. (cited on page 19)

74. W. Al-Assadi and S. Kakarla, “Design for test of asynchronous Null convention logic (NCL) circuits,” in Proc. IEEE Int. Test Conf. (ITC), 2008, pp. 1–9. (cited on pages 19, 110, and 115)

75. C.-L. Chang and C. H.-P. Wen, “Demystifying IDDQ data with process variation for automatic chip classification,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 6, pp. 1175–1179, 2015. (cited on pages 20 and 46)

76. M. Roncken and E. Bruls, “Test quality of asynchronous circuits: a defect-oriented evaluation,” in Proc. IEEE Int. Test Conf., 1996, pp. 205–214. (cited on pages 20, 75, 76, and 114)

77. D. B. Armstrong, “On finding a nearly minimal set of fault detection tests for combinational logic nets,” IEEE Trans. Electron. Comput., vol. EC-15, no. 1, pp. 66– 73, 1966. (cited on page 38)

78. J. P. Roth, “Diagnosis of automata failures: a calculus and a method,” IBM J. Re- search and Develop., vol. 10, no. 4, pp. 278–291, 1966. (cited on page 38)

79. N. Nemati, A. Simjour, A.-A. Ghofrani, and Z. Navabi, “Optimizing parametric BIST using bio-inspired computing algorithms,” in Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems, 2009, pp. 268–276. (cited on page 43) 126 BIBLIOGRAPHY

80. M. Kamal, M. S. Jelodar, and S. Hessabi, “GABIST: A new methodology to find near optimal LFSR for BIST structure,” in Proc. IEEE Int. Conf. on Electronics, Circuits and Systems, 2007. (cited on page 43)

81. M. Murase, “Linear feedback shift register,” US Patent 5 090 035, Feb., 18, 1992. (cited on page 43)

82. S. R. Makar and E. J. McCluskey, “Some faults need an IDDQ test,” in Proc. IEEE Int. Workshop on IDDQ Testing, 1996, pp. 102–103. (cited on page 45)

83. S. Ma, P. Franco, and E. McCluskey, “An experimental chip to evaluate test tech- niques experiment results,” in Proc. IEEE Int. Test Conf., 1995, pp. 663–672. (cited on page 45)

84. C. Hawkins, J. Soden, A. Righter, and F. Ferguson, “Defect classes-an overdue paradigm for CMOS IC testing,” in Proc. IEEE Int. Test Conf., 1994, pp. 413–425. (cited on page 45)

85. S. S. Sabade and D. Walker, “IDDQ test: will it survive the DSM challenge?” IEEE Des. Test Comput., vol. 19, no. 5, pp. 8–16, 2002. (cited on page 46)

86. N. Jarrige and I. Kandah, “Quiescent current (IDDQ) indication and testing appa- ratus and methods,” US Patent 8 476 917, Jul., 2, 2013. (cited on page 46)

87. X. Xu, Y. Guo, S. Zhang, W. Zhang, X. Zhang, and Y. Zhang, “On-chip current test circuit,” US Patent 20 150 323 590, Nov., 12, 2015. (cited on pages 46 and 117)

88. R. Rajsuman, “IDDQ testing for CMOS VLSI,” Proc. IEEE, vol. 88, no. 4, pp. 544– 568, 2000. (cited on pages 46 and 117)

89. R. Soundararajan, A. Srivastava, and S. S. Yellampalli, “∆IDDQ testing of a CMOS digital-to-analog converter considering process variation effects,” Circuits and Sys- tems, vol. 2, no. 03, pp. 133–138, 2011. (cited on page 46)

90. D. Narayen, N. Singh, G. Ponnuvel, H. Kumar, L. Nasser, and C. Nishizaki, “System and method for compensating measured IDDQ values,” US Patent 9 007 079, Apr., 14, 2015. (cited on page 46)

91. Y. Tsiatouhas, Y. Moisiadis, T. Haniotakis, D. Nikolos, and A. Arapoyanni, “A new technique for IDDQ testing in nanometer technologies,” Integration, the VLSI J., vol. 31, no. 2, pp. 183–194, 2002. (cited on page 46)

92. C. L. Chang, C. H. P. Wen, and J. Bhadra, “Process-variation-aware IDDQ diagnosis for nano-scale CMOS designs - the first step,” in Proc. IEEE Design, Automation Test in Europe Conf. Exhibition, 2013, pp. 454–457. (cited on page 46) BIBLIOGRAPHY 127

93. M. Shintani and T. Sato, “An adaptive current-threshold determination for IDDQ testing based on Bayesian process parameter estimation,” in Proc. IEEE Asia and South Pacific Design Automation Conf., 2013, pp. 614–619. (cited on page 46)

94. T. Larrabee, “Efficient generation of test patterns using Boolean difference,” in Proc. Int. Test Conf., 1989, pp. 795–801. (cited on pages 49 and 51)

95. P. Kabisatpathy, A. Barua, and S. Sinha, Fault diagnosis of analog integrated circuits. Springer Science & Business Media, 2005. (cited on page 76)

96. Arizona State University, (2012). Predictive technology model (PTM). [Online]. Avail- able: http://ptm.asu.edu/. (cited on pages 106, 116, and 131)

97. H. Fujiwara and T. Shimono, “On the acceleration of test generation algorithms,” IEEE Trans. Comput., vol. C-32, no. 12, pp. 1137–1144, Dec 1983. (cited on page 115)

98. T. Kirkland and M. R. Mercer, “A topological search algorithm for ATPG,” in Proc. ACM/IEEE Design Automation Conf., 1987, pp. 502–508. (cited on page 115)

99. M. H. Schulz, E. Trischler, and T. M. Sarfert, “SOCRATES: a highly efficient auto- matic test pattern generation system,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 7, no. 1, pp. 126–137, 1988. (cited on page 116)

100. S. T. Chakradhar, V. D. Agrawal, and S. G. Rothweiler, “A transitive closure al- gorithm for test generation,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 12, no. 7, pp. 1015–1028, 1993. (cited on page 116) 128 BIBLIOGRAPHY Appendix A

HDL/PLI Environment

In this work a mixed HDL/PLI environment [17] is used for implementation and eval- uation of the proposed test techniques. PLI (Procedural Language Interface) is a library of C functions which gives access to the internal data structure of circuits implemented in HDL Verilog. The main reason for using this environment for testing NCL circuits is that it gives the test program access to the HDL simulation engine. As mentioned before, due to the lack of deterministic timing and clock signals in NCL, timing of test algorithms is managed using the handshaking signals coming from the NCL DUT. Therefore, it is necessary to have a strong HDL simulation engine capable of simulating delays and managing the timing of the circuit. In the mixed HDL/PLI environment, PLI functions are in charge of accessing the DUT’s data structure, tracing the circuit’s graph, reading and writing values and delays into the circuit and finding connections between different objects in the circuit. Also managing complex data structure like Hash, Stack, FIFO, etc, is done in the PLI envi- ronment. The HDL side of the environment, on the other side, is in charge of input application, output collection, HDL simulation, delay management, and giving access to waveforms of the circuit.

Fig. A.1. Building blocks of HDL/PLI test environment [17]

Many conventional test tools written for testing synchronous logic use only C/C++

129 130 HDL/PLI Environment

or other programming languages. In such cases the test tools needs to implement a simple HDL simulator to propagate the applied values throughout the circuit under test. In NCL testing, however, circuit simulation plays an even more important role, as it has to generate the signals that control the timing of test. So, if a test engineer decides to implement test algorithms for NCL in pure C/C++, they will have to also write a comprehensive HDL simulator that understands the NCL gates and circuit architectures and generates the handshaking signals correctly. On the other hand, the HDL/PLI environment empowers the test engineer not only with strong HDL simulator and debugging tools like waveforms, but also eliminates the need for rebuilding the huge data structure of the DUT for C/C++ access functions. Another advantage of using the HDL/PLI test environment is that because test tech- niques are implemented in HDL, threshold gates will naturally show their hysteresis behavior and there is no need to introduce their behavior to the test program. PLI can even give access to GIFs, so fault injection on these wires and reading their values is possible. Fig. A.1 shows the main components of the HDL/PLI test environment. Given these reasons, one could see why using HDL/PLI test environment for testing NCL circuits is more beneficial than using pure C/C++ environment.

A.1 PLI functions

PLI utilities that are used in this work are briefly introduced in the following subsections.

A.1.1 Fault injection and removal

The most important utilities for implementing test algorithms are fault injection (FI) and fault removal (FR). As noted, PLI provides mechanisms for reading and writing net and reg values. Therefore we can force and release values in the data structures corresponding to nets, which gives us capabilities for fault injection and fault removal on and from circuit lines. PLI also enables us to have control on delays and to check changes on net and reg values.

A.1.2 Fault collapsing

Another process needed in many test applications is fault collapsing (FC). Reference [2] discusses a line FC method that is based on gate types that circuit lines are connected to the inputs of. Since PLI allows tracing all design hierarchies down to gate primitives, and allows us to identify primitive types that a line drives, the FC method of [2] is easily implemented by PLI routines. Primitive types, that are a decisive factor in stuck-at fault values of gates, can be looked up with PLI routines. Appendix B

Effect of Sizing on NCL Gate Delays

Here, we analyse the impact of transistor sizing on the rise time (trise), fall time (t f all) and propagation delay (tdelay) of a TH22 NCL gate. The HSpice model of this TH22 gate is made using the 45nm LP PTM from [96], where length of each transistor (Lmin) is 45 nm and width of the NMOS transistors in the GoToData network (Wn) is 180 nm. We keep these values fixed during simulation while changing the width of the PMOS transistors in the GoToNull network (Wp), that in the HoldNull network (Wph) and the ration of (Wp) to (Wph). Fig. B.1 shows the effect of changing the size of PMOS transistors in the GotoNull network (Wp) of the TH22 gate. The horizontal axis of this graph shows the value of Wp divided by 22.5 nm. Since Wn is 180 nm, Wn/22.5 maps on point 8 of this axis.

Fig. B.1. Effect of changes in Wp on TH22 gate delay

As trise mainly depends on Wn, which is a constant in this simulation, its value does not show much variation in the graph of Fig. B.1. On the other hand, as expected, when

Wp is smaller than Wn, t f all is much larger than trise. The larger the value of Wp, the

131 132 Effect of Sizing on NCL Gate Delays

closer these two delay values. Finally, when Wp/22.5 reaches 16 and Wp/Wn = 2, these two delay values become equal.

The propagation delay (tdelay) of the gate slightly increases with Wp as larger Wp means larger capacitor on the output of the gate.

Changing Wph, however, does not have such an effect on t f all of TH22, and as de- picted in Fig. B.2 the values of t f all and trise remain close and show very slight changes with the increase in Wph. This is because t f all and trise are related to the GoToNull and GoToData networks, respectively, which are not much affected by the sizing of the Hold- Data network. The effect on the propagation delay of the gate is larger though, as the large PMOS transistors in the HoldData network form a larger capacitor on the output of the gate.

Fig. B.2. Effect of changes in Wph on TH22 gate delay

Comparing the two graphs of Fig. B.1 and B.2 shows that compared to Wp, Wph has a bigger impact on the propagation delay of the TH22 gate. This is because in the HoldData network we have two parallel PMOS gates to form (A+B) whereas in the

GoToData network two series transistors from (AB). So, for the same Wp and Wph the width of the HoldData network is larger than that of the GoToData and so is its effect on the propagation delay of the gate.

Fig. B.3 illustrates the effect of changes in the ratio of Wp/Wph on the gate delays while Wp/22.5 is kept at 12 and Wph is changing relative to that. As shown in the graph, the larger Wph compared to Wp, the larger the gate delays. On the other hand, as Wph becomes smaller than Wp, the gate turns out to be slightly faster. However, the effect is quiet trivial on t f all and trise because these delays mainly depend on the 133

size of the transistors in the GoToNull and GoToData networks. Whereas, the value of

tdelay shows a dramatic correlation to the ratio of Wp/Wph. This is again due to the fact that the transistors in the GoToData network have a series topology while the topology of transistors in the HoldData network is parallel and increases the effect of the size difference in the transistors of these two networks.

Fig. B.3. Effect of changes in Wp/Wph on TH22 gate delay

As mentioned in Chapter 4, fault detection using IDDQ test is more successful for gates with balanced rise and fall delays. In this appendix, we found that a P-to-N width ratio of 2 results in balanced delys for TH22. However, since the topology of the four transistor networks are very different for each of the 27 NCL gates, the balanced P-to-N ratio is different for each of them. In this work, to keep the general, as we did not intend to base the results on careful sizing of transistors, we chose a ration of 5/2 which was showed acceptably balanced delays for all NCL gates. 134 Effect of Sizing on NCL Gate Delays Appendix C

Probability-based Controllability for NCL Circuits

Controllability measurement is an important factor in test algorithms which shows the ease of accessing each internal line in the circuit from the PIs. This measurement is es- pecially important in deterministic test generation algorithms as a heuristic to minimise the search space. Deciding on different options in the search tree is very common and one of the metrics that could prioritise one node to others is its higher controllability. One common method of calculating controllability is based on probability; where

P1(Z) and P0(Z) mean the probability of a gate’s output having value 1 and 0, respec- tively, based on the probability of values on the gate’s inputs. From the basics of NCL, equations (C.1) to (C.4) are given.

Z = ToD + Z∗.HoldD (C.1)

Z = ToN + Z∗.HoldN (C.2)

HoldD = ToN (C.3)

HoldN = ToD (C.4)

From equation (C.3), the following results.

P1(HoldD) = 1 − P0(HoldD) (C.5)

= 1 − P1(HoldD) (C.6)

= 1 − P1(ToN) (C.7)

And similarly from equation (C.4):

135 136 Probability-based Controllability for NCL Circuits

P1(HoldN) = 1 − P1(ToD) (C.8)

From basics of probability we know that

P1(A + B) = P1(A) + P1(B) − P1(A.B) (C.9)

In NCL gates ToD function and HoldD are not independent, but ToD and Z∗ are. So, the probability of having 1 on the output of an NCL gate based on its inputs calculates as follows.

∗ P1(Z) = P1(ToD + (Z .HoldD)) (C.10) ∗ ∗ = P1(ToD) + P1(HoldD)P1(Z ) − P1(ToD)P1(HoldD)P1(Z ) (C.11)

P1(ToD) ⇒ P1(Z) = (C.12) 1 − P1(HoldD) + P1(ToD)P1(HoldD) (C.13)

Similarly for P0(Z) the following drives from equation (C.2).

P1(ToN) P0(Z) = (C.14) 1 − P1(HoldN) + P1(ToN)P1(HoldN) (C.15)

Notice that in order to result equation (C.12) from () we have assumed that P1(Z) = ∗ ∗ P1(Z ). In order to prove that this assumption is correct we should remember that Z ∗ is the previous value of Z. Therefore, assuming P1(Z) 6= P1(Z ) means that one of the followings must be true.

∗ 1. P1(Z) > P1(Z ) ⇒ If the circuit runs for infinite time, all of the gate outputs will eventually become 1.

∗ 2. P1(Z) < P1(Z ) ⇒If the circuit runs for infinite time, all of the gate outputs will eventually become 0.

∗ This proves that the assumption of P1(Z) = P1(Z ) is correct. Alternatively, interested readers are encouraged to use equation (C.16) to calculate ∗ and observe that the conditional probability of Z based on Z is equal to P1(Z) calculated in equation (C.12), i.e., Z is independent of Z∗.

∗ ∗ P1(Z )P1(Z)|Z∗=1 + P0(Z )P1(Z)|Z∗=0 (C.16) 137

In order to double check the correctness of equations (C.12) and (C.14), we check the correctness of equation (C.17).

P1(Z) + P0(Z) = 1 (C.17)

From equations (C.14), (C.8) and (C.11):

1 − P1(HoldD) P0(Z) = (C.18) 1 − (1 − P1(ToD)) + (1 − P1(HoldD))(1 − P1(ToD)) 1 − P (HoldD) = 1 (C.19) 1 − P1(HoldD) + P1(ToD)P1(HoldD)

P1(ToD) P1(Z) + P0(Z) = (C.20) 1 − P1(HoldD) + P1(ToD)P1(HoldD) 1 − P (HoldD) + 1 (C.21) 1 − P1(HoldD) + P1(ToD)P1(HoldD) P (ToD) + 1 − P (HoldD) = 1 1 (C.22) 1 − P1(HoldD) + P1(ToD)P1(HoldD)

From equations ( C.12) and ( C.17) results:

1 − P1(HoldD) + P1(ToD) = 1 − P1(HoldD) + P1(ToD)P1(HoldD) (C.23)

P1(ToD) = P1(ToD)P1(HoldD) (C.24)

From equations (C.17) and (C.5)-(C.8):

1 − P1(ToN) = (1 − P1(ToN))(1 − P1(HoldN)) (C.25)

1 − P1(ToN) = 1 − P1(ToN) − P1(HoldN) (C.26)

+ P1(ToN)P1(HoldN) (C.27)

P1(ToN) = P1(ToN)P1(HoldN) (C.28)

In all 27 standard NCL gates, HoldN is composed of sum of products of inverted inputs, and ToN equals to the product of all inverted inputs. As an example, in TH23, HoldN = A¯B¯ + A¯C¯ + B¯C¯ and ToN = A¯B¯C¯. So, in all NCL gates:

ToN.HoldN = ToN (C.29)

⇒ P1(ToN) = P1(HoldN)P1(ToN) (C.30)

Therefore equation (C.17) is proven to be true which in turn proves (C.12) and (C.14). 138 Probability-based Controllability for NCL Circuits