An Open Source Platform and EDA Tool Framework to Enable Scan Test Power Analysis Testing

Author Ivano Indino

Supervisor Dr Ciaran MacNamee

Submitted for the degree of Master of Engineering University of Limerick June 2017

ii

Abstract An Open Source Platform and EDA Tool Framework to Enable Scan Test Power Analysis Testing Ivano Indino

Scan testing has been the preferred method used for testing large digital integrated circuits for many decades and many electronic design automation (EDA) tools vendors provide support for inserting scan test structures into a design as part of their tool chains. Although EDA tools have been available for many years, they are still hard to use, and setting up a design flow, which includes scan insertion is an especially difficult process. Increasingly high integration, smaller device geometries, along with the requirement for low power operation mean that scan testing has become a limiting factor in achieving time to market demands without compromising quality of the delivered product or increasing test costs. As a result, using EDA tools for power analysis of device behaviour during scan testing is an important research topic for the semiconductor industry.

This thesis describes the design synthesis of the OpenPiton, open research processor, with emphasis on scan insertion, automated test pattern generation (ATPG) and gate level simulation (GLS) steps. Having reviewed scan testing theory and practice, the thesis describes the execution of each of these steps on the OpenPiton design block.

Thus, by demonstrating how to apply EDA based synthesis and design for test (DFT) tools to the OpenPiton project, the thesis addresses one of the most difficult problems faced by any new user who wishes to use existing EDA tools for synthesis and scan insertion, namely, the enormous complexity of the tool chains and the huge and confusing volume of related documentation. The OpenPiton project was selected because it allows a user to implement design synthesis by simply adding library files, a good starting point for research based on scan test. Applying the design flow to a relatively small OpenPiton design block allows many overheads to be eliminated thereby making the flow easier to understand, but it is shown that the techniques can be easily migrated to larger OpenPiton design blocks including synthesis of multicore designs that can mimic today’s large commercial SOC (system on chips) for scan power issues. Additionally, in keeping with the emphasis on mitigating, the thesis shows how a design flow for the OpenPiton design block can be created using several EDA tools, with techniques to support power analysis or estimation being highlighted at various points in the flow. As a result of this work, readers should be able to set up an entire flow and reach a stage of data generation for scan power analysis in a shorter duration. This will allow engineers to focus on new approaches for scan power test mitigation and means that the re-iteration of the flow for data collection will become a much more manageable task.

iii

Declaration

I hereby declare that this thesis is entirely my own work and does not contain material previously published by other author, except where due reference or acknowledgment has been made. Furthermore, I declare that it has not been submitted to any other university or higher education institution, or for any other academic award in this university.

Signed: ______

Ivano Indino

June 2017

iv

Acknowledgements

I would like to express my sincere gratitude to my advisor and supervisor Dr Ciaran MacNamee for the continuous support of my Masters research study, for his patience, motivation, and immense knowledge. His guidance helped me for both the time of research and the writing of this thesis. I could not have imagined having a better advisor and mentor for my Masters study.

My sincere thanks also goes to Paul T. Donovan (my former Intel manager) that provided me with the opportunity to join his team allowing me to speed up my learning and gain the necessary knowledge to complete this study. I would also thank its entire team for all the support and training provided. Without their precious support it would have not been possible to conduct this research. Last but not the least, I would like to thank my family: my children for their patience and understanding throughout the writing of this thesis and my partner that believed in me.

v

I. Table of Contents

1 Scan testing: introduction ...... 1

Introduction ...... 1

The importance of testing ...... 1

Scope of this research ...... 3

Motivation for this work ...... 4

Difficulty of dealing with IP and proprietary data ...... 5

1.5.1 Why the need for an open-source and a mature project ...... 6

Outcomes, publication and conclusions ...... 7

Organization of the thesis ...... 8

2 IC Test, DFT techniques, scan testing and current research ...... 11

Introduction ...... 11

From functional test to scan testing ...... 11

Problems introduced by new technology nodes: from stuck at and at speed to a cell aware fault model ...... 13

Power consumption during scan testing ...... 16

Design for testability techniques ...... 18

2.5.1 Ad-hoc techniques ...... 19

2.5.2 Scan testing ...... 20

2.5.3 BIST (built in self-test) ...... 22

Scan test architectures and application ...... 23

Scan architectures ...... 23

2.6.1 Scan test techniques ...... 24

2.6.2 Clock signals for scan testing ...... 26

Timing and power problems during scan testing ...... 29

Current research with respect to timing and power issues ...... 32

2.8.1 Test Power ...... 33

vi

2.8.2 Switching reduction schemes based on pattern generation ...... 35

2.8.3 Schemes based on structural design elements ...... 36

Potential tools and methods to assess power violation issues during scan testing ... 38

Summary and conclusions ...... 40

3 Open-source project selection, synthesis and scan insertion ...... 43

Introduction ...... 43

Selecting an open-source project ...... 44

Generic synthesis and scan insertion flow using DC ...... 47

Standard cell libraries ...... 52

Case study: synthesis and scan insertion Dynamic_node of OpenPiton ...... 53

3.5.1 Design compiler reference methodology ...... 57

3.5.2 Alteration of the synthesis flow to add scan insertion to the design ...... 60

3.5.3 Synthesis implementation for Dynamic_node block of OpenPiton ...... 61

Conclusion ...... 64

4 ATPG environment setup and execution ...... 66

Introduction ...... 66

Scan configuration options ...... 67

Full chip ATPG vs hierarchical ATPG ...... 70

Standard and low power scan patterns ...... 72

The importance of fault models ...... 75

ATPG basic setup and execution steps ...... 78

4.6.1 Standard cell libraries and fault count...... 81

4.6.2 Setting up an ATPG environment ...... 83

4.6.3 Dofiles and main Mentor commands ...... 87

4.6.4 Test-procedure files ...... 88

Test case: ATPG environment setup for Dynamic_node of OpenPiton ...... 91

Fault reports and fault grading ...... 102

vii

How to use ATPG tools to support GLS and silicon fails debug ...... 105

4.9.1 One-Hot patterns ...... 105

4.9.2 ATPG diagnosis tool ...... 108

Silicon failure analysis debug ...... 111

Conclusion ...... 113

5 Simulation from GLS to power analysis ...... 115

Introduction ...... 115

HDL simulation software ...... 115

Need for GLS execution in the design life cycle ...... 117

Timing analysis in physical design ...... 120

Basics of timing analysis ...... 122

5.5.1 Static timing analysis ...... 122

5.5.2 What is setup and hold time?...... 124

5.5.3 Recommendation to reduce and eliminate setup-hold violations ...... 125

5.5.4 Dynamic timing analysis ...... 127

Steps required to run GLS using Synopsys VCS...... 130

5.6.1 GLS Execution and results of test-case Dynamic_node module ...... 135

GLS failures ...... 138

Using GLS to generate output files for power analysis ...... 141

How to use vcd files as an input to power analysis ...... 142

Summary and conclusions ...... 146

6 Conclusions, outcomes and future work ...... 148

Introduction ...... 148

Findings and learning ...... 148

Possibility for future study ...... 150

Conclusions ...... 155

References ...... 158

viii

Appendix A: Ad-hoc techniques: observation and control point insertion ...... 1

Appendix B: Synopsys reference methodology generation ...... 3

Appendix C: Synopsys RMgen dc.tcl scrip added content with DFT synthesis set to TRUE .. 5

Appendix D: accessing Mentor tools ...... 12

ix

II Table of Figures

Figure 1 Project flow implementation ...... 3 Figure 2-1 Timeline showing test technologies for test cost reduction ...... 12 Figure 2-2 CPU scaling showing transistor density, power consumption, and efficiency ..... 17 Figure 2-3 D-type Flip-Flop vs Mux-D Scan Cell ...... 20 Figure 2-4 BIST architecture block diagram ...... 22 Figure 2-5 Scan cell designs including muxed-D scan cell (a), clocked-scan cell (b), and an LSSD scan cell (c) ...... 23 Figure 2-6 Waveform for LOS scan test technique ...... 24 Figure 2-7 Waveform for LOC scan test technique ...... 27 Figure 2-8 LOC using late-late-early waveform (late scan in, late launch, early capture) .... 28 Figure 2-9 LOC using early-late-early (early scan-in, late launch, early capture) ...... 28 Figure 2-10 LOC using two different periods ...... 29 Figure 2-11 Open via defect on Metal power line ...... 34 Figure 2-12 International Technology Roadmap for Semiconductors (ITRS) evolution of production of IC and future targets ...... 38 Figure 3-1 Basic Synthesis Flow ...... 48 Figure 3-2 OpenPiton high level directory structure ...... 54 Figure 3-3 OpenPiton directory structure and dynamic-node module content ...... 55 Figure 3-4 Synopsys director structure in OpenPiton ...... 56 Figure 3-5 tools directory content in OpenPiton ...... 56 Figure 3-6 Changes to dc_setup_filename.tcl when enabling DFT during RMgen process ... 59 Figure 3-7 Synopsys RM to OpenPiton patching: synrm_path succesfull output example .... 62 Figure 4-1 Cell Aware generation flow ...... 76 Figure 4-2 Improving PPM (Source: Mentor Graphics) ...... 78 Figure 4-3 ATPG directory structure ...... 84 Figure 4-4 Test coverage report or a 4 input OR gate from Libcomp Mentor Graphics ...... 82 Figure 4-5 a) Scan flop view from Mentor Graphic Tessent Visualizer software b) Internal structure of a scan flop. Source Mentor Graphic Visualizer ...... 82 Figure 4-6 Pattern increase CA versus SA for 10 different designs ...... 83 Figure 4-7 Stats reports for pattern set generated with power controller setting: shift on - switching_threshold_percentage 25 -rejection_threshold_percentage 30...... 98

x

Figure 4-8 Scan control signals set on Dynamic_node module (scan-clock (clk)) in green, scan- enabled (SE) in orange, scan-in data (SI) in red) source Mentor Graphics Visualizer ...... 99 Figure 4-9 Stats reports for pattern set generated with power controller setting: shift on - switching_threshold_percentage 25 -rejection_threshold_percentage OFF ...... 99 Figure 4-10 Cell tracing report of unbalanced chains design ...... 100 Figure 4-11 Cell tracing report of balanced chains design ...... 100 Figure 4-12 Typical test-coverage Vs pattern count curves ...... 102 Figure 4-13 Complete faults classification used by Mentor Graphics ...... 105 Figure 5-1 Ideal vs Real clock waves ...... 121 Figure 5-2 visible "stair case" shape of a chain test pattern shifted through consecutive scan cells elements ...... 138 Figure 5-3 Using a scan chain test to observe failing scan chains (Source: Mentor Graphics) ...... 139 Figure 5-4 Chain fault models determined by chain patterns (shift-in: 0011) (Source: Mentor Graphics) ...... 140 Figure 5-5 Clock vs data signal wave toggle comparison ...... 142 Figure 5-6 uncompressed chains with chain test shifting ...... 143 Figure 6-1. The relationship between defect coverage rates and resultant DPPM levels (Source: Synopsys) ...... 153 Figure 6-2 Original chain set configuration and proposed approach to half the number of scan flops and reduce scan power issues ...... 152 Figure 6-3 Logic transistor scaling with the evolution of technology node ...... 154

xi

III Table of Tables

Table 1 OpenPiton synthesis and back-end flow supported modules ...... 54 Table 2 OpenPiton Synthesis and Back-end Flow Run available commands ...... 61 Table 3 Compression analysis form Mentor Graphics on OpenPitonDynamic_node (configured with 1 uncompressed chain of 2245 cells) ...... 69 Table 4 Test pattern generation log of a stuck-at run on Dynamic node configured with 1 uncompressed chain ...... 71 Table 5 Power Metrics report of test case Dynamic-node for SA faults on 1 uncompressed chains configuration ...... 75 Table 6 Data and results comparison of 10 designs of various sizes ...... 77 Table 7 Log of a post configuration script execution ...... 79 Table 8 TestKompress log showing processing power set and utilized for pattern generation ...... 79 Table 9 Power metric report per pattern (SA uncompressed chain) ...... 97 Table 10 Statistic reports comparison between LOS and LOC transition test coverage ...... 101 Table 11 The three most used Commercial HDL simulators ...... 116 Table 12 Open-source simulators ...... 117 Table 13 VCD file dimension examples for a design with 12 scan-in inputs ...... 142 Table 14 Potential variables for a future study of scan power ...... 151 Table 15 License setup error from Mentor Test-Kompress ...... 12

xii

Abbreviation list

DFT A Design for Testability, 4 AMD DPM Advanced Micro Devices, 15 Defects per Million, 2 AS DRC At Speed, 84 Design Rule Check, 56 ASIC DTA Application Specific , 6, 43 Dynamic Timing Analysis, 127 ATE DVE Automatic Test Equipment, 23 Discovery Visualization Environment, 132 ATPG E Automated Test Pattern Generation, 3 ECSM B Effective Current Source Model, 53 BB EDA Black Boxe, 84 Electronic Design Automation, 3 BIST EDIF Built In Self-Test, 22 Electronic Design Interchange Format, 53 EDT C Embedded Determinist Test, 68 CA F Cell Aware, 70 CAD FC Computer Aided Design, 5 Full Chip, 72 CCS FPGA Composite Current Source, 53 Field Programmable Gate Array, 43 CMOS G Complementary Metal Oxide Semiconductor, 38 CP GDS Control Point, 1 Graphic Database System, 53 CPU GITC Central Processing Unit, 17 Global Instantaneous Toggle Constraints, 40 CUT GLS Circuit Under Test, 22 Gate Level Simulation, 3 GPL D General Public License, 46 DC GTC Design Compiler, 47 Global Toggle Constraint, 40

xiii

GTECH Level Sensitive Scan Design, 23 Generic Technology Independent Design, 47 LVS GUI Layout Versus Schematic, 56 Graphic User Interface, 85 LVT Low Voltage Threshould, 126 H M HDL Hardwary Description Language, 46 MGLS HFPD Mentor Graphic license, 78 High Frequency Power Drop, 39 MOSFET HVT Metal–Oxide–Semiconductor Field-Effect Hight Voltage Threshould, 126 Transistor, 38

I N

IC NCSU Integrated Circuit, 1 North Caroline State University, 52 IOT NLDM/NLPM Internet of Things, 1 non-Linear Delay Model/ non-Linear Power IP Model, 53 Intellectual Properties, 4 NTB ITRS Native Test-Bench, 136 International Technology Roadmap for Semiconductors, 39 O ODDD J On Die Droop Detection, 39 JTAG ORA Joint Test Action Group, 93 Output Response Analyser, 22

L P

LEC PDK Logical Equivalence Checking, 118 Package Development Kit, 52 LEF PDN Library Exchange Format, 53 Power Distribution Network, 17 LGPL PLL Lesser General Public License, 44 Phase Locked Loop, 27 LOC PNG Launch off Capture, 24 Portable Network Graphics, 53 LOS PVT Launch on Shift, 24 Process, Voltage and Temperature, 50 LSSD

xiv

R T

RITC TA Regional Instantaneous Toggle Constraint, 40 Timing Analysis, 122 RM TAP Reference Methodology, 56 Test Access Port, 87 RTL TC Register Transfer Level, 45 Toggle Count, 40 RVT TDR Regular Voltage Threshould, 126 Test Data Register, 86 TM S Test Mode, 21 SA TPG Stuck at, 84 Test Pattern Generator, 22 SCAP TPI Switching Cycle Average Power, 40 Test Point Insertion, 20 SDC U Synopsys Design Constraint, 94 SDF UCLI Standard Delary Format, 63 Unified Command Line Interface, 132 SE UDFM Scan Enable, 21, 25 User Defined Fault Models, 15 SI Scan Input, 20 V SO VCD Scan Output, 20 Value Change Dump, 91 SOC VCS System on Chip, 4, 46 Verilog Compiler Simulator, 115 SRAM VLSI Static Random Access Memory, 62 Very Large Scale Integration, 18 STA VPD Static Timing Analysis, 122 Value Change Dump Plus, 91 STIL Standard Test Interface Language, 81 W SVT WGL Standard Voltage Threshould, 126 Waveform Generation Language, 81 WSA Weighted Switching Activity, 36, 40

xv

xvi

1 Scan testing: introduction Introduction Integrated circuits are found in every modern electrical device that we see around us, in cars, television sets, music players, cellular phones, smart homes and in an exponentially growing number of “things” (internet of things or IOT).

As with any other type of manufactured product, quality checking is part of its production phase life cycle, and test techniques have a history as old as the Integrated Circuit (or IC) itself. In 1958 Jack Kilby of Texas Instruments had the idea to make all the components and the chip out of the same block (monolith) of semiconductor material. In September of the same year, he had his first IC built and tested to verify that it was working as expected [1].

Testing, therefore, has been an essential element in the manufacture of integrated circuits since they were first developed. In fact, as transistor counts have increased, and power considerations have become more important, testing has become a significant cost factor during IC manufacture. As a result research into optimising the test process is very important for future IC production. The work described in this thesis is aimed at establishing a reproducible methodology that can allow test research to be carried out in an academic environment, independently of proprietary research carried out by IC manufacturers. Because of the prevalence of scan-based testing in today’s IC production environment, the research described in the thesis is focussed on enabling academic research into scan techniques, particularly those related to power issues during scan testing.

The importance of testing

Semiconductor manufacturers would prefer to avoid component testing, but the fact is that the chip manufacturing process is not 100 percent reliable and therefore some number of dies will be defective in every manufacturing lot. To identify which dies contain defects the best approach is to apply component testing to every die to screen out the defective ones.

1

But how much test is enough and at what cost? The answer to this question is dictated by how much a company is willing to spend on design for test features, versus how much it is willing to spend on test equipment versus how much it is willing to spend on the costs of field returns. Field return cost is not just about replacing or refunding the customer for the economic damage caused by the failing IC but it’s also a matter of company image.

As silicon manufacturing companies move towards smaller technology nodes, IC designs and manufacturing are becoming more complex and expensive. The presence of defects is higher and also higher is the risk of shipping a defective unit. Therefore, chipmakers must ensure the quality of the parts before they are shipped. The problem is always the same: have a stringent test methodology to prevent unwanted field returns but without paying a premium for test.

Semiconductor manufacturers strive to ship defect free devices to their customers. Defect rates are measured in defects per million (or DPM). Silicon design houses and foundries are constantly pushing design, layout and fabrication technologies. Low manufacturing yields are often the price paid for building new state of the art designs and manufacturing processes. Testing makes it possible to not only select the good units from the vast number produced, but the data collected during testing is also used by the design and fabrication departments to improve upon their respective technologies and therefore improve yields.

It is obvious that no amount of testing can make a defective circuit any better, however it does guarantee a level of quality of the parts delivered to the end customers which could not otherwise be reached in any other way. Through rigorous testing techniques, the silicon industry has been able to greatly improve the quality and reliability of ICs delivered and it will continue to do so as test tools and techniques keep improving and evolving.

Some of today’s markets such as the automotive industry are demanding DPM below the 100 mark and pushing down to 10 defective parts per million. The rest of the industry, like PCs and other consumer devices, would traditionally have a significantly higher DPM number (around 500 DPM) but due to high volume of units produced, these customers are now also requesting silicon supplier to push down DPMs close to automotive-quality levels [2]. To reach these DPM values requires large investments on test equipment, design for test and test coverage figures and cost is always going to be a pivoting factor.

2

Screening defects at an early stage is not always possible but silicon design houses are putting major efforts to avoid unnecessary defects to be caused by design and modelling. With the technology nodes already reaching the single digit nanometres size, traditional fault models are starting to lose their efficiency and new ones are required in order to be able to generate test vector capable of maintaining DPM numbers as low as they ever been.

Scope of this research

Design flows are the explicit combination of electronic design automation (or EDA) tools to accomplish the design of an integrated circuit. The scope of this research was to provide an entire design flow (Figure 1) to process an electronic design through the necessary steps to generate scan test power data. Taking an open-source design (OpenPiton), it applies synthesis and scan insertion steps using commercial electronic design automation tools and current scan test techniques; it then describes how to use the newly created scan inserted design model (netlist) to generate scan patterns through Automated Test Pattern Generation (ATPG) tools. It then continues with the description of Gate Level Simulation (GLS) setup and debug steps utilizing the outputs of the ATPG phase. The final step gives a description of how to use GLS to produce an output file that can be utilized for scan test power analysis. This flow is shown in Figure 1.

Figure 1 Project flow implementation

3

Motivation for this work

Because of Intellectual Property (IP) restrictions, it is very hard to find readily available data for research purposes. This issue forces any research work on scan testing to go through a lengthy flow in order to generate the data needed. Because of its complexity, a lot of attention needs to be spent in setting up the flow rather than producing the data, analysing it and devising solutions to the scan power related problems. With this work, the issue of setting up an entire flow to generate scan power data is addressed, therefore the primary issue of limited scan power data availability is also addressed.

The primary focus of this work was to supply all the necessary information to execute a design for testability (or DFT) flow on a design in order to be able to collect data for scan power analysis. The flow includes a brief description of synthesis and scan insertion, then, after detailing ATPG setup and pattern generation which is at the core of this work, it moves on to give guidelines for gate level simulation (GLS) setup and debug before concluding with the steps needed to run a basic power analysis on the scan pattern generated.

ATPG is at the core of this work because power issues are directly related to the settings that have been used for pattern generation. Scan test architecture is also important but no matter how cleverly and carefully it is implemented, if patterns are not produced with the correct setting they will without a doubt cause power related fails during test.

The goal was to eliminate the burden of learning to setup the flow, a process which normally involves becoming familiar with a large number of manuals and online resources and considerable development time, allowing academic researchers to reach the phase of data collection in a much more straightforward way. Scan insertion, ATPG pattern generation and gate level simulation essential steps are first described in a general way and then applied to one of the open-source blocks of the OpenPiton project. The OpenPiton was selected as the source design because it provides the opportunity to synthesise multicore designs to mimic today’s large commercial SOC (system on chips) which are the most prone to scan power issues due to the number of scan flops present.

4

As a result of this work, readers should be able to set up an entire flow and reach a stage of data generation and collection for scan power analysis in a matter of weeks, which gives the opportunity to spend a lot more time on processing data rather than producing it.

Difficulty of dealing with IP and proprietary data

The market requirement for better, faster, cheaper and more reliable products is putting every business aspect of the electronic world under pressure. Every business enterprise dealing with electronics keeps going through accelerated changes to hold at bay fearsome competition. To aggravate that, there is the shortening life span of new products and the request for more complex ones which necessitate large investments from all the players in the electronics industry.

Due to the complexity of planning, designing, executing and manufacturing an IC, the high number of man hours to execute the job, the expensive software and hardware needed to bring a design to a manufacturing phase, great attention is paid to keep the work created secure and undisclosed.

Intellectual Property (IP) is defined as any "original creative work manifested in a tangible form that can be legally protected" [3]. The term ‘IP rights’ refers to controlling the way IP is used, accessed or distributed [4]. In today's electronic world, an organization's IP ‘intellectual property’ is definitely its biggest asset. Electronic systems manufacturers who most successfully hold their own in-house designs for reuse, possess a clear advantage over their competition. The existing designs (as well as their design flows, procedures, techniques and data collection) are so valuable that they have been declared as IPs.

With the arrival of the digital age, it has become much harder to remain in control of IP and it is no longer just about protecting the interest of preserving a trade secret but more importantly it‘s about the interest of preserving one's company monetary gain and future existence.

The emergence of modern automated computer aided design (or CAD) and EDA software systems has not only helped improve designer productivity across the IC industry adding speed and efficiency, but also provides much easier access to large amounts of data and IP.

5

The reuse of previous successful IPs and components in IC design is very common. It allows engineers to speed up the creation of brand new ICs with known reliable elements and also to alleviate the pressure of delivering new systems in an environment which faces tight time-to- market windows. To meet their development schedules, designers often have no other choice other than reuse their own or third party IP.

It is not surprising, therefore, that for any individual seeking data and information for a study or research, finding, identifying and being granted access to the IPs is a challenging and in most cases impossible task. Limitations on access to design IPs, standard cell libraries from foundries with asSOCiated data collection from years of manufacturing and testing, as well as information on how the EDA tools operates means that it is extremely difficult for academic researchers to analyse and to study any of the numerous aspects of IC design, manufacturing and testing. The only two choices available are to join a big silicon manufacturing company to get access to unlimited data but with the inability of making any research public or try to utilize open-source design, data and information to accomplish the same study. The easiest way to proceed is to identify a suitable open-source design that can be used for the research purpose. Obtaining data for sharing is as hard as obtaining access to IP, hence the only option is to generate the data from the available designs and this will without any doubt require a lot of processing steps and time.

1.5.1 Why the need for an open-source and a mature project

The use of an open-source design has been an obvious choice for this research for the obvious reasons mentioned in 1.5. After a lengthy search through the hundreds of available online sources, a design was identified and selected: this is the OpenPiton [5] an open-source research processor based on the Princeton Piton processor which was designed and taped-out in March 2015 by the Princeton Parallel Group. It provided the starting point for this study and includes synthesis and back-end flows for application-specific integrated circuit (or ASIC) and field programmable gate array (or FPGA) but did not include a standard cell library. In fact, identifying a suitable standard cell library for the synthesis step of this research was also a difficult and time-consuming task.

The standard cell library selected was the NanGate 15nm Open Cell Library which is a generic open-source, standard-cell library provided for the purposes of researching, testing, and

6

exploring EDA flows but is purposely non-manufacturable [6]. This integration of the Nangate standard cell libraries into the OpenPiton flow provides a platform for enabling research into scan power issues.

Outcomes, publication and conclusions

The research for this thesis began with an investigation into the issues relating to scan-based testing of integrated circuits, particularly those arising from power constraints and high transistor counts. This was summarised in a conference paper presented at the Irish Signal and Systems Conference:

I. Indino and C. MacNamee, "DFT: Scan testing issues and current research," 25th IET Irish Signals & Systems Conference 2014 and 2014 China-Ireland International Conference on Information and Communications Technologies (ISSC 2014/CIICT 2014), Limerick, 2014, pp. 227-232. doi: 10.1049/cp.2014.0690

Furthermore, this work achieved the goal of presenting a full DFT flow that can be used to process a design from scan insertion to power analysis. With some adjustments, the flow can be adjusted to suit any open source design, although the open source design selected and used for this work is very stable and scalable, therefore should be well suitable for a very large number of research purposes with regard to scan test.

Starting from a bare design, synthesis and scan insertion was setup and performed and able to output a netlist file usable for pattern generation through the use of an ATPG tool. The open- source project chosen comes with a well proven flow. The flow was missing the scan insertion steps and this work provided all the necessary steps to implement scan insertion. The completion of this work and the presentation of this report will allow other researchers to generate scan test data that can be used in further studies related to scan test and scan test power issues. The ease of applying few changes to the scan configuration during scan insertion will allow exploration of many different configurations. The flow can also be ported to different technology nodes with very little changes thereby expanding the study to new cell aware cell models.

7

The final goal of this work was to provide the means to generated scan test power data and it was successfully completed.

Organization of the thesis

This thesis is divided into 4 main sections A. Chapter 2 gives a simple introduction and overview of scan testing starting from a brief history section and moving to the most used test techniques used in the past and current days. It provides some details on timing and power issues encountered during scan test and the schemes used during scan pattern generation and during design phase to mitigate such issues. It concludes providing some information on aspects of scan that could be investigated to gather data in order to implement the analysis on scan power. B. Chapter 3 provides detailed information on the need of using an open source design that is currently available as open-source to research fellows and describes the open-source design selected for this research, as well as the benefits introduced when using the selected source (OpenPiton project). The second part of this chapter gives an overview of the synthesis flow included in the OpenPiton project and a detailed description on how to modify the original flow in order to be able to implement scan insertion and produce a netlist needed for scan test pattern generation. The chapter includes the steps implemented on the test case utilized for this research. C. Chapter 4 is a very important chapter as it gives an insight into pattern generation starting from the introduction of fault models used for ATPG; it discusses the basic components needed in an ATPG setup for pattern generation, as well as how to generate scan patterns and GLS test-benches required by simulation. ATPG settings are as important the synthesis and scan insertion steps, as they will dictate power requirements during scan. Any other step past the ATPG phase is only to validate the working design and does not affect power of functional goals. The end of the chapter concludes with an overview of the use of ATPG tools for debug purposes utilizing one-hot patterns and ATPG scan diagnose. A small section of the chapter is dedicated to the implementation of the ATPG environment and pattern generation on the OpenPiton test case. D. Chapter 5 is all dedicated to simulation, from GLS to power analysis simulation. This chapter gives a wide description of timing analysis and debug techniques used for debugging a design timing issues. A section of the chapter is dedicated to the

8

implementation of GLS to the test case and the steps required to generate the necessary inputs for power analysis tools. Chapter 1 provided the introduction to the whole work, while chapter 6 gives an overall view of the work completed here as well as some insight into potential future work that could be completed with the use of the flow presented in this project.

9

10

2 IC Test, DFT techniques, scan testing and current research

Introduction This chapter describes the role of testing in the manufacture of integrated circuits (ICs). In particular, it explains the problems associated with testing integrated circuits, and explains the reason why structural testing is used and why scan testing is today the most important means of testing ICs. Scan testing today faces a number of challenges associated with smaller design rules, increased transistor count and more stringent power constraints. The need for research into overcoming power limitations on scan testing is a motivating factor for this thesis. The chapter is organized as follows: it starts by giving a brief description of functional and scan testing as well as other test techniques that replaced functional testing over the years; it introduces power issues during scan testing, as well as scan architecture and techniques; the second part of the chapter describes the current reseach with respect to timing and power issues and the schemes used to mitigate these issues.

From functional test to scan testing

When the IC was first invented and for many years after that, the main method of validation was functional test which relies on exercising the design in its normal operational mode. That was possible and easy to apply as design size and complexity were low. As design size and complexity started to grow, however, it became impossible to generate enough functional test vectors to cover the entire design as well as to reach every part of it. Increasing numbers of sequential logic circuits such as flip flops and latches within ICs further complicated functional test. It could take many tens of thousands of clock cycles to propagate data at the IC's input through the sequential logic. As a result, it became almost impossible to create a functional test that could be executed in a reasonable time and provide a high level of detection for all possible defects. The impracticality of manually creating a thorough functional test for the device showed the need for change in test methods.

At first ad-hoc techniques were applied to resolve this problem; these introduced access points to control and observe the design behaviours in parts of the circuit difficult to reach from the

11

design (functional) input/output channels and then test moved onto using more global techniques when structural test was introduced.

From the 1960s through the 1980s [7] (Figure 2-1), gate delay and the cost of silicon area associated with gates conditioned any decision taken in terms of test solutions. This is when scan began and also when scan tests became the primary solution to test digital ICs. Scan test is a systematic way of performing semiconductor test based on the structure of the device rather than its functionality.

Scan chains were introduced by making use of added in the data path of flip-flops. Gate delay was considered excessive at this stage, hence adding a large number of instances only for test purpose was not always an acceptable solution, which impeded the widespread adoption of scan testing. When semiconductor technology moved toward smaller geometries in the 1990s, the reduction in gate delay and the shift in delay from the gate to the net had a significant impact on test. With transistor cost going down and gate delay no longer significant, test solutions that were unacceptable in the previous decade, became viable, thereby reducing the number of test patterns required to validate silicon.

Figure 2-1 Timeline showing test technologies for test cost reduction [7]

During this time, while test was evolving to increase test quality and reduce defects, the cost of test per transistor stayed relatively constant while the cost of manufacturing transistors kept decreasing allowing the introduction of more complex test infrastructures. As the projected cost of testing a transistor was approximately equal to its manufacturing cost, industry tried to improve the scenario by moving onto low-cost test equipment and this facilitated the creation of structural testers.

12

The introduction of scan allowed designers to automatically generate test patterns. At the beginning the single stuck-at fault model was good enough to structurally verify the correctness of transistor to transistor logic. Then the focus moved to increase pattern efficiency and therefore reduce patterns volume in order to keep test application time under control. This is still the main focus in today’s test cost reduction and the main reason why in the late 90’s scan compression made its appearance in the world of IC testing.

Test pattern generation is performed one fault at a time. The process of collecting test patterns to accumulate a set of patterns that can be simulated in parallel (across all channels) is called static compaction. After a pattern is generated for a fault (primary fault), the pattern generation tools will try to detect many more faults (secondary faults) with the same pattern before storing it. This process is called dynamic compaction.

The introduction of scan compression technology proved to be the factor that addressed the problem of the rising cost of test as it allowed test data volume reduction which contributes to keeping the test time under control. As a result, the cost of test was contained for many years but it is now reaching its limitation as transistor dimensions continue to shrink and new factors, described below, condition the validity of traditional fault models.

From 1998 to the present day, improvements in scan-compression have had a major impact on IC testing. For many years the main focus has been and still is test cost containment. In the beginning, cost was attributed to the extra engineering hours and silicon area overhead for the addition of scan related features to the design, later the cost shifted to test time because of the increase of complexity and cost of testing equipment. Currently test time remains an issue and is joined by the need for keeping defect number low to improve product yield.

Problems introduced by new technology nodes: from stuck at and at speed to a cell aware fault model

During the testing process, attempts are made to screen out circuits containing defects. A defect is defined as a physical flaw introduced into the circuit during the manufacturing process. It may cause a catastrophic failure or it may just bring a reliability risk, waiting to make the unit fail at some future point in time. Defects can be caused by random imperfections, process

13

variations, chemical impurities, fabrication equipment, mask imperfections and other variables and it is not an easy task to modify manufacturing flows and processes to remove all flaws.

Defects are the actual problems or production issues that cause an IC not to function properly. Faults are models that try to represent defects with simple properties that correlate to defects and are easy to use by EDA tools. Engineers try to pack as much information and data as possible into models but at the same time they are kept as simple as possible because models that are too complex would cause simulation execution time to grow exponentially.

The most basic models used in IC designs are the standard cell libraries. This is a collection of low-level electronic logic functions such as “and”, “or”, inverters, flip-flops, latches, and buffers. Library databases consists of a number of views often including layout, schematic, symbol, abstract, and other logical or simulation views. From this, various types of information are captured in a number of formats which contain reduced data about the cells specifically needed for each step in the design process (e.g. scan insertion tools, pattern generation tools etc.). These cells are physically realized as fixed-height and variable-width. The fixed height is the key aspect of the libraries, which enables them to be placed in rows making it easier to generate a full design layout using automated digital layout tools [8].

DFT techniques are used to improve the controllability and observability of internal nodes, so that embedded functions can be tested. Two basic properties determine the testability of a node: controllability, which is a measure of the difficulty of setting internal circuit nodes to 0 or 1 by assigning values to primary inputs (PIs), and observability, which is a measure of the difficulty of propagating a node’s value to a primary output.

Fault detection is the obvious primary goal in DFT. The usual methods involve adding scan- test structures to the design, then applying fault models to pattern-generation tools that represent issues such as stuck-at and transition defects. For decades, from the introduction of scan testing, the fault model used was primarily based on a stuck-at fault model that considers the possible stuck-at-0 and stuck-at-1 scenario at every gate terminal. Somewhere between the 130nm and 90nm process technologies era, new timing-related defects demanded different models for at-speed (or functional speed) tests, and this created the need for transition patterns which targets and detects the timing related defects [9]. The stuck-at and transition are considered to be the traditional models to be employed in pattern generation and fault detection. 14

Requests for much lower DPM numbers and the escape of defects (not targeted by patterns generated with current fault models) in today’s much smaller technology nodes are calling for drastic changes in fault modelling. All of the traditional models have fault site definition at the model’s gate boundary, however they can also detect the majority of defects such as bridges, opens, and even many defects within the gates. With more recent fabrication technologies (45nm to 10nm), however, the volume of defects occurring within cells is significantly increased to roughly 50% of all defects [10] and this has called for adjustments to model definitions to ensure high volumes of "cell-internal" defects are targeted. To achieve this, test engineers have to use the physical design of gate cells to drive targeting of pattern generation. This involves carrying out a library characterization to determine where defects can occur within the physical layout of the cell and how they would affect its operation.

The result of the characterization are user defined fault models (or UDFM) that describe all the cell inputs and responses necessary to detect the characterized defects. Different technology nodes require characterization to produce UDFM for the specific geometry size. As the newly created fault models are within the building blocks of a design, then any design using that technology library will only require the corresponding cell-aware fault model files for cell- aware pattern generation. The concept is always the same for any model definition: a value is stated at the cell inputs and the expected response is defined for any number of cycles. Once an automatic test pattern generation (or ATPG) tool loads the UDFM file, it can target the custom-defined faults.

Production silicon test results using cell-aware UDFM have shown noticeable improvement in DPM if compared to what stuck-at and transition patterns can achieve. Some data showing such improvement was published by AMD (Advanced Micro Devices) which reported test results based on applying cell-aware patterns to 600,000 ICs using a 45-nm process [11]. The results showed that cell-aware patterns detected defects in 32 devices that passed the traditional stuck-at and transition patterns, equivalent to 55 DPM improvement, a very significant number for any type of production. Better results were observed when analysing DPM values for a 32- nm process IC using slow-speed and at-speed cell-aware patterns which showed that the ones generated with slow speed cell aware models were more than 3 times more effective than the traditional transition at-speed patterns.

15

Field returns are not a positive statistic for a chip manufacturer but at the same time they offer the opportunity to improve screening methods (test) and to improve manufacturing processes. Using field returns for process and test evolution is paramount. Field returns are devices that passed all production tests and were shipped as fully functional and defect free, but at some stage they failed. This could happen during customer test or in the hands of an end user (hence during functional operation). If a population of field returns is available, foundries and design houses will process them and try to collect as much data as possible as these devices have the potential to show test program gaps, hence the possibility to improve tests and deliver product to customer with lower DPMs with just few “soft” changes to the flow.

Field returns can be used to find the value of additional tests and the opportunity to verify new processes, test methods or fault models. For example, a population of 300 field returns from a production of 100,000 parts at which to apply new scan test vectors generated with cell-aware ATPG models would allow to detect a number of faulty devices (i.e. 50); these would imply that moving from traditional stuck-at and transition fault models to cell aware could improve DPM (by 500). Hard data such as this is often needed to convince managerial departments to authorize the high investments needed for the application of new cell-aware ATPG models as part of high volume production test. [9]

It’s important to note that new fault models are bringing improvement to silicon defect detection but are also increasing pattern count hence test time. In relation to scan power the scenario is unchanged; fault models don’t seem to be affecting scan power which appears to be depending on pattern type and scan configuration implemented. [9]

Power consumption during scan testing An unexpected consequence of technology scaling (and the resulting density growth of transistors in IC design (Figure 2-2)) has been the increased power consumption in a chip during functional mode.

16

Figure 2-2 CPU scaling showing transistor density, power consumption, and efficiency [12]

Technology scaling has been accompanied by a linear reduction of supply voltage for the devices, but the exponential density increase of transistors allowed power density to continue its rapid ascent to levels that have created two new by-products:  Heat dissipation issues: the increase of power density increases the need for better heat dissipation to ensure that any given device is operated within its defined temperature limits. When a device is running, it consumes electrical energy that is transformed into heat. Most of the heat is typically generated by switching activity of the transistors. With technology scaling, the decrease in surface area causes reduction of heat dissipation with consequent possibility of hotspots damage, reduced reliability and yield loss. To alleviate these problems techniques such as clock gating, design partitioning, voltage shut-off and more are used during functional operating mode.  Power supply problems: the increased transistor density brought an inevitable increase in current density which demands greater metal area dedicated to a power distribution network (PDN). With transistor number increasing, transistors connection density has also increased which also requires more metal connections for their signal routing. Compromises have to be reached during design to satisfy both demands. Because of this, in some cases power delivery and power grid constraints have brought some limitations to the final product.

17

These issues are magnified during the application of test techniques such as scan testing, extensively used because it is capable of reducing test time and keeping test cost within a reasonable limit. Scan design is currently the most popular structured DFT method. Many techniques are used to alleviate the problems mentioned above. Some of the techniques are specifically designed to be used during scan testing as this type of testing is based on the structure of the device and not on its functionality. Some of the techniques used to prevent heat and power issues during functional operation cannot be applied to testing because of incompatibility or because they are in conflict with 2 main requirements of scan testing: controllability and observability.

DFT techniques were in fact created and proposed to improve the controllability and observability of internal nodes of a design, problem that became a showstopper when IC integration started to move toward the VLSI era. Controllability reflects the difficulty of setting a signal line to a required logic value from primary inputs and observability reflects the difficulty of propagating the logic value of the signal line to primary outputs.

This chapter outlines the phases involved in scan-based testing and examines some of the techniques used to reduce test losses in each phase. It is seen that power and timing issues affect both main phases of scan testing (during shift and during launch and capture) and this is a complicating factor for researchers in the area as technology scaling keeps its high pace.

Design for testability techniques When small scale digital systems were common, exhaustive test techniques were used to verify the functionality and reliability of integrated circuits. The system was tested over its full range of operating functions and conditions. When larger scale systems were introduced, it was no longer possible to use exhaustive testing because test time was stretching to impractical levels and consequently test cost was starting to become a dominant factor in the final cost of a product.

The concept of DFT was therefore introduced. Industries and researchers started to look at testing during the design stage of a product and created alternative ways of testing that would not rely on exhaustive testing and at the same time would ensure that the system could be properly tested and a working device delivered to the end customer.

18

DFT introduced extra costs to a design, the major contributing factors being:  addition of non-functional external pins, area on the die used specifically for architecture components needed for testing  performance cost (i.e. reduced clock speed reached by a processor due to extra path length or extra devices a signal needs to pass through)  extra design time to insert test structures in the original design and verify that the design’s functionality has not being modified  test time

Despite these cost factors, overall test cost was kept low and design for testability techniques has become an essential part of any design because the cost of testing and most of all the testability of large design was becoming impossible.

One of the main points of interest was to find a solution to control and monitor portions of a circuit that were not directly accessible from the external pins of the device. Before introducing any testing method, testability analysis is often used in order to measure the testability of a device. The testability of combinational or sequential logic decreases as the level of the logic increases due to the increased complexity of the devices. This requires calculating the controllability and observability of every the signal line in a device. The most common DFT methods are the following:  Ad-hoc methods  Scan testing (full and partial scan testing)  Scan-based logic BIST (Built-In Self-Test)

2.5.1 Ad-hoc techniques

Some of the first techniques were based on methods developed for a particular design, the so called “ad hoc methods” (DFT techniques introduced in the 1970s [13]) which targeted only those portions of the circuit that were difficult to test. These techniques required making local modifications to a circuit to improve testability. While ad hoc techniques bring some testability improvement, these effects were not global and the technique could not be adopted for other designs without further modifications

19

Ad-hoc techniques required the use of a set of design practices and guidelines to improve testability such as:  Insert test points  Avoid asynchronous set/reset for storage elements and asynchronous logic  Avoid combinational feedback loops  Partition a large circuit into small blocks The test point insertion (or TPI) is the most used ad-hoc technique for improving the controllability and observability of internal nodes. The concepts of observation and control points insertion have been further developed and combined into scan insertion points which are used for scan testing techniques and the other guidelines were incorporated into the set of design guidelines used when implementing scan test. The use of ad hoc methods did improve the testability of circuits but reaching levels of fault coverage above 90%, for large designs, requires adoption of better performing test techniques such as scan insertion.

Figure 2-3 D-type Flip-Flop vs Mux-D Scan Cell [14]

2.5.2 Scan testing

In order to perform scan testing, a design needs to go through a radical modification. This is currently achieved automatically (with the use of highly sophisticated software tools) and it is performed by replacing each of the storage elements in a design with a scan cell. If all the storage elements are selected for substitution, the design is a full scan design otherwise it is said to be a partial scan design. Each scan cell has one additional scan input (SI) port, a scan enable port (SE for test control) and one shared scan output (SO) port. Figure 2-3 shows the comparison between a simple D-type Flip-Flop and a Mux-D scan cell which is composed by a D-type Flip-Flop and a Mux (); the scan enable is the control signal for the Mux. The scan cells are connected to form a shift register by connecting the SO port of one scan cell to the SI port of the next scan cell, creating a scan chain. The scan chain is connected to a primary input pin of the device through the SI port of the first scan cell and it is also connected

20

to a primary output pin of the device through the SO port of the last scan cell and therefore they are provided with external input and output pin.

After the original design has been subjected to the scan-insertion, the new design can operate in three modes:  Normal mode: in this mode of operation, a control signal usually called test mode (TM) is used to control and turn off all the test related functions of the design, (including the SE signal feeding the mux of all the scan cells). The scan inserted design operates in the original functional configuration.  Shift mode: for this mode of operation (as well as during the capture mode), the TM signal is set (logic 1) in order to turn on all test-related functions required by scan design rules. During the shift operation, the SE (scan enable) signal is set and therefore the scan chain can be used to shift in a test pattern (accessed using one of the device’s primary inputs). The scan out operation of the test response is shifted out concurrently with the patterns shift in operation (apart from the first pattern shifted in and last test response shifted out).  Capture mode: once the test pattern is shifted into the scan chain, it needs to be applied to the combinational logic. For this to happen, the circuit is configured in capture mode (equivalent to the functional mode for the design), by temporarily setting the scan enable to 0. After that, the response of the combinational logic generated by a test pattern is captured in the scan cells. The scan chain is then configured in scan mode again to shift out the response captured by the scan cells for comparison to the expected vector value.

Scan testing differs from functional testing in that the purpose of the scan patterns is to detect defects and not to verify device specification. The test patterns are automatically generated by ATPG tools and applied to the circuitry thanks to the ease of access of almost all of the sequential cells in the circuitry which are now part of the scan chains. As technology for design and manufacturing integrated systems keeps evolving quite rapidly with node technology scaling down (physical dimension of the devices on the wafer) and design complexity scaling up (designs composed by multimillion-gates), the industry and researchers have more and new problems to deal with.

21

These issues vary from excessive data volume for testing a device which may exceed tester’s memory, insufficient fault coverage levels for testing deep sub nanometre designs, or problems related to the actual structural nature of scan testing which may cause localised power picks that may invalidate a test or worse cause a fault free devices to be erroneously. This last problem is one of great concern for the industry because a fault caused by a timing or power issue during scan testing may never actually occur during the functional mode of operation.

2.5.3 BIST (built in self-test)

Logic built-in self-test is a DFT technique in which the only purpose of a portion of a circuit on the chip is testing the digital logic circuit itself.

BIST is primarily used to reduce test problems that are encountered with scan testing alone, or in the past to reduce testing time required by exhaustive test by applying BIST in hard to reach portions of the circuit under test (CUT). This is achieved by embedding circuits that generate test patterns and analyse the output responses of the CUT itself. A block diagram is shown in Figure 2-4, BIST architecture includes a test pattern generator (TPG) which is used to apply internally generated test patterns to the CUT and an output response analyser (ORA) required for analysing the output responses of the portion of logic tested. The pass/fail signal is then shifted out.

Figure 2-4 BIST architecture block diagram [15]

BIST architectures are classified into two categories:  Test per-scan: this architecture takes advantage of inserted scan chains of the scan design and applies a test pattern to the CUT after a shift operation has been completed; hence, the hardware overhead is low.

22

 Test per-clock: in this case a test pattern is applied to the CUT and its test response is captured for every system clock cycle; the scheme can therefore execute tests much faster than the test per-scan technique at the expense of more hardware overhead.

Automatic test equipment (or ATE) accuracy, spiralling clock rates, and other similar phenomena will eventually drive the industry towards BIST or similar techniques.

Scan test architectures and application

Scan architectures

There are three fundamental architectures for scan testing: 1. Muxed-D scan design: any storage element is converted into a muxed-D scan cell which (as seen before) uses a scan enable signal SE to select data input or scan input from the previous scan cell output in the chain (Figure 2-5a). 2. Clocked-scan design: the storage elements are converted into clocked-scan cells, a sort of D-type flip flop with two clocks, a data clock which select the data input when set and a scan clock which selects the scan input signal when set (Figure 2-5b). 3. Level-sensitive scan design (LSSD): in this case, the storage elements are converted into LSSD shift register latches which use two non-overlapping clocks C and B to select the data input and two non-overlapping scan clocks A and B to select the scan input (see Figure 2-5.c).

Figure 2-5 Scan cell designs including muxed-D scan cell (a), clocked-scan cell (b), and an LSSD scan cell (c) [15]

23

2.6.1 Scan test techniques

When stuck-at faults were the main fault model targeted for scan test, it was only necessary to enable capture mode for one clock cycle thereby capturing the response from a single test vector. However, testing for correct timing behaviour requires more sophisticated test techniques, and today the two most popular scan test techniques are launch-on-shift test (LOS, also known as skewed load or launch-off-shift) and launch-on-capture test (LOC also known as broadside test or launch-off-capture).

These types of test techniques are essential for transition test which is used to ensure the correct timing behaviour of logic gates within the manufactured device. Transition faults model the delay of a signal entering the input of a and exiting the output of it. A transition fault test requires a pair of vectors in order to detect the faults. It is necessary to cause a transition on a node and therefore two vectors are used, the first one is used to initialize the state of the node and the second vector launches a transition of logic values (0 to 1 or 1 to 0). The response from the targeted fault site is captured by a scan flip-flop and the scan chain will provide the way to shift the response out to a primary output for comparison. The launch-on-shift (LOS) scan technique, launches a transition of a logic value by the last clock pulse of the scan shift operation followed by a system clock pulse (at functional speed frequency) that captures the transition. Figure 2-6 shows the SE (scan enabled) control signal and the clock waveform. While scanning in a test vector, the clock runs at a much lower speed than functional speed of operation. Only the timing between launch and capture is equivalent to functional speed (system clock) in order to verify correct timing operation of the transition in the nodes.

Figure 2-6 Waveform for LOS scan test technique [16]

24

Once the nodes are initialized at the last shift clock, in order to make the transitions and capture the response, the scan enable signal must go from 1 to 0 and the capture clock is applied while SE is at logic 0. Because scan enable signal changes between launch and capture clock, which is applied at system speed, scan enable signal must also be designed to operate at system speed. The time period between the launch clock pulse and the capture clock pulses determines the test application frequency (which should be at system clock speed). The scan-enable signal must fully transition from logic 1 to 0 during the launch and capture clock cycle and placing strict timing requirements on the design and routing of scan-enable signal.

The scan enable signal (SE) must be able to turn off very quickly after the last shift clock to let the logic values in the nodes settle before the capture clock occurs. As systems frequencies of operations are continuing to rise, the scan enable signal usually needs to be routed as a to accomplish this. In some designs, the design frequency required for the test may also require the scan chains to shift at system frequencies. Most scan chain shifting is done at lower frequencies to try to avoid possible deterioration of the test vectors shifted in the chain. A possible change in value of any of the bits of the test vector would invalidate the test.

The main advantage of this launch-off-shift approach is that it only requires the ATPG tool to create combinational patterns, which are quicker and easier to generate. Different methods have been developed to mitigate the timing requirements on the scan-enable signal of LOS testing but often these methods are not adequate for the application of LOS. If that is the case LOC has to be applied at the expense of bigger test pattern sets. Although LOS is capable of launching more transitions compared to LOC (which is one of the reasons why LOS can deliver a higher coverage), power dissipation of LOS in the launch cycle is higher than that in LOC [17] and this is a major reason why LOC is more common in scan testing.

During launch-on-capture (LOC) transition test, both launch and capture of a transition occurs when the SE is at logic 0. The launch and capture transition is still performed at system clock speed as for LOS. However, unlike LOS test, the last shift of LOC is not applied at fast system clock speed. The important point to note here is that the launch and capture are performed with the scan-enable signal set to functional mode. As the entire scan data shifting can be done at slow speeds in test mode, this main advantage of the LOC technique is that it does not require scan chains to shift at-speed or the SE signal to

25

perform as a high-speed clock and this is much simpler to implement from a design point of view.

Because of the above advantages, most industries have adopted this approach for transition test patterns. The biggest disadvantage of LOC technique is that the ATPG needs to generate sequential patterns, which can increase the test pattern generation time and certainly will result in a higher pattern count. Therefore LOC test suffers from large test set size and low fault coverage compared to LOS because launch and capture clock are applied while scan enable is low. Most commercial ATPG tools support both LOS and LOC transition delay test. Other hybrid techniques in terms of timing sets and hardware are used in industry.

In any of the techniques used, the first clock pulse launches a logic transition value along a path, and the second captures the response at a specified time determined by the system clock speed. If the captured value was not detected at the capture point in time, the path fails the test and is considered to have a defect (each pin is tested for slow-to-rise or slow-to-fall transition behaviour; a 0 to 1 transition is applied to test a "slow-to-rise" defect and a 1 to 0 transition is applied to test a “slow to fall” transition). If the response does not match the expected value, than the logic involved was not able to transition correctly during the cycle time, as a consequence the path fails the test and is considered to contain a defect. The transition fault model is more widely used than the alternative path delay model because it tests for at-speed failures at all pins in the design's logic and doesn't require any special user input.

2.6.2 Clock signals for scan testing

The system clock is required to deliver timing for at-speed tests and functional operation. There are two main sources from which such a signal can be obtained: the external ATE (automated test equipment) in the case of scan test and the on-chip clocks (PLLs) for when the device is operating in functional mode. In the past it was common practice to use the clocks generated by the ATE, but this is often no longer possible, because the high system clock frequency of current designs would require much more sophisticated and expensive testers. Tester costs increase as the clocking speeds and accuracy rises. Most IC designs include a phase-locked loop (PLL) and other on-chip clock generating circuitry. Use of these signals provides some

26

advantages over using the ATE clock signals, such as improved accuracy (test timing is more accurate when the test clocks exactly match the functional clock) and cost reduction for the ATE as the high-speed on-chip clocks reduce the ATE requirements and their cost. The shift frequency during scan testing is much slower than the functional operation frequency for most designs. The scan-shift speed may also be limited by the maximum frequency supported by the ATE being used.

Figure 2-7 Waveform for LOC scan test technique [16]

As a result, two different waveforms (or time-sets), one to enable scan-shift and the other to perform the at-speed capture are employed to the same clock pin while the device is being tested. The three main ways of implementing this are:  For LOC using a late-late-early waveform (Figure 2-7): two time-set waveforms are used: one for scan-shift and launch, the second one for capture. It can be seen that scan- in and launch are performed using one time-set labelled “Late”, with the rising edge of the clock occurring towards the end of the cycle. For the capture operation the time-set is different as the rising edge of the clock occurs at the early half of the cycle (time-set labelled “Early”). The time interval between the launch and capture is altered such that it corresponds to the functional operating frequency of the device in order to enable at- speed testing.  For LOC using early-late-early waveform (Figure 2-9): an “Early” time-set is used for shifting in the data. This could allow a wider shift pulse which is helpful when slower clock trees are present in a design. After the scan-in phase, another pulse with a rising edge late in the cycle is applied for the launch operation, followed by an “Early” pulse (rising edge of pulse early in the cycle) for the capture pulse. As in the Late-Late-

27

Early approach, the timing between the launch and capture pulses is adjusted to mirror the functional operating frequency.

Figure 2-8 LOC using late-late-early waveform (late scan in, late launch, early capture) [18]

Figure 2-9 LOC using early-late-early (early scan-in, late launch, early capture) [18]

 For LOC using two different periods (Figure 2-10): two waveforms with different time periods are used, a slow one for shifting in and out and a fast one (at-speed) for launch and capture. This approach offers great flexibility as the scan operation is completely decoupled from the launch and capture. The speed of shifting can be adjusted to a required speed and clock width (useful for slow clock trees).The shift-in stage is followed by a cycle with no clock pulse its time width is the same as for a scan-in cycle. During this period, the scan- enable is reset (logic 0) in order to enable the CUT to enter functional mode. The no pulse cycle allows the scan-enable signal to have enough time to propagate to all the scan flip-

28

flops. The no pulse cycle is then followed by two quick clock pulses (for launch and capture) matching the frequency value of the operational frequency of the CUT.

Figure 2-10 LOC using two different periods [18]

During scan test the majority of test time is spent shifting in and out the test vector and the captured response from the combinational logic; shifting operation are usually performed using the maximum shift speed of the scan chains and the use of 2 different periods allows more hardware flexibility and a bigger range of test equipment can be used for test.

The CUT will often contain false paths and multi-cycle paths that do not operate at functional frequency. These paths cannot be tested simultaneously with the rest of the circuitry and they need to be masked with a consequent reduction of fault coverage. Multiple clock domains are common practice especially for large designs. They can operate at different frequencies as they may be using separate clock sources to get activated in functional mode. When in test mode, the use a single clock signal for all domains can facilitate test.

Timing and power problems during scan testing

Timing and power issues during testing will always be present in integrated circuit design as technology tries to push its boundaries ever further. Current IC designs contain complex embedded and DSP (digital signal processing) cores. Testing such complex ICs is extremely challenging especially when they include custom IP (intellectual propriety) cores with their internal structures not fully visible. Timing issues are getting more complicated to solve because with the advent of smaller node technology, frequency of operation is constantly

29

increasing and as a consequence the tolerance of a signal to be within a well-defined time-set is shrinking to critical levels. Timing models within today’s synthesis tools allow the debug and solution of any timing issue within a device during design or scan test simulation, but execution run time is also increasing and every time a timing issue is identified, re-syntheses for bug removal will add execution time to the whole flow.

Timing faults are now encountered more often during wafer testing as power models in synthesis tools are still young and they are incapable of detecting these faults because the models used don’t exactly replicate a real device’s behaviours. These timing issues do not show during simulation of a design and they are most probably related to power issues, such as IR- drops or localized current surge caused by excessive switching. This latter behaviour is one of the main issues that scan testing faces because devices are not designed to meet scan test requirements. Design engineers are focused on making sure that a design does what its functionality requires it to do. Designs are made to reflect customer needs, with very little time and effort dedicated to the testing functions; they are not designed to deal with the excessive switching activity present in the very short testing period that a device has to go through relative to its entire life span.

Power related problems are becoming increasingly more common due to the gate density of today’s design and the power grids which are made to suit the functional mode of operation of a device. The main issue is that power related problems are not easily detectable or recognizable as such and they most often manifest themselves as common timing problems, which, at the moment, are not preventable by synthesis tools during device design and simulation. In any design, during normal operation, typically a relatively small percentage of the flip-flops change value in each clock cycle. When the device is subjected to testing and the SE (scan enable) signal puts the devices in test mode, the scan test cubes will cause a much larger percentage of the flip-flops to toggle in each clock cycle resulting in excessive switching activity in the circuit. Test power is several times higher than functional power [15].

ATPG tools include multiple algorithms to try to produce efficient patterns that can have high fault coverage but they also aim at producing patterns that do not cause excessive switching activity during shift in operation. When a large number of flip-flops switch simultaneously at a single clock pulse it may cause a large current spike. A chip’s power network distribution system is designed for handling typical peak power that could occur during functional operation 30

mode and is not suited to handle large power peaks that occur during scan testing. If the peak power during test exceeds design power parameters, then there will be a Vdd-drop/ground- bounce that may cause problems (such as PLL (phase-lock loop) to malfunction or memory elements to lose their state) as well as the generation of hot spots that may damage the device and increase yield loss. Monitoring local switching activity is also important because of the possibility of having excessive power dissipation in certain area of the circuits even when the total power dissipation is acceptable. A “peak power violation” occurs when the peak power in a clock cycle during scan testing exceeds a specified limit. This limit is the amount of peak power that can be handled by the devices during function mode of operation without causing a failure.

The average power dissipation during scan testing is much simpler to control. In fact it could be controlled by reducing the scan clock frequency. On the other hand the peak power during scan testing is independent of clock frequency and more difficult to control. Controlling peak power requires ensuring that the peak power dissipation in any single clock cycle does not exceed the capabilities of the device. This is very difficult to do considering the size and high frequency of modern devices.

Within a scan test, the test power necessary to scan in and scan out test vectors is called shift power, and the power required to capture the test responses with an at-speed clock rate is called capture power. From this distinction, the excessive power issue during scan testing can be split into two sub-problems: excessive power during the shift cycles and excessive power during the Launch and Capture phase.

Excessive shift power can lead to high temperatures that can damage the CUT and may reduce the circuit’s reliability (combinational logic is disabled at this stage of testing). Excessive capture power induced by test patterns may lead to large current demand and induced IR-drop that may cause a normal circuit to slow down and possibly fail the test. In some cases, a large percentage of the scan elements need to be at a specified value in order for a specific fault to be detectable. If that fault cannot be detected during scan testing without causing a peak power violation, the only option to eliminate such peak power violations is by removing the test, with a consequent drop in fault coverage. This is necessary in order to preserve the structural integrity of the device which is more important than a single fault detection.

31

There are two main points of concern when applying scan test techniques:  Tests are exciting illegal conditions in the design which would never be possible during normal functional operation, thus resulting in false failures if a fault occurs  The electrical differences during the application of scan based tests create an unnatural situation which causes the test results to become invalid in some cases.

Both causes could unnecessarily increase yield losses. Other factors widen the differences in behaviour of a device during function and testing mode such as DFT elements within a design that are disabled during functional operating mode but enabled during testing mode during which time they also contribute to power dissipation.

Current research with respect to timing and power issues

The timing issues within a design are well taken care of by the synthesis and simulation tools. The scaling down of technology nodes may only cause a more tedious analysis and may require extra iterations for the process. Gate delays get smaller with decreasing feature sizes (since they are proportional to the transistor gate sizes). Interconnect delay is proportional to the interconnect capacitance and the interconnect length, which increases as the design size increases. Models of these characteristics are used by the tools and allow them to simulate transition timing issues and path delay issues. As frequency increases, tolerances get smaller and therefore the level of difficulty increases. Timing issues are regularly encountered during the design process and they can be pinpointed and fixed. Power related problems complicate test development further. They may manifest themselves physically as IR drop/ground bounce, localized current surge, localized power peaks, temperature increase and hotspots. The main problem is that the test relies only on detecting if the response obtained from the CUT matches the expected one for a particular test pattern, therefore it’s only based on checking the correct sequence of bits shifted out from the chains. Taking for granted that all the timing issues were solved during simulation, it becomes natural to think that any other timing issues found during scan testing at wafer level are caused by power related problems. Simulation tools are not yet be able to incorporate reliable power models as different power behaviour occurs at different technology nodes and simulation at the moment is incapable of keeping up with industry pace.

32

The general vital parameters of measurement during testing are fault coverage, test count, test time and test length, all strictly focused on the task of properly testing a product. They do not monitor any of the above physical quantities and therefore it is very hard to establish the causes of a possible faulty response captured from the test. Monitoring these physical entities is not only difficult but could also require extra area on die increasing costs; this may not be admissible for a monitoring function alone. The huge amount of circuitry in modern devices also means that monitoring such entities would be a very complicated task.

2.8.1 Test Power

When referring to power during test, it is common to differentiate between test power required to load or unload test data during shift operation (shift power) and the power utilized by the switching occurring to capture test responses (capture power).

Issues derived from excessive shift power would lead to high temperatures with possible damage of the CUT or reduced device reliability. Issues derived from excessive capture power caused by a test vector might include a large current requirement, with consequent supply voltage drop (IR-drop). As a consequence of the IR-drop/ground bounce, a normal circuit will slow down and eventually will fail a transition fault test or path fault test, with resulting yield loss.

Another area of concern where excessive switching may occur is during the launch cycle (launch power). This could cause an elevated peak supply currents, leading to IR drop that increases the signal propagation delays.

The end-effect of these timing delays caused by the nature of scan testing cannot be differentiated from timing-related defects cause by manufacturing defects. As an example of a manufacturing defect on the PDN (power distribution network), consider an open defect on the power line or power via as shown in Figure 2-11. Such a defect weakens the power network with consequent IR drop that brings changes in delays in the neighbouring nodes, and possibly path delay faults.

33

Figure 2-11 Open via defect on Metal power line [15]

If one of the power vias is broken, the network is still connected, but weakened. The result is that the PDN is not able to supply the power to the cells below it, causing IR-drop and relative timing issues when the current demand for that region becomes too high meaning that functional failures inevitably occur. In the particular case shown in Figure 2-11, an open via defect is present. Due to the defect, region 2 cannot draw current through the via above it. The 2 neighbouring vias will probably supply the power required to the cells in region 2. As a result, there will be an increase of current flowing through the vias in the neighbouring regions. The increase in IR drop during switching caused by the need to draw extra current from farther power vias may result in functional or timing failures, especially when an already large amount of switching activity in any of the neighbouring regions (region 1 or 3) is occurring [15].

Testing simulation is done primarily to identify the functionality of a device but also to verify that the testing procedure is valid and capable of verifying that a device is in fact a good device. This includes identifying the scan vectors that could cause peak power violations during scan testing. As scan testing process is dominated by shift operations, average power mostly depends on shift power, but it is peak power during shift that is hard to control. The peak power in each clock cycle during scan test can be estimated through cycle by cycle simulation of the switching activity that occurs in the circuit. If the peak power in a clock cycle exceeds a specified limit (dependent on the amount of peak power safely handled by the device without causing a failure during normal functional operation) then a “peak power violation” occurs in that clock cycle.

34

Problems during scan capture occur because of the presence of an excessive number of flip- flops transitions during the scan capture stage. Transitions occurring in flip-flops during scan capture are those for which the input value differs from the output value. The number of input/output differences in the test vector needs to be reduced by reassigning input values in order to eliminate scan problems during the capture cycles. A peak power violation can occur during scan capture if a scan vector places the circuit in a state that it would never go into during normal functional operation, and that state caused peak power in excess of what could occur during normal functional operation. Fault detection will follow and this would cause the device to be discarded unnecessarily thereby increasing yield loss. Many methods and strategies are used to try to reduce switching during scan testing. Most are based on test pattern generation and manipulation, but methods based on using the structural elements of the device to mitigate the excessive switching that may cause peak power problems are also used, including clock trees adjustments, chains organization, and timing management for testing different blocks within the same device.

2.8.2 Switching reduction schemes based on pattern generation

In this section some approaches used to reduce switching based on pattern generation are described. These include ATPG-based solutions, which rely on analysis and adjustment of test pattern contents:  Fault coverage migration to high ratio X-bits vectors: It is general knowledge that the patterns that can detect more faults have usually higher switching activity and lower amounts of don’t care bits within the vector (low X-bit ratio). Increasing the X-bit ratio will drastically decrease the number of detectable faults. As a consequence, it would be safer and more efficient to fill X-bits (don’t care bits) in patterns with lower fault coverage (power-safe patterns) and high X-bit ratio in order to increase fault coverage and consequently the switching activity. This would allow keeping untouched the patterns with asSOCiated power risks, or even better reducing fault coverage for those patterns by transferring fault detection to power-stage patterns (increase X-bit ratio). In other words it is desirable to try to find a balance of X-bit ratio within a test set, which would keep fault coverage high while staying within the power constraint dictated by the design. This method was proposed by Yi-Hua Li and Wei-Cheng Lien [19]  Test vectors reordering:

35

Another approach is re-ordering the test vectors to achieve minimum peak power supported per test set. The set of vectors is not modified but only reordered so this does not affect the fault coverage or test time; hence reducing switching activity by increasing correlation between vectors is one approach that is used to reduce peak power. Such a procedure for modifying a given set of scan vectors works for any conventional full-scan design. This approach is not only used for scan testing but also applied to BIST as proposed by Yi Wang [20]  X-fill approach to obtain test vector that mimics functional operation: A random filling of unspecified bits with 0s and 1s may cause test vectors to contain non-functional states, which could cause higher switching activity for test stimuli and test responses during test. Some methods try to generate test vectors that keep switching activity low during capture cycles. A method of this type is proposed by E.K. Moghaddam et al [21]. It uses background states obtained by applying a number of functional clock cycles starting with scan-in states to fill unspecified values in test cubes and keep the switching activity low. The WSA (Weighted Switching Activity) of test vectors generated with this method is expected to be low. As with other x-fill approaches, pattern count inflation is an inevitable side effect.  Other approaches consider pattern generation under clock tree constraints. This method requires that only some clock trees or certain chains are enabled at the same time during testing or that they take into consideration the physical location of the flip-flops in the layout as opposed to their logical location. The ATE can also be used to skew launch operations of a chain while other chains are in their capture cycles. Having launch and capture to occur at different cycles for different scan chains can also alleviate peak power issue and this can be taken into consideration during the ATPG process. [22]

The above methods are based on schemes that try to increase test vector correlation, or bit correlation within the same test vector by using the available don’t care bits or by introducing extra test vectors that may increase correlation without raising fault coverage. A point to be noted is that most of these schemes might mitigate power peaks during testing but will also introduce test time and/or data volume penalties.

2.8.3 Schemes based on structural design elements

36

The following are current methods used to reduce switching employing structural elements of the CUT (DFT-based solutions, which rely on modifications of scan structure):  Scan chain partitioning schemes are often used to try to reduce power peaks violations during both shift and capture cycles. One of these techniques worth mentioning is applicable to both BIST and non-BIST schemes and can be easily used in conjunction with X-filling methodologies to reduce the power consumption in the capture phase. This scheme was proposed by Efi Arvaniti and Yiorgos Tsiatouhas [23]. The concept is to partition the scan chain and apply the scan-in/out operations in each partition separately while the rest of the partitions remain inactive and just hold the data (performing scan operations on a single partition will cause the data in the rest of the partitions to get corrupted unless the corresponding flip-flops are set in a hold mode to retain their data). This method allows the number of signal transitions to be drastically reduced and reducing the power consumption. Test time and the fault coverage are not affected but clock, data or power supply gating techniques are required.  Scan capture problems may be less likely than scan-in and scan-out problems because the number of transitions during scan capture will tend to be similar to what occurs during functional operation if the test is done by keeping the functional structure in mind. This may not be the case if the design is flattened to reduce testing time with testing being done across clock domains and IP blocks. In this case capture peak power may be very difficult to control unless the design is partitioned into functional blocks and test is applied to one partition at a time. Increased testing time is an obvious side effect. Other more generic techniques are:  Clock gating techniques for power reduction during functional mode are quite common. There are 2 main ways of using clock gating to reduces switching: o Preventing memory elements from switching in the combinational logic: this is achieved by deactivating the scan clock during application of patterns that detect faults that are already detected by other patterns (ineffective patterns). o Preventing clock transitions in the leaves of the clock tree that do not need to be active.  An effective technique to save power during scan based test is scan cell clustering. The goal is to group scan cells into a number of segments or parallel scan chains. The

37

clustering tries to increase the likelihood that the scan cells with care bits are in the same segment, allowing unused scan chains to be switched off. As seen, scan testing problems are mitigated using structural elements of the design and most are based on chain partitioning, block sequence management, or clock staggering and management. Also scan cell stitching is used to try to physically separate the cells at the expense of more routing area. Some methods may introduce other complications during ATE setup or may require more sophisticated and expensive equipment and they may never be used. In most cases and for large design a combination of the above techniques would be used.

Figure 2-12 International Technology Roadmap for Semiconductors (ITRS) evolution of production of IC and future targets [24]

Potential tools and methods to assess power violation issues during scan testing

Figure 2-12 shows what challenge researchers are against in testing the latest generations of integrated circuit designs. The “Evolution of MOSFET gate length in production-stage integrated circuits” is shown in red filled circles and the “International Technology Roadmap for Semiconductors (ITRS) targets” continues this trend as shown in open red circles. With

38

gate lengths decreasing, the number of transistors per processor chip has an opposite trend shown by the blue stars curve.

Power dissipation in CMOS circuits is proportional to the number of nodes switching within the device. As the obvious cause of power related issues is the excessive switching, any effort is aimed to reduce switching somehow during any phase of testing (loading, capture, and unloading response). Localized power violation will introduce voltage drops in supply voltage across the gate with a consequent delay for the logic gate itself. A sufficiently large delay could cause one bit of a test response to be invalid causing the part to fail the test.

HFPD (high frequency power drop) occurs when multiple cells drawing current from the same power grid segment suddenly increase their current demand. If the current cannot be provided quickly enough from other parts of the chip, power starvation results in a voltage drop. This can be caused by excessive switching during shift or capture. Once the actual test phase of a product is reached, the only adoptable solution to solve power issue during testing is to disable tests (or test patterns). It is difficult to find the root cause of what exactly is happening when tests simulation did not show any issue. Therefore patterns will have to be disabled or removed from test set with consequent test coverage loss. As a result, they will have to be substituted with functional test or some other method to reach the test coverage required.

The main problem seems to be the difficulty to reproduce a power related fault during simulation. Gate level simulation (or GLS) can only guarantee that a design will functionally work for timing but not power. The power is calculated as an average of the scan vectors used and does not pinpoint where within a design layout the power issue is occurring. As power supply drops seems to be causing most of the issues, it would be possible to monitor and quantify the power supply drop effects using on-chip instruments to precisely monitor the transient circuit state. The on-die droop detection system (ODDD) comprises multiple detectors spread across a design to provide spatial observability. The detectors can be triggered to measure droop over single/multiple clock cycles, or store the minimum/maximum droop and overshoot during a sample window [25]. The use of ODDD can directly relate power drops and test failures more precisely but it does not offer any other function or solution, therefore industry is reluctant to integrate such features in any design for obvious cost reasons.

39

A more economical and feasible way to evaluate IR-drops is by making use of some metrics that can be monitored by ATE or ATPG equipment. These are:  Toggle count (TC): total amount of switching in the whole design for a test cube/pattern  weighted switching activity (WSA) which takes into consideration the weight of a node reflecting its output capacitance  switching cycle average power (SCAP) defined as the average power consumed during the time frame of the entire switching activity  global toggle constraint (GTC) which limits toggle count in a whole circuit throughout the test cycle  global instantaneous toggle constraints (GITC) which limits toggle count in a whole circuit at any time instant  regional instantaneous toggle constraint (RITC) which limits toggle count in each region at any time instant

Most of these metrics have a good correlation with IR-drop and can be used to eliminate patterns suspected of causing high IR-drop, but as a single failure may compromise a test result, the focus is on how to identify local drastic voltage variations at the cell level. Even with all the tools available it is still very difficult to distinguish between a failed test due to manufacturing defect and a failed test due to time delay caused by IR-drop or power related problems as test simulation is not reliable enough to represent real life situations. Extra effort is therefore required to reproduce these later failures during simulation, to create new models that can be integrated in test simulation and this may have to be done by trying to force a failure on a perfectly functional design using the simulation tools. As a solution has not been found, industry at the moment is avoiding the issue by relying more and more on extensive use of BIST, which allows reduction of total test switching activity required by exclusive use of scan testing, or by targeting BIST testing in areas of the design that otherwise would be in a power marginal state during scan testing and therefore in a risky and unpredictable situation.

Summary and conclusions

Years of study and research have come up with different working solutions to improve the testability of a device using scan testing techniques, eliminating or at least mitigating timing

40

and power issues. However the issues are evolving as technology scaling is getting smaller and solutions are harder to find.

Modern applications of Ultra-Low Power devices used for embedded applications in advanced and multitasking battery operated portable electronic makes low power consumption during functional operation a must. This is an obvious requirement for these products because they need to keep the functional power down and extend battery life. Therefore, it becomes more complicated to apply scan test with harsher power constraints. Whenever a new method, algorithm or architecture is developed to solve or mitigate an issue, it is recommended to use standard benchmark circuits for testing because they provide better sharing and comparison of research results.

In summary, hardware-based methods effectively reduce test power; however, they may increase circuit area as well as degrading circuit performance; they may also be incompatible with existing design flows. The software-based methods attempt to generate power-safe test patterns that will reduce excess power during testing by modifying the traditional ATPG procedure or by modifying predetermined test sets. Unfortunately so far, methods studied have developed a long lasting solution. Industry may be soon forced to adopt drastic and new design methods or testing techniques as they did when DFT was first introduced.

The outcome of this research work in this thesis will supply the means to generate large amounts of scan power data that can contribute to understanding how scan power is causing real or false failures during test. Silicon data is not accessible at the moment therefore the best possible approach is to investigate simulation data.

41

42

3 Open-source project selection, synthesis and scan insertion Introduction The impediments to gaining access to commercial designs that use proprietary information and to test data collected during scan simulation and silicom testing add extra layers of difficulty to scan test research. To overcome these difficulties it was necessary to invest time to identify a suitable open-source design project on which to execute pattern generation for power analysis.

Currently, cores or IP that are proprietary must be purchased from established vendors, often at very high prices. These costs can be prohibitive for students and research purposes, but even more importantly their terms and condition would not allow any type of data sharing from any study. Proprietary IPs are also hard to integrate due to the multiplicity of incompatible design and test tools. The majority of the open-source designs available come with an execution flow that targets FPGAs (field programmable gate array) and not ASICs (application specific integrated circuit). The FPGA design flow eliminates the complex and time-consuming floor- planning, place and route, timing analysis, and manufacturing masks stages of a project, since the design logic synthesized is to be “placed” onto an already tested FPGA device. The reason for targeting FPGA is obvious as it is much easier to buy an off the shelf FPGA module on which to install a design (for testing its functionalities) rather than invest a large amount of funding to process a manufacturing flow and produce a product on wafer, followed by its packaging phase. FPGAs are a much better choice if the purpose of a study is to verify the functionality of a design, the main advantage being a much simpler design cycle that makes it more predictable and allows for a more straight forward bug removal which can be done in days rather than weeks. Because this study targets the investigation of scan test, however, it was necessary either to find a project with an ASICs execution flow already set or develop a new flow. The latter case requires more development time and definitely brings more challenges to this work.

This chapter gives a description of the open-source project chosen for this research; it provides a high level introduction to Synopsys synthesis flow and standard cell libraries utilized for the design synthesis implementation. It also describes the steps taken to alter the OpenPiton flow

43

in order to perform scan insertion required for the scan pattern generation step (see 4.6). Note that the terms flip flop and flop are used interchangeably in the text following modern industry practice. Also the back-end flow refers to all post designs steps necessary to bring a design to fabrication.

Selecting an open-source project There are a good number of online sources for open-source projects but not many were suitable for this work. One website worth mentioning is OpenCores, [26]. In fact it is the world largest community focusing on open-source development targeted for hardware IPs. Designing IP cores is not as simple as writing a program. A lot more steps are needed to verify the designs and to ensure they can be synthesized to different FPGA architectures and various standard cell libraries for ASICS applications. The OpenCores organization has the objective of designing and publishing core designs which are made freely available, freely usable and re-usable (under a Lesser General Public License or LGPL). It is also a very good source of design methodologies and open-source tools with the aim of allowing large international teams to develop hardware in an open way offering the solution to most of the problems otherwise encountered when using commercial tools and sharing proprietary IP. The major benefits are:  The source code is available, so any developer can find out any aspect of the design  There is no charge for using the core and this is a major advantage to researchers and students. In addition, OpenCores offers recommendations on Open-source EDA tools which might be essential to students due to funding limitations. The use of such tools is strongly suggested as it makes it easier to collaborate on open-source projects. A lot of time must be invested in setting up tools before proceeding to execution and simulation, therefore sharing tools settings is also very important. A test environment that is built for a commercial simulator which can only be accessed by a limited number of people would make sharing information and flows much more complicated. Some of the open-source tools suggested are Icarus Verilog Simulator, Verilator (Verilog HDL simulator) and GHDL VHDL simulator.

Project/designs offered by OpenCores are categorised based on their functionality and are listed at [27]

One of the System-on-Chip (SOC) projects, the asynchronous spatial division multiplexing router [28] for network on-chip presented a good synthesis flow. This is an on-chip

44

communication fabric block utilized for multiprocessor SOCs featuring a 5-port router for mesh network as well as a reconfigurable number of virtual circuits, buffer sizings, and data widths. Most importantly it came with a full flow for its synthesis.

The router is written in synthesizable System-Verilog. The software requirements for its execution are:  The open-source Nangate 45nm cell library  Synopsys Design Compiler (Synthesis)  Cadence IUS -- NC Simulator (for SystemC/Verilog co-simulation) It offered a directory structure and a tested execution flow for synthesis and the only requirement to adapt it to this project was the introduction of scan insertion in the synthesis flow. The router synthesis was only a matter of setting up the design compiler environment and the cell library files, defining a configuration file for the router interface selection, and modifying a compile.tcl script for the design parameters. Also if a different standard cell library to the 45nm NanGate was to be used, it would be necessary to make only few more changes to the script.

The execution of such script would allow a researcher to produce the scan inserted netlist needed as input to a pattern generation tool. However, as the number of scan cells contained in the block were very low and none of the projects in the Open-Cores provided a scalable design, more time was invested to identify a more suitable project possibly with an ASICs synthesis flow and potential scalability. The latter point is very important as scan power issues are greatly magnified with large designs hence more suitable for future scan power analysis studies.

Most of the other open-source websites offered similar services. Projects are continuously updated and maintained with updates related to functionality, design bug fixes, simulation/emulation tools setup flow fixes. Large amounts of data relating to RTL design files (VHDL / Verilog / C / assembler / etc.), test-benches specifications and related documentation as well as design functionality documentation are usually available. The number of open-source projects being started and documented online is constantly increasing. A promising project was identified in OpenPiton [29], the world’s first open-source, general- purpose, multithreaded, many-core processor and framework. It is based on the Princeton Piton processor which was designed and taped-out in March 2015 by the Princeton Parallel Group. OpenPiton is open-source across its hardware design as well as its firmware and software. The

45

best asset of this project is that the hardware can be easily synthesized to FPGA and ASICS as well as being designed to be highly scalable and configurable, including core count. Some of its features are:  Open-source core under the GPL (General Public License which is a widely used free software license that guarantees end users the freedom to run, study, share and modify the software)

 All blocks are written in Verilog HDL  Includes synthesis and back-end flows for ASIC and FPGA

 It’s capable of running the full stack multi-user Debian Linux for functional validation  Scalable up to 1/2 billion cores

The last feature will allow to build a large enough configuration to “replicate” the scale of current commercial large design, hence any study, research or investigation on scan testing or scan related topics will be directly comparable to the proprietary counterparts.

Academic projects always faced difficulties in replicating commercial SOC scale in order to develop and share knowledge; the need for an open architecture frameworks for simulation, synthesis, scalability, and configurability, alongside an established flow for verification tools was the goal of the OpenPiton project. The project is stable and mature; multiple implementations of it have been created including a taped-out 25-core implementation synthesized with IBM’s 32nm process. OpenPiton leverages the well-known OpenSPARC T1 for its core block from Oracle [30]. In addition, it provides synthesis and backend scripts for ASIC and FPGA to enable other researchers to bring their designs to implementation. It provides a complete verification infrastructure and is supported by mature software tools.

The design is implemented in industry standard Verilog HDL and does not require the use of any new languages. An explicit design goal of OpenPiton is that it should be easy to use by other researchers. To support this, OpenPiton provides a high degree of integration and configurability. Unlike many other designs, in which the pieces are provided, but it is up to the user to compose them together, OpenPiton is designed with all of the components integrated into the same easy to use built infrastructure providing push-button scalability. [31]. As the synthesis step is already setup, the OpenPiton turned out to be the perfect design choice for this research. A simple alteration to the synthesis flow allowed scan insertion in order to produce a netlist usable for ATPG (the next step in this research project). 46

Generic synthesis and scan insertion flow using Synopsys DC The process of converting HDL (hardware description language) description and combining multiple IP blocks to a hardware design targeting a specific technology node is called "synthesis". As the OpenPiton project already comes with a synthesis flow to be executed using Synopsys DC (Design Compiler [32]), there was no requirement for investigating and selecting the most appropriate tool for the job. Synthesis is a complex task consisting of many phases and requires various inputs in order to produce a functionally correct netlist. Synthesis with DC includes the following main tasks: (Figure 3-1):  reading in the design  setting constraints  optimizing the design  analysing the results  saving the design database

The first task in synthesis is to read the design into Design Compiler memory. Reading in a HDL design description consists of two tasks: analysing and elaborating the design. The analysis command (analyse) performs the reading of the HDL source and checks it for syntactical errors; it creates HDL library objects in an HDL-independent intermediate format and saves these files into a specified directory. The elaboration phase translates the design into a generic technology-independent design (or GTECH) from the intermediate files produced during the analysis phase. GTECH generic technologies are just symbolic place holders for the basic building block models (standard cells, i.e. OR, AND gates) and have no timing, power, or other realistic information contained within them. This intermediate translation is temporary and is used to facilitate the execution of the replacement of HDL arithmetic operators in the code with ‘DesignWare’ components by DC. DesignWare is a collection of reusable intellectual property blocks that are tightly integrated into the Synopsys DC synthesis environment. Use of these blocks allows transparent, high-level optimization of performance during synthesis. The large availability of IP components enables design reuse and improves logic performance due to optimization. ‘Designware’ library is a real implementation for complicated functions like multiplier, divider and other logical/arithmetic operators and is technology independent.

47

Figure 3-1 Basic Synthesis Flow [32]

The ‘elaborate’ command will also automatically execute the link command, which performs a name-based resolution of design references for the technology node library provided to the current design. The purpose of this command is to locate all of the library components referenced in the current design and connect (link) them to the current design itself.

As the whole process is highly automated, the way to instruct the tool to execute a particular task in a certain way is with the use of design constraints. Constraints are the instructions that the designer gives to Synopsys DC to define what the synthesis tool can or cannot do with the design or how the tool should behave in particular scenarios.

48

Design constraints can be divided into 2 categories: implicit and explicit constraints.

1. Design rule constraints are usually implicit constraints defined within the technology library specified in the setup. The technology library in fact will specify all design rules in that library and these rules cannot be overridden or discarded. Parts of the design rule constraints are the definition of maximum transition time, longest time allowed for a driving pin of a net to change its logic value, maximum fanout for a driving pin, maximum (and minimum) capacitive load that an output pin can drive and cell degradation. The last one is the maximum capacitance that can be driven by a cell as a function of the transition time at the inputs of the cell. 2. Optimization constraints are explicit constraints (set by the designer). They describe the design goals set by the user and they work as instructions for the DC on how to execute the synthesis. The optimization constraints comprise:  Timing and maximum area constraints including system clock definition and clock delays which are fundamental, as clock signal is the synchronization signal that controls the operation of the entire design (as system). The clock signal indirectly defines the timing requirements for all paths in the design because most of the other timing constraints are related to the clock signal.  Input and output delays constraints are used to define and model the path delay from external inputs to the first registers in the design. Output delays constrain the path from the last register to the outputs of the design.  Minimum and maximum path delays are used to define path delays individually by setting specific timing constraints on those paths.  Input transition and output load capacitance constraints are used to define the limits of the input slew rate and output capacitance on input and output pins.  False paths definition identifies paths that cannot propagate a signal for any of the possible combination of input signals.  Other optimization constraints worth mentioning are multi-cycle paths, defined as exception to the default single cycle timing requirement of regular paths because the signal requires more than a cycle to propagate from the path start-point to the path end- point.

49

Synopsys DC will try to meet all the requirements defined with the constraints for both design rule and optimization constraints. Due to the implicit nature of the design rule constraints and the impossibility of overriding them, DC will always prioritise design rule constraints over optimization constraints and it will violate these latter ones if required in order to avoid violation on design rule constraints.

In addition to design constraints, it is necessary to define the environment in which the design is supposed to operate. The operating conditions specify the variations in process, voltage, and temperature (PVT) ranges in regards to how a product is expected to perform. These specifications will be taken into account when the tool performs the linking to the technology library and will scale cell and wire delays according to these conditions. The environment setting will include the definition of the technology specific library to be used for the synthesis.

The above constraints and environment variables are used to perform optimization steps from Synopsys DC. The optimization step translates the HDL description into gate-level netlist using the cells available in the technology library and this is performed in a number of phases, each of which will use different optimization techniques according to the design constraints specified.

Optimizations performed during synthesis are done on three levels: architectural, logic-level, and gate-level.

With regard to architectural level, optimization is performed on high-level definitions on the HDL description and it uses arithmetic expressions according to the constraints to improve the implementation of the design and minimize the area or timing parameters. It also performs resource sharing to try to reduce the amount of hardware by sharing hardware resources with multiple operators in the HDL description. This eliminates the need to have separate hardware components in the final circuitry performing the same operation. Optimization also takes place when selecting a DesignWare implementation of a particular resource as Synopsys DC considers all available implementations and makes its selection according to the design constraints defined (at this stage the design is represented by GTECH library parts, technology- independent netlist).

50

The logic-level optimizations are performed on the technology independent netlist (GTECH is still used) and consists of two processes:  the structuring optimizations which try to identify and share low level common Boolean algebra equations in the netlist.  the flattening optimization which tries to convert logic into Sum-of-Products representation; this produces fast logic by minizing the levels of logic between the inputs and outputs at the expense of area increase.

The third type of optimization is the gate-level optimization which works on the technology- independent netlist and maps it to the library cells to produce a technology-specific gate-level netlist precisely using the cells in the libraries specified by the target_library variable. Once the mapping phase is complete, the tool will need to perform a delay optimization to fix the timing violations introduced by the mapping. Other types of gate-level optimization are design rule fixes to remove design rule violations in the design by inserting buffers or resizing existing cells. Area optimization is performed as a last task and during this task, the breaking of design rules or timing constraints is not allowed.

All the optimizations performed by Synopsys DC are directly dependent on the constraints sets, therefore setting proper constraints is a fundamental step in the synthesis flow. Once the synthesis has been completed, the results need to be analysed to verify that the design meets the goals set by the user and specified by the design constraints. There are basically two types of analysis methods and tools:  Use reports commands for design object properties which generate textual reports for various design objects such as timing, cells, clocks, ports, buses, pins, nets, hierarchy, resources, constraints in the design, and more.  Analyse and examine the design schematic graphically and explore the design structure, visualize critical and timing paths in the design, generate histograms for various metrics and more. The last task in the synthesis flow is to save the synthesised design. The main output will be a Verilog gate-level netlist file.

51

Standard cell libraries

In order to synthesize a design, the synthesis flow requires a user to setup four library variables during the environment setup task:  Target library: it defines the technology library that Synopsys DC uses to build the circuit. In fact during the technology mapping phase, DC selects components from the library specified with the target library variable to build the gate-level netlist  Synthetic library: it specifies the synthetic or DesignWare libraries. These synthetic libraries are technology-independent, microarchitecture-level design libraries providing implementations for various IP blocks  Link library: it is used to resolve design references as Synopsys DC must connect all the library components and designs it references. This step is called linking the design or resolving references and in most cases the link library is the same as the target library  Symbol library: it defines the schematic symbols for components in technology library which are needed for drawing design schematics.

As no library is delivered with the OpenPiton project, it was necessary to identify an open- source, complete and suitable set of technology libraries to use for the design synthesis implementation. Because most of the scan power related issues arise as cell geometry size shrinks, it was more appropriate to select a technology library under the 45nm node, but not essential as the goal of this work is to setup and provide guidance for the entire scan power analysis flow.

Although not necessary, the opportunity to use a small technology node was offered by NanGate [33], a library optimization private company with the primary focus of providing EDA tools targeting standard cell library development process. The company itself is already at the leading edge of standard cell design, developing 20nm, 14nm and starting the support of 10nm processes. In the research community NanGate has become the best option for Open Cell Libraries (or OCLs). Due to the increase in complexity of technologies nodes, NanGate has developed and has made available standard cell libraries that uses NCSU FreePDK (North Caroline State University free package development kit), an open-source non-manufacturable process that takes into account the challenges of the advanced geometry nodes while only utilizing information available in the research community [33].

52

NanGate currently offers two Open Cell Libraries:  a NanGate 45nm Open Cell Library which was donated to the research community; it is no longer developed and has been finalized.  a NanGate 15nm Open Cell Library which currently being maintained and updated. It is based on the NCSU FreePDK15 process kit.

As this project goal is to only establish a flow, there was no difference in choosing one technology nodes library over the other. As a smaller technology node will magnify scan power issues, the 15nm cell library would be more suitable to it but as it is still not fully developed the 45nm library was chosen. This was specifically designed for the purposes of researching, testing, and exploring EDA flows. The 45nm Open Cell Library contains the following views:  Liberty (.lib) formatted libraries with CCS (composite current source) timing, ECSM (effective current source model) timing and NLDM/NLPM (non-linear delay model/ non- linear power model) data (slow, typical, fast, low temperature and worst low temperature corners). The CCS and ECSM formats provide more accurate delay information than NLDM therefore it is advised to use CCS and ECSM whenever possible.  Geometric library in Library Exchange Format (LEF)  Simulation libraries in Verilog and Spice (pre and post parasitic extracted netlist)  Cell layouts in GDSII  Schematics in EDIF and PNG formats

The 45nm NanGate library set is made up of over 62 different functions ranging from buffers to scan flip-flops with set and reset, including specialized low power cell and includes over 170 different standard cells. The goal of the NanGate 45nm cell library development was to provide a standard cell library to the industry and research community that represents the challenges in state-of-the-art process nodes, allowing it to reflect the challenges encountered by EDA tools and design implementation flows. The library was not created to be an optimized library for any real-life applications. It is important to note here that benchmarking of this library against any other library makes no sense as it is built on a non-manufacturable process.

Case study: synthesis and scan insertion Dynamic_node of OpenPiton After a general overview of the synthesis flow for Synopsys DC, this subsection will describe the steps needed to implement synthesis and scan insertion to one of the OpenPiton sub

53

modules, specifically the Dynamic_node (see available modules in Table 1). The process can be replicated on bigger modules or on a multi-core OpenPiton design with minor adjustments. The implementation of a larger multicore design is avoided as it would cause unnecessary overheads, much larger machine execution time as well as a lot of debug (not relevant to this work). Table 1 OpenPiton synthesis and back-end flow supported modules Module Name Description Purpose ffu OpenSPARC T1 core floating-point Small module with one SRAM front-end unit macro sparc OpenSPARC T1 core Large module with many SRAM macros dynamic_node OpenPiton on-chip network router Small module with no IP macros tile OpenPiton tile Large, hierarchical module with many SRAM macros

OpenPiton was designed to be a useful tool to both researchers and on field engineers in exploring and designing future many-core processors. Unfortunately it was not designed to explore scan testing and therefore it does not include scan insertion capabilities. This work will address that and will provide the basics to include the steps necessary to scan insert the design and output a netlist that can be processed for scan pattern generation. Without this additional work, OpenPiton, as any other open-source design would not be of any value for scan power analysis.

Figure 3-2 OpenPiton high level directory structure [32]

The access to the OpenPiton project doesn’t require any particular request or permission; it’s just a matter of downloading the compressed project source files from the OpenPiton webpage [27]. The downloaded source contains multiple subfolders (some related to FPGA) which include a ‘docs’ folder containing guides and manuals and a ‘piton’ folder containing the project source code as well as tools flow for synthesis and validation. The directory structure

54

is made of the following folders: ‘piton’, ‘design’, ‘chip’, ‘tools’, ‘calibre’, ‘synopsys’. The only folder relevant to synthesis is the ‘piton/’ directory. All of the scripts for the synthesis and back-end flow are located in ‘piton/’ which contains two relevant sub-directories: ‘design/’ (Figure 3-3) and ‘tools/’. The first of the two directories contains all of the synthesizable Verilog RTL design files and as each subfolder contains a multitude of folders, only the one used by this work will be mentioned.

Figure 3-3 OpenPiton directory structure and dynamic-node module content

Within the design/ directory the only directory pertinent to synthesis and back-end scripts is ‘chip/’ as it contains the Verilog design files for an OpenPiton chip and sub-blocks. Within ‘chip/’ the directory structure follows the Verilog module design hierarchy. The synthesis and back-end scripts specific to a module such as constraints scripts, are located in directory named ‘synopsys/’ (Figure 3-4). It follows an example for the dynamic_node sub-block ‘synopsys/’ directory structure (‘piton/design/chip/tile/dynamic_node/rtl/’) which contains results and a

55

reports folder used by the OpenPiton synthesis flow to output the generated netlist and the various synthesis report files.

Figure 3-4 Synopsys director structure in OpenPiton

The tools/ directory (Figure 3-5) within the piton/ folder contains all of the scripts and tools used in OpenPiton, including the synthesis and back-end scripts that are common to all modules which are the drivers of the synthesis flow. The module synthesis and back-end scripts are split into two subdirectories: calibre/ for DRC and LVS scripts that use Mentor Graphics Calibre (which will be ignored) and synopsys/ for all other tasks that make use of Synopsys tools.

The two relevant directories for the synthesis work are ‘tools/synopsys/’ for the initial Synopsys Reference Methodology patching (see 3.5.1) and ‘tools/bin/’ from where the module synthesis will be launched.

Figure 3-5 tools directory content in OpenPiton

56

3.5.1 Design compiler reference methodology

The OpenPiton synthesis and back-end flow is based on the Synopsys Reference Methodology (RM). Because of IP issues, the OpenPiton synthesis scripts have been released as a patch to the Synopsys RM. Users need a Synopsys Solvnet account to access the Synopsys RM generation tool in order to make use of the OpenPiton synthesis flow (see Appendix BA for Synopsys RM generation link and steps)

The Design Compiler Reference Methodology (DC-RM) provides a set of reference scripts that serve as a guideline for developing the synthesis scripts. These reference scripts are not designed to run in their current form and need to be adapted to the current design environment [34]. The patching is all automated in the OpenPiton project. The DC-RM includes options for running DFT Compiler, Synopsys DFTMAX scan compression, and Power Compiler optimization. As additional licenses are required when running the DFTMAX for scan compression, this flow was intentionally set to include scan insertion with uncompressed chains. The RMgen tool only requires few simple steps to setup and generate the methodology scripts based on the DC version used for the synthesis implementation. The OpenPiton synthesis flow supports patching from the following versions of the Synopsys RM: Synopsys DC-RM I-2013.12-SP2 and the following setting:

 RTL Source Format: VERILOG4  QoR Strategy: DEFAULT  Physical Guidance: TRUE  Hierarchical Flow: TRUE  MCMM Flow: FALSE  Multi-Voltage UPF: FALSE  Clock Gating: TRUE  Leakage Power: TRUE  DFT Synthesis: FALSE  Lynx Compatible: FALSE  Static Timing Analysis

57

The project expects the value for DFT Synthesis to be set to FALSE. As this work is aiming to have a basic uncompressed scan test infrastructure, there is the need to change this variable to TRUE. The Synopsys RMgen will output multiple files: DC-RM_I-2013.12-SP2 |-- DC-RMsettings.txt |-- README.DC-RM.txt |-- Release_Notes.DC-RM.txt |-- rm_dc_scripts | |-- dc.dft_autofix_config.tcl (only generate with DFT enabled) | |-- dc.tcl (changed; see Appendix C) | |-- dc_top.tcl (changed; see Appendix C) | |-- fm.tcl (no difference with DFT synthesis set to TRUE) | `-- fm_top.tcl (no difference with DFT synthesis set to TRUE) `-- rm_setup |-- common_setup.tcl (no difference with DFT synthesis set to TRUE) |-- dc_setup.tcl (no difference with DFT synthesis set to TRUE) `-- dc_setup_filenames.tcl (changed; see Figure 3-6)

When the DFT synthesis variable is set to TRUE, the outputted scripts from RMgen will include a new file named dc.dft_autofix_config.tcl. This script contains auto-fix capabilities that allow DFT Compiler to insert logic to fix scan rule violations asSOCiated with clocks, asynchronous set signals, and asynchronous reset signals (asynchronous signals are usually avoided in digital applications). The content of the script is disabled by default but when enabled it facilitate the specification of the design external controllable signals needed for scan (clocks, set/reset and scan-enable needed to switches the design between functional mode and test mode). The remaining files generated by RMgen are the same in name and number as when the DFT synthesis option is set to FALSE but differ in content. The dc_setup_filename.tcl script content will be adapted to scan and will include a number of inputs and outputs and DFT entries setting as seen in Figure 3-6.

Other files that are affected by the DFT synthesis variable value change are the dc.tcl and dc_top.tcl files. The first is the block-level RTL exploration and synthesis in a hierarchical flow, while the latter one is the top-level integration script for RTL exploration and synthesis. In this example case the synthesis implementation will be done on the Dynamic_node which is the top level and only the dc_top.tcl will be taken in consideration.

58

When the DFT synthesis is set to TRUE there is a large number of extra lines integrated in both scripts which include various sections for scan DFT commands, as well as suggestions with sample commands which need to be modified according to the current design (see Appendix C for full list of commands introduced when enabling DFT synthesis during RMgen).

Figure 3-6 Changes to dc_setup_filename.tcl when enabling DFT during RMgen process

This is a brief list of the sections introduced by enabling DFT for RMgen:  Writing out the updated DC blocks after compile_ultra  DFT Compiler Optimization Section  Verilog Libraries for Test  DFT Signal Type Definitions o set_dft_signal -view spec -type ScanDataOut -port SO o set_dft_signal -view spec -type ScanDataIn -port SI o set_dft_signal -view spec -type ScanEnable -port SCAN_ENABLE o set_dft_signal -view existing_dft -type ScanClock -port [list CLK] -timing {45 55} o set_dft_signal -view existing_dft -type Reset -port RESET -active 0  DFT Configuration  DFT AutoFix Configuration

59

 DFTMAX Compression Configuration  DFT Insertion

3.5.2 Alteration of the synthesis flow to add scan insertion to the design

The dynamic_node_top_wrap.dft_signal_defs.tcl is the script that must be modified to allow for the insertion of uncompressed scan chains. This will allow us to generate a scan inserted netlist and can be found at the following path within the OpenPiton directory structure (Figure 3-5): /piton/design/chip/tile/dynamic_node/synopsys/script

As this work only aims at setting up a basic flow, the scan insertion will make use of a basic configuration with no compression IP, 5 balanced scan chains, a clock, a reset signal and a scan-enable. The Dynamic_node module contains 1980 flip flops that will all be converted to scan flops, therefore it is possible to define 5 scan chains of 396 scan flop each. The following are the least amount of settings that can produce a working scan inserted netlist: 1. # clocks: to define scan clock(s) and its timing period set_dft_signal -view existing_dft -type ScanClock -timing [list 45 55] -port clk 2. # Resets: to define memory elements reset signal(s) set_dft_signal -view existing_dft -type Reset -active_state 0 -port reset_in 3. # Shift Enable: this is to define a scan control signal that selects the scan mode (shift or capture) set_dft_signal -view spec -type ScanEnable -usage scan -port validIn_E 4. #masking violations: can be used to wave drc rules not relevant to the current design implementation ##set_dft_drc_rules -allow {i.e. TEST-504 TEST-505} 5. # Remove flops from scan chain that should not be on the chain such as the ones in the compression IP when present i.e. ##set_scan_element false i_dfx/* ##set_scan_element false i_dfx_scan/* 6. # hook-up definitions and chain stitching: chains can be defined with a simple command line which indicated the chain name (i.e. chain0), the scan_data_in (i.e. yummy_in_N), the scan_data_out (yummyOut_N) and the length which can be also defined as exact or leaving the tool to select the most appropriate configuration (exact_length 396). Some examples are given below: set_scan_path chain0 -scan_data_in yummyIn_N -scan_data_out yummyOut_N -exact_length 396 set_scan_path chain1 -scan_data_in yummyIn_E -scan_data_out yummyOut_E -exact_length 396 set_scan_path chain2 -scan_data_in yummyIn_S -scan_data_out yummyOut_S -exact_length 396

60

set_scan_path chain3 -scan_data_in yummyIn_W -scan_data_out yummyOut_W - exact_length 396 set_scan_path chain4 -scan_data_in yummyIn_P -scan_data_out yummyOut_P -exact_length 396

In reality the number of uncompressed chains will be dictated by the number of available design primary input and output channels that can be dedicated to scan. It is best practice to use as many primary inputs/outputs as possible as this will increase the number of chains that can be synthesized, shortening their length and as a consequence reducing test time. There is no need for such considerations in this case; on the contrary, it would be best to keep chain length as long as possible as power issue are magnified for very long uncompressed chains as the scan cell toggle percentage is higher.

3.5.3 Synthesis implementation for Dynamic_node block of OpenPiton

Almost everything that is required as well as the infrastructure for executing the synthesis flow for any OpenPiton module is provided by the OpenPiton project itself. It is all integrated into the directory structure and environment making it relatively easy to execute synthesis allowing users to keep the focus on tool settings (in this case scan insertion) and research goals. An easy step by step guide is provided with the download of the project source codes [35].

Table 2 OpenPiton Synthesis and Back-end Flow Run available commands Command Flow Step Tool Checking Script rsyn Synthesis Synopsys Design Compiler csyn rsta Static Timing Analysis Synopsys Primetime csta rrvs RTL vs Schematic Synopsys Formality crvs Equivalence Checking rpar Place and Route Synopsys IC Compiler cpar reco Run ECO Synopsys IC Compiler cpar merge_gds Merge GDSII Designs Synopsys Workbench cmerge_gds Edit/View Plus rdrc Design Rule Checking Mentor Graphics Calibre cdrc rlvs Layout vs. Schematic Mentor Graphics Calibre clvs Checking rftf Full tool flow All of the above N/A

61

The synthesis implementation flow only requires the adjustment of the environment setup by modifying a script provided (piton/piton_settings.bash) and located within the parent /piton directory (see Table 2). Most of the variables are already set and don’t need any adjustments but some of them such as $PITON_ROOT will need to be set according to the user’s needs. Once the piton_settings.bash script is adjusted and executed, the next stage is the patching of Synopsys RM to the OpenPiton Flow. This requires 3 simple steps:  Generate and download the Synopsys RM (see 3.5.1)  Execute the /piton/tools/bin/synrm_patch script with the –dc_rm_path= switch.  Check that synrm patch is executed and completed successfully At this point the OpenPiton synthesis flow should be ready for execution (synrm output example in Figure 3-7)

Figure 3-7 Synopsys RM to OpenPiton patching: synrm_path succesfull output example

The OpenPiton synthesis scripts make it easy to port the flow to a specific process technology as it only requires changes to the following files:  ${PITON_ROOT}/piton/tools/synopsys/script/common/env_setup.tcl  ${PITON_ROOT}/piton/tools/synopsys/script/common/process_setup.tcl The env_setup.tcl script should be modified to get environment variables that point to directories for standard cell libraries and process design kits (PDK). The process_setup.tcl is the main location where changes are needed to port the Synopsys portion of the flow to a new process. Setting up the above two files will allow running the synthesis flow for modules without SRAMs as in the case of the Dynamic_node. Modules containing SRAM will additionally also require memory modules models.

62

To process larger OpenPiton modules, there is a need to implement additional modifications which are not required to get the flow running. These changes have to be made to scripts present in the following directory: /piton/tools/synopsys/script/common/design_setup.tcl. The only other changes applied were not to the existing flow but to the synthesis scripts in order to implement scan insertion (see 3.5.2). Once the above steps have been carried out and the flow modified to use the desired process technology (in this case NanGate 45nm), the synthesis of a module can be executed with a very simple step by using the following command: rsyn (for this work rsyn dynamic_node was performed). The successful execution of the command will output a number of files in the following result directory: $PITON_ROOT/piton/design/chip/tile/dynamic_node/synopsys. The files generated are the following: dynamic_node_top_wrap.elab.ddc dynamic_node_top_wrap.initial.fp dynamic_node_top_wrap.compile_ultra.ddc dynamic_node_top_wrap.mapped.v dynamic_node_top_wrap.mapped.ddc dynamic_node_top_wrap.mapped.svf dynamic_node_top_wrap.mapped.fp dynamic_node_top_wrap.mapped.spef dynamic_node_top_wrap.mapped.SDF dynamic_node_top_wrap.mapped.sdc

The main files needed as input to ATPG tools are the scan inserted netlist’’ and for transition pattern generation the file (sdc or Synopsys Design Constraint). This second file contains cell delay information as well as false and multi-cycle path definitions. When the ‘.sdc’ file is loaded into an ATPG tool, it will allow the elimination of patterns targeting false paths. Most of the remaining outputted files are useful for synthesis optimization using Synopsys tools including the SDF (standard delay format) file which is also utilized for cell delay definition needed for gate level simulation (see 5.3).

Table 2 shows the complete suite of commands available when adopting OpenPiton as the base for a research project. The purpose in this case was only to generate a netlist and there was no other requirement. The project supplies commands to run each individual step of the flow, as

63

well as a single command to execute the whole flow (sequential execution). It also provides checking scripts that can be used to verify that any of the steps that have been started have been executed correctly.

Conclusion The extra time and effort put into the selection of a suitable design to use in this project was justified by the discovery of the OpenPiton. As OpenPiton was designed for researchers it was very easy to work with and one of its main aspects is its scalability which will make it perfect for future work. The OpenPiton project was of great help for its complete synthesis flow and directory structure. In this case only a small Verilog module with a 45nm library technology node was used to setup the flow but with few simple steps the flow can be altered to synthesis a multicore design with a standard cell library much closer to the current cutting edge technology node. This would allow a researcher to mimic billion gates designs with the latest standard cells technology which are more prone to scan power issues.

The only downside in selecting the OpenPiton project was the lack of DFT content but the issue was easily addressed thanks to the highly-automated capability of modern EDA tools (Synopsys DC).

64

65

4 ATPG environment setup and execution

Introduction

IC design is a very big business and in support to it there are a large number of electronic design automation (EDA) companies. The three major players in the field of ATPG at the time of writing are Mentor Graphics with Tessent FastScan and TestKompress, Synopsys TetraMax ATPG and Cadence Design System with Encounter Test ATPG. The full list of EDA companies includes well over 30 entities and they offer different software packages with a large variety of options for technology nodes, specific features that can be used during the implementation of Design For Test (DFT) and price range.

In this project, the ATPG tool chosen for pattern generation is Mentor Graphics Tessent. The choice was made for two reasons: Tessent is currently the industry standard for silicon design companies due to the compression algorithm used by its EDT IP; it is also one of the 2 options available in 3rd level institutes under the Euro-practice license (the 2nd one being TetraMax). Tessent offers a very large set of scan tools which support small technology nodes as well as low power, cell aware fault models and more. In general the name given to ATPG tools is misleading because they have a very extensive use in design and silicon sign-off. They are not only used to generate patterns but they are also invaluable for silicon fail debug. For this project flow, the goal is to use it to setup the ATPG environment, generate patterns and ultimately generate test-benches required for the execution of GLS simulation. They are capable of outputting a very large number of reports which are used by engineers for statistical study and design debug (for test coverage, DFT structures, reset and power usage during test).

This chapter is meant to be a simple guide to be used for setting-up an ATPG environment from scratch and it will also provide the description of the least amount of commands necessary to execute pattern generation. In addition, it will include brief descriptions of other uses of an ATPG tool during a design life cycle. The aim is to demonstrate the important points of ATPG without getting lost into too many details. It will also point out the best known methods to implement this phase of the design flow. Some extra sharing regarding pattern expansion (from compressed patterns) and ATPG diagnosis tool is also described.

66

Scan configuration options

Scan configuration is very much decided at a very early stagy of a life of a product. Scan insertion (as well as other types of DFT structures i.e. Built-In Self Test or BIST) may be the last step taken during synthesis but they are essential to the business. Deciding what type of scan test to use on a device is dependent on two main aspects of the design: its size and the field of use. If the design is used in a very sensitive field (i.e. medical, military, automotive, space, others) it may require built-in diagnosis test that will automatically execute at each device boot-up. In this case, a pure scan test cannot be applied and a more hybrid structure will be put in place. Scan test will always test the largest percentage of the silicon while BIST or other types of self-diagnostic tests will take over the power-on section and other more sensitive areas. If the design is aimed at a more common commercial use, scan test will be the protagonist. At this point there is only to decide how to set up scan in order to reduce cost as test features will increase the silicon area of the product with no value added for the customer. As scan flops are larger than conventional flops, a decision must be taken on the scan insertion ratio. 100% scan insertion is the ideal scenario but in some cases this value is reduced in order to meet area requirements to fit the silicon into the package’s configuration requested by the end customer. Pin count can also condition the scan configuration and package choice. Other points to be decided at an early stage are test time budget as well as memory availability on tester platforms (test memory is very expensive). Usually for large designs, pattern generation and simulation run time will also need to be taken into consideration. If the execution run time for pattern generation and simulation goes into weeks, the number of iterations that can be executed to find and remove bugs are going to be limited. A bug found at a very late point in the design flow, may require a design change and as a consequence a new ‘re-spin’ of pattern generation and simulation. A lengthy execution may cause missed deadlines, force engineers to make risk calls and proceed to the next stage of the flow with a partial validation of pattern set (GLS) missing out on newly introduced bugs. Other aspects to be taken into considerations are power limits; it is well known that scan power during test may damage devices.

The number of primary input/output pins available is one of the biggest constraints to the number of scan chains that can be setup and used for the test. Package choice (hence pin count) is always driven by customers and business needs and is rarely challenged because of test. The limitation on the number of input pins dedicated for scan is the main reason why the large majority of industry is now using compressed patterns as it makes it cheaper, faster and possible

67

to apply scan. Embedded determinist test (EDT) modules Intellectual Property (IP) are the engines of today’s scan and scan pins/channel are now only used to interface to EDT IP.

The EDT module is made of 3 main components: a decompressor, a set of internal chains (of possibly equal length) and a compactor module. The EDT block can be configured to operate at different compression ratios within limits dictated by the number of input/output primary channel and the number of scan flops. The advantage of an EDT module and compressed patterns is obvious. As an example, consider a small design with no EDT IP, 30K scan flops, 3 primary inputs and 3 primary outputs available to scan: such a scenario will require 3 scan chains of 10K flop in length, hence each scan test vector will be made of 10K bits and require 10K cycles to be shifted into the chain. For a clock running at 2 MHz, (equivalent to 500ns per cycle) this will require around 4ms test time to just shift in 1 pattern (without overheads). If we consider the same scenario with an EDT IP in place, it could be possible to setup scan by dividing the 30K flops into 150 chains of 200 flops each. Test time is now 80us, 50 times smaller per pattern. Multiplying that saving by probably thousands of patterns in a scan test set, then by many times again for each device tested (millions in some cases) and it becomes clear how much cheaper scan test can be as a result of investing time and money on EDT technology.

In addition to scan inputs and outputs channels, scan test will require the use of other primary input pins for scan control signals such as scan clock, scan enable, reset signals and other signals for the control of the EDT module. These extra signals (that may or may not be controlled externally) will allow engineers to make use of features such as low power pattern generation as well as the backup feature known as bypass mode. This later feature is really important as it allows the option of switching from compressed to uncompressed patterns. In the event of a major flaw in the design, synthesis or manufacturing of a decompressor or compactor modules of the EDT IP, the compressed pattern will become obsolete and the bypass option is the only way to still test the device. Such a feature makes it possible to configure and access the internal chains directly, bypassing the decompressor and compactor modules. In this scenario, the internal chains will be connected together in a daisy chain configuration (last scan cell of chain 1 connected to the first cell of chain 2 and so on). This backup option allows application of scan test although power requirement and test time might no longer suit the test budgets previously put in place; extra scan cost is not ideal but it could still be cheaper the a full iteration of the design flow or the loss of a customer due to a missed deadline. 68

Internal chain length is often dictated by the compression ratio (chain/channel ratio). The higher the number of internal chains, the shorter is the length of them and less test time will be required per pattern. TestKompress provides users with an analyze_compression command that allows reporting of the maximum chain to channel ratio before the configuration causes loss in test coverage. It can be used on any scan inserted netlist and the result can be fedback to the design group for alteration. When the command is executed, the ATPG will run full pattern generation on the netlist using the current scan setup and it will elaborate the report. Table 3, for example, shows an output report from the analyse_compression command which was executed on the Dynamic-node test-case on a scan configuration of 1 uncompressed chains. This case shows that the optimal compression ratio with an EDT IP will be 12 chains. Increasing the compression ratio above 12 (12 internal chains per channel) will start to introduce coverage loss.

Table 3 Compression analysis form Mentor Graphics on OpenPitonDynamic_node (configured with 1 uncompressed chain of 2245 cells) // For stuck-at faults // Chain:Channel ratio Predicted fault coverage drop // 12 negligible fault coverage drop // 13 0.01% - 0.10% drop // 15 0.15% - 0.55% drop // 16 0.60% - 1.00% drop // CPU time to analyze compression is 4 seconds

Other tools (such as DFT Advisor) can assist DFT engineers to select the best possible configuration and chain to channel ratio at an earlier stage, before getting to run ATPG on a scan inserted netlist. All decision regarding DFT implementation are taken keeping in mind one thing only: reduce cost as much as possible because test is just an overhead and keeping the cost down is essential.

69

Full chip ATPG vs hierarchical ATPG

Full chip (FC) or flat ATPG is simple as ATPG is commonly executed on a single netlist containing an EDT module (package and netlist pins are equivalent). Flat ATPG requires a complete design and the ATPG is run on the entire design at once. In reality some of the component modules can be missing and they are substituted by dummy models (grey boxes) only in place to verify the connectivity. The use of grey boxes allows to run a flat model ATPG at an early stage of the design although its results will have to be re-evaluated each time a new model is replacing a grey box. Such an approach will make it possible to setup the ATPG environment and test the main scan structures including scan control signals, scan clock controllers as well as the EDT IP at a very early stage in the design flow. A partial environment will also allow the verification of the integrity of the existing design and debug any Design Rule Check (DRC) already present. Minor adjustments will be needed for the ATPG setup every time a new piece of the puzzle is put in place in the netlist until its final version.

Full chip ATPG is often used for small to medium size designs. They can contain one or multiple EDT modules sharing the scan cells present in the design. Each EDT module can have its own input and output scan pins assigned to it or if their count is limited, they can also share them. There are 2 option here: 1) use multiplexer to deliver data to one EDT IP at the time; 2) deliver the same data into both EDT blocks; in this case coverage may be impacted. When a full chip becomes too big, hierarchical DFT is preferred to full chip because it is capable of making a much better use of compute resources as well as reducing runtimes. Pattern generation runtime varies a lot and become an issue for large designs; multiple factors contributing to its growth. The following section examines some of them.

The easiest fault type for a tool to generate pattern for, is stuck-at faults. Generally, at-speed patterns are a much larger set and require 2X or longer generation time than stuck-at faults patterns. The tool finds it much more difficult to identify a pair of vectors capable of activating the targeted fault and capturing the response for a transition fault. Test coverage also suffers from this difficulty and coverage drop can varies from 2-3% to well over 10%. Low power patterns will also cause an increase in runtime; this happens because a low power configuration has a more aggressive setting to keep power usage at bay (see Section 4.4). For small technology nodes (< 30nm) conventional stuck-at and at-speed fault types are no longer ‘good enough’. The use of cell aware (CA) fault types is necessary and again this will cause an

70

increase in fault count because the number of faults per library cell is greater. The use of CA models can increase total number of faults from 1.2X to over 1.5X the number of traditional fault types (see Section 4.5). Table 4 Test pattern generation log of a stuck-at run on Dynamic node configured with 1 uncompressed chain Stuck-at Faults on dynamic_node // Simulation performed for #gates = 39870 #faults = 58109

RE=redundant faults; AU=atpg untestable; AAB=atpg abort;

// system mode = analysis pattern source = internal patterns #patterns test #faults #faults #eff #test process RE/AU/AAB simulated coverage in list detected patterns patterns CPU time // ------1.03 sec 0/534/0 // 64 94.42% 3677 54432 64 64 1.71 sec // ------2.34 sec 0/534/87 // 128 96.75% 1922 1754 64 128 2.43 sec // ------2.98 sec 4/534/87 // 192 97.95% 1005 906 62 190 3.06 sec // ------3.76 sec 5/535/95 // 256 98.33% 400 284 60 250 3.81 sec // ------3.98 sec 5/535/95 // 320 98.68% 137 263 62 312 4.03 sec // ------3.76 sec 5/535/95 // 347 98.73% 95 42 25 337 3.81 sec

// 420 faults were identified as detected by implication.

With the increase in design size, there is an inevitable increase in fault count and runtime will also increase. Hard to reach faults will cause the number of pattern to grow. To make this concept easier to understand let’s take a look at Table 4 which contains a log of an ATPG stuck-at session of the Dynamic_node module. It can be observed that the tool can almost effortlessly detect the majority of fault sites (94.42% equivalent to over 54K faults) with just 64 patterns. It also shows that the tool had to simulate a much larger number of pattern to identify the remaining detectable faults (4K). The log reports that a further 283 patterns are required for the detection of the remaining 4.5% of fault coverage. This situation is magnified as the design become larger and larger (i.e. for a 10 million fault design the first 1K pattern may detect over 90% of test coverage, a further 10K patterns will probably be required to detect the remaining 5-6% of testable faults).

71

At the other end, the hierarchical DFT methodology breaks down the full chip design into more manageable blocks and each block (usually named partitions or cores) is commonly contained by a wrapper. At top level, the FC can use different ways of reaching the various partial components. It could be using shared scan pins (as in the case of a flat ATPG method) or a more advanced bus routing system comparable to a modern computer network. A power-up sequence will be the key to configure and access each individual partition as well as setting them up into a scan test mode. Using such structures will have each partition running as an independent entity, and patterns can be generated at the block level and merged at the top level with the remaining patterns. This method provides a way to perform large parts of the DFT process in parallel instead of executing it on a full chip model. This allows a much earlier start for the setup and run of the ATPG environments of each partition (as well as FC setup by using grey boxes as place holders for the missing partitions).

This is not a new approach and it often results in 10 to 50 times faster runtime as well as requiring smaller computing resources. The advantage is not limited to ATPG but also applies for pattern simulation and verification using GLS. Reducing runtime by a magnitude of 10 might not sound a great achievement but going from a 10 days to a potential 1 day run can be of great help to the engineers. Having the option to execute an extra iteration of the whole flow could reduce the number of bugs with a consequent increase in yield. Furthermore, if a partition is used multiple times in the design, then pattern generation only needs to be executed once and the same set can be reused for all that partition’s instances. The main downside of hierarchical DFT is the need of interconnect structures between blocks which adds another level of complexity for pattern delivery and debug. The benefits of hierarchical test are substantial and outweigh the costs. For large designs it is a necessity [36] as pattern generation execution time became a factore that conditions projects’s time line.

Standard and low power scan patterns

Very high power demands during scan test can cause unnecessary yield loss as well as worsen already troublesome issues asSOCiated with IR-drop and crosstalk. Excessive power consumption can also induce higher stress-related failures. Furthermore, power grid limitations are exacerbated by the IR voltage drop during scan.

72

Wire bonds have limited capability in terms of power handling, hence adjusting tester setting to supply more power to the device under test is not a real option. Wire bonds are designed to withstand power level required during functional operation of the device and are incapable of tolerating scan test power. The scenario worsens with the continuous decrease of device size that fits in smaller packaging which in turn require smaller wires bonds. A quick solution to this is the use of multiple bonds per pin but at an extra cost. As shown in Chapter 2, it is well known that scan patterns have a much greater switching rate than functional mode of operation. Despite many efforts to mitigate the problem, a clear cut solution has not been found and the issue is likely to remain as a significant problem for the foreseeable future.

Scan power is a very generic term. The simplest breakdown for it is to consider shift power and capture power although other ways of analysing scan power are also possible, for example static and dynamic power. Shift power is often rated to be of higher importance than capture power as the shift mode takes the majority of the test time (i.e. for a chain length of 100, scan text requires 100 shift cycles and 1 capture cycle, so that for over 99% of test time the device is on shift mode). At the same time it is important not to undervalue capture power as it happens during a phase of scan that matches the functional state of the device.

Furthermore, functional mode can have a different way of handling power management and distribution to scan mode. Power domains might be turned on and off based on functions and usage of the device; this is very common on battery operated designs. At the other end, scan might require to have scan chains crossing different functional blocks and different power domains.

Keeping scan power under control does impose investing time and money. Some of the available options to mitigate scan power issues are the use of clock staggering, the insertion of clock gaters (this allow the control of switching activities in sensitive areas of the design), but the ultimate “weapon” available today for keeping scan power at bay is to enable the low power shift controller in the EDT IP during pattern generation.

Past methods based on patterns filling of don't care bits with pseudo random and non-random algorithms are no longer valid; test coverage gets affected by these as the number of fortuitously detected faults might drops drastically. Coverage loss is not an option. Reduction of node technology is increasing the manufacturing fault density; this causes an increase in 73

DPMs, in direct contrast with the customer request for much lower DPMs especially for sensitive application where reliable hardware is a necessity and taken for granted.

To solve power issues, design companies rely increasingly on low power settings. EDA tools provide the option of using a low power shift controller; this involves constraining some scan chains which are held to a constant 0 on a pattern by pattern basis. The choice of holding off a specific scan chain depends on the faults being targeted by each simulated pattern. As a consequence the chains held off will not contribute to switching activities, but only the chains that are free to toggle will contain efficient patterns (pattern that will detect faults). It should be noted that these are internal decompressed patterns. The downside of this method is the lowering of the encoding capacity of the system and the fact that the pattern set will be larger than for a conventional run because less faults are targeted by each external pattern. The extended pattern set size will be directly related to the severity of the power setting chosen for the execution of pattern generation.

Making use of low-power shift includes two steps:  Generating and inserting power controller logic during the EDT module creation. This is going to be based on the minimum switching threshold percentage value specified during IP generation (the Tessent shell command, set_edt_power_controller, generates power controller logic and is described in detail in the Tessent shell reference manual [37]). The capability of the power controller hardware is determined by the percentage switching threshold specified during its creation.  Before generating test patterns, it is required to enable the power controller and specify the low-power switching threshold to be used during scan shifting (with the set_power_control and ‘set_edt_power_controller Shift’ commands).

The use of more aggressive settings will require a larger number of chains to be held off at once. The minimum switching threshold for pattern generation represents the minimum switching threshold the power controller hardware can handle during a low-power execution. The switching thresholds specified for pattern generation cannot exceed the switching threshold specified during the hardware creation.

74

If the power threshold set in ATPG for pattern generation causes a drop in test coverage, the ATPG tool will deliberately violate the threshold to avoid test coverage losses. Only in extreme cases (unacceptable scan power values that may compromise the physical integrity of the device) is it possible to specify a hard limit on the switching activity, which will allow the tool to ignore the test coverage impact. The power controller should be disabled when it is not needed as it adds additional shift cycles to each test pattern thereby unnecessarily increasing test time.

A useful command that can give an idea of the switching activity occurring for any pattern set generated, is the report_power_metrics command. Table 5 shows an output from the execution of such command on the Dynamic_node module. It displays a summary of the patterns’ average and peak power values (minimum, average and maximum) for the weighted switching activity (WSA) as well as the state element transition percentage [32].

Table 5 Power Metrics report of test case Dynamic-node for SA faults on 1 uncompressed chains configuration

ANALYSIS> report_power_metrics Power Metrics M in. Average Max. WSA 16.39% 28.92% 34.16% State Element Transitions 22.82% 47.11% 51.74% Peak Cycle WSA 26.68% 29.36% 37.37% State Element Transitions 43.37% 47.90% 51.74% Load Shift Transitions 46.46% 50.00% 53.43% Response Shift Transitions 20.05% 25.85% 51.81%

The importance of fault models

The most widely used and best-known fault types for scan test are stuck-at and transition faults. The miniaturization and the drive to smaller technology nodes has led to an increase in DPMs and worsened yield in manufacturing. It appears that conventional fault models are no longer capable of screening enough faults to reach the targeted DPMs. This has required the

75

introduction of new fault models (UDFM) also known as cell aware (CA) [38]. This modified flow is shown in Figure 4-1.

Stuck-at fault models are the default upon invocation of the ATPG tools. They assume that one of the signal lines in a cell is stuck at a fixed value, regardless of the inputs supplied to the cell. Because of the nature of the logic values (0, 1), in a cell with n fault sites there will be 2n faults. Delay information is not considered in this model and for this reason it is considered to be a logical fault model; it is also defined as structural model because it is based on the structural gate-level circuitry and is independent of operating conditions (e.g. temperature, power supply voltage).

Figure 4-1 Cell Aware generation flow [38]

The transition fault model is much different. There are two faults sub types: a slow-to-rise fault and a slow-to-fall fault. For a slow-to-rise fault, the 0 to 1 transition (or 1 to 0 for slow-to-fall) will not reach the output of a scan cell within the time limits specified. To detect a transition fault requires the use of a pair of vector patterns (V1, V2). The first pattern initializes the fault site by setting it to the initial value. The second vector V2 launches the transition with a first clock pulse and propagates the sequential element response to another scan flop. The initial values the fault site needs to be set to, are delivered (as for stuck at faults) by shifting a vector into a scan chain; at this point the device is in scan test mode. For the second vector, there are 2 options: derive it from the combinational logic response to the first vector stimulus or use the chains to shift in the second vector. The two methods of applying a pair of test vectors for transition fault testing are called Launch Off Shift (LOS) and Launch Off Capture (LOC), and were described in Chapter 2. In today’s cutting-edge design and for required commercial (i.e. medical, automotive) and military fields of application where the need for lowering DPMs count is paramount, DFT

76

implements LOS because of the improved fault coverage. The extra timing constraints and validation provided by a fast toggling scan enable signal is as necessary as scan test itself. Table 6 Data and results comparison of 10 designs of various sizes [36]

A passing scan test on silicon doesn’t necessarily mean the test is capable of screening bad dies and allowing delivery of only good dies to a customer. This inability to detect all the faults on current small node technologies, is allowing defective units to reach the end customer and the number of defective units returned is unfortunately increasing. Scan test no longer appear to be capable of structurally and systematically testing the silicon as it once was. The current fault models (SA and at-speed (AS)) are no longer suitable for delivering high quality products. Although test coverage curve went up in the past years and new techniques have been deployed to recover test coverage losses (i.e. test point insertions TPI), it appears these models are not detecting sufficient faults on silicon. Having to scrap more parts (sometime worth hundreds of dollars each) is costing industries a lot of profit.

The solution to these problems has been to invest in developing new fault models; these had to be specifically created for each new process node and that is when Cell Aware (CA) fault models were introduced. The Cell Aware technique no longer bases its fault model on general behaviour of a cell but on its circuit; it combines layout extraction with an analogue fault simulation and synthesis (induction and capacitance) to create fault models for the library cells. After completing the characterization process, these CA fault models are used to generate high- quality test patterns.

77

Figure 4-2 Improving PPM (Source: Mentor Graphics) [11] [39]

Conventional SA and AS pattern-generation methods focus on detecting defects at gate boundary pins, although a significant population of defects may occur inside a cell or gate. Most of the internal manufacturing defects to cells could be detected with traditional fault models but smaller node technologies have introduced other defects that require new vector stimulus to be able to excite and observe their effects. The number of defects occurring within cells is quite significant, it could reach up to 50% of the total defects.

One of the first publications on results achieved using cell-aware testing method reported major reductions in defect density [11] [39]. Figure 4-2 shows a Mentor Graphics representation that reports an improved DPM number of 885; this is a very impressive number if considering that some products have a DPM goal number below 1000 as a customer requirement. One more observation is relevant here: new CA models have to be defined to detect both stuck at and transition faults, hence 2 different CA models per cell will be needed to fulfil the task. This is a whole new field of study for the industry and investments will continue to be made to generate higher quality fault models that can detect more faults and reduce DPM numbers [40] [11] [41].

ATPG basic setup and execution steps

As mentioned earlier, we will be using Mentor TestKompress to execute the ATPG part of the flow (see 1.7). It is common practice to have simple scripts to execute small series of command. As an example, the following Cshell script will load the Mentor Graphic license (MGLS file) and it will call the shell. This second command can be used in combination with multiple switches:

78

#!/bin/csh setenv MGLS_LICENSE_FILE 100@mentor_licence_server.com mentor_software/tessent/tool_version/2013/tessent -shell

Optional but really useful and time saving are –dofile and –log switches (i.e. -dofile fault_type.cfg -log log_name_`date +%d``date +%m``date +%y`_`date +%H``date +%M`.log). The dofile switch can call other scripts commonly used to setup environment variables needed for the ATPG execution. Use of the log switch is really important for debug purposes. Saving logs of any EDA execution runs will allow review of the status of the run and comparison of the results with previous sessions. Once the license is loaded, the tool will be enabled and it will enter “setup mode”. Table 7 Log of a post configuration script execution Copyright 2011-2017 Mentor Graphics Corporation

// Mentor Graphics software executing under XXXX

// Host: XXXXX (XXXXX MB RAM, XXXX MB Swap)

// Note: License will be released if tool idle for 15 minutes // command: set EDT_MODE "EDT"

// command: set FAULT_TYPE "STUCK"

// command: set LOW_POWER_ATPG "OFF"

// command: set SHIFT_FREQ "2M"

// command: set DATE [clock format [clock seconds] - format {%b_%d_%Y_%H_%M}] // command: dofile ../dofiles/run_sa_edt.dofile

Table 8 TestKompress log showing processing power set and utilized for pattern generation // Master with 8 threads running. // sub-command: report processors // hosts threads arch CPU(s) %idle free RAM process size // machine003 (master) 8 x86-64 8 x 3.3 GHz 86% XXX MB XXX MB // master with 8 threads running.

The –dofile switch gives the opportunity to set up a number of environment variables that can be used to control the execution flow of scripts. Dofiles are just lists of commands in tcl (tool command language) that are used to speed up execution and eliminate human error. When the

79

–dofile switch is used in conjunction to the Tessent shell call, the shell will enter setup mode and execute the commands listed in the file. Table 7 shows the commands executed; in this case the dofile is setting up 5 environment variables: EDT mode (EDT or Bypass); pattern type (stuck-at, at-speed, iddq, UDFM in the case of cell aware); low power setting (on/off); shift frequency variable useful for setting up clock frequencies; a DATE variable that may be used for file naming (reports, patterns sets, etc.). The last row is a call to a second dofile. Nested Dofiles can provide great flexibility.

For large designs, pattern generation is a very long process. To shorten the execution time, it is possible to break down the job and redistribute it to multiple machines (4 processor cores will require 1 license). An example of such distribution procedure is shown in Table 8. At this point Tessent needs to know what tool is required (i.e fast-scan, scan-diagnose, TestKompress, etc.) and this can be specified by using the set_context patterns command. The tool is now ready to receive design input files such as HDL netlist and start the ATPG environment setup.

Setting up variables for flow control is very useful. It is standard practice to have multiple files that set these variable at different values. The number of combinations of settings can be very large and an equivalent number of files could be created. A simple set could include the following:  Run_stuck_at_edt.cfg  Run_stuck_at_byp.cfg  Run_atspeed_at_edt.cfg  Run_atspeed_at_byp.cfg

Executing any of the above could get the ATPG set-up and running for stuck at or at-speed pattern generation as well as compressed or uncompressed patterns. These should be enough to cover the combination of the main variables (fault type and EDT on/off).

A basic directory structure for the ATPG will consist of a number of folders, some used for input files (dofiles, libs), others for tool generated files (logs, reports, results) and a run directory. The run directory will contain the configuration files, used to set up the variables and the run_dofiles used to load the licences and get the shell up and running. The “results” folder will be the destination of pattern set files which can be written out in different format. The most common formats are. WGL, .ASCII, .STIL2005. Saving out the pattern in one or another format will be dictated by what process is going to be used to convert the pattern in its final

80

format to suit the tester platform. The ‘results’ folder might contain subfolder to separate the different pattern types. It is good practice to save fault list and flattened models (matching the pattern sets in the same folders). Keeping everything under an intuitive and easy to understand directory structure can help a lot during the debug phase.

4.6.1 Standard cell libraries and fault count

Library cells are designed, manufactured, and tested by foundries. When a new cell library is released, it is made available to design teams for synthesis and ATPG for pattern generation. In some cases the foundry will only provide Verilog models for ATPG. If so an extra step will be required before getting into pattern generation. The EDA tool vendors can provide sub-tools to convert Verilog models into ATPG usable models. Libcomp is a Mentor Graphics tool that does this conversion. The process is highly automated: it takes as input the Verilog files and produces a large list of reports in addition to the converted library file ready to be loaded in MG Tessent. The flow can be described in few steps: 1. Step 1: Execute the Libcomp shell call (same process as for Tessent) followed by the library Verilog files name to be converted (the Libcomp manual has more information on optional switches available). This will open the Libcomp shell and automatically read in all the standard cells Verilog models. 2. Step 2: Execute the add model command with a specific cell name or with ‘-all’ switch to convert all the models at once. 3. Step 3: using ‘set system mode translation’ move the tool to conversion mode. 4. Step 4: Execute the ‘run’ command. At this point the tool will translate each model added and run a pattern generation session to make sure faults can be detected. If the model reports 0% coverage, corrections to the Verilog models may be required. 5. Step 5: The last step is to save the newly create ATPG library using the ‘write library ’ command.

81

Figure 4-3 a) Scan flop view from Mentor Graphic Tessent Visualizer software b) Internal structure of a scan flop. Source Mentor Graphic Visualizer

An example of the test coverage report on a converted standard cell model is shown in Figure 4-4. The cell is a 4 input OR gate. The fault count is 10 (4 inputs, 1 outputs, counted twice for stuck-at 0 and stuck-at 1). Losing coverage at this stage of the flow can be very costly, hence any model with less than 100% test coverage should be reviewed and the cause of the losses investigated.

Figure 4-4 Test coverage report or a 4 input OR gate from Libcomp Mentor Graphics

Default ATPG settings do not target faults within standard cell models but just at their boundaries. The set_internal_fault command specifies whether the tool should also target faults on internal nets. Fault count increases dramatically if such a setting is enabled and consequently the pattern set will also increase. To get an idea of the increase in fault number consider the scan flop model in Figure 4-3. The flop shows 4+1 input outputs (equivalent to 10 total faults) and Figure 4-3 b shows the internal components of the cell. The total number of faults in the cell is 28. This is nearly a 200% increase in fault counts for a scan flop model. The use of CA models will also increase fault count from the standard SA and AS models as seen in Figure 4-7. For CA there is no need to consider enabling internal faults as they already take into consideration physical models and internal composition of each cell.

82

Figure 4-5 Pattern increase CA versus SA for 10 different designs [42]

4.6.2 Setting up an ATPG environment

Building the ATPG environment is the main job for an ATPG engineer. Once the environment is created, the remaining tasks are about refining the configuration setting to match what is expected to happen on silicon. A basic set of dofiles to implement an ATPG configuration from the set-up phase to pattern generation is shown in Figure 4-6. The main dofile (all_config.do) will contain the back-bone of the flow and from it, other dofiles will be called based upon the configuration selected (i.e. SA, AS etc.). The following steps will describe the minimum requirement to run a successful ATPG session that can provide pattern sets for silicon and test- benches for GLS simulation.

Once the Tessent shell is up and running it will be required to:  Load the Verilog design file: this can be one or multiple files. In the latter case, it might be necessary to define the top design using set_current_design command.  Load in the standard cell library files (one or more). If models with equal name are contained in different files, the tool will use the latest model loaded.

83

Figure 4-6 ATPG directory structure

It is not uncommon to have missing instance definitions in a design. Such definitions could be Verilog from files not yet available to the engineer or a missing ATPG standard cell library model. When the tool analyse and compiles the Verilog files and ATPG models and encounters an undefined instance, it will stop execution, report an error and return the prompt to the user for corrective actions. It is not possible to flatten a model with undefined instances. Flattening is the process of compiling all the HDL design file(s) with the standard cells library models to obtain a single flat model of the entire design (in binary format). Specifying any missing instance definitions as black boxes (BB) will allow the script to proceed to the next step but only if they do not interfere with the scan architecture. It is the user’s responsibility to verify if the model was mistakenly not loaded or is not going to be available for the process. The tool will list all the undefined models before ‘erroring’ out. A –auto switch is available to let the tool automatically set any undefined model as BB but this is not advised. A new netlist delivery could introduce new missing models and the –auto will mask any erroneously missing definitions. A manual approach will provide the opportunity to verify the causes of each error. It is important to note that the tool will produce unknown (X) values out of the BB instances

84

as default but this may also be changed if appropriate (i.e. if the real model would produce a logic high at its output(s) the engineer will set the BB output to constant logic 1).  Over 120 reports can be run. They can be used to get information and data to verify or assist the initialization of the environment (i.e. report_primary_inputs). When starting to setup the ATPG environment, it is convenient to utilize the Visualizer (Tessent GUI interface) that can be started with the open_visualizer command. The Visualizer has multiple views, some are yet to be enabled and they will not be until the tool analyses the design and creates a flattened model for it. Its hierarchical view is available as soon as a Verilog design model and cell libraries are loaded into the tool and can be very useful to identify scan inputs and outputs and scan control signal ports. It also offers the option of copying pin/paths names (which are of great length in large hierarchical designs) that can be pasted into scripts for signal definitions eliminating human error.  At this point it is necessary to start declaring clocks, resets and scan control signals. Declaring a pin/port as a clock requires the use of the add_clock command. There can be various types of clocks: functional or scan, internal (to replace the effect of internal phase look loops PLL) or externally sourced (from primary input ports), synchronous and asynchronous. Resets are also considered to be clocks and they are defined in the same way. The add_clock command can provide a multitude of switches. The idle state (or off state) is a compulsory argument with possible values been logic 0 or 1 (add_clock 0 ). A clock is defined as asynchronous when the -period option with a numerical value and a time scale is used (i.e. –period 500ns). This option is often used for functional clock definition and they can replace internal or external PLL sources. It is common to use the functional clock generated by a tester platform during scan test. A mux cell with its select pin controlled by the scan mode signal will allow bypassing PLL driven signals during scan testing. This is preferred as it guarantees full control and extra monitoring for a very important type of signals. In fact, small differences in PLL manufacturing on silicon could generate a signal which is insufficiently stable and cause false scan test fails. During the silicon debug phase, the functional clock frequencies are varied to verify the actual working frequency capabilities of each unit; this would not be possible if the PLL signal was used. Also, PLLs are usually trimmed after the units are tested and this allows it to be tested at the unit’s maximum working frequency. The number of clocks in a large SOC can be very high; often a lot of them share the same proprieties and frequency. In such cases it is very convenient to define them as a group using a group

85

label (list the clock path names after the add_clock command and use the –pin_name switch for it); this will allow use of the group name to define their behaviour in a much faster way.  In order to simulate and force a specific waveform internally to the design, it is necessary to define the site (on a net or cell model pin) as a primary input (PI). The add_primary_input command is used with the -internal switch to disconnect the original drivers of the net; the added pseudo port becomes the only driver of the signal; multiple pins or nets that need to behave in the same way can also be grouped in a similar manner as described for the clock definition.  The add_input_constraints command is used to constraints primary input pins to specified values. Resets will be constrained to their off state to avoid the resetting of sequential elements (flops and latches) during a scan test procedure. Other examples are the setting of test data register (TDRs) to values required by the scan configuration (i.e. TDR bit controlling a low power controller of the EDT will be forced to logic 1). In large designs, defining TDR register setting can become a big task as their number can reach the thousands. On silicon, TDRs will be set to their correct value by a power-on sequence. In ATPG, it is possible to apply a power-on sequence but it is common practise to allow the use of internal forces because it allows speed up of execution time.  The set_edt_pins command is essential for the definition of scan control signals. It is a command that is used for both during the EDT module creation as well as to set up the ATPG environment for pattern generation. During the EDT logic creation, a dofile will be automatically generated and will contains the correct set_edt_options commands for the pattern generation phase. These definitions will need some adjustments if sequential elements are placed between the EDT blocks and the external input/output ports. EDT channel input/output pins are shared with the functional mode using multiplexers controlled by the scan_enable signal. There are a large number of arguments for the set_edt_pins command; the essential ones are: o Scan Input channels: used to specify input channels o Scan Output channels: used to specify output channels Both previous commands option will required a channel index during definition. o edt_pin: this literal is used for the definition of edt_clock pin, edt_reset pin, scan_enable pin, edt_bypass pin, edt_update pin, low_power_shift_enable pin for low-power decompressor.

86

 The Add_scan_chains command is used to define the name of pre-existing scan chain of the design. Each scan chain will reference to a scan chain group, (multiple EDT IP will refer to different scan groups). The scan groups are defined prior to the definition of the chains and will also point to a (scan) test procedure file which provides information on how the chains will operate during scan test.

4.6.3 Dofiles and main Mentor commands

Dofiles are used to contain large sequence of code usually implementing the same tasks (i.e. all scan chains definitions). Their use makes the main dofile more readable and allows the user to trace specific code faster. A short list of these files is given in Figure 4-6. The scan_tdr_setting.dofile will be made of the primary input cut on TDR output and to define the forces required by each of the configurations (EDT, bypass, low power, etc). The use of variables set at the start of an ATPG session (in the config files) will select the appropriate constraints for each TDR based on the configuration. The following code is an example of such use: if { $PAT_TYPE == "STUCK" } { add_input_constraints itdr_1_data_out_reg_2_/Q -CT0 //stuck-at } else { add_input_constraints itdr_1_data_out_reg_2_/Q -C1 //transition } Because these forces are internal to the design, it is required to first define the pseudo primary input and then apply the force. The compulsory ‘-internal’ switch will make sure the input created is not driven by the value in the register but the defined force. Here is an example of a primary input definition and the applied constraint to a logic 0: ‘add_primary_input scan_tdr_internal_wrapper_example_reg_1_4/Q –internal add_input_contraint scan_tdr_internal_wrapper_example_reg_1_4/Q C0’ As previously mentioned the forces on TDRs are used to replicate the outcome of a reset sequence and configure the design to scan mode.

A dofile for uncompressed (BYP_chains.dofile) and compress chains (EDT_chains.dofile) is also recommended. Is not unusual for the number of compressed chains to exceed 1000. Bypass chain numbers at the other end are much lower but it is still convenient to have them defined in a separate file. Compressed chains files will also contain the EDT module Test Access Port (TAP) settings. These setting are provided by the EDT module generation process and they cannot be modified. Inconsistency between setting and logic will lead to invalid patterns. They

87

include decompressor connections tap setting to each single internal chain, decoder setting and power controller setting and each internal chain will have one of the above assigned to it.

Chains are defined using the following command: add_scan_chains –internal chain_name chain_group input_end_point output_end_point The chain group is defined before the chain definition and it will point to the timing specs that are needed to run scan. These specs are defined in the test-procedure file. Bypass chains are defined in a similar way, the main difference being the use of primary ports (instead of internal as for compressed chains) hence there will be no need for the use of the –internal switch in their definition.

The fault sites contained within a BIST engine or the EDT block itself, as well as any other test infrastructure not visible to the end customer do not need to be tested for. To avoid spending resources (processing time and patterns) on these faults, they are declared ‘nofaults’ and all definitions collected in a nofaul.dofile. It is possible to no-fault a full block or multiple IPs with a simple command (i.e. add_nofaults ). The command can also use a wild card (*). If all testing logic is named in the same way, the wild card becomes really powerful as all fault sites within the test structure can be nofaulted with a very simple command (i.e. add_nofaults *bist*).

Other dofiles can be used for writing out reports as well as test-benches, fault lists and pattern sets. These later ones are usually broken down into manageable chunks for the tester platforms whose memory might not be big enough to load them all at once.

4.6.4 Test-procedure files

A test procedure is a special file that defines how the scan control signals as well as the EDT channel and chains (compressed or uncompressed) are going to behave during the scan test. There are to 2 main components in a test-procedure file: time-plates and procedures. Time-plates are used to define how signals will behave within an entire cycle. These are the basic rules to follow when building one:  Reset signals are forced to their off state; this is required in order to avoid resetting any sequential element (scan or not scan) during any scan procedure.

88

 Cycle waveforms will be defined for every existing clock (leading edge position, trailing edge position and duty cycle).  For other scan signals a case by case approach is required. This is a small example of a time-plate: timeplate my_time_plate = force_pi 0; measure_po 125; force scan_enable 1; pulse scan_clock 125 250; force rst 1; period 500; end;

The scale of the numeric values is defined at the top of the test-procedure file (i.e. set time scale 1 ns). This time-plate defines a period of 500ns, the (pi) primary inputs are forced at time 0 in the cycle (all the forced defined in the dofiles on all controlled signals have to be defined on primary or pseudo primary inputs); the reset is forced at logic high (to its off/idle state) and the clock has a balanced duty cycle (50% logic 0 and 50% logic 1) with a leading edge at 125ns, and trailing edge at 125+250 (375ns). The scan_enble signal is also forced to logic 1. This last statement suggests this time-plate could be used to build a shift procedure. There is no limit to the number of time-plates that can be defined. A single time-plate could be enough for the definition of all the scan procedures but complexity of today’s designs often requires multiple definitions.

Scan test can be broken down into few phases: scan setup, load-unload, shift and capture. The tool will make use of a procedure definition for each phase of the test. These will provide sequence of events that need to occur during the full test. In reality, the only procedures that need to be explicitly defined are the load-unload and shift procedure. The set-up procedure might be a requirement if it is necessary to flush out Xs from specific registers such as the ones within scan clock controllers (this will allow to have known values at their outputs). A reset procedure will most likely have a number of reset cycles (toggling or reset signals) followed by scan_clock pulses needed to propagate the signals through the logic.

The capture procedure is in most cases equivalent to just one scan clock cycle. EDA tools are capable of auto generating capture procedures based on the available time-plate definitions.

89

This is an example of a shift test-procedure; it indicates what time-plate definition is going to be used and the events that are going to happen when applied: procedure shift = timeplate global; cycle = force_sci; measure_sco; force scan_enable 1; force rst 1; pulse scan_clock; end; end; A Shift procedure requires the scan_enable to be held at its active state, the resets held at their off state and the scan clock will need to be pulsed. To be noted, these are still events occurring within one cycle. Procedures can call other procedures. A simple case of such implementation is shown in the following example: procedure Load_Unload = timeplate global; cycle = force rst 1; pulse fscan_clock; end; apply shift 1; if { $Condition=="XX"} { apply shift_last 1; } end;

This load procedure is structurally similar to the shift procedure seen earlier. The main difference is the call of a different procedure within it, in this case a call for the shift procedure which gets applied ones. In reality the EDA tools will calculate the number of shifts needed by the pattern vector to reach the end of the chain based upon the analysis of chains length (longest chain length used); it will apply as many shifts as needed replicating the ‘apply shift 1’ event. This Load-Unload procedure case is also calling a 3rd procedure (shift_last). For low power or instability issues during scan test, it might be required to apply one or more slower shifts at the end of the shifting phase. This is implemented in order to provide extra time to the power rail to recover from a high level of toggling (occurred during shift) before moving on to the capture cycle. This implies multiple shift procedure definitions using different time-plate (to define different clock waveforms) are necessary.

90

When a number of primary or pseudo primary inputs/outputs behave in similar way in a test- procedure file, they are grouped under an alias name and the alias is then used for time-plates and procedure definitions (i.e. alias rst = reset_1, reset_2, reset_3;).

Test case: ATPG environment setup for Dynamic_node of OpenPiton

A basic ATPG environment setup was implemented for the dynamic-node-top-wrap model from OpenPiton (open-source project), in order to be able to generate patterns, (test coverage) but most importantly to write out test-benches. The aim of this project is to be able to run power analysis; test-benches will be used to run GLS simulation; from it, it will be possible to dump VCD or VPD files (value change dump or value change dump plus) which will be used as input for the power simulation. Therefore, a scan power simulation starting point will be the ATPG environment. A directory structure such as the one in Figure 4-6 was created and populated. The main folders contained in the ATPG directory are Dofiles, Run, Libs, Logs, Reports, and Results. The log, the report and the results will be used to direct output files generated from the ATPG sessions. The Libs will contain the necessary ATPG standard cell library files. The Dofiles will contain the necessary commands to setup the environment and to run pattern generation. These will be the all_config.do and my_testproc; (the remaining files shown in the folder will not be populated but are intentionally left in the folder to give a better understanding of what is usually used for larger projects).

The following (in italics) is the content of the all_config.do file. Most of the commands have been already described in 4.6.3; extra details and observations are given after the script: 1. set_context patterns -scan -license fastscan //# netlists and libraries # 2. read_verilog ../libs/Verilog/dynamic_node_top_wrap.mapped.v 3. read_cell_library ../libs/ATPG_libs* 4. set_current_design dynamic_node_top_wrap 5. add_black_boxes -auto 6. //# clocks # 7. add_clock 0 /clk 8. //# resets # 9. add_clock 1 /reset_in

91

10. add_input_constraint reset_in -C1 11. //# # 12. //# cuts and constraints # 13. //add_primary_input -internal 14. add_input_constraint /test_se -C1 15. //# edt/scan signal pins # 16. set_edt_pins Scan_enable /yummyIn_E 17. set_edt_pins Clock /clk 18. //# dofiles # 19. //dofile ../dofiles/${EDT_MODE}_chains.dofile 20. add_scan_groups byp_chain ../dofiles/my_testproc 21. add_scan_chains -load_only chain1 byp_chain /yummyIn_W /yummyOut_W 22. //# tool setting # 23. set_power_metrics -shift on -capture on 24. set_tied_signals x 25. set_split_capture on //should handle C3 vios 26. set_pattern_type -sequential 2 -multiple_load off 27. set_abort_limit 30 28. set_xclock_handling x -pessimistic_simulation on 29. set_contention_check OFF //(E4 and E10 DRCs) 30. set_possible_credit 0 //50% is default 31. set_gate_report drc state 32. set_stability_check On -max_shift_cycles 2 33. set_parallel_load_subchains on 34. set transition holdpi on 35. //# drc error handling # 36. set_drc_handling D1 note //default error, does not allow to reach pattern generation 37. set_simulation_options -C6_mask_races ON //can be used to mask the signal creating the C6 violation and avoid having patterns containing such signals 38. set_drc_handling W35 error //(timing violation) 39. //*********************************************** 40. set_system_mode analysis //MOVING INTO ANALISYS 41. //*********************************************** 92

42. add_faults –all 43. set_fault_type stuck //transition //UDFM 44. if { ($PAT_TYPE == "TRANS") } { delete_faults -asynchronous_controls} 45. if { ($PAT_TYPE == "TRANS") } {read_sdc dynamic_node.sdc} 46. //set_fault_sampling 1 47. create_patterns 48. write_patterns ../results/STUCK/dynamic_node_stuck_at_uncompressed_$date.ascii 49. set_chain_test -sequence 0101 50. write_patterns chains_testbench0101.v -verilog -serial -pattern_sets chain -Noz -pad0 -mode_external 51. write_patterns scan_testbench_5_pat.v -verilog -serial -pattern_sets chain -Noz -pad0 -mode_external -beg 0 -end 4 52. Write_flat_model ../results/STUCK/flat_model_dynamic_node_stuck_at_byp_$date.zg

The following comments describe the effect of the commands at various lines: Line 1: the command is compulsory and used to specify what Tessent sub-tool is needed. Line 4: hierarchical view can be used to identify scan inputs and outs and scan control signals Line 7: use ‘-free -per 500000ps’ switch to the add_clock command to define free running asynchronous functional clocks Line 9: resets are defined as clocks and forced to their off-state during scan mode Line 11: place-holder for required setting to JTAG (Joint Test Action Group) control signals (not present in this case) Line 13: to create pseudo primary input pins and to apply the desired signal (in some cases to replace the events of a power-on sequence) Line 16: defining all the EDT and scan pin is a requirement Line 19: call to other dofiles (nested dofiles helps readability and flow) Line 19: the use of variables in a command can help with the execution flow; in this case the value of EDT_MODE would have selected different files containing different chains definition (compressed or uncompressed chains). Line 21: as only one uncompressed scan chains was synthetized, it was defined in the main dofile. Lines 23-37: a large number of tool setting are available in Tessent. Some are used to automatically handle timing violations, others to handle DRCs (design rule checks). They can be used to get the tool to behave in different ways for different scenarios. 93

Line 36: If the handling of DRC (design rule check) violations is set to error, when the tool identifies a violation of such DRC, it will stop the execution and return the prompt to the user. The majority of the DRCs can be momentarily waved or completely ignored especially during the set-up phase of the atpg environment. Line 38: in other cases is preferable to make sure the DRCs are handled as error because an earlier debug can provide more benefits. Line 40: been able to move into analysis mode is crucial. The tool will analyse the design with the current setting and if no issue are identified it will enter analysis mode; from this point on it is possible to go ahead and generate patterns. Line 42: before generating patterns the tool needs to know what faults to target. The norm is to add all faults. The tool will add all faults automatically if the create_pattern command is executed. It is possible to have pattern generation targeting just sub-modules, clock domains, power domains or others. Line 43: default fault type is SA; for any other fault type this command is compulsory Line 44: when running pattern generation for transition faults, the fault sites on asynchronous set/rest paths are not targeted. Line 45: used to load a collection of false path and multi-cycle path information from a Synopsys Design Constraint (SDC) file. These are determined as part of a static timing analysis before running ATPG. Line 46: during the setup and debug phase of the ATPG environment, it is convenient to target a low percentage of faults to verify if there are any issues with the chosen settings. The use of this command will drastically reduce the pattern generation phase as the targeted faults across the design is now only set to 1%. Other values could be used but is not advisable or necessary. Line 48: the use of meaningful pattern naming can be helpful. Multiple formats types are available to write out patterns. Line 49: the binary sequence of a chain test can be explicitly specified. This command has to be executed before writing out chain-test test-benches. The most common sequences are 0101 or 0011. For low power designs, the need to keep toggle rate low may requires sequences with lesser level changes. Line 50: test-benches are written out in Verilog, they can be of parallel or serial load. Parallel test-benches will load the chains in parallel. This is only a simulator advantage allowing reduction of GLS execution time. This command will output multiple files containing test- bench configuration and setup files as well as the chain test vector file. The following is a set of output files for a chain test test-bench: 94

chains_testbench0101.v chains_testbench0101.v.0.vec chains_testbench0101.v.cfg chains_testbench0101.v.po.name

Line 51: capture pattern test-benches are written out in a similar mode. It is also possible to specify which pattern to include in the test-bench (in this case only pattern 0 to 4). Line 52: saving out flat model of an ATPG session for a specific configuration is a good practice; reloading a flat model instead of rerunning the execution of all the dofiles in large designs could save hours of run time. A very basic example of test-procedure which has been applied to the test case is reported here: set time scale 1.000000 ns ; set strobe_window time 15000 ; //************************************************************************** //* TESTPROC * //************************************************************************** set time scale 1ps; alias fscan_clock = /clk; alias rst = /reset_in; alias sci = /yummyIn_W; alias sco = /yummyOut_W; alias scan_enable = /yummyIn_E; //test_se; //************************************************************************** //timeplate for 2Mhz if { $SHIFT_FREQ == "2M" } { timeplate global = force_pi 0; measure_po 125000; force scan_enable 1; pulse scan_clock 125000 250000; force rst 1; period 500000; end; } //************************************************************************** procedure shift = timeplate global; cycle = force_sci; measure_sco; force scan_enable 1; force rst 1; pulse scan_clock; end;

95

end; //************************************************************************** procedure Load_Unload = timeplate global; cycle = force rst 1; pulse scan_clock; end; apply shift 1; // if { $Condition=="XX"} { apply shift_last 1;} end; //************************************************************************** procedure test_setup = timeplate global; //1st cycle cycle = force sci 0; force rst 1; pulse scan_clock; //for edt logic end; //n cycle cycle = force sci 0; force rst 1; pulse scan_clock; end; end; //**************************************************************************

Saving output logs for each ATPG session can be helpful for debugging purposes. Comparison of logs can give a very quick way to identify what are the commands or settings that may have caused a test coverage drop or worse issues such as broken scan chains. To reduce disk space usage, most EDA tools will allow users to save outputs as well as re-load compressed files (logs, patterns, fault list etc.). For low power designs or to verify current ATPG power setting it is good practice to make use of the power metrics report per pattern. Table 9 is an output example of the report_power_metric command for the first 10 patterns of a stuck-at run. The (default) 50% shift switching threshold were the power setting used in this specific case. As there is no EDT IP synthesized in the test-case, (hence no low power shift controller available within the dynamic_node design), the EDA tool will directly utilize different algorithm to keep the various power metrics below the defined threshold. All patterns with metrics above the specified thresholds will be rejected but only if this is not going to cause a loss in test coverage.

96

The set_power_control command can be used to control switching threshold for shift or capture. The rejection threshold is set by the use of rejection_threshold_percentage switch. This value has to be higher than the switching threshold, i.e.: set_power_control shift on -switching_threshold_percentage 25 - rejection_threshold_percentage 30

Table 9 Power metric report per pattern (SA uncompressed chain) ANALYSIS> report_power_metrics -pat 0 1 2 3 4 5 6 7 8 9 // Pattern Shift Capture

// Load Response WSA State Element Tran.

// 0 49.49% 49.14% 32.04% (32.04%) 48.76% (48.76%)

// 1 49.05% 49.39% 31.64% (31.64%) 48.30% (48.30%)

// 2 49.30% 49.68% 30.40% (30.40%) 45.90% (45.90%)

// 3 50.98% 24.54% 30.31% (30.31%) 49.26% (49.26%)

// 4 48.64% 23.98% 29.32% (29.32%) 49.60% (49.60%)

// 5 48.70% 23.50% 30.43% (30.43%) 49.71% (49.71%)

// 6 49.92% 25.97% 29.26% (29.26%) 47.61% (47.61%)

// 7 49.77% 25.49% 29.45% (29.45%) 48.34% (48.34%)

// 8 49.35% 23.98% 30.44% (30.44%) 49.41% (49.41%)

// 9 48.66% 49.07% 30.76% (30.76%) 45.36% (45.36%)

It is recommended to set the rejection threshold to be 5% more than the switching threshold. If the rejection margin is too strict the tool will simulate hundreds of patterns but it might only be able to extract only few that adheres to the constraints. In some extreme cases it may reject all the patterns and output a warning such as the following: Warning: Discarding 64 patterns that violate the shift power constraint

The TestKompress default setting is to simulate groups of 64 patterns at each time, hence in this case all 64 patterns have been rejected due to power setting violations. As a result, power constraints might be the cause of much lower test coverage. An assessment of the power setting may be required and adjusted in order to recover and reach an acceptable final test coverage value. A comparison of the statistics report for coverage between patterns generated with and

97

without the rejection threshold switch can be observed in Figure 4-7 and Figure 4-9 The first thing to notice is a short power metrics report at the bottom of each figure. In the case of the rejection been OFF (Figure 4-9), the report shows an average switching hovering around the set shift threshold with maximum values going almost 10 point percentage over the setting. This is significant as it shows how difficult it is for the tool to generate effective patterns that also comply with the power setting.

Figure 4-7 Stats reports for pattern set generated with power controller setting: shift on -switching_threshold_percentage 25 -rejection_threshold_percentage 30

When rejection is set to 30 (Figure 4-7) the maximum switching is kept below the specified 30% mark. The downside is that coverage is slightly lower (94.78% vs 94.98%) and the pattern count to reach such a coverage is now increased by over 10% (501 vs 570). For this test-case, both sessions with power setting failed to reach the maximum achievable test coverage that could have been obtained with no power settings in place (95.53%). This is a loss that could be quite significant in terms of DPMs and may not be affordable for the product.

98

Figure 4-8 Scan control signals set on Dynamic_node module (scan-clock (clk)) in green, scan-enabled (SE) in orange, scan-in data (SI) in red) source Mentor Graphics Visualizer

When the test-case is synthesized with 1 uncompressed chain, transition patterns cannot be generated. These types of pattern do require to setups the logic with a first vector, then uses 2 clock pulses to cause a transition (0 to 1 or 1 to 0) and capture the response across flops that are usually part of a different chain.

Figure 4-9 Stats reports for pattern set generated with power controller setting: shift on -switching_threshold_percentage 25 -rejection_threshold_percentage OFF

In the case of a design synthetized with multiple chains, generating transition coverage is not a problem. Figure 4-10 shows a cell tracing report of a design containing 5 unbalanced scan chains while Figure 4-11 shows the case of 5 balances chains.

99

Making sure chains are perfectly balanced does not guarantee the ATPG tools will be able to generate a higher test coverage figure. Balancing chains is only important for test time. In the case of one or more chains of longer length than the remaining chains, the ATPG tool will be forced to pad the pattern for the shorter chains with Xs (don’t care values). Having the patterns of equal length is a requirement. In fact, all the scan flops will be pulsed by the same scan clock and all the chains (independently of their length) have to be synchronized with regard to the various scan procedures (they all have to be in shift mode or all in capture mode). For chain6 of Figure 4-10, the patterns generated will contain 309 extra cycle of no value for the test and there is a potential test time loss between the balanced and unbalanced configuration of 62 cycles per pattern (458-396). To note, Chain5 is deliberately missing from the list of chains; chain name can be defined as anything as long as it’s different from any other chain definition.

Figure 4-10 Cell tracing report of unbalanced chains design

Figure 4-11 Cell tracing report of balanced chains design

Test coverage achieved for the Dynamic-node design using the unbalanced chains was almost 1% higher than for the balanced configuration. This is not always the case. In fact, there is no relationship between the chain balancing and coverage. Synthesis can control and link scan flops in a very specific way and finding the right configuration is not always straight forward. Now that there are multiple chains available, it is possible to generated transition fault patterns. The transition patterns can be generated using the two methods described earlier methods: LOS (launch off shift) and LOC (launch off capture).

In the case of launch of capture, the ATPG tool has to provide a pattern that can setup a fault site to a specific value indirectly after going through some combinational logic (with the first clock pulse). In some cases this is hard or impossible to achieve. When the tool can’t setup the 100

fault site it will not be able to test for it and therefore there will be a loss of coverage. In the case of launch from shift, the tool can directly control and set the fault site and as a consequence LOS ATPG is capable of reaching higher coverage. The downside of LOS is greater power requirements. Table 10 shows a comparison of LOS and LOC test coverage reports from patterns generated and executed on the Dynamic-node test case; the LOS shows a much higher test coverage (over 61%) compared to LOC (below 50%); this is a very significant gain (over 10%) in test coverage with about a 20% extra patterns. Unfortunately, there are 2 impediments in applying LOS in every design: 1. The design has to be capable of switching the scan-enable signal between launch and capture in a much smaller time requiring a more stable signal tree, as discussed in Chapter 2. 2. If the design is targeting low power market, accommodating LOS patterns might not to be possible.

Table 10 Statistic reports comparison between LOS and LOC transition test coverage LOC statistic report for a transition ATPG run on LOS statistic report for a transition ATPG run on Danimic_node design Danimic_node design

In this test case, the WSA value (weighted switching activity) increased from 28.46% peak max for LOC, to a 37.24% peak max value for LOS, while the state element transitions went from a 46.82% for LOC (value already high) to a 52.31% peak max for LOS. In power- critical designs these values will cause a lot of issues during scan test.

101

In some cases, if scan enable is capable of toggling as fast as functional frequency (point 1 above), it could be possible to use an hybrid approach to achieve the highest possible coverage by first running pattern generation making use of the LOC (with lower power requirements), then by only targeting the undetected faults from the first run, moving to the LOS method and rerunning the pattern generation. As the number of faults to be targeted with LOS is now significantly reduced, the tool should be able to keep switching activity much lower and deliver the extra coverage needed.

Fault reports and fault grading

The graph in Figure 4-12 shows how easily an ATPG tool can generate patterns to detect a large percentage of faults with minimal effort. Looking at the stuck-at type (curve in red), the tool can reach around 90% coverage with 20% of the total pattern count and it will require the remaining 80% of the patterns to be able to detect the remaining (testable) faults. From an analysis of the statistics report, it is possible to identify a number of fault classes. The main classes shown in the report of Table 10, Figure 4-7 and Figure 4-8 are: DI (detected by implication), DS (detected by simulation), UO (unobserved) and AU (ATPG untestable). A full list of classes and subclasses used by Mentor Graphics is shown in Figure 4-13.

100.00%

90.00%

80.00%

70.00%

60.00%

50.00% transition faults

EST COVERAGE EST 40.00% stuckat faults T

30.00%

20.00%

10.00%

0.00% 0 2000 4000 6000 8000 1000012000140001600018000 PATTERN COUNT

Figure 4-12 Typical test-coverage Vs pattern count curves

102

These statistics reports (based on stuck-at pattern generation on the Dynamic-node test case) also show the breakdown of the test coverage per class. The faults labelled DI are faults located on scan chains: as soon as the ATPG tool is able to trace the chains, all the faults located on any instance of the scan chain paths are marked as detected. The tool would not be able to trace the chains if any issue was present on any of these faults sites. If the tool comes across any issue, it will remain in setup-up mode until corrective action is taken (pattern generation will not be enabled). The DS faults are faults detected by the generated test patterns. The faults still to be detected are UO faults (unobserved) named as such because the effects of these faults cannot be propagated to an observable point with the current pattern set. The remaining faults are the AU faults; these are divided in multiple subclasses and they will not be detected unless changes to the design or setting are put in place. The most important subclasses of AU faults are:

 PC (pin constraints), forces on primary (or pseudo-primary) input will constraint the tool from applying the desired logic level to detect these faults;  TC (tied cell) is very similar to PC but it applies to sequential elements; any number of pulses sent to the sequential element will not change its driving value, hence the ATPG tool is blocked from been able to propagate any other value capable of detecting these faults.  SEQ: sequential depth. These are faults asSOCiated with non-scan cells that require multiple clock cycles to propagate the signal to an observation point. Only scan design changes can solve the problem of non-detection of such faults. In this test case, there are many SEQ types as the netlist does not include a clock controller capable of delivering multiple pulses for signal propagation during capture.  BB black-boxes (not present in this test case but very important) These are faults that are untestable due to a black box; faults in this class will include those fault sites whose signals need to be propagated through a black box in order to reach an observation point; the BB fault class will also include faults whose control or observation requires values from the output(s) of a black box. In most cases black-boxes are missing ATPG models such as memories or analogue modules which are available for RTL but not for ATPG. As too many resources would be required for a conversion process, some of them will have to remain as BB for ATPG if they do not cause major test coverage losses.

103

After this brief description of fault report and class types, we now consider the meaning and use of Fault grading.

Fault grading is a way of reducing test time by eliminating most of the stuck-at patterns in the set. Silicon is tested using scan chain test patterns, stuck-at patterns and transition pattern in that order. As test time is very costly, the chain test pattern is usually dropped and only used as a helping tool for debugging failing stuck-at and transition patterns. It is now also common to eliminate a large portion of stuck-at patterns and to do so, it is necessary to use fault grading in ATPG. The idea here is to use the same transition pattern to detect transition and stuck-at faults, then generate just stuck-at fault on those fault sites not detected by the transition patterns set. The first step is to generate transition patterns for transition fault type. The transition pattern set is then saved. Now the tools will be forced to change fault type to stuck-at while remaining in transition fault configuration. When the fault type change is made, the tool will automatically delete all the transition faults and the pattern set generated (hence the need to save the pattern set before moving to a different fault type). At this stage, it is necessary to add in all the (SA) faults, reload the transition pattern and re-simulate against stuck-at faults. Now it will be possible to save the list of detected stuck-at fault by the transition pattern set.

The final phase is to start an ATPG session to set the design into a scan stuck-at configuration mode and re-load the saved stuck-at detected fault list from the re-simulation of transition patterns. A statistical report will show a level of coverage very close to the transition coverage obtained during the transition pattern generation. The final step is to run SA pattern generation which is going to be targeting a much lower number of faults.

The advantage of fault grading is obvious; there is no reason to re-run pattern generation on the same faults detected by the transition patterns; the stuck-at configuration will only be required to target a much lower number of faults with a significant reduction of number of patterns for the stuck-at pattern set and a consequence reduction of total test time.

104

Figure 4-13 Complete faults classification used by Mentor Graphics

How to use ATPG tools to support GLS and silicon fails debug As previously seen, chain test, stuck-at patterns and transition patterns are generated and used for silicon testing. Whenever possible all the 3 sets of patterns will be validated using GLS to verify that design as well as scan settings will not cause issues on silicon test. Often, especially for large designs, the pattern sets can reach very high counts and it will be practically impossible to process all of them as the resulting GLS run time can go on for days. A sub-set (sample) of patterns will be used for stuck-at and at-speed content while the chain patterns sets are fully simulated. These are generally smaller and they are very important for checking the sanity of the chains and the scan structure.

4.9.1 One-Hot patterns

When dealing with compressed patterns, the number of external channels used for scan inputs and outputs is very low. Once the pattern reaches the decompressor, it gets expanded and

105

distributed to the internal chains. The number of internal chains per channel depends on the compression ratio.

When running GLS for a set of compressed patterns, the VCS tool will report the failing pattern number and the sequence of bits expected and simulated, i.e.

Mismatch on chain: EDT_group_1_channel_4 cell: 15 2500ps: S 2500ps: Simulated zz0111111xxx0101010100x0000100x01x10100 pattern 6 cycle 3133 2500ps: Expected xxxxxxxxxxxx0111010100x00xxxxxxxxxxxxxx pattern 6 cycle 3133

From the fail report, it is therefore possible to identify the failing pattern number, the failing cycle and the channel output. Unfortunately, the channel will only narrow down the search for the failing flop; it will not pin point it because per each channel there will be a certain number of internal chains and any of the scan flops positioned in each chain at the failing cycle can potentially be the failing flop (cell 15 in the example). At this point, it is possible to proceed through the VCS tool and tediously search for the failing instance or make use of the ATPG tool to expand the failing pattern, regenerate the test-bench for the new expanded set, rerun simulation and pin point the failing flops from it. The procedure is identical if the fail occurs on silicon. The expansion of patterns in Mentor Test-Kompress has been named One-Hot and the procedure to obtain them is the following:  Open a Tessent shell  Load the flat model that reflects the configuration used to generate the failing patterns  Load the set of patterns using the set_pattern_source  Use the set_pattern_filtering -list < list of failing pattern> to identify the patterns to be expanded  The expansion is executed with the expand_compressed_patterns command. A –map will create a file that maps the newly expanded patterns set to the internal chains been targeted by each pattern. At this point the tool will verify the patterns filtered and it will output a small report: (i.e. compression ratio 50) // Original pattern number 0 is expanded to 50 1hot patterns // Total 50 1hot patterns are created.  Then it will simulate and store the pattern in memory  The last task is to save the patterns and test-benches with the write_pattern command.

106

Once obtained the One-hot test-bench, GLS can be executed again and the new fail log together with the mapping file (outputted by the One-hot process) will allow the user to pin-point the failing scan cell.

If the failure was caused by design, the process will have to go all the way back to RTL or synthesis, then back to ATPG, GLS and so on. If the issue is an ATPG setting, it is just a matter of fixing it and regenerating the pattern sets. An ATPG modelling issue (i.e. standard cell library, PAD models, netlist) that cannot be fixed, is usually masked as it does not reflect reality and the GLS fail is classified as false fail. Masking is the process of changing the expected value in the pattern from a possible logic 0 or 1 to an ‘observe unknown’ value which can be indicated by the observe X symbol (OX); the GLS tool or the tester platforms will skip the execution of any comparison when encountering an OX within the expected value pattern.

To mask a fail it is possible to implement a cycle mask or a cell mask. A cycle mask will only mask a specific cycle in a specific pattern. A cell mask will mask the same cycle matching that particular scan cell position within a chain in all the patterns. The use of one or the other masking method will be suggested by the analysis of the GLS (or Silicon) fail log. If the same cycle is failing for each pattern, it will be obvious that somewhere in a chain, a scan flop is failing repeatedly; this case will require a cell mask. If the fail is a single one in a pattern set, it is probably due to some faults path pattern and it shouldn’t concern the functional mode of operation. In some extreme cases the masking of a full scan channel by applying a mask to the external Pad might be required. It is also possible to mask an internal scan chain if required. i.e. cell mask: add_cell_constraint reg_14_/Q TX (this ties the output of the register to unknown) i.e. Pad mask: add_input_constraint /PAD[15] –CX (this ties the scan pad to unknown)

If the chosen option is a cycle mask, it requires the creation of a mask file. Once created, it is simply a matter of loading the failing pattern into Test-Kompress together with the mask file and the tool will substitute the failing cycle values (0, 1) to the OX at the exact cycles specified by the mask. This is a very simple file with no header or footer, composed of a list of rows composed of pattern number followed by scan channel input and the cycle number, i.e. 0 edt_channel6 4 0 edt_channel6 5

107

0 edt_channel6 14

The above lines will mask cycle 4, 5 and 14 of channel 6 for pattern 0. A similar approach can be used for uncompressed patterns. To obtain the new pattern set with masked cycles, it is only required to run a read_patterns command with the mask file switch, followed by a write_pattern command: (i.e) read_patterns -mask_file -preserve_masked_observe_points simulate_patterns (optional) write_patterns

If the pattern set been masked was already masked by a previous iteration of the process, it will be necessary to use the -preserve_masked_observe_points to preserve previous masked cycles. The simulate_pattern command is not a requirement but it is useful for quantifying the loss of coverage caused by the masking. This is usually done when the count of the cycles masked is high.

4.9.2 ATPG diagnosis tool

As with EDA tools, there are a large number of companies providing automated test equipment (ATE). The choice of a tester platform is based upon the design functionalities and specifications (i.e. wireless functionality will require RF testing capabilities, high frequency will require higher end and more up to date machines, pin count will require much larger machines). Large designs will most likely have a very large number of input/output ports and therefore they will require a large number of tester channels. All the ports of a package will have to be connected to be tested, independently of whether they are used for scan configuration or not. Scan is the main test used for validating silicon, but a large number of other tests are also used. The tester platform (and its basic specs) will be chosen at a very early stage in the design life cycle. Scan test will account for most of the digital content needed to test a design and in most cases, it will also account for the majority of total tester memory needs for the completion of a full chip test.

Tester software is custom built and they vary a lot. As a consequence, fail logs will vary in format and amount of data reported.

108

Scan diagnose is a Tessent sub-tool that can provide invaluable information for silicon debug. Both a scan diagnostics input file and a masking input file as discussed in section 4.9.1, are generated from the data provided by tester fail log. For data collection, the log files usually come in a compressed format to minimize storage space requirements, while for debug purposes they come in human readable format to allow a visual check as well as the use of some script that can implement string manipulating for parsing and sorting the data contained in the log. A typical log will contain information such as the pattern file name that caused the fails, the number of fails, the failing cycle, the failing port/pin, the expected valued and the received value (here is a log example): //Pattern file name: test_chip_scan_edt_stuckat_date_rev_1 //Fail count: 7 767 TEST_CHIP_4 H L 771 TEST_CHIP_4 L H 772 TEST_CHIP_4 H L 1867 TEST_CHIP_5 L H 1879 TEST_CHIP_5 L H 2573 TEST_CHIP_5 L H 2700 TEST_CHIP_4 H L failures_buffer_limit_reached none failures_end total_cycles 3417

As mentioned earlier, the use of scripts will automate the translation of some fail log into a mask file (shown in section 4.9.1) or parse it into a Mentor ATPG fail format that can be used for executing Mentor ATPG scan diagnose as shown in the following example: format pattern scan_test failures_begin 8 edt_channel2 35 H L 36 edt_channel2 35 H L 44 edt_channel2 42 L H . . 52 edt_channel2 35 H L 53 edt_channel2 42 H L failures_end last_pattern_applied 59 failure_file_end

This file is composed of a header and footer and the body is made of a list of fails indicating the pattern number, the edt/byp channel, the failing cell/cycle, the expected and observed value for the specific patterns/cycle/channel.

109

Scan diagnose is usually executed for silicon fails, not GLS fails. This is because the Mentor diagnose tool doesn’t just pin-point the flop that captured the fail, but it might also be able to give the most likely instance or instances that caused it (i.e. simple, logic gate, mux, etc.) To run ATPG scan diagnose with Mentor tools it is required to execute the following essential steps:  Once in Tessent shell, enter the scan diagnose mode using the set_context_patterns - scan_diagnos  Reload the flat-model from the configuration used to generate the failing patterns (read_flat_model )  At this point the mode is set to analysis (set_system_mode analysi)  The next step is to load the pattern set that caused the fails using the set_pattern_source - external -store all.  A series of diagnose setting can be added by the set_diagnosis_options. This command has a long list of switches available. The -mode auto switch enables Tessent to determine what type of diagnosis to run based on the contents of the failure file (chain diagnosis or scan diagnosis will be performed). Another important switch is the -Verify_patterns ON/OFF which enables verification of the external patterns.  The read_failure command checks the specified failure file for correct syntax and semantics and the switch –patterns is used to displays or write to an output file with the failure file in a pattern-based format instead of the cycle-based provided by the failure input file loaded.  The diagnose_failures command performs a diagnosis of the failing scan and chain test patterns in the specified failure file provided. The output can be displayed or piped to a file. If verification is enabled (with the set_diagnose_option command), the first diagnose_failures command entered will simulate the external patterns and compare the capture values with the expected values in the external patterns. When a mismatch occurs, verification stops and an error is issued.  If the diagnose report does not contain enough information to locate the faulty site, the tool provides the option to create_diagnosis_patterns. They will then be processed and executed on silicon and the fails from this new set should be able to narrow down the faulty site when scan diagnose is re-run on the new fail log. Sometimes diagnose pattern count can be high, therefore is not unusual to set a limit to the number of diagnose patterns that the tool will generate.

110

Silicon failure analysis debug

The ATPG diagnose tool can be very useful to narrow down the search of design flaws or physical fabrication defects. The data collected using diagnose, while not vital for design adjustments (other tools and techniques may filter out these issues), are becoming essential for the execution of fault analysis (FA) on silicon components. With the use of data from scan diagnose tools and the physical layout plan of the design, engineers can identify a specific portion of silicon at which to point the FA tools.

Integrated circuit failure analysis is a post manufacturing process that provides information necessary for technology adjustments and advancement but most importantly for corrective action to improve quality and reliability on the current manufacturing process cycle and products. Performing an FA process includes the following steps:  Verify the presence of a failure. This is essential for yield improvement. False fails in production testing can select good IC’s as bad and it will result in the incorrect removal of properly operating units.  Characterize the symptoms of the failure. The failure symptoms cannot be described by a scan test result or any other test that identified the failure in the first place. Once completed, the characterization will allow an analyst to utilize databases to fully understand the failure and confirm that the observed symptoms did in fact cause it. The symptoms will suggest one or more failure mechanisms but usually only one will be the real cause of it. Analysing the wrong failure mechanism may lead to fabrication process changes that may cause more failures than the pre-adjusted process.  Establish the root cause of the failure. This is the most time costly task and involves the site localization of the failure as well as the failure mechanism and the stresses on the physical structure.  Document and put forward corrective action. Corrective action for best implementation will require the input of multiple experts from design, process, and FA engineers.  Document the results of the FA. Documentation will allow keeping track of occurring issues on the current fabrication process and will be invaluable for future FA work. Many techniques and tools are used to execute FA and can be destructive or non-destructive. They can be divided into 5 main categories:

111

1. Optical Microscopy: Optical microscopy is a method for direct, non-destructive examination of topographical features of the IC components. These techniques can locate physical defects that affect reliability and marginality issues. They are very common, low cost and ease to apply. 2. Infrared microscopy: make use of the transparent quality of the semiconductor materials to infrared light to capture the images of structures underneath the IC surfaces. This technique is very similar to microscopy in the visible region. The infrared microscopy has a maximum resolution which is usually one to two times the illuminating wavelength. 3. Physical techniques are destructive techniques. Although many FA techniques are non- destructive, it most cases it is necessary to perform destructive techniques to root cause the failure. Before proceeding to one of these techniques, it is important to verify that no other non-destructive techniques can achieve the goal. Applying these techniques requires caution because their effects cannot be undone. Probing, cutting, mechanical, laser, or focused ion beam isolation can be used in conjunction with mechanical or electron beam probing to isolate and identify the cause of the failure. 4. Thermal Imaging Techniques: these are non-destructive techniques that could be required for defect localization when there are no other detectable symptoms such as light emission from the failing site. 5. Electron Beam Techniques: the scanning electron microscope has a main position in the collection of tools used by FA. They provide high-resolution imaging, imaging through opaque layers as well as images of the electrical activity within the silicon.

FA is used because without it would be impossible to guarantee the reliability and performance of silicon products if the occurring causes of failure and yield loss are not understood. Although FA is a very costly side of the business (setting up a lab will cost tens of millions of dollars), investment in failure analysis equipment ultimately results in reduced cost and improved quality. Most importantly, it is the primary tool to shorten the development cycle of new technology nodes, being the principal sources of failure identification and manufacturing defect and allow faster changes in the manufacturing processes. [43]

112

Conclusion This chapter has explained how to use Mentor Graphics’ Tessent ATPG tool, by applying it to the OpenPiton design. The chapter has presented the steps to be followed from initial steps right through to application of the tool in Failure Analysis. The steps were shown in ‘how-to- use’ form, explaining the reasons given for each setting in some detail. Presenting the application of the tool in this way is the first step in achieving the aim of the thesis: creation of an open methodology that can allow scan insertion in an open-source design to enable scan test power analysis. The following chapter discusses the next step in the process, gate level simulation leading to power analysis.

113

114

5 Simulation from GLS to power analysis Introduction Gate-level simulation is an essential part of the product design life cycle in today’s silicon industry. It is used both to boost the confidence in the design implementation and to allow verification of functionality and test components. Scan and other test structures can compromise functional operation of a design; simulation provides the means to guarantee that the introduced test design features are not going to break the operational features requested by the end customer.

In this chapter, a generic and high-level description of HDL simulation software is provided, followed by a description for the needs of GLS. It also contains the setup and execution of a GLS flow for scan patterns generated as described in section 4.6. Identification and debug of timing issues is a main goal for GLS execution and this chapter dedicates a large section to its description. Description of the execution of GLS for a case study is also included before moving on to how to use an HDL simulation tool to generate output files that can be utilized with power analysis tools.

HDL simulation software

Hardware description language simulation software has improved dramatically since its first appearance in the mid ’80, and today many vendors supply simulation tools. They can be installed on simple desktops devices for personal use or on server farms (enterprise-level simulators) to execute jobs on very large designs. The three major suppliers of simulation software include Cadence with Incisive Enterprise Simulator, Mentor with Model-Sim and Synopsys with VCS. The prices are not reported publicly but they range between $25,000- $100,000 USD per license per calendar year.

Table 11 reports a short description of these 3 main simulators.

115

Table 11 The three most used Commercial HDL simulators [44] Simulator Name Description Author/Company ModelSim and Questa The original Modeltech (VHDL) simulator was the first mixed- by Mentor Graphics language simulator capable of simulating VHDL and Verilog design modules together. In 2005 Mentor introduced Questa to

provide high performance Verilog and System-Verilog simulation and expand verification capabilities to more advanced methodologies such as Assertion Based Verification and Functional Coverage. Today Questa is the leading high- performance System-Verilog and Mixed simulator. ModelSim is still the leading simulator for FPGA design. VCS by Synopsys Originally developed by John Sanguinetti, Peter Eichenberger and Michael McNamara under the start-up company Chronologic Simulation, VCS (Verilog Compiled code Simulator) was purchased by Synopsys, where development continued. Cadence Design Cadence initially acquired Gateway Design, thereby acquiring Systems Incisive Verilog-XL. In response to competition from faster simulators, Enterprise Simulator Cadence developed its own compiled-language simulator, NC- Verilog. The modern version of the NCsim family, called Incisive Enterprise Simulator, includes Verilog, VHDL, and System- Verilog support. It also provides support for fast System C simulation kernel.

A number of open-source simulators are also available but they do not offer as many features and speed of execution as the commercially available counterparts. An extensive list of open- source simulators that could be useful for research fellow and students is show in Table 12

116

Table 12 Open-source simulators Simulator Name Description

Author/Company Icarus Verilog Stephen Also known as Iverilog. Good support for Verilog 2005, Williams including generate statements and constant functions LIFTING A. Bosio, G. Di LIFTING (LIRMM Fault Simulator) is an open-source Natale simulator able to perform both logic and fault simulation for single/multiple stuck-at faults and single event upset (SEU) on digital circuits described in Verilog. TkGate Jeffery P. Hansen Event driven digital circuit editor and simulator with tcl/tk GUI based on Verilog. Includes Verilog simulator Verga. homepage latest versions This is a very high speed open-source simulator that compiles Verilator Veripool synthesizable Verilog to C++/SystemC. Supports functions, tasks and module instantiation. Still lacks a lot of features, but this release has enough for a VLSI student Verilog Behavioral to use and learn Verilog. Supports only behavioral constructs Simulator (VBS) Lay H. of Verilog and minimal simulation constructs such as 'initial' Tho and Jimen Ching statements. VeriWell Elliot Mednick This simulator used to be commercial, but has recently become GPL open-source. Compliance with 1364 is not well documented. It is not fully compliant with IEEE 1364-1995.

For this work, the HDL simulator chosen is Synopsys VCS. The choice is justified by the fact that VCS is one of the leading and industry standard tool for gate level simulation as well as being available under the Euro-practice scheme in a large number of 3rd level education institutes [45].

Need for GLS execution in the design life cycle

GLS is one of the jobs in a design life cycle that requires a lot of computing power and execution time. Therefore, EDA suppliers have developed ways to move some aspects of verification that were part of this process to an earlier design stage. For example, the Logical

117

Equivalence Checking (or LEC) is a logic synthesis tool that “compares” that the synthesized netlist is logically/functionally equivalent to the RTL source code. Another example is Static Timing Analysis (or STA) which is now executed as part of the synthesis step as opposed to the more traditional way of utilizing GLS for it.

GLS simulates the compiled netlist which contains timing data. Because of the use of compiled code, the source can be of a variety of languages. The downside of GLS is simulation run time which is very big in comparison to the execution of an RTL simulation. This later process can run directly on RTL code without the need for compiling but lacks timing information and only supports HDL languages. GLS run time depends directly on the number of scan patterns being simulated. The number of patterns is destined to become larger and larger due to fault model changes (from more traditional SA and AS models to UDFM such as CA), the use of low power settings that literally expand the pattern set to spread out the total toggles count found in a standard set (constraining a number of EDT scan chains to reduce toggle rate) and the increase in size and complexity of designs (consequently increase in fault count).

It is advisable to verify all the scan pattern sets in their entirety using GLS. This would almost guarantee the validity of the design as well as the test architecture and the validity of scan signals generated during ATPG for the silicon test. The pattern sets of today’s SOC design may require weeks of simulation time when using GLS. Such execution times would put a design on hold for several weeks before it could be sent to the fabrication phase. If we also consider that the new technology nodes are extending fabrication time, a silicon manufacturing company would be waiting months before releasing a finished product. To overcome these issues, it is now normal practice to run GLS on samples of the full scan pattern sets only, before sending the design to fabrication. The only exception is the chain test patterns as they are limited in number and they guarantee the integrity and traceability of the scan chains. The remaining digital content destined for the production test will also be processed through GLS while the design is already going through fabrication. Overlapping the 2 phases (validation and fabrication) can help reducing the length of a product design/fabrication cycle, while still providing all the data needed to highlight potential issues that an engineer may come across during the debugging of test programs, as well as providing information to select the best corrective action to allow scan test implementation and execution on current and future products.

118

Thus, GLS has a big role in validating scan patterns. Execution runtime is very much related to the amount of data simulated. If we consider a test case of an ATPG run that produced 10K patterns (for an exhaustive run) with a corresponding test coverage of 90% (percentages commonly reached for SA and AS), simulating just the first 100 patterns (approximately equivalent to over 50% of design SA test coverage) will take 100 times less time than simulating the entire 10K patterns. The decision on running the full set of patterns will be obviously based on machines’ availabilities, licences’ availability (very costly and companies may be reluctant to increase their number just to process extra patterns) and time availability. Not simulating the full digital content becomes a calculated risk as even “GLSing” a minimum percentage of it will allow certainties on the structural sanity of the test architecture, clock trees and power domains structures. It is to be taken into account that the full flow (synthesis, scan insertion, ATPG and simulation) would have been executed multiple times to eliminate functional and DFT bugs at this stage of a design flow, therefore the chance of finding more bugs while silicon is fabricated (although not impossible) are much slimmer.

There are a number of advantages and reasons for using GLS some of which are discussed here:  To verify DFT structures not included in RTL and added during synthesis. Scan chains are often inserted after the gate-level netlist has been created. GLS is often used to determine the integrity of scan chains which is paramount as a device that cannot be tested cannot be sold.  To verify low-power structures, absent in RTL and added during synthesis. The insertion of these structures is now very common because the majority of the products designed today target portable devices which are battery powered and as a consequence they must be able to operate at a very low voltage in order to extend battery life.  To verify no bugs are present in multi-cycle paths. Timing verification on multi- cycle/asynchronous paths can only be validated with GLS. It makes up for the limitations of static timing analysis executed post synthesis which can't identify asynchronous paths and is currently the only check point for it. Design paths that are critical are usually preserved during synthesis to limit complications. In the case of asynchronous paths, timing checks are turned off in order to avoid timing violations. This will reduce redundant debugging as they will otherwise produce X propagation in simulation. Also, technology

119

libraries of 45nm and below have far more timing checks, much more complex than older process nodes and GLS is essential [46].  Power estimation: in most cases GLS provides only the input to power analysis tools used to simulate the design’s power requirements.  Verify power-up and reset operation which are critical points for a device. Being able to verify their functionality is a massive milestone for any project. Validating the setting of a device to guarantee that it has entered test mode or functional mode (or any other mode listed in the design specs) will require a number of different GLS tests. This will allow validation of the interdependencies of the signals producing the initial state of the device and guarantee that the reset sequence generated in RTL actually produces the desired outcomes.  To identify glitches on edge-sensitive signals due to combinational logic on clock and reset paths, as well as to verify the functionality of complex feedback loops.  To verify if the design can handle the frequency it is planned to work at, with actual delays in place (SDF annotation), preventing X propagation and timing violations to slip through to the final product. Simulation in ATPG may optimistically assign known logic values (0 or 1) to an input pin or output of any instance while a GLS simulator would identify it as unknown (X).

Timing analysis in physical design The ease of access of journals, article and manuals from electronic databases (i.e. IEEE Xplore [47]) offers an unimaginable amount of data, information and debug methods in regard to timing violations, but most of the people working in the field still have difficulty understanding all the concepts related to them. What exactly is timing analysis and why is the most important aspect of designing a circuit? During design, there are 3 types of constraints: timing, power, and area. Trade-offs between speeds/timing, area, power are very common. Nevertheless, a chip must meet the timing constraints in order to operate at the desired clock rate, hence timing constraints are not as flexible as power and area and are the most important of the three. Simulation is an essential method to apply timing analysis to a design in order to make sure the circuit can work properly at different frequencies over the entire specified operating environment (frequency is subject to variation due to device and ambient temperature). Clock speed rate will also affect

120

component selection at an earlier stage of the design life cycle. Integrating a component that is too slow compared to a processor module may reduce overall frequency.

Timing analysis is the methodical analysis of a circuit to determine if the timing constraints (set-up, hold, and pulse-width) are being met. There are 2 types of timing analysis:  Static Timing Analysis used to check static delay preconditions of the circuit without the use of any input or output signal.  Dynamic Timing Analysis which applies input signals and checks for expected output signals to verify design functionality

Figure 5-1 Ideal vs Real clock waves [48]

The main items affected by timing analysis are the clocks and the sequential components (mainly flip flops). The required characteristics for a clock are based on the technology node used for fabrication of the device. These are:  Minimum pulse width (at both low and high state)  Glitch-free (no unwanted pulses are present in the clock wave)  Minimum Jitter. The importance of jitter increases with the increase in operating frequency. The use of some components such as PLLs, may introduce jitter which is a timing uncertainty between a desired signal or time frame and the actual signal (see Figure 5-1)  Duty cycle (ideally 50/50) The required characteristics for a flip-flop are:  The setup and hold times of data (or scan input in the case of scan flops) are met for the clock signal worst case scenario (earliest/latest clock signal)  Passing data to flops on different clock domains and asynchronous sets and reset have to be dealt with on a case by case scenario

121

Basics of timing analysis

As previously mentioned, there are 2 types of timing analysis: Static Timing Analysis and Dynamic Timing Analysis. Static timing analysis is a way of validating the timing performance of a design and it is based on the analysis of all possible paths timing violations. The process considers the worst-case scenario in terms of delay through each instance (gates, sequential element) and overlooks its logical operation. [49]

Static timing analysis runtime is much faster that circuit simulation because it doesn’t simulate multiple test vectors as it only checks the worst-case timing for all possible logic conditions for the entire design. In simple terms, it is the method to check if the correct data value will be present at a specific instance input at the clock edge under all conditions (by checking for worst case condition). The word static is referring to the fact that this test is carried out in an input independent manner; it analyses all the possible logic paths (false and not-false) within the design. An observation must be made regarding the fact that STA is a process suitable only for fully synchronous designs but since a large percentage of products are designed to operate in synchronous mode it has become a fundamental process in system validation with regard to timing checks. Although STA does not require long runtime, it has been stripped from the GLS step of a product design flow and placed at an earlier stage (see 5.3).

5.5.1 Static timing analysis

There are 3 main steps to check static timing violations: • Design is broken down into sets of timing paths • Calculation of signal propagation delay along each path is then executed • Results are checked/compared against the paths’ timing constraints in the design and at the input/output interfaces

The STA tool analyses all paths from each and every start-point to each and every end-point and compares it against the constraint that should exist for that path. All paths will be time constrained, most paths are constrained by the definition of the period of the clock, and the timing characteristics of the primary inputs and outputs of the circuit. Here are a few key terms in timing analysis (TA): timing path, arrive time, required time, slack and critical path. Timing paths can be divided by types of signals: data paths, clock paths, clock gating paths,

122

asynchronous paths. Start and end point definition vary per type of timing path and it is very important to understand this clearly to understand the timing analysis report and to debug the timing violations. The two main types of paths are data and clock paths: for data path, the start-point is a place in the design where data is launched by a clock edge. The data is propagated through combinational logic in the path and then captured at the end-point by another clock edge. For clock paths the start-point is the clock input port and the end point is the clock pin of the flip- flop/latch/memory element (sequential cell).

There are other definition and types of path which are used during timing analysis reports and are subset of the aforementioned paths with some specific characteristics. Some of these names are:  Longest path: also known as worst path, late path, max path, maximum delay path  Shortest path: also known as best path, early path, min path, minimum delay path  Critical Path: the single path that creates the longest delay is the critical path. These are timing-sensitive functional paths and because of the timing values on them, no additional gates can be added in order to avoid increasing the total delay of the critical path itself.  Timing critical path: those paths that do not meet timing. The flow requires that after synthesis the tool must identify and report a number of paths which have a negative slack. The first step is to make sure those paths are not false or multi-cycle because in that case they can be ignored.  False Path: they are physically in the design but are logically/functionally incorrect because no data is ever going to be transferred from start-point to end-point. The goal in timing analysis is to perform timing check on all “true” timing paths, therefore false paths are excluded from timing checks. Since false path are not exercised during normal circuit operation, they typically don't meet timing specification and the procedure to fix their time violation is unnecessary as it does not add any value to the product.  Multi-cycle paths are designed to use more than one clock cycle for the data to propagate from the start-point to the end-point. To define any path as a multi-cycle one, it is necessary to specify (to the STA tool) how many (N) clock cycles are needed for the data to propagate from the start to the end point.  A Single-cycle path is a timing path able to propagate the data from the start-point to the end point with the use of a single clock pulse.

123

In some cases, paths can be logically true (a signal could propagate through its logic) but are not functionality true. Such cases will verify when design scan test setting may cause a particular path to become a true path but in reality, such paths will never be stimulated or enabled in functional mode. These types of paths may fail timing checks and if that is the case, it is the engineer’s job to verify if timing adjustments are required purely to get scan test operations to work. The second option would be to ignore the timing violation if they cause no issues for scan operation. Also, when moving into dynamic time analysis, the violation could be bypassed by applying a mask to the failing cycle in that specific vector (which will preserve test coverage and eliminate the false fail).

To shorten timing analysis run time, it is possible to declare perfectly good paths as false. Functional operation of a design can dictate, suggest, and identify a large number of paths for which timing analysis is not required, as in the case of paths between two multiplexed blocks that are never going to be enabled at the same time during normal mode of operation. This approach can eliminate a very large amount of timing analysis execution and the need to identify and remove single false paths and multi-cycle paths from an .SDF file.

5.5.2 What is setup and hold time?

During transition of the clock signals, there is a meta-stable window within which data cannot change. If data changes in this window, the correct data will not be captured reliably. For this reason, the input signal must meet specific setup and hold time requirements. These requirements are defined as follows:  Setup Time: Input data must arrive at this time before the clock rising edge so that the valid logic data can be safely stored in the flip-flop.  Hold Time: Input data must be held for this period in order for the valid data to be safely stored in the flip-flop

Timing parameters such as max time (critical path setup time) and min time (hold time) are fine-tuned throughout the design until the very last day available before design delivery to the foundry. While violating setup time would force reduction of the highest frequency at which a chip can potentially run, which is undesirable, violating hold-time is catastrophic. As post process adjustments to the operating frequency (PLL trimming) can overcome any setup issues,

124

the majority of debug time during product design will be spent to deal with the hold-time requirements.

The reason for a hold-time violation is a race condition between a data and a clock signal. When this race condition is aggravated by the presence of a larger than expected delay (skew) on the clock or a lower than expected delay on the data signal path, a hold time violation will occur. Clock skews will always be present in any design and in any part of it, hence the goal is not to eliminate the clock skew but to keep it under control using corrective methodologies. Clock skew can be caused by a number of factors:  Process variation can cause larger than tolerable clock skew despite the fact of the quality of the clock tree design.  Marginal power grids, hence IR drops could cause race conditions which will manifest as hold time violation on silicon. IR drop is not examined during the timing analysis but GLS will provide the waveforms to achieve such validation at the last stage of the flow during power analysis.  Inductive effects can be a cause for clock skew and since inductance is not usually modelled in most designs, unexpected clock skew can reach silicon stage [50]. The use of UDFM and cell aware models prevent some of these effects because they contain capacitive and inductive components information.

5.5.3 Recommendation to reduce and eliminate setup-hold violations

There are many recommendations that can be used and applied through the various design phases in order to reduce setup and hold timing violations. These will help designers reduce the number of iterations and to put in place fixes in simple and fast ways. Setup violations are essentially when the data path is “too slow” compared to the clock speed at the capture flip- flop. Keeping this in mind, consider the most commonly used method to avoid both setup-hold violations:  Reduce the amount of buffering in the data path which will reduce the cell delay but increase the net delay. Cell delay is the amount of delay from input to output of a logic gate in a path. Net delay is the amount of delay from the output of a cell to the input of the next cell in a timing path, therefore if cell delay is reduced in comparison to net delay, the effective stage delay will decrease (the combined gate and net delay is often called stage delay).

125

 Replace buffers with 2 inverters placed farther apart: the use of 2 inverters in place of 1 buffer will reduce the overall stage delay as they decrease the transition time by 2X compared to buffers. As cell delay of 1 buffer gate ≈ cell delay of 2 inverter gates this will allow the stage delay of single buffer to be smaller than the stage delay for the case of 2 inverters in the same path.  Replace HVT cells with SVT/RVT or with LVT ones. HVT stands for high voltage threshold, they can be used in the paths where timing is not critical as they provide power saving. LVT stands for low voltage threshold. They should be used in timing critical paths because they decrease the transition time and so propagation delay decreases. These types of cells are faster but their downside is their power consumption due to leakage, so they need to be used only when timing is critical. SVT, standard voltage threshold or RVT, regular voltage threshold, are a sort of compromise between both worlds as they provide medium timing delay and medium power requirements, hence if the timing violation is marginal, HVT cells can be replaced by SVT/RVT ones and only if necessary will LVT type cells be used. All cell types have the same size and pin positioning therefore no other layout adjustments are required. Note that the introduction of a large number of LVT would increase the overall SOC leakage current and this is not always a viable option for low power applications.  Increase driver cell size by upsizing the cell size. In general, larger cells have higher speed and could be used in setup time fixes. As for the LVT, the issue is higher power consumption and increased layout area  Buffers insertion: these are implemented to decrease overall wire delay by decreasing transition time, hence the overall delay will also decrease. The downside is area increase and increase in power consumption.  Cell positioning adjustments in layout. The automated layout tool should optimize the design for timing but in some cases manual adjustments will be required.  Clock skew is a very common technique to eliminate setup-hold timing violations. This is achieved by delaying the clock to the end point as it will relax the timing of the path. Note that it cannot be applied to clocks feeding into critical paths.

Hold violation is the opposite of setup violation; the setup violation occurs when data is too fast compared to the clock speed and to fix it, delay should be added in the data path.

126

It is essential to solve hold violation prior to fabrication unlike the setup violations where the clock speed could be reduced to a level where the SOC starts working correctly.  Adding delays to data paths through the use of extra buffer/inverter pairs/delay cells to fix hold violations. The hold violation path may have its start point or end point in other setup violation paths hence extra care will be required to avoid introduction new violations when fixing one.  Cell size decrease in data paths; the best option to implement this fix is to reduce the size of the cell closer to the capture flip flop as this will make it less likely to affect other paths and cause new violations.

As previously mentioned, setup violations could be eliminated as a post fabrication process during testing. Speed binning is a technique used to classify the packaged units into different bins based on their passing frequency regions. In some cases, it is convenient to reduce the clock speed to bring back units into a scan test passing region. This will confirm that the unit is working properly but at a lesser speed. It is often the case that a foundry will sell faster units at a premium price and downgrade price and specs for others. This process will reduce waste and provide more options to customers.

Note that the majority of the applied methods in STA debug and fixes can also be applied for DTA (GLS).

5.5.4 Dynamic timing analysis

Dynamic timing analysis is more commonly known as Gate Level Simulations with timing information. It is specifically used to validate test vectors, which are design application specific. The quality of the dynamic timing analysis (DTA) rises incrementally with the increase in count of input test vectors. The downside of DTA is execution time because increasing the number of test vectors will increase run time; the upside is that it can be applied to synchronous as well as asynchronous designs. The most common GLS is the so-called min-max analysis method. The meaning of the min- max is the following:  Typically the minimum version of delays is used to verify that the circuit works under best-case timing (no hold issues), hence for min case the tool will select the easiest conditions: the highest timing margin from the minimum-typical-maximum (min-typ-

127

max) trio of values contained in the vectors and in physical terms is equivalent to test silicon at cold temperature (at least 0° Celsius, potentially down to negative 40° for military grade applications) and at the higher end of the voltage supply rail (value of the nominal voltage specs + margin which is +10% in most cases, going up to 15% for applications targeting cheap markets and down to 5% for more demanding designs).  Typically the maximum version of delays is used to verify that the circuit works under worst-case timing (no setup issues), hence for max case the tool will select the hardest conditions: the smallest timing margin from the min-typ-max trio present in the vectors and in physical terms, this is going to be equivalent to silicon testing at hot temperature (at least 70° Celsius, with values going up to 125° for military grade applications) and at the lower end of the voltage supply rail (value of the nominal voltage specs -7%, down to -5% for cheaper designs and down to -10% for military/aerospace/medical markets).

Under min and max timing analysis, both delay conditions of the design instances are used to generate timing ranges/windows (earliest arrival data and latest arrival data) instead of edges (typical). Since outputs are in turn fed into inputs, cascading and managing the ranges (merging them) is very complex and as both min and max value of the delays are used, the simulation can become really slow. The most important issue with DTA is that it is not able to test all the design, resulting in incomplete coverage. The coverage will be dictated by the vectors selected for simulation and it will check logic and paths which are excited by the patterns, with a large probability of leaving out critical paths, which will remain untested for timing problems. Specific paths cannot be targeted unless such paths have been targeted during pattern generation when the pattern is created. DTA has to cover more information than logic simulators, making their performance slower, and each design component must contain functional and timing information in their models before simulation can be executed. For this reason, new IP blocks may not be simulated if they do not have functional models. In some cases, to slim down the process, only typical timing values are used for simulation. It may be possible to simulate a large number of vectors, possibly the full digital content but at the expense of missing out precious data. This will provide a certain level of validation, especially on the functional side but it will not and cannot identify timing violations due to setup-up (for which max conditions should be used) and hold violations (detected by a min simulation). The obvious downside of min-max is literally doubling simulation run-time which 128

will be an issue for large designs. In such cases a hybrid setup may be the best option: the execution of the first 10% of the patterns under a min-max condition will identify setup-up and hold violations on large part of the design (i.e. 10% of patterns sets can achieve up to 90% of the fault coverage of a design) and with the option to execute the remaining 90% of the vectors under the typical timing values. This hybrid option will therefore be required to run 110% of the digital content but would allow validation of the full design functionality and also will provide the best option for identifying the largest percentage of the timing violations of a design.

In conclusion, GLS provides the following advantages:  Extends coverage of circuit simulation (edges to window)  Evaluates worst-case timing using both min and max delay values for instances using the same vector stimulus  Does not report false errors as paths targeted will be “selected” by the patterns generated in ATPG

The disadvantages are the following:  It is not complete, at best will reach coverage above 90% of the design.  It is slower than logic simulation and may require additional test stimulus.  It requires functional behavioral models.

GLS validation allows reporting or identifying violations in terms of simulation times and states. As the source vectors will functionally exercise the logic within scan flop, false errors on unused or not relevant paths are not captured. This is also avoided because of the use of SDC (Synopsys design constraints) file utilized as input to ATPG tools prior to patterns generation. The use of SDC files is fundamental to avoid generating at-speed content for paths of the design that are never going to be operated during functional mode. Generating such vectors will most definitely produce GLS false timing failures which will not require debug, but when the same vectors are then used for high volume silicon testing, they will produce fails that will require processing in order to find the root cause the source, adding to the work load of silicon testing debug and contributing to the already expensive phase of product testing. SDC are used during synthesis as well as pattern generation and they are made of all the timing definitions and constraints essential to meet designs goal in terms of area, timing and power to obtain the best

129

possible implementation of a circuit. There is a common format, for constraining the design, which is supported by almost all the tools (from different vendors). The file is saved with an .sdc extension and its syntax is based on tcl language and contents are the following:  False path  Input/Output delay  Clock definition  Generated clock  Min/Max delay  Multi cycle path

Steps required to run GLS using Synopsys VCS

Synopsys VCS MX® (Verilog Compiler Simulator) is a compiled code simulator. It is capable of analysing, compiling, and simulating Verilog, VHDL, mixed-HDL, System- Verilog designs. There are three main steps in debugging a design:  Compiling the Verilog/VHDL source code.  Running the Simulation.  Viewing and debugging the generated waveforms.

It is possible to run the tool interactively to execute the above steps. VCS first compiles the Verilog source code into object files which are simple C source files; then it will invoke a C compiler to create an executable file; this is done without generating assembly language files and then the executables are used to simulate the design.

The first step consists of analysing all Verilog source files. This is achieved by the ‘vlogan’ command. If the design is made of VHDL source, the analysis is performed with the use of the ‘vhdlan’ command. In the case of mixed design (Verilog, VHDL), the tool makes use of a setup file named synopsys_sim.setup to configure its environment for it. This file maps the VHDL design library names to specific host directories, sets search paths, and assigns values to simulation control variables. An example of the setup file sourced from the VCS user guide follows: -- Example synopsys_sim.setup -- see ${VCS_HOME}/bin/synopsys_sim.setup -- Logical Library Mappings

130

WORK > TB_LIB TB_LIB : /prj/libs/tb_lib DUT_LIB : /prj/libs/dut_lib IP_BLOCK : ${VENDOR_LIB_PATH} -- Simulator Variable Settings ASSERT_STOP = ERROR ASSERT_IGNORE = WARNING TIME_RESOLUTION = 10 ps

Once the design analysis has been executed, the next step is to compile the design. This is done with the use of the ‘VCS’ command. To be noted: if the design is composed of purely Verilog modules, the vlogan command does not need to be executed but this is very rare. Usually the complexity of the design will require the use of the additional step for HDL analysis.

The following is an example of the vlogan command: vlogan [-help] [+define macro] [-f file] [+librescan] [+incdir+dir] [-l logfile] [-q] [-v file] [-y libdir] [+libext lext] [-work logical_lib] [-resolve] [+nospecify] [+notimingchecks] verilog_design_files

The most common switches used for the execution of the vlogan command are:

+v2k - Enable Verilog 2001 constructs -sverilog - Enable SystemVerilog constructs -timescale=1ns/1ps - Specify default timescale +define - Define a Macro -f file - Specify files as well as switches -l logfile - Log file generation -q - Quiet (no internal messages and banner) -v - Verilog library file -y - Directory of Verilog library files +libext+ - Library file extensions -work - Analyze into different logical library +nospecify - No timing or timing checks +notimingchecks - No timing checks

131

The second step is elaboration using VCS (for design compiling). During this phase, VCS will make use of the intermediate files generated during analysis, build the instance hierarchy and generate an executable simv file (binary). This executable will be used for simulation. There are 2 modes for design elaboration and execution time will depend upon the elaboration mode and the options needed during simulation:  Optimized mode: the tool is capable of the best compile time and runtime in this mode as it does not include extensive debug capability for the simulation.  Debug mode: also known as interactive mode, is often used at the initial phase of the design development cycle, or when extra debug capabilities are needed to solve the design issues. Compile and run time suffer because of the extra capability implemented with this mode.

The command syntax to use VCS takes three types of argument and is shown below: VCS [elab_options] [libname.] design_unit

Libname is the library name where the top module was analysed and can be either an entity or a configuration. VCS looks for the specified design_unit in the default library indicated in the synopsys_sim.setup file if libname is not specified. The design_unit can be a module Verilog top module name, or an entity VHDL top entity name, or an entity_archname (name of the top entity and the architecture to be simulated), or cfgname name of the top-level configuration. The most commonly used VCS options are the following: -doc: used to displays VCS documentation in the system’s web browser. -ID: returns useful information such as VCS version and build date, work station used to execute the run, OS name and host ID which is tracked for license’s count. -licqueue: allows waiting for a VSC license if none are immediately available. With license’s prices reaching the 100k dollar mark is not advisable to have licenses lying idle and jobs are queued to make the most used out of them. -full64: enables elaboration and simulation in 64-bit mode (default option in today’s executions) -file : points to a file containing multiple elaboration options.

The following are options for Discovery Visualization Environment (DVE) and Unified Command Line Interface (UCLI)

132

-gui: if starts the DVE during elaboration run time. -R: runs the executable file immediately after VCS and starts simulation immediately after completing the elaboration step. -gfile : used to override default values for design generics and parameters by loading specific once contained in . -notice: used to enable diagnostic messaging (verbose) -q: employed to suppress messages from the C compiler used (quiet mode) -V: it compiles verbosely; these are messages from the compiler driver program which would print the commands executed as it runs C compiler, assembler, and linker. -l : it point to a log file where VCS can output the elaboration messages

For an extensive list of VCS options please refer to the VCS, DVE, UCLI manuals or make use of the –h switch to bring up the Synopsys documentation interface. [51]

The third and last step is simulation which is executing the simv (binary executable) file generated by the elaboration phase. Simulation can be run in 2 modes based on how the elaboration was setup to execute. It can be run in interactive mode and batch mode:  Interactive (or debug) mode: is the default mode when debugging design issues using the command line interface (UCLI) or the GUI (Discovery Visual Environment or DVE) which is essential during debug phase. To be able to simulate the design in the interactive mode, the design must be elaborated using the -debug (or -debug_all) options at compile- time.  Batch Mode: is a fast way of running the simulation phase when the design has reached a certain maturity and the possibility of having issues during simulation is very slim. It still offers debug abilities but greatly reduced and is used primarily when the main goal is getting execution finished as fast as possible.

The command syntax is very simple: simv

As for elaboration command, there is also a long list of simv options targeting different aspects of the process: for SystemVerilog assertions, to control termination of simulation, to enable or disabling specific design blocks, for specifying time range for simulation start-stops, for

133

recording outputs (logs, fails), for controlling execution messages, for VPD and VCD outputs, for timing specific control and for licensing. The following are the most used switches for simulation: –l : although the –l option will slow down execution time, it is always advisable to have records of every simulation executed. -debug: in batch mode is never to be used and enables the most functions for debugging purposes. -assert : the argument field has also a large number of options, the most used been finish_maxfail=N which allow to terminate the simulation if the number of failures for any System Verilog assertion reaches the N value +no_notifier is used to suppress the toggling of notifier registers that are optional arguments of system timing checks but of great help during the debug phase +no_tchk_msg used to disables the display of timing violations without disabling the notifier register’s toggling in timing checks. +notimingcheck used to disables timing check system tasks in the design. It allows to improve runtime depending on the number of timing checks that are going to be disables. +VCS+stop+time stop simulation at the time value specified. -error used to control error and warning messages (multiple options available). +SDFverbose used to display all back-annotating delay error/warning from SDF files. +vpdfile used to specify the name of the VPD output file with a .vpd extension. +maxdelays employed to specifies to the tool which delay value of the min, typical and max trio found in module path delays has to be used for timing checks during simulation. +mindelays used as for +maxdelays for minimum timing value selection. +typdelays used as for +maxdelays for typical timing value selection.

It is common practice to resort to unit-delay GLS execution for test bench clean-up and setup. This is done because unit-delay simulations are relatively faster and all test bench and SOC related issues can be resolved more easily. Running unit-delay GLS is recommended because one can catch most of the test bench/design issues before moving on to SDF annotation. After SDF arrives, the focus should be more on finding the real design timing issues, so one must make sure that the time does not get wasted in debugging environment setup issues. Once the unit delay GLS has been pipe cleaned, the engineers can proceed to re-iterate the operation using min and max SDF timing values. GLS uses min max to define a window within which

134

states must be valid. At the other end, during silicon testing, engineers will try to setup clocks and timing to get signals at the centre of the min-max timing window validated in GLS. This is the main reason to run a min and max GL simulation: to discover that the actual silicon can operate within the window and to allow to have a certain signal time marginality for the scan test on silicon.

5.6.1 GLS Execution and results of test-case Dynamic_node module

Once the GLS setup has been completed and clean of bugs, it is best practice to combine the 3 steps command to run the whole flow in a simple shell script. This will contain a variable pointing to the Synopsys licence files, followed by the vlogan, the VCS and .simv commands.

An example of the content of the directory structure to run a GLS is shown here: Dynamic_node_gls_v1. |-- chains_testbench0101.v |-- chains_testbench0101.v.0.vec |-- chains_testbench0101.v.cfg |-- chains_testbench0101.v.po.name |-- dump.tcl |-- dynamic_node_top_wrap.mapped.v |-- logs |-- SDF | `-- dynamic_node_top_wrap.mapped.SDF |-- SDF.maxselector_cap_.opt |-- SDF.min_selector_cap_.opt |-- SDF.typ_selector_cap_.opt |-- synopsys_sim.setup `-- v_libs |-- standard_cell_lib_verilog_file_1 |-- standard_cell_lib_verilog_file_2 |-- standard_cell_lib_verilog_file_3 |-- standard_cell_lib_verilog_file_4

It is used to collect all the required input to execute the simulation. In this case it shows the test bench files which have been generated by an ATPG tool. The above directory shows a chain test made of a single pattern and a single vec (vector) file. When the number of patterns is much larger, the ATPG tool (Mentor Graphic Fast Scan in this case) will breakdown the pattern set into multiple vector files.

135

VCS allows the use of a very large number of switches. In most case it is preferable for ‘ease of use’ reasons to group the switches in a text file (that can be read using –f switch). The dynamic_node_wrap_mapped.v file represents in this case the synthesized and scan inserted netlist file. The following is the vlogan command used for Verilog analysis of the Dynamic- node block.  vlogan chains_testbench.v dynamic_node_top_wrap.mapped.v -sverilog -full64 +v2k -V -v v_libs/ v_libs/ v_libs/ v_libs/

The tool will not throw any error for using switches that are not required as in this case the use of –sverilog. An example of the log is shown below. The execution reported that one of the standard cell library files contained the declaration of a cell already present in a previous file, it states the non-issue as a warning and ignores the second declaration The tool proceeded to analyse the design and parse the library files provided. A chain test- bench was used as the main argument of the vlogan command as well as switches to enable system Verilog analysis and the use of 64-bit elaboration. The next step was to run the VCS command in order to generates the simv executable  VCS dynamic_node_top_wrap_chains_testbench0101_v_ctl -o SOC_no_pg_gls_scan_m.simv +notimingchecks -marchive 3961 +lint=TFIPC-L +lint=PCWM -sverilog +error+1000 -debug_all -add_seq_delay 1ns -notice -V -l logs/elab.log The main arguments to the VCS commands are the –o to indicate the name of the .simv output files (a default simv file will be create if –o option is not used), the design unit (dynamic_node_top_wrap_chains_testbench0101_v_ctl), and various switches. Some of their functions are:  +error+1000: enables the increase of the maximum number of NTB (native test-bench) errors at compile-time to 1000  -sverilog: System Verilog enabled  -debug_all: enables the use of UCLI (user command line interface) and DVE (graphic interface). It will also enables line stepping through the execution.  -add_seq_delay: it will add a 1ns delay to each expected response time  -marchive: indicates the number of module definitions to tell the linker to create temporary object files that contain the specified number of module definitions.

136

The VCS command can be run with the –RI switch to create an executable in the current directory and start the simulator automatically. A –f option is often used to group a large number of command line options as well as Verilog file needed during the compilation phase. Also to be noted that the test-bench is listed as the first argument for the VCS command. VCS compiles the source code on a module by module basis. Incremental compilation of the design will occur in each iteration and only the modules which have changed since the last run will be reprocessed. There are over 250 switches available for use with the VCS command. Please refer to Synopsys UCLI manual for more details. A log example from the VCS command follows Info: Loading Package synopsys/VCSmx/H-2013.06-SP1- 12/packages/IEEE/lib/IEEE.STD_LOGIC_1164 Top Level Modules: dynamic_node_top_wrap_chains_testbench_v_ctl TimeScale is 1 ps / 100 fs Starting VCS inline pass... 91 modules and 4 UDPs read. However, due to incremental compilation, no re-compilation is necessary.

As specified by the –o command, the tool was capable of outputting a SOC_no_pg_gls_scan_m.simv executable which will be used for the 3rd and final step in the GLS process. The final command is the .simv executed with a –ucli to allow debug capabilities, a log file and a –i to indicate a short tcl file that could contain design specific forces or in this case just a dump command to store simulation signals and result that can be used to visualize signal wave forms and facilitate debug.  SOC_no_pg_gls_scan_m.simv -ucli -i dump.tcl -l logs/simv1.log

The .simv will produce a simulation log as well as a test-bench.v.fail log containing the expected and simulated valued for each chain and for each pattern vector: i.e. // This File is simulation generated (chains_testbench.v) format cycle failures_begin //cycle_number PO_name expected_value simulated_value pattern_id chain_name cell_number 2256 yummyOut_W L X // Pattern 0 chain: chain1 cell: 1 2258 yummyOut_W H X // Pattern 0 chain: chain1 cell: 3

137

When simulation mismatch occurs, it is best practice to compare the ATPG environment and setting with the GLS setup before moving onto more tedious debug practices involving timing analysis. Most cases just require matching the setup of the two tools and the scan test signals governing the test. When a simulation is executed with no mismatch a passing message will be reported at the end of the .simv execution. i.e. Simulated 1 patterns No error between simulated and expected patterns $finish called from file "testbench.v", line 74105. $finish at simulation time 362750.00 ns

To view the waveforms generated (dumped) by the simulator run (execution of the .simv), it necessary to execute a simple command. dve –vpd inter.vpd & This fourth and last step is only useful for debug purposes; there would be no need to view the waveforms if the result of a simulation is a “PASS”.

Figure 5-2 visible "stair case" shape of a chain test pattern shifted through consecutive scan cells elements

Figure 5-2 displays a number of scan cells outputs changing their state from an X (unknown value, shown in red) at the start of a loading phase to a 0 state and then a series of 0 to 1 and 1 to 0 transitions present in a chain test pattern. A healthy simulation will show a stair case shape representation of the signals, if the cells displays are consecutively stitched.

GLS failures

A GLS failure can be very hard to debug. VCS provides engineers with a set of tools for debugging and validating a design from source-level debugging to simulation result viewing (Discovery Visual Environment or DVE).

138

Figure 5-3 Using a scan chain test to observe failing scan chains (Source: Mentor Graphics) [52]

The challenge in GLS is X propagation debug. X corruption may be caused by a number of reasons such as timing violations, uninitialized memory and non-resettable flops. In general, non-initialized flops in a design are guaranteed not to cause any problems but there is a need to find out which flops are not initialized and proceed to initialise them to some random value (either zero or one) so as to mimic real silicon. This is done in order to have a closer and clearer picture of how the design will behave at the desired frequency with actual delays in place.

Although GLS has its own set of challenges like set-up issues and long run time, it is still very much a part of the sign-off process. It is always the case that the first set of patterns to be simulated are chain patterns. Scan chains are used for the first phase of debugging of complex chip designs because detecting a scan chain fail is trivial as the scan output waveform is identical to the scan input waveform shifted in, if the scan chains are healthy (see Figure 5-3). Since chain elements are based on the sequential elements of a design, they can easily occupy over 30% of the silicon area and can account for more than 50% of the silicon failures [53], therefore having the ability to debug broken scan chains is crucial for scan test simulation, silicon test failure analysis and yield analysis. The chain test results also help us understand the type of defect that is causing chain failure (Figure 5-4). In fact, different defect types will produce different chain test results (i.e. a single delay defect will result in the correct sequence appearing at the output pins but with a one-cycle offset).

139

Figure 5-4 Chain fault models determined by chain patterns (shift-in: 0011) (Source: Mentor Graphics) [52]

While detecting a scan chain defect is trivial, identifying the defect location is much more complex. Knowing the exact location of the defect is crucial for GLS debug and bring-up silicon failure analysis (FA). The chain test results will not give any indication of where in the chain the defect is located. Depending on the type of defect, it is also virtually impossible to determine if a single or multiple failures occurred on each failing chain. Since it is not possible to identify the defect location from the chain test results, different pattern sets will have to be used for it. These patterns include chain patterns (expanded patterns or one-hot for compressed EDT configurations (see paragraph 4.9.1)) and the scan patterns that target faults in the functional circuitry of the design within the failing chain domain. The use of stuck-at patterns will allow the engineer to gather more data once the failure occurs. The failing cycles can also be fed back to an ATPG tool to execute scan diagnose patterns (see section 4.9.2). This later pattern type can pin point the cause of the failure to not only the sequential element that captured the response but it can provide detailed information down to an individual logic instance. Once the location has been identified, is time to establish the causes of the failure analysing scan signal wave for any timing violation (see section 5.5).

140

Using GLS to generate output files for power analysis

GLS simulation is used also to generate input files for power simulation. The GLS can produce a .vcd file with a simple command sequence. A VCD (value change dump) file stores the switching activity data generated by simulation for use in power calculation. Internally, non- vcd format switching activity is converted to vcd. An example of these commands are shown here: initial begin #36643ns; $dumpfile("atp2.vcd"); $dumpvars(0, dyn_node_testbench_v_ct, dynamic_node_wrap); $dumpon; #2000ns; $dumpoff; End

This sequence of code is usually placed in the dump.tcl file used when running the executable .simv generated by VCS (i.e. SOC_no_pg_gls_scan_m.simv -ucli -i dump.tcl -l logs/simv1.log) The initial numerical value indicates the starting instant of the wave dump in the vcd data file while the second numerical value appearing in the sequence is the window size of the simulated waveforms. It is not possible to generate multiple vcd files within the same simulation run.

An example of the content of a vcd files follows: $date Nov 14, 2016 14:38:21 $end $version TOOL: ncsim 05.70-s001 $end $timescale 1 fs $end $scope module dynamic_noder_tb $end $var wire 1 ! clk $end $var wire 1 " rstn $end $var wire 1 # count_en $end $var wire 32 $ reg_out [31:0] $end $var wire 1 % tb_clk $end $var wire 1 & tb_rstn $end $var integer 32 ' tb_counter $end $var wire 1 ( tb_counter_vec [31] $end ….

141

….

How to use vcd files as an input to power analysis Power simulation tools cannot directly use vcd format files. A vcd to saif (switching activity interchange format) file conversion is required. Synopsys provides a vcd2saif converting utility which is used to convert VCD file format to saif. Unfortunately, there are file size limitations. Window size limit can be different from design to design. Some designs can have a large number of input, output and scan control signals hence a lot of data will be stored in a small window size, while for a simpler scan infrastructure it may be possible to have a very large window across multiple patterns or entire sets (see Table 13).

Table 13 VCD file dimension examples for a design with 12 scan-in inputs Time FILE Conversion Number Length comments DIMENSION succeed 01 2000ns Section of 1 pattern 1.5 GB YES 02 3400ns Section of 1 pattern 2.1 GB YES 03 14000ns Window size 3 pattern 6.3 GB YES 04 70000ns Window size 6 pattern 29 GB YES 05 140000ns Window size 13 patterns 58 GB YES 06 280000ns Window size 23 patterns 115 GB YES 07 560000ns Window size 50 patterns 229 GB NO

To convert a vcd to saif file requires the execution of a very simple command: vcd2saif –input dynamic_node_chains.vcd -o dynamic_node_chains.saif

The saif file or activity file is a record of transitions each net underwent during simulation. Since transitions affect dynamic power, the activity file is a critical input into power estimation.

Figure 5-5 Clock vs data signal wave toggle comparison 142

A toggle is a 01 or 10 transition; the toggle rate is the ratio of the number of transitions divided by the number of cycles. For the case of the signals such as in Figure 5-5 with multiple signals as in a real scenario, the toggle rate would be calculated as the sum of all signal transitions divided by the number of signals and the results divided by the number of cycles.

Figure 5-6 uncompressed chains with chain test shifting

The case shown in Figure 5-6 would occur when using chain test patterns (sequence 0011) for uncompressed chains. Such a pattern is very useful for debug and allows verification of scan chain integrity. This test patterns requires a lot of power as all scan flops would transition from one state to another at least once per cycle. It is considered to be the worst-case scenario as SA and AS patterns will most likely contain less transitions due to the power constraints that are normally used when setting up pattern generation. It could be relaxed by setting the chain test sequence to a more conservative 00001111 (one transition per 4 cycles) although the transition would still occur at the same time with the possibility of causing power issues. The following is an example of a saif file content: (SAIFILE (SAIFVERSION "2.0") (DIRECTION "backward") (DESIGN ) (DATE "Tue Nov 15 08:37:09 2016") (VENDOR "Synopsys, Inc") (PROGRAM_NAME "vcd2saif") (VERSION "A-2014.12-SP2") (DIVIDER / ) (TIMESCALE 1 fs) (DURATION 50000000000) (INSTANCE binary_counter_tb (NET (clk (T0 0) (T1 0) (TX 50000000000) (TC 0) (IG 0)

143

) (count_en (T0 25625000000) (T1 24375000000) (TX 0) (TC 39) (IG 0) ) (count_out\[0\] (T0 37810000000) (T1 12185000000) (TX 5000000) (TC 2437) (IG 0) ……. …….

A Saif file contains toggle counts and time information such as how much time a signal is in 1 state (T1), 0 state (T0), x state (TX). In comparison, a vcd file contains value changes of a signal including at what times signals change their values. Saif does not contain this information as it contains cumulative information of value changes, hence a vcd file is a superset of saif file. Any application which needs time stamps of individual value changes must use vcd.

The following are the steps required to run and obtain a power report from a saif input file:

 Invoke design compiler in tcl mode with the dc_shell -t On dc_shell prompt  set search_path  set target_library  read_ddc dynamic_node.ddc A .ddc file is binary file which contains both verilog gate level description and design constrains  read_saif -input dynamic_node_chains.saif -instance_name  report_power > power.rpt  exit

The output power.rpt contains the power consumption figure. An example of the report for a saif generated from a chain test simulation follows:

Information: Updating design information... (UID-85) Information: Propagating switching activity (low effort zero delay simulation). (PWR-6)

144

Warning: The derived toggle rate value (0.240096) for the clock net 'clk' conflicts with the annotated value (0.199980). Using the annotated value. (PWR-12)

**************************************** Report : power -analysis_effort low Design : dynamic_node Version: A-2007.12-SP2 Date : Nov 15 08:23:21 2016 ****************************************

Library(s) Used:

tcb013ghpwc (File: /libs/lib_file.db)

Operating Conditions: WCCOM Wire Load Model Mode: top

Global Operating Voltage = 1.08 Power-specific unit information: Voltage Units = 1V Capacitance Units = 1.000000pf Time Units = 1ns Dynamic Power Units = 1mW (derived from V,C,T units) Leakage Power Units = 1nW

Cell Internal Power = 47.4650 uW (59%) Net Switching Power = 32.5019 uW (41%) ------Total Dynamic Power = 79.9669 uW (100%)

Cell Leakage Power = 1.5108 uW

Here are some points to keep in mind when running power simulations:

 When there are warnings about the nets not being properly annotated, if these nets are internal nets to standard cell libraries these warnings can be safely ignored. If that’s not the case, saif annotation is unsuccessful.  Errors such as “No switching activity has been annotated” may require you to review the command used to read in the saif file, probably the instance name in the read_saif command might not be the correct one.  It is important to read design 'ddc' and not the design 'netlist' for power analysis. This is because the 'ddc' is expected to contain the clock information, the input slopes, output loads which do all influence the power consumption.  The power simulation should be run for a sufficient amount of time using a test-bench which resulted in exercising the design to its limit.

145

Summary and conclusions

This chapter has explained the basic concepts involved in Static and Dynamic Timing Analysis, or GLS. This includes a discussion of the issues involved, followed by a detailed step-by-step description of how to use the Synopsys timing analysis tools, with emphasis on GLS. Finally, the relationship between GLS and power analysis is explained and some examples presented.

Simulation is a core task for delivering reliable SOC designs. It would be impossible to produce a working product without it. GLS has been around for a number of years and provides the mean for identifying the majority of bugs in a design. Due to timing constraints and business needs, there will always be bugs slipping through all the way to silicon manufacturing.

Design complexity and size are increasing execution time and making it more difficult for an engineer to debug gate level simulations. Fortunately, as the designs are getting ever more complex, computer power and EDA tools are also progressing in their evolution, providing more ways of dealing with the multitude of issues encountered in a product design flow.

GLS is still not capable of simulating power but can provide invaluable input to other simulating tools. The high number of issues, variable and complexity of today’s designs may not allow to integrate power simulation during gate level simulation for a number years to come.

146

147

6 Conclusions, outcomes and future work Introduction In this chapter, the main findings and learning with regard to the research are summarised and general conclusions based on the findings of the studies presented in this thesis are described. Furthermore, the strengths and limitations of this thesis are considered and suggestions for further research are presented, including a number of potential points to conduct the study on. This chapter concludes with some observations on this work and the more general topic of scan power analysis.

Findings and learning

The literature survey and background study on scan power analysis revealed an intricate and complicated environment to study, research, and work on. The documentation in relation to silicon manufacturing and tools used for designing, analyzing, debugging, and fabricating IC devices is plentiful, unfortunately, however it does not disclose straightforward answers to the numerous questions presented when investigating issues related to the silicon industry. The nature of the documentation in most cases is of an academic nature and is often based on older documentation and conjectures. In an industry that evolves as fast as silicon manufacturing, this can mean that the documentation obtained can be out of date. On the other hand, any documentation that is up to date is off-limits because it contains proprietary information regarding the latest techniques and technologies.

The literature on scan testing prior to this decade focused on scan test time as the main issue while power supply limits was not much of a concern and was dealt with using techniques such as clock gating, scan chain segmentation, design partitioning, voltage shut-off etc. There is very little indication of any type of scan power analysis. With the advent of compressed patterns, larger designs which contained a much greater number of scan flops and the push to ever smaller technology nodes, magnified the scan power issues and with it the documentation related to IR-drop started appearing in academic documents and EDA tools manuals.

Industry was and still is reluctant to share valuable information regarding this issue. A study done by Intel Corporation in 2009 produced a paper that describes the IR-drop issue in what

148

was at that time the latest technology node. The paper was titled “Understanding power supply droop during at-speed scan testing” [54] and contained a detailed description of the issue. No information on possible solutions or data was shared.

To overcome the lack of accessibility on up to date documentation in regard to design processing and scan power data, the solution proposed here is to supply an entire flow that can show the reader how to process a design through DFT scan insertion, then how to process the scan inserted netlist using an ATPG tool for pattern generation, also how to setup and run a gate level simulation which will provide the means to create the waveform inputs to power analysis tools; this latest step can be setup to generate the valuable data necessary for scan power studies. This research has tried to get as close as possible to current industry flows and used the most utilized tools for design and debug which should help mimic the steps used by the silicon manufacturers. Simulating scan test using the same tools will allow academic researchers to encounter similar issues to those seen by commercial design houses and this is important because scan power analysis is still very much based on simulation data and not on data from silicon testing.

There are multiple commercially available simulation tools in the market and for an individual, it would be a monumental task to undertake to be able to learn them all for a study. It would not be recommended to use multiple EDA tools as these are highly automated, very complex, requiring expensive training (often not affordable by students and researchers) and years of experience to learn. For this reason, most of the flow presented here used Synopsys tools (synthesis, GLS and power analysis) while for pattern generation the option was Mentor Graphics tools.

Power simulation tools are still immature and although capable of predicting precisely the overall scan power requirement for a design, they still have limitations because their output reports are based on a very small gate level simulation data window, which does not provide the full picture for an entire scan pattern set. They are also incapable of pinpointing the power issues in the design layout or at cell level. In order to reach a definitive conclusion about what and where within a design the power issue occurred, it would be required to cross reference silicon tester data with power analysis reports based on patterns that failed scan test on silicon. Computer simulation processing time and data file size (see Table 13, Section 5.9) is in this case the limitation that prevents generating a scan power report based on the entire scan pattern 149

set. The research work described in this thesis has provided all the information needed to setup the generation of scan power data and has also included suggestions on using an open-source project that can be easily adapted to a scan power data study. The open-source design selected (OpenPiton) simplified the flow of this work as it comes with a fully tested and working synthesis flow. Although in this case the flow was setup to execute scan insertion of uncompressed chains, altering the flow to include IPs for the generation of compressed patterns to reflect today’s standards used by industry should be straightforward.

With the presentation of the flow in this thesis and its alterations to include scan flip flops and scan pattern generation, the research has made a significant and useful contribution to scan test research and its related power issues. Given the expanded capabilities opened up by the modified flow and the possibilities that arise from the use of OpenPiton, the next section presents a number of important research investigations made possible by the modified flow and the use of OpenPiton.

Possibility for future study

The main aim of research on scan test power analysis is to improve quality and eliminate high yield fallout due to scan test fails (hence eliminating waste and improve revenue). Scan power issues can manifest themselves in various forms such as IR-drop/power drop, localized temperature increase, clock signals instability and others which result in false fails. Variation in process technology can cause the issues to manifest in similar ways and are accentuated when moving towards smaller geometries. Frequency is also a major factor that needs to be taken in consideration during a scan power analysis research study. Often the literature found in relation to the scan power problems of IR-drop or other scan related issues, referenced older journals and data based on older technology nodes. Symptoms and behavior of power usage during scan changes with technology scaling and issues mitigation that may work for older technology nodes may no longer be suitable for the current technology node.

The OpenPiton design chosen for this research come with a synthesis flow that allows a user to easily port it to different technology nodes which enables the user to synthesis it with the use of different standard cell libraries; comparison of scan power across technology nodes could be investigated.

150

The opportunity of generating scan power data following the set up described in this research can open very many possibilities in regard to the exploration of scan power issues; Table 14 shows variation/ranges that could be applied to a number of variables for a future study of scan power and could be extended to include temperature and voltage conditions. The number of variable and their combination is large and it might not be possible to execute a study on all of them. Also applying too many changes to a setup simultaneously may generate data too hard to interpret or compare. An easier approach would be to generate data by changing the value of only one variable at a time, then proceed to generate data for best and worst scenario of the combinations of them. Breaking down the analysis as above might reduce the number of iterations required. Comparing data for the best and worst scenario identified for each variable, followed by a second phase to integrate all 4 fields could be the best approach.

Scan test power analysis is a huge field of study. A lot of the work will have to be performed on simulated data, but the possibility of having access to silicon test data in the future may also be considered (although very unlikely). Table 14 Potential variables for a future study of scan power Technology node Frequency Scan configuration Scan clock signal 45nm Low MHz range compressed single 32nm | uncompressed multiple 22nm | chain length skewed 18nm | clock duty cycle 14nm | 10nm | 7nm GHz range

The various configurations to be explored and the large numbers of variants and variables together with the flow provided here, can offer the possibility of generating a vast amount of data. Specific factors that could be investigated in order to identify the root cause of excessive test power are:  scan power data correlation between consecutive scan test vectors  scan power data correlation to toggle rate for scan test vectors and comparison to toggle rate during functional mode of operation  correlation of scan power data analysis to scan frequency used and comparison to functional frequency of operation  Correlation between scan flop counts and gate density (technology node) 151

If silicon data or silicon and test equipment is available, the research could be expanded to take into consideration the following aspects:  Power marginality correlation with frequency variation during scan testing, (possibly applying the variation to just scan shift as it count for the large percentage of the scan test time)  Power marginality and issues when a device is tested at different temperatures  Power marginality correlation with Vmin (nominal voltage supply minus 5%)  Useful data could be also generated with a comparison between scan power simulated data with tester power data per scan vector to verify that tool reports and models used are up to the task of simulating silicon design and silicon behaviour 

Figure 6-1 Original chain set configuration and proposed approach to half the number of scan flops and reduce scan power issues

Two potential scenarios that could also be investigated are: 1. Replace a standard set of chains with two sets made of scan flops as for a checker board configuration (by having the 1st set composed of odd chains made of odd positioned flops from the original chains set and even chains made of flops positioned in even locations from the original chain set; the second chain set will be using the remaining flops, with odd chains made of flops positioned in even places in the original chain set and even chains made of flops positioned in the odd places in the original pattern set (see Figure 6-1)). Such a configuration should offer the advantage of having half of the total scan flops switching during scan test (from the original configuration) therefore halving the toggle/switching rate. The two sets should be able to achieve a test coverage

152

as high as in the original configuration (although higher number of clock pulses may be required for signal propagation to observation points because they are farther apart).

Figure 6-2. The relationship between defect coverage rates and resultant DPPM levels (Source: Synopsys) [55]

Test time is not compromised because the chain count is doubled but their length is now halved. This should result in lower power requirements but simulation and possibly silicon tester data will be required for validation. In today’s market a reduction of scan test coverage cannot be accepted, therefore data from ATPG tools may already give an indication of the validity of this approach and adjustments to the cells configuration may be required in order to recover coverage as DPM increase cannot be compromised

(Figure 6-2).

2. Creation of new fault models that can hold power information. In today’s silicon industry, standard fault models (AS and SA) are being replaced by UDFM, more commonly known as cell aware fault models to increase fault coverage and reduce DPM numbers. Limitations of existing fault models is in most cases the cause of silicon testing false fails. Power related false fails are not as easy to reproduce in models as in the case of timing fails. Defects and DPMs have been increasing [56] and this is attributed to the transistor density increase (Figure 6-3). As the ATPG fault models were unable to detect all the cells internal faults, the migration to UDFM cell-aware from tradition fault models has allowed test engineers to screen out an increase number of defects and bring back down DPMs but the number of DPMs is still higher than requested by customers and scan power may be the cause of it. The use of compressed

153

patterns for high volume test is a must in today’s test programs, but when a compressed pattern fails, it is necessary to expand the compressed pattern to identify the location of the fail (see 4.9.1). This process will create a number of patterns which will match the compression ratio of the scan design used. As the toggle activity of the newly created patterns is much lower than the compressed counterpart, it is often the case that a re- test of the same silicon die with the expanded version of the pattern will output a pass result and reveal an inconsistent outcome. The cause of the fail in the compressed patterns points to the power issues, but the connection is unclear because the generated compressed patterns are often constrained to avoid power threshold violations.

Figure 6-3 Logic transistor scaling with the evolution of technology node [56]

Results such as the one described above can be interpreted in different ways and different points could be expressed on it:  The second test done with uncompressed patterns reveals that the unit tested is not defective and that the previous fail caused by the compressed patterns can be ignored and treated as a false fail. This may indicate a poor cell modelling or limitations of ATPG tools in calculating power usage and obey to power constraints during patterns generation.

154

 In contradiction to it, the passing test done with uncompressed patterns confirms the validity of the ATPG fault models used for the pattern generation. 3. Synthesis and fabrication of adesign with of ODDD (see 2.9) that can produce silicon data to compared to simulation data providing the mean for corrective action to cell models and tool’s power calculation methods.

Other avenues could be adopted for investigation but the main starting point is always going to be data generated by simulation tools.

Conclusions

The initial step in a research project is to understand the problem issues and in order to do that it is necessary to collect or generate data for analysis. This work has provided a guide through the harsh silicon industry methods and tools and tries to assist the reader to setup a flow to generate such data. The work also offers information on what are the de facto standard tools used by industry, general knowledge on techniques and steps to run simulations, analysis and debugging. The issue of scan power will probably be part of the scan test debug flow for years to come. Mitigating solutions will provide the best chance to keep scan power at bay and manufacturing yield high until more powerful EDA tools capable of producing power analysis on a gate level will be made available.

Simulation data is very important but the ultimate goal would be to be able to run studies using data from silicon testing. This is very unlikely to happen unless the issue becomes so critical that industry will be forced to obtain help from the academic world and make data available for research and studies. A lot of scan power studies have probably been done but it is obvious that they are not getting into the ‘open’ taking into account the very high value given to these types of information within the silicon industry The final goal of this work is to facilitate the investigation of low power scan for VLSI applications, which can allow us to identify and propose new approaches to deal with scan power issues while scan is applied to a design. A clear cut solution may not exist, but finding

155

and introducing novel scan test power requirements mitigation techniques could be of great help for the silicon industry.

156

157

References

[1] “The History of the Integrated Circuit,” [Online]. Available: http://www.nobelprize.org/educational/physics/integrated_circuit/history/. [Accessed 15 dec 2016]. [2] M. LAPEDUS, “How Much Testing Is Enough?,” 15th may 2014. [Online]. Available: http://semiengineering.com/how-much-testing-is-enough/. [Accessed 23 Jan 2017]. [3] WIPO Intellectual Property Handbook: Policy, Law and Use. [4] “What is Intellectual Property?,” [Online]. Available: http://www.wipo.int/about-ip/en/ . [Accessed 15 March 2017]. [5] “OpenPiton open source research processor,” , 2015. [Online]. Available: http://parallel.princeton.edu/openpiton/#infosec. [Accessed 19 4 2016]. [6] “NanGate FreePDK15 Open Cell Library,” NanGate, [Online]. Available: http://www.nangate.com/?page_id=2328. [Accessed 14 Aug 2016]. [7] Kapur, Rohit, S. Mitra and T. W. Williams, "Historical Perspective on Scan Compression," in IEEE Design & Test of Computers, vol. 25, no. 2, pp. 114-120, March-April 2008. doi: 10.1109/MDT.2008.40 [8] F. Aniruddha_Gupta, “Estimate power at RTL to identify problems early,” 5 Aug 2015. [Online]. Available: http://www.edn.com/Home/PrintView?contentItemId=4440079. [Accessed 4 12 2016]. [9] By Ron Press, Mentor Graphics, “Cell-aware atpg test methods improve test quality,” test and measurement world, June 2012. [10] S. Manich, “Faster defect localization in nanometer technology based on defective cell diagnosis,” in 2007 IEEE International Test Conference, 2007. [11] Hapke, F. et al., "Cell-aware analysis for small-delay effects and production test results from different fault models," 2011 IEEE International Test Conference, Anaheim, CA, 2011, pp. 1-8. doi: 10.1109/TEST.2011.6139151 [12] H. Sutter, “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software,” Herb Sutter, 1 February 2009. [Online]. Available: http://www.gotw.ca/publications/concurrency-ddj.htm. [Accessed 25 January 2014].

158

[13] Abramovici, Miron; Melvin A. Breuer; Arthur D. Friedman, Digital Systems Testing and Testable Design , New Jersey: Wiley-IEEE Press, 1990 [14] Wang, Laung-Terng, Cheng-Wen Wu, and Xiaoqing Wen. VLSI test principles and architectures: design for testability. Academic Press, 2006, p 52. [15] Girard, Patrick; Nicolici, Nicola; Wen, Xiaoqing; Power-Aware Testing and Test Strategies for Low Power Devices, New York: Springer, 2010. [16] X. Lin, R. Press, J. Rajski, P. Reuter, T. Rinderknecht, B. Swanson and N. Tamarapalli, “High-frequency, at-speed scan testing,” IEEE Design & Test of Computers, pp. 17 - 25, 15 September 2003. [17] Wu, F. et al., "Analysis of power consumption and transition fault coverage for LOS and LOC testing schemes," 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems, Vienna, 2010, pp. 376-381. doi: 10.1109/DDECS.2010.5491748 [18] V. B. Jayaram, “Experimental Study of Scan Based Transition Fault Testing Techniques,” Virginia Polytechnic Institute and State University, 2003. [19] Li, Y.H., W. C. Lien, I. C. Lin and K. J. Lee, "Capture-Power-Safe Test Pattern Determination for At-Speed Scan-Based Testing," in IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, vol. 33, no. 1, pp. 127-138, Jan. 2014. doi: 10.1109/TCAD.2013.2282281 [20] Y. WANG, “Research and Design of Low Power Consumption Testing Generator for Integrated Circuits,” in Third International Conference on Information and Computing, Wuxi, Jiang Su, China , 2010. [21] Moghaddam, E. K., J. Rajski, M. Kassab and S. M. Reddy, "At-speed scan test with low switching activity," 2010 28th VLSI Test Symposium (VTS), Santa Cruz, CA, 2010, pp. 177-182. doi: 10.1109/VTS.2010.5469580 [22] Zhang, Z., S. M. Reddy, I. Pomeranz, J. Rajski and B. m. Al-Hashimi, "Enhancing delay fault coverage through low-power segmented scan," in IET Computers & Digital Techniques, vol. 1, no. 3, pp. 220-229, May 2007. [23] Arvaniti, E., and Y. Tsiatouhas, "Low power scan by partitioning and scan hold," 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), Tallinn, 2012, pp. 262-265. doi: 10.1109/DDECS.2012.6219070

159

[24] F. Schwierz, “Graphene transistors,” nature.com, 30 May 2010. [Online]. Available: http://www.nature.com/nnano/journal/v5/n7/full/nnano.2010.89.html. [Accessed 31 January 2014]. [25] S. Pontarelli, M. Ottavi, A. Salsano and K. Zarrineh, “Feedback Based Droop Mitigation,” in Design, Automation & Test in Europe Conference & Exhibition, Grenoble, 2011. [26] “opencores,” [Online]. Available: http://opencores.org/. [Accessed 2 12 2014]. [27] “OpenCores projects,” [Online]. Available: http://opencores.org/projects. [28] W. Song, “Async-SDM-NoC,” 9 may 2011. [Online]. Available: https://opencores.org/project,async_sdm_noc. [Accessed 3 nov 2015]. [29] “OpenPiton Open Source Research Processor,” Princeton Univerity, 2015. [Online]. Available: http://parallel.princeton.edu/openpiton/index.html. [Accessed 4 Jan 2016]. [30] “OpenSPARC T1,” Oracle Technology Network, [Online]. Available: http://www.oracle.com/technetwork/systems/opensparc/opensparc-t1-page- 1444609.html. [Accessed Feb 2015 ]. [31] J. Balkind, “OpenPiton: An Open Source Manycore Research Framework,” ASPLOS, Atlanta, GA, USA,, 2016. [32] K. M. Butler, “Minimizing Power Consumption in Scan Testing: Pattern Generation and DFT Techniques,” 2004. [Online]. Available: http://204.12.117.117:8080/itc2004proc/Papers/PDFs/0012_4.pdf. [33] “NanGate Open Cell Library,” NanGate, [Online]. Available: http://www.nangate.com/?page_id=22. [Accessed 3 Oct 2016]. [34] “Design Compiler Reference Methodology,” Synopsys, [Online]. Available: https://solvnet.synopsys.com/retrieve/021023.html?otSearchResultSrc=advSearch&ot SearchResultNumber=2&otPageNum=1. [Accessed 3 Oct 2016]. [35] W. P. R. Group, OpenPiton Synthesis and Back-end Manual, Princeton University, 20 Oct 2016. [36] “hierarchical-dft-how-to-do-more-more-quickly-with-fewer-resources,” [Online]. Available: http://chipdesignmag.com/sld/blog/2016/02/26/hierarchical-dft-how-to-do- more-more-quickly-with-fewer-resources/.

160

[37] T. s. r. manual, Mentor Graphics, 14 March 2015. [Online]. Available: https://documentation.mentor.com/en/docs/201702035/tshell_ref/html/id10a11ce7- 6648-42f3-aec3-4e2ed7245e8c . [38] “what-s-difference-between-traditional-and-defect-simulated-fault-models,” [Online]. Available: http://electronicdesign.com/test-amp-measurement/what-s-difference- between-traditional-and-defect-simulated-fault-models . [39] “cell-aware-testing,” [Online]. Available: http://www.techdesignforums.com/blog/2012/07/03/cell-aware-testing/. [40] Hapke, F. et al., "Defect-oriented cell-internal testing," 2010 IEEE International Test Conference (ITC10), Austin, TX, 2010, pp. 1-10. doi: 10.1109/TEST.2010.5699229 [41] Hapke, F., and J. Rivers. "Cell-aware library characterization for advanced technology nodes and production test results from a 32-nm processor in Design Automation an Test Conference Europe, March 2012, Dresden, Germany [42] Hapke, F. and J. Schloeffel, "Introduction to the defect-oriented cell-aware test methodology for significant reduction of DPPM rates," 2012 17th IEEE European Test Symposium (ETS), Annecy, 2012, pp. 1-6. doi: 10.1109/ETS.2012.6233046 [43] Soden, J. M. and R. E. Anderson, "IC failure analysis: techniques and tools for quality reliability improvement," in Proceedings of the IEEE, vol. 81, no. 5, pp. 703-715, May 1993. doi: 10.1109/5.220902 [44] “List of HDL simulators,” [Online]. Available: https://en.wikipedia.org/wiki/List_of_HDL_simulators. [45] “EUROPRACTICE Membership List,” [Online]. Available: http://www.europractice.stfc.ac.uk/membership/membership_list.cfm. [46] R. Goering, “Functional Verification Survey -- Why Gate-Level Simulation is Increasing,” 2013. [Online]. Available: https://community.cadence.com/cadence_blogs_8/b/ii/archive/2013/01/16/functional- verification-survey-why-gate-level-simulation-is-increasing. [47] “IEEE xplorer digital library,” [Online]. Available: http://ieeexplore.ieee.org/Xplore/home.jsp. [48] “what is jitter,” [Online]. Available: http://www.silabs.com/products/clocksoscillators/pages/what-is-jitter.aspx .

161

[49] “VSLI concepts,” [Online]. Available: http://www.vlsi-expert.com/p/static-timing- analysis.html. [50] B. Kleveland, “High frequency characterization of on-chip digital interconnect,” IEEE J. of Solid-State Circuits, pp. 716-725, June 2002. [51] “VCS® MX/VCS MXi™ User guide,” synopsys , 2014. [Online]. Available: http://www.cerc.utexas.edu/~grajesh/ee382m/lab3/vcsmx_ug.pdf. [52] Huang, Yu, and Geir Eide. "When good DFT goes bad: debugging broken scan chains," Mentor Graphics , 18 August 2012. [Online]. Available: http://www.techdesignforums.com/practice/technique/software-diagnosis-of-broken- scan-chains/ [53] Y. Huang, R. Guo, W.-T. Cheng and J. C.-M. Li, “Survey of Scan Chain Diagnosis,” 2008. [54] J. Z. Pankaj Pant, “Understanding Power Supply Droop during At-Speed Scan Testing,” in VLSI Test Symposium, 2009. [55] “An ISO 26262 approach to meeting the cost, quality, reliability, and integration needs of automotive ICs,” synopsys tech design forum, 16 jan 2016. [Online]. Available: http://www.techdesignforums.com/practice/technique/an-iso-26262-approach-to- meeting-the-quality-reliability-cost-and-integration-requirements-of-automotive-ics/. [Accessed 13 Feb 2017]. [56] J. Hruska, “Intel claims three-year advantage on 10nm, wants to redefine process nodes,” Extreme tech, 31 march 2017. [Online]. Available: https://www.extremetech.com/computing/246902-intel-claims-three-year-advantage- 10nm-process-wants-change-define-process-nodes. [Accessed 2 april 2017]. [57] Wang, Laung-Terng, Cheng-Wen Wu, and Xiaoqing Wen. VLSI test principles and architectures: design for testability. Academic Press, 2006 [58] Design Compiler User Guide, Mountain View, CA 94043: Synopsys, Inc., 2012. [59] Chen, Z., K. Chakrabarty and D. Xiang, "MVP: Capture-power reduction with minimum-violations partitioning for delay testing," 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, 2010, pp. 149-154. doi: 10.1109/ICCAD.2010.5654124

162

[60] Polian, I., A. Czutro, S. Kundu and B. Becker, "Power Droop Testing," in IEEE Design & Test of Computers, vol. 24, no. 3, pp. 276-284, May-June 2007. doi: 10.1109/MDT.2007.77 [61] S. Bahl, R. Mattiuzzo, S. Khullar, A. Garg, S. Graniello, K. S. Abdel-Hafez and S. Talluto, “State of the art low capture power methodology,” in International Test Conference (ITC), Anaheim, CA, USA, 2011. [62] X. Lin, “Power Supply Droop and Its Impacts on Structural At-Speed Testing,” in Test Symposium (ATS), Niigata, Japan, 2012. [63] S. Sde-Paz and E. Salomon, “Frequency and Power Correlation between At-Speed Scan and Functional Tests,” in International test conference (ITC), Santa Clara, CA, USA, 2008.

163

Appendix A: Ad-hoc techniques: observation and control point insertion

Figure A1-(a) shows an example of observation point insertion. The diagram shows three low- observability nodes and OP2 shows in detail the components of an observation point which are a multiplexer (MUX) and a D type flip-flop. The low-observability node B is connected to the

0 port of the MUX that composes the OP2 observation point. The observation points are connected in series to form an observation shift register using the second port 1 of the MUX. The port selection of the MUX is controlled by SE (shift enable). When SE is set to 0, at the positive edge (in this case) of the clock CK, the values of the low-observability nodes are captured into the D type flip-flops. The flip-flops within the observability points (OP1, OP2, and OP3) operate as a shift register when the signal SE is set to 1 and their contents can be shifted out through the pin OP_output and compared for test results. Therefore the observation point insertion allows the improvement of the observability of specific nodes in the device under test. Figure A1-(b) shows an example of a control point insertion. The diagram shows three low- controllability nodes and CP2 shows in details the components of a control points (CP) which are (as before for the observation point) a MUX and a D type flip-flop. The “Source to Destination” connection at a low-controllability node is cut to route the signal from the source to a MUX. During functional operation, the MUX control signal TM (test mode) is set to 0 in order to have the source signal to drive the “destination end” passing through the 0 port of the MUX When test is enabled, test mode (TM) is set to 1 and in this case the value contained in the D flip-flop drives the “destination end” passing through the enabled port 1 of the MUX. As for the Observation point insertion technique, the D type flip-flops (in OP1, OP2, and OP3) are connected to form a shift register. The values required to test the circuit are generated (where possible using ATPG)) and shifted into the flip-flops from the external pin CP_input in order to control the destination ends of low-controllability nodes. As a result, the controllability of the circuit nodes is dramatically improved.

1

(a)

(b) Figure A1 Observation point insertions (a) and Control point insertion (b) [57]

2

Appendix B: Synopsys reference methodology

generation

The Reference methodology retrieval system is straight forward. Once logged into Synopsys Solvnet is just a matter of opening the following link https://solvnet.synopsys.com/rmgen/, than select the tool and the tool revision that is going to be used and proceed to the next step (figure B-a). Appendix BA Reference Methodology Generation (a)

(b)

3

The second step consists in selecting the desired setting and submitting the request of RM. The website will move onto a 3rd page where it is possible to download the scripts after agreed terms and conditions.

4

Appendix C: Synopsys RMgen dc.tcl scrip added

content with DFT synthesis set to TRUE

The following content has been added to the dc.tcl script when the DTF synthesys variable is set to TRUE when generating the Synopsys RM # compile_ultra -scan -gate_clock -spg -check_only } compile_ultra -scan -gate_clock -spg ########################################################################### # Save Design after First Compile ########################################################################### write -format ddc -hierarchy -output ${RESULTS_DIR}/${DCRM_COMPILE_ULTRA_DDC_OUTPUT_FILE} # Writing out the updated DC blocks after compile_ultra foreach design "${DC_BLOCK_ABSTRACTION_DESIGNS_TIO}" { write -format ddc -hierarchy -output ${RESULTS_DIR}/[dcrm_compile_ultra_tio_filename $design] $design } ########################################################################## # DFT Compiler Optimization Section ###########################################################################

########################################################################## # Verilog Libraries for Test Design Rule Checking ########################################################################## # For complex cells that do not have functional models in .lib format, # you can supply a list of TetraMAX-compatible Verilog libraries # for test design rule checking. # Set the following variable in the dc_setup.tcl file:

# set_app_var test_simulation_library

######################################################################### # DFT Signal Type Definitions # # These are design-specific settings that should be modified. # The following are only examples and should not be used. ##########################################################################

# It is recommended that top-level test ports be defined as a part of the # RTL design and included in the netlist for floorplanning.

# If you create test ports here and they are not in your floorplan, you should use # create_terminal for these additional test ports for topographical mode synthesis.

if {[shell_is_in_topographical_mode]} {

5

# create_terminal -layer "layer_name" -bounding_box {x1 y1 x2 y2} -port ScanPortName ... (repeat for each new test port) }

# If you are using the internal pins flow, it is recommended to run the # change_names command before set_dft_signal to avoid problems after DFT insertion. # In this case, set_dft_signal pins should be based on pin names after change_names.

# change_names -rules verilog -hierarchy

# set_dft_signal -view spec -type ScanDataOut -port SO # set_dft_signal -view spec -type ScanDataIn -port SI # set_dft_signal -view spec -type ScanEnable -port SCAN_ENABLE # set_dft_signal -view existing_dft -type ScanClock -port [list CLK] -timing {45 55} # set_dft_signal -view existing_dft -type Reset -port RESET -active 0

puts "RM-Info: Sourcing script file [which ${DCRM_DFT_SIGNAL_SETUP_INPUT_FILE}]\n" source -echo -verbose ${DCRM_DFT_SIGNAL_SETUP_INPUT_FILE}

########################################################################## # DFT for Clock Gating # # This section includes variables and commands used only when clock gating # has been performed in the design. ##########################################################################

# Use the following command to initialize clock gating cells for test that are # made transparent with a signal held constant for testing, e.g. of type 'Constant'. # The value set depends on the hierarchy depth of the clock-gating cells. # This setting is not needed where clock-gating cells are controlled with scan enable.

# set_dft_drc_configuration -clock_gating_init_cycles 1

# To specify a dedicated ScanEnable/TestMode signal to be used for clock gating, # use the "-usage clock_gating" option of the "set_dft_signal" command

# set_dft_signal -view spec -type -port -usage clock_gating

# You can specify the clock-gating connectivity of the ScanEnable/TestMode signals # after they are predefined with set_dft_signal -usage clock_gating

# set_dft_connect

########################################################################### # DFT Configuration ##########################################################################

# Preserve the design name when writing to the database during DFT insertion.

6

set_dft_insertion_configuration -preserve_design_name true

# Do not perform synthesis optimization during DFT insertion. set_dft_insertion_configuration -synthesis_optimization none

# Multibit cell handling # Specify -preserve_multibit_segment to false to treat the cells inside a # multibit component as discrete sequential cells. This improves balancing # of scan chains. # Starting I-2013.12 release, the default setting is false # set_scan_configuration -preserve_multibit_segment false

## DFT Clock Mixing Specification # For top-level integration, clock mixing is recommended, if possible: set_scan_configuration -clock_mixing mix_clocks

# If clock-mixing is not possible, use the following setting: # set_scan_configuration -clock_mixing no_mix

########################################################################### # DFT AutoFix Configuration ###########################################################################

# Please refer to the DFT Compiler Scan User Guide, Chapter 7, # "Advanced DFT Architecture Methodologies", "Using AutoFix" section.

# Please refer to the dc.dft_autofix_config.tcl file included with the # Design Compiler Reference Methodology scripts for an example of a # design-specific AutoFix configuration.

# Create a design-specific Autofix configuration file and uncomment the # following line to source this file.

# source -echo -verbose ${DCRM_DFT_AUTOFIX_CONFIG_INPUT_FILE}

########################################################################## # DFTMAX Compression Configuration ##########################################################################

# To enable DFTMAX compression insertion regenerate a new set of scripts with default # configuration or uncomment the following command to enable it.

# set_dft_configuration -scan_compression enable

# DFTMAX Compression Options: # # -min_power true # This specifies that compressor inputs are to be gated for functional power # saving. # It also reduces glitching during functional and capture operations

7

# Default for -min_power option is false. Recommend that you set this to # true. # # -xtolerance: value is set to tool default. # Specify "high" to generate DFTMAX compression architecture that has 100% X- tolerance. # # -minimum_compression: tool default is a target compression ratio of 10, # # -location # Specifies the instance name in which the compressor and decompressor # will be instantiated. # The default location is the top level of the current design. # # For details on these and other DFTMAX compression options, please refer to the # DFTMAX User Guide, Chapter 2, "Using DFTMAX Compression" # and Chapter 4, "Managing X Values in Scan Compression".

# set_scan_compression_configuration -xtolerance high -min_power true;

# Use the following to define the test-mode signal to be used for DFTMAX # compression. Ensure that that test mode signals to be used for clockgating have # been configured with set_dft_signal -usage clock_gating.

# set_dft_signal -view spec -type TestMode -port scan_compression_enable

########################################################################## # DFT Pipelined Scan Data Configuration ###########################################################################

# Pipelined Scan Data Registers are commonly used with DFTMAX designs to # reduce the delay between top-level scan-in and scan-out ports and the # first and last element of the scan chain.

# For details on Pipelined Scan Data please refer to DFTMAX Compression User Guide # Chapter 5, "Pipelined Scan Data"

# Enabling automatic insertion of Pipelined Scan Data registers

# set_dft_configuration -pipeline_scan_data enable

# Controlling Automatic insertion of Pipelined Scan Data Registers

# Use set_pipeline_scan_data_configuration to control how Pipelined Scan Data Registers # should be inserted

# Options: # -head_pipeline_clock # -tail_pipeline_clock # -head_pipeline_stages

8

# -tail_pipeline_stages

# Note: No scan clock signal can be shared with the Pipelined Scan Data Register clock

# set_pipeline_scan_data_configuration -head_pipeline_clock \ # -tail_pipeline_clock \ # -head_pipeline_stages \ # -tail_pipeline_stages

######################################################################### # DFT Additional Setup ##########################################################################

# Add any additional design-specific DFT constraints here

######################################################################### # DFT Test Protocol Creation ###########################################################################

create_test_protocol

########################################################################## # DFT Insertion ########################################################################## # Use the -verbose version of dft_drc to assist in debugging if necessary

dft_drc dft_drc -verbose > ${REPORTS_DIR}/${DCRM_DFT_DRC_CONFIGURED_VERBOSE_REPORT} report_scan_configuration > ${REPORTS_DIR}/${DCRM_DFT_SCAN_CONFIGURATION_REPORT} report_dft_insertion_configuration > ${REPORTS_DIR}/${DCRM_DFT_PREVIEW_CONFIGURATION_REPORT}

# Use the -show all version to preview_dft for more detailed report preview_dft > ${REPORTS_DIR}/${DCRM_DFT_PREVIEW_DFT_SUMMARY_REPORT} preview_dft -show all -test_points all > ${REPORTS_DIR}/${DCRM_DFT_PREVIEW_DFT_ALL_REPORT}

insert_dft

########################################################################## # Re-create Default Path Groups # # In case of ports being created during insert_dft they need to be added # to those path groups. # Separating these paths can help improve optimization. ##########################################################################

9

set ports_clock_root [filter_collection [get_attribute [get_clocks] sources] object_class==port] group_path -name REGOUT -to [all_outputs] group_path -name REGIN -from [remove_from_collection [all_inputs] ${ports_clock_root}] group_path -name FEEDTHROUGH -from [remove_from_collection [all_inputs] ${ports_clock_root}] -to [all_outputs]

########################################################################## # DFT Incremental Compile # Only required if scan chain insertion has been performed. ######################################################################## compile_ultra -incremental -scan

########################################################################### # High-effort area optimization # # optimize_netlist -area command, was introduced in I-2013.12 release to improve # area of gate-level netlists. The command performs monotonic gate-to-gate # optimization on mapped designs, thus improving area without degrading timing or # leakage. ########################################################################### optimize_netlist -area ########################################################################### # Write Out Final Design and Reports # # .ddc: Recommended binary format used for subsequent Design Compiler sessions # Milkyway: Recommended binary format for IC Compiler # .v : Verilog netlist for ASCII flow (Formality, PrimeTime, VCS) # .spef: Topographical mode parasitics for PrimeTime # .SDF: SDF backannotated topographical mode timing for PrimeTime # .sdc: SDC constraints for ASCII flow # ########################################################################### change_names -rules verilog -hierarchy

########################################################################### # DFT Write out Test Protocols and Reports ########################################################################

# write_scan_def adds SCANDEF information to the design database in memory, so # this command must be performed prior to writing out the design database # containing binary SCANDEF.

# Write out top-level SCANDEF for physical synthesis write_scan_def -output ${RESULTS_DIR}/${DCRM_DFT_FINAL_SCANDEF_OUTPUT_FILE}

# Note: check_scan_def is not supported with subdesign abstraction

10

# Write out expanded SCANDEF for floorplanning purposes # Need to derive Tcl list of hierarchical cells that are not IC Compiler ILMs or IC Compiler Block Abstractions for SCANDEF expansion if { (${DDC_HIER_DESIGNS} != "") || (${DC_ILM_HIER_DESIGNS} != "") || (${DC_BLOCK_ABSTRACTION_DESIGNS} != "") || (${DC_BLOCK_ABSTRACTION_DESIGNS_TIO} != "") } { set hier_cells "" set HIER_DESIGNS "${DDC_HIER_DESIGNS} ${DC_ILM_HIER_DESIGNS} ${DC_BLOCK_ABSTRACTION_DESIGNS} ${DC_BLOCK_ABSTRACTION_DESIGNS_TIO}" foreach_in_collection hier_cell [sub_instances_of -hierarchy -of_references ${HIER_DESIGNS} ${DESIGN_NAME}] { lappend hier_cells [get_object_name $hier_cell] } write_scan_def -expand_elements ${hier_cells} -output ${RESULTS_DIR}/${DCRM_DFT_FINAL_EXPANDED_SCANDEF_OUTPUT_FILE} }

report_dft_signal > ${REPORTS_DIR}/${DCRM_DFT_FINAL_DFT_SIGNALS_REPORT}

# DFT outputs for standard scan mode

write_test_protocol -test_mode Internal_scan -output ${RESULTS_DIR}/${DCRM_DFT_FINAL_PROTOCOL_OUTPUT_FILE} current_test_mode Internal_scan report_scan_path > ${REPORTS_DIR}/${DCRM_DFT_FINAL_SCAN_PATH_REPORT} dft_drc dft_drc -verbose > ${REPORTS_DIR}/${DCRM_DFT_DRC_FINAL_REPORT}

# DFT outputs for compressed scan mode

# write_test_protocol -test_mode ScanCompression_mode -output ${RESULTS_DIR}/${DCRM_DFT_FINAL_SCAN_COMPR_PROTOCOL_OUTPUT_FILE} # current_test_mode ScanCompression_mode # report_scan_path > ${REPORTS_DIR}/${DCRM_DFT_FINAL_SCAN_COMPR_SCAN_PATH_REPORT}

###########################################################################

11

Appendix D: accessing Mentor tools

To access any Mentor Graphic tool, it is necessary to have a valid up to date license file. Licenses are very expensive and in enterprise environments they are usually stored in secure servers separate from machines executing the jobs. Before the tool is enabled it is necessary to load the license file. If this is not found or invalid the shell will exit immediately. Table 15 shows an output example received when the license is not found. This error gives the opportunity to see how many tools are available under the Tessent shell. Table 15 License setup error from Mentor Test-Kompress machine003> run_testkompress_edt /p/eda/mentor/tessent/2013/tessent -shell -dofile stuckat_edt.cfg -log ../logs/atp2_stuckat_edt_270117_0808.log -replace // Tessent Shell 2013 Thu Mar 10 04:08:24 GMT 2016 // Copyright 2011-2017 Mentor Graphics Corporation // // Mentor Graphics software executing under Linux.64 bit version // Host: host_name (XXXXX MB RAM) // // Error: The following licenses are not available: // Scan (dftadvisor) // FastScan (mtfastscanf) // fastscan

// IJTAG (mtijtagf)

// SOCScan (mtSOCscanf)

// ScanPro (mtscanprof)

// MemoryBIST-LV (mtmemorybistf)

// Diagnosis (yieldascandiag)

// TestKompress (testkomp)

// LogicBIST (mtlogicbistf)

12

1