University of Alberta

PARALLELFACLT SISICLATIOS ON THE CeRAhI ~\RCHITECTCRE

-. & 'w' Albert L.. Kwong

-4 thesis submitted to the Faculty of Graduate Studies and Research in partial fulfill- ment of the requirements for the degree of Master of Science.

Depart ment of Elect rical and Corn put er Engineering

Edmonton. ,Alberta Fa11 1998 National Library Bibliothèque nationale du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques

395 Weaington Street 395, nie Wellington OttawaON K1AW ûüawaON K1AW canada canada

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Libracy of Canada to Bibiiothèque nationale du Canada de reproduce, loan, distriiute or sell reproduire, prêter, distxibuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la fome de microfïche/nlm, de reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantid extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract

Fault simulation is used to determine if the faulty behal-iour of a given circuit with a fault added is detectable ii-hen a list of test patterns is supplied. Siniulating al1 potential faults in a circuit witli a sufficient number of patterns to observe a11 faiilts is comput at ionall>-intensive. Fort unately. a tiigh esistc in the fault simuiat ion application. For esample. convent ional parallel faul t simulat ion can t îke adlantage of the datapat h of the processor and use the logical operations provided by the CPIl to perform parallel gate evaluations. This algorit lin1 cari he furt her accelerated usirig a niassii-ely parallel coniputer.

C'.R.-\.\I is a S [.\ID architecture t hat integrates processing elements \vit h rnemorj-.

-4 SJ-steriiwith 2.5611 bytes of C.R.-\.\I rnemory can contain up to 12Sk processing elenient s i PE). providing a ver!- wide datapat 11. \\ë trislied to deterniirie \diet lier or not a parallel fault siniuiator impleriiented on C'.R.-\.\I coiilcl niake effective use of ttiis datapat h to accelerate fault simulatioii.

In t liis research. three different esperimental C'eR.-\II fault siniulators lvere de- signed. The simulators implement pat tern-parallel fault sirnulat ion ( sirn/-pp). fault- parallel fault simulation (sincf-h~s).and a hybrid of the pattern-parallel and fault- parallel algorithnis (sirilf-hps). T~voversions of sirrljXps were iiiipleiiiented: static- hjvbrid parallel fault simulation and dynamic-hybrid parallel fault simiilation.

Bx using the ISC.-\S B5 benchmark circuits to e\-aluate the siniulators. it was found that al1 of Our sirnulators (escept simf-Jpa) run faster thri our benchmark conventional siniulator. sirr?f. SitnI-fps is geiierally inefficient. escept for circuits ivith rnostl! eaq--to-detect faults. Sirnf-pps is not as efficient as tlie liybricl fault siiiiiilators in niost of the cases. escept when the circuit has many hard-tedetect faults. \\'lien sirniilating with 4096 PEs. the hl-brid faolt simulators were found to irnproi-e upori our benchmark simulator b~-10 to 30 tinies. The h>-brid fault siniulator is scalable in that we observe increasing speed-up for up to 1Gk PEs. Acknowledgement s

1 would Iike to thank m>-supervisors. Dr. Bruce F. Cockburn and Dr. Duncan C. Elliott . for t heir help in the course of this research. The? have provicled valuable ideas and background knowledge. I \vould also like to thank my supervisory committee. Dr. Martin llargala and Dr. Jonathan Schaeffer. who have gi\-en me many useful comments. My thanks also goes to TRLabs and lkronet for their financiai support. Most importantly. 1 thank C;od who gave me this opportunity to study and the ability to finish it. Praise the Lord. Amen. Contents

1 Introduction 1 1.1 ProblernDefinition ...... 1

1.2 Testing and Fauit Simulation ...... -1- 1-13 SIlID Parallel .Architectures ...... 4 . 4 Thesis Outline ...... 3

2 CeRAM and SIMD machines 7 . 2.1 Historj...... 1 2.2 The C'oRAlI Architectirre ...... 10

2.2. I Tlie Processing Ele~nent( PE i ...... 11 2.2.- The C'oinnliinicat ion Setwork ...... 12 2.3 llernory St riictirre ...... 14 2.3 The CaR.-\II Programniing Ilotlel ...... 1-1 2.4 .-\ci i.antagesantl Disad\antagesofC0oR.4.\I ...... 17

3 Fault Models 19 3 . L Introdiict ion ...... 19 3.2 Strick..-lt Faults ...... 20 3 ..3 Gate-Delay Faiilts ...... -1.--1 3.4 Stuck-Open Faults ...... 2.5 4.5 Fault Inipiication and Eqitii-alence Among Fault .\ Iotlels ...... -1-

4 Fault Simulation 36 3.1 Int rodiict ion ...... 36 4.2 Seqiient ial Faiilt Simtilation ...... 40

8 Hybrid Fault Simulation on CaRAM 103 S .1 Overview ...... 103 8.2 Design Details ...... 105 Y 2.1 Partit ioning CORAM hlemory ...... 10.5 Y 2.2 llodified Fault-Farnily Fault Simulation ...... 109 8.2.3 Hybrid-Parallel Fault Simulation ...... 112 S.2.4 Fault Triggering and Fault Detection ...... 113 8.2.5 Dynamic-HJ-brid Fault Simulation ...... 119 S .3 Evaluation ...... 118 s.3.1 PE Ctilization ...... 1'21 S.3.2 Simulating I-sing a Larger CeR.-\.\I ...... 125

9 Conclusion 128 9.1 Summaryof Results ...... ES 9.2 Furt her Research ...... 130 9.2.1 Fauit-Simulating Large Circuits ...... 131 9.2.2 ÇequentialCircuits ...... 131 92.3 Other Fault ,\ loclels ...... 133

Bi bliography 133

Appendices 136

-4 Source Code 136 1 Basic lioduies ...... 1:36 -h ...... 136 prototype-h ...... 137 simf.C ...... 137 readiscas.Cg ...... 1:38 genfau1t.C ...... 111 fault lamil'.. C ...... 135 -h ...... 116 heap.C ...... 117 4-19 crammapper.h ...... 14s A .1.10 crammapper-C ...... 118 A.2 sirnf Benchmark Fault Simulator ...... 150 A.3 Pat tern-Parallel Fault Simulator ...... 1.36 .=\.a Fault-Parallel Fault Sirnulator ...... 161 4.5 Hybrid Fault Simulator ...... 166 List of Tables

:3.I Triggering Conditions for Stuck-At Faults affecting SOT Gates (In- verters) ...... 3.2 Triggering Conditions of Stuck-At Faults for '-input SASD Gate . . 3.3 Triggering Conditions for Gate-Delay Faults Affecting a ?-input SAXD Gate ...... 3.4 Triggering Conditions for St uck-Open Faults -4ffect ing 2-input ;\XII Gate ...... 3.5 Triggering Conditions for Faults Affecting the SOT Gate ...... 3.6 Triggering Conditions for Faults .A ffecting the BCFF Gate ...... 3.7 Triggeriiig Condition for Faulti -4ffectinj the S.-Il-D Gate ...... 3- Triggering Conditions for Faults Affecting the ASD Gate ...... 3.9 Trigering Conditions for Faults -4ffecting the TOR Gate ...... 3.10 Triggering Conditions for Faults Affecting the OR Gate ......

5 1 Root-to-Fault Ratio for the ISC.4SS.j Circuits ...... 5.2 Evaluar ion of Simulation Speed (in seconds) ...... 5.3 Fault C'ollapsing Results for the ISC-AS85 Circuits ......

6.1 Comparing simj and simf-pps in Terms of Speed- L-p ...... 6.2 C'omparison of PE L*t ilization for simf and simflpps ...... I -1 Speed Difference in Fault-Free Simulation ...... 7.2 llemory Requirernent \Vit h or \Vit hour Dynamic .\ lernory -4llocation 7.3 Interpretation of Flags in the Event-List ...... t -4 Result Cornparison of PPS with FPS ...... t 3 Ana1pis of the FauIt-ParalleI FauIr Simulation Results ...... PE 1-tilization : Average Xurnber of FFs Sirnulatecl Per Pass ..... 99 Effect of Depth-First-Search from Output Optimization ...... 100 Fault Triggering Detection Time in FPS ...... 101

Speeding-up HPS by L-sing Depth-First-Search Grouping ...... 110 Speeding-up HPS by king C*R.UI Mapper ...... 112 Final Results ...... 118 PE rtilizations of the Hybrid Fault Simiilators ...... 123 Fauit Simulation Resuits with 41; PEs ...... 125 Speed-up of Fault Simulations Going frorn 11; PEs to 41; PEa .... 12.5

Optimization Implementedin Each Fault Simulator ...... 129 Fault Simulation Results \\'ith 41; PEs Running at 143.\IHz ..... 130 List of Figures

. SISD .Single Instruction Stream . Single Data Stream ...... t SIAID .Single Instruction Stream . AIuitiple Data Stream ...... 8 11ND . Mdtiple Instruction Stream. SIuItiple Data Stream . 9 The CaRA,\ 1 Architecture ...... 10 C'OR-UIPEStructure ...... 11 CeR.411 Programming llodels ...... 1.5 1Iemor~-Bandwidth Throughout the Systern ...... 17

SodeTypes ...... 20 Inverter(SOTgate)~\-ithPossibleStuck--4tFaults ...... 21 Fault Collapsing for the Stuck-.At Fault SIodel ...... 33 Gate-Dela!- Faults -4ssociated with ?-input 1-ASD Gates ...... 23 Fault Collapsing for the Gate-Delay Fault Mode1 ...... 3 .A ?-input S.lSD Gate with Stuck-Open Fauit ...... 26 C'ommon Structure of CMOS -4SD Gates and OR Gates ...... 2, . Fault Implication and Fault Collapsing Relationships ...... 3.5

The Sîan Chain/BIST -Architectures ...... 36 Hardirare Configuration of RTPGs ...... 38 Fault Coverage of Different RTPGs ...... 39 Flow C'hart of the Basic Fault Simulation ...... 41 2-DimensionalFauItSimulationSpace ...... 12 '2-D Fault Simulation Space of Fault-ParaIlel Fault Simulation .... 4-1 2-D Fault Simulation Space of Pattern-Parallel Fault Simulation ... 45 2-D Fault Simulation Çpace of Hybrid Parallel Fault Simulation ... 47 Classes of Faults ~vithRespect to Fault Detection ...... Id 4.10 Ciirves for Cndetected Faiilts . Triggered Faults ancl Detected Faitlts . -Id

5 1 Top Level Architectural Design of sirnj ...... 51 5 .2 Cornputer Representation of Circuit Sodes in PPSFP ...... 53 5.3 Top Level View of the Circuit ...... 54 4 -4 Fault-List Esample ...... 56 5.5 Gate Evaluation in sin2 f ...... 59 6 Fault Propagation in Circuits ...... 60 " - a .1 Csing a Heap for Sorting Simulation Events ...... 62 5.8 Heap .Random Input . Sorted Output ...... 63 5.9 Detect ing Circuit-Sode Transit ions ...... 6.5 5-10 Fault Insertion Psing Transition .\ Iasks ...... 66

6.1 Data Structure I'sed for Storing C'ircuit Sode States ...... 713 6 .2 king the First PE to Backiip Data ...... 74 6-13 C'opyiug Test Patterns into C'mR.-\'\I ...... 76 . 1 Dataflou- Diagram of sirriLjj).? ...... $5 . r -2 Illiist rat ion of C'irciiit Socle Life.C*~.çles ...... $7 7.3 Gate Evaliiatioii witli .\ Iultiple Faiilt-Insertio~i ...... 91 1.4 Probleniu-ith Fari-out Sodes ...... 91 . 1 3 Fault Insert ion \\ït h t lie Token-passing .\let hod ...... 95

8. 1 S t at ic-HJ-brid Fault Siiiiulat ion versus Dynamic-Hyhrid Faiilt Simulat ion 104 S .2 Flow Chart of a Fauit-Simiilatiori Rotinci ...... 106 3 Logical Partition of C'eR-411 ...... 101 Y.-! Partitioiiing With PEid ...... 108 L.5 Data Supporting Fault Groupinj ...... 109 S.6 Simuiating \\'itii Fault-Groups in a CeR.A.\I Partition ...... 111 8.7 Fault Triggering/ Detection .\ Iechariism ...... 113 S-S Clioosing the Best Fault-to-Pat tern Ratio ...... 11;

S.9 Effect of I-sirig Differeiit Suniber of Partit ions ( Sornialized to f= 1 ) . 120 8.10 Determiriing a Lseful of PEs ...... 12'1 S.11 PE I-tilization C'urve ...... 124

Glossary

equivalent faults Two faults are equivalent if t hey produce ident ical behaviour from the prirnary inputs to the primary outputs. erroneous node -4 node is said to be erroneous if its Boolean state in fault simula- tion is complernentaq- from that in a good simulation. fanout node -4 node that contains at least three pieces of wire: the stem mire that cornes out of a gate output. and the two or more fanout !rires t liat leads to other gate inputs. fanout wire A [rire in a faiioiit notk that is coniiectecl to a gare input. fault or iogical fault hbstract represeiitat ion of the effects on sigrials of one or more pliysical clefects in the circuit iirider test. fault collapsing The of rediicing the nuniher of faults by fiiiding fatilts tliat are ecluivalerit aiicl then only retaining one fault froni eacii fauir ecluivalence class. fault coverage The ratio betn-een the niimber of detected faiilts and t lie total nutii- ber of considered faults in a circuit under test. fault detection Tlie situation tvhen r lie effects of a lault propagate to at least one primary out put and t liiis produce an esternaIl!. ohseriable error. fault dropping The process of reniai-ing detected faults froni the faiilt list. fault effect The consequence of a fatilt which causes a wire to contain complernen- tary values in a rnodeled fauitj- circuit and the fault-free circuit. fault impiication Fault .A implies fault B if the detection of fault .-\ bu a test implies that fault B will also be detected by the sarne test.

fault list At the start of a fault simulation. this list contains al1 considered faults. The list is updated to contain only undetected faults before each round of fault simulation.

fault model ked in a general sense to mean the set of al1 the different faults used to represent al1 possible faulty behaviours in a circuit under test.

fault propagation The process of determining the spread of fault effects dom- Stream from a fault going towards the primary outputs of the circuit under test. fault simulation The process of computing the future behaviour of a modeled faulty circuit under test when presented with a given test. fault triggering A process ivhich determines ivhether or not the conditions esist for creating the primary effect of a fault in the circuit under test. fault type Faults are sonietimes grouped into fault t>-pes according to the kind of family behaviour involved. In this thesis. ire consider the following fault types: stuck-at-0 (S-40). st uck-at-1 (Sr\1 ). slow-terise (SR). slow-to-faIl (SF).

S-transistor stuck-open ( SO ). P-transistor stuck-open (PO). good simulation The procesv of determining the node values in a fault-free circuit under test wlien presented wit h a $*en test.

ISCAS85 circuits .A set of benchmark combinational circuits in a specific format that is widely used in the research community. They rvere released to the research community at the 19s-5 International Symposium on Circuits and Sus- tems. node a model of the low resistance connection of one or more physical conductors in a circuit that should carry. in a good circuit. the same electrical signal. node (non-fanout) A node that contains only one piece of wire. The wire connects one gate output \vit h one gate input.

PE utilkation The ratio between the number of PEs doing useful work and the total number of PEs in the CmRI.\I. -4 PE does useful work if the corresponding calcdation would have to be done in a conventional sequential fault simulator. physical defect -4 manufacturing flaw in integrated circuits that can change the behaviour of a circuit. primary effect The first fault effect at a gate output that is caused in a circuit under test by the presence of a physical defect. primary input -4 circuit node that is driven by a signal source external to the circuit under test. primary output A circuit node that does not drive other gates in the circuit under test and ~vhosestate is observable bu an esternal tester or obser\-er. simulation pass The process in ~vhicha siihset of the current1~-undetectetl faults are fault eimiilated together with a subset of the test patterns. simulation round -4 sequence of simulation passes t hat fault simulates al1 the un- detected faults for a given set of test patterns. stem wire A wire in a fanout node that is connecteci to a gate output. test -4 seqiience of test patterns ttiat is intended to be used to detect mosr if not al1 considered faults in a circuit under test. test pattern -4 Boolean input vector that is used as part of a test. The ienpth of a test pattern equals the number of primary inputs. test pattern generation The process of producing a sequence of test patterns to be used as a test. Test pattern generation can be implemented either using custom hardware or as a program running on a programmable conipiiter or tester. simf The benchmark convent ional par tern-parallel fault simulator used in the t hesis. simffps A faul t-parallel fault simulator implemented on COR-AM. simf-pps A pattern-parallel fault simulator implemented on CoR.UI. wire .-\mode1 of a physical conductor in a circuit. LVires are assumed to be indivisible or primitive elements of a circuit. Chapter 1 Introduction

1.1 Problem Definition

When manufacturing Ver? Large Scale Integration (YLSI) chips. randorn defects oc- cur despire the best precautions. To iiiisure that shipped chips are defect-free. it is necessary to apply a sequence of test patterns1 to the inputs of the circuit under test (CL-T) and then compare the output signals from the CCT ivith those espected of a good chi p. .A defect is detected if the sequence of outputs generated by the CL-T is different from that of a good chip. Different sequences of Test patterns can detect different faults. Ir is desirable to select a minimum length sequence of test patterns to detect most. if not all. of the espected faults since the test tirne is proportional to the number of test patterns required for testing. and testing is a major cost of production. Ftrult sirnulntions are riin to evaluate the fault detecting abi1it~-of a proposed sequence of test patterns. This in\-olves taking the sequence of test patterns. possi blj- generated by a pseudo-Random Test Pattern Generator ( RTPG). and inputtjng the patterns into a simulated CCT. In a Jauft simulation. the circuit is simulated with a fault inserted. a fault that is selected from one of many possible associated fault models. BJ~-inserting a fault" it is meant that the circuit is assumed to be affected bj- the fault. In a fault-frce simulation. on the ot her hand. the CL-T is assumed to have no faults at all. i.e. the CCT is assumed to be good. The out put signal sequence from a fault simulation is compared with that of the fault-free one to determine if the presence of the inserted fault can be observed as a signal at at least one ClTT output

'-4 sequence of n bit binarv numbers where n equals the number of prirnary inputs. that differs from the corresponding signal produced by a good CL-T. The main goal of the research described in this thesis is to evaluate alternative parallel implenientations of ivell-known parallei fault simulation algorithrns on the Computational RASI (COR-411)architecture [Tl.

1.2 Testing and Fault Simulation

Ph ysical dcfect.5 can cause a chip to behave in numerous different erroneous n--S.For esample. a piece of wire that is accidentally shorted to ground will appear to be stuck at the logic \due O. assuming positive logic encoding. Physical defects are typically

too comples and t oo numerous t O consider direct ly. t herefore simplified Boolean fault models are introduced [Id]. The effects of the physical defects are generalized and grouped into various simplified fault models. For esample the stuc(.-at jault represents al1 the defects that could cause a piece ol ivire to appear to be stuck at a logical 1 or 0. In the past. digital \.LSI rnanufacturers only tested for it~ick-atfaults. the most simple fault model. S t uck-at fault s affecting combinat ional circuits are detectable using single test patterns. Hoivever. as the qualit? requirements for l'LSI chips have continued to increase. it has also become necessary to test for more cornples sequential faults2. such as gate del-- faults and stuck-open faults. -4 gate deiay fauit assumes that the ph>-sicaldefect will cause a signal to he driven from Ion--tehigh or high-tdow overly slo~vly.-4 st ucb-open fault assumes that a transistor channel or wire connected to it is disconnected permanent 15 These kinds of faults require two consecut ive test patterns to be applied in order for the fault to be detected. Therefore. the pattern space for detecting these faults is roughly the square of that of the stuck-at faults and more effort is needed to search for a sequence of test patterns t hat yields 100% fault coverage. -4s nient ioned earlier. it is necessq- to generate a sequence of test patterns that detects most. of the expected possible faulrs in a given circuit. There are two basic waxs of doing so. One way is to compute them dct etministicaiiy from the structure of the circuit. .A merno- unit is required for storing each precomputed test sey uence

'Fauits that cause a combinatioriat circuit to eshibit memory.

.)- before applying them to a CPT. As the size of the circuit increases. the size of the memory required also increases. Another ivay of generating the test sequence is to use pseuderandom test pattern generaton (RTPG). Some well known RTPGs are the Linear Feedback Shift Register ( LFSR) and the Linear Hybrid Cellular Automata (LHC.4) [dl. DiEerent RTPGs use different to generate a pseudo-random signal sequence. Fault simulation is needed to evaluat e alternat ive RTPG algorit hms. It is necessary to distinguish between single-fault simulation and multiple fault simuiat ion. The sing1~-fault assurnptiort implies t hat an- circuit under test rvill con- tain a maximum of one fault : on the other hand. the multiple fault aseurnption implies that a circuit can contain an>-number of faults. In reality. the multiple fault assump tion is likely to be able to represent real defects more accurately than the single-fault assumpt ion: hoivever. simulating for multiple fault s of unlimited multiplicity requires the simulation of a ver>- large number of circuits. approsimately proportional to the factorial of the number of wires in the CCT. In [lÏ] it is found that a sequence of test parteriis t hat detects al1 single stuck-at faults has a 99.9% chance of detecting ail the inult iple st uck-at faults. Sirnilar research on other fault models has generally coiifirnied the effect iveness of the single-fault assumpt ion. Therefore. it is reasonable to omit multiple-fauit simulations. In t his research. only single-fault fault simulations are considered. Faiilt simulation can take a very long time to esecute. Harei and Iirishnamurthy ha\*e sliown that tliere esists a class of circuits that requires 0(n2)simulation time. lvhere 11 ir the number of gates [13]. Se\-eral efficient fault simulation acceleration techniques have been proposed. such as the parallel. and concurrent techniques. Con- curre nt /a ult simulation is currently (as of 199s) the most popular technique because it is efficient and it supports not only the simulation of logic units but also of more comples funct ional blocks [30]. Parallel fault simulation. alt hough it doesn't support comples blocks as well as the concurrent technique. ic as efficient as concurrent simu- lat ion. Since our goal \vas to use Ca R.411 to accelerate the simulation process. parallel fault simulation. due to its parallel computational structure. should fit nicely ont0 CaR-UI. which is a SISID architecture. Thus. parallel fault simulation was selected for implementation on the CeRAM architecture. The rest of this chapter provides additional background on testing. fault simulation. and relelant parallel computer architectures.

1.3 SIMD Parallel Architectures

The SIMD (Single-Instruction strrarn. .Clultiplc-Data strearn) computing mode1 was defined by Flynn in 1966 [IO]. In this computing rnodel. a control unit issues a single st ream of instructions to an array of ident ical processors. called the processing

elements ( PEs ). Each processing element executes t his instruction sr ream in parallel. on their own data in a local memory; hence rve have multiple data streams. Historicall- SIMD machines have often been characterized by their high com- putationai porver and high price tag. These machines include the Distributed Array Processor (DAP)designed at Active Slemory Technology Inc. [26]. Thinking Machines Corporation's Connection Machine (CM-?) [29]. and MasPar Computer Corporation's Pl2 They have been used mainly by government and research institutions 12-1 for scient ific and mili t ary research. Khile al1 of the above machines are substantial cornputer systems. there are also SIND computers that are based on inregrateci SIlID processors and on-chip main mernory. This category of SIllD computers are becoming more popular. The main driving force for this change in technolog? was the advance in YLSI processing tech- nology. These new designs can integrate on one chip the processing elements together with the memory. thus taking adtantage of the high data transmission rate (band- width) inside the chip. Recent custom chip designs include the processing chip of BLITZES designed at the .\licroelectronics Center of Sorth Carolina jl.51. Tesas In- struments' Serial \*ide0 Processor (SVP) i5]. SEC Corporation 's Int egrated llemory .ha? Processor ( IlIAP) [:33] and Parallel Image-Processing RA AI (PIP RAM ) [II. the Processing-In-lfemory (PIN)chip designed at the Supercomputing Research Center [1'2]. and Sony Corporation's Linear Arrau Architecture DSP [19]. These processor- memory integrated circuits are sometimes called Processors-In-llemorl- (PIM) or In- telligent RAAI (IRAN) [XI. One design is the Cornputational R.4M ( C'OR-AAI) developed at the L-niversity of Toronto [SI. It is characterized by a large nurnber of processing elements (PEs) and memory elements that are integrated into one chip. Basically. the CeR.-\M architecture is a modified Slemory (R.411).RA11 chips. including horh Dynamic-RAS1 (DR-411) and Static-RA11 (SRAM). are organized into rows and columns of memory cells. Each memory ce11 can store one bit of data. In the CeRA11 architecture. several columns of memory cells are grouped together with a single PE integrated at their sense amplifiers3 (where the number of columns in a group depends on the irnplementation). The columns of memory thus become the PE's local memory. C'mR.A.\I was designed so that the area of the process- ing elements is minirnized. t hus achieving a high mernory-per-P E ratio. One unique feature of the CORMIis that. besides being a SIMD machine. the memory elements of a CmRAlI chip can also be used as in a normal memory chip. thus possibly acting as the main memorJ-of a computer system.

1.4 Thesis Outline

There are two basic ways to impiement parallel fault siniulat ion. One [va!- is to simu- late a nuniber of faulty circuits in parallel with one common test pattern. This is called fault-paralid fnult simulation. The other wa1 is to simulate a numher of test pattern in parallel for one comrnon faulty circuit. This is called pattern-paralld fault simula- fion. These is a ver>-popular fault simulation technique based on the pattern-parallel approach called the Parallel Pattern Single Fault Propagation algorithm (PPSFP)

3 3.-4 program ( sirrd based on t his scheme was impleniented for r his t hesis. and its performance !vas compared with another efficient public domain fault simu- lator. named ain1.1. froni t lie C'niversity of i'ictoria. The new PPSFP program iras tlieri used as a reference to eialuate our C.R,\lI fault simulators. Three versions of fault sirnulators t har run on C.RA.\I were developed. One uses the pat tern-parallel technique (sin2f-pps). one uses the fault-parallel technique ( simf-fps). and the last one is a h>-bridof the two (simLhps). The ISCASS.? benchmark circuits were used as the CrTs [9]. The goal of this thesis is to evaluate three alternative ways to parallelize fault sim- ulation running on the CORANarchitecture. This thesis is divided into the following chapters: Chapters 2 to 4 are background sections that introduce the SLMD architec-

'-1 sense amplifier amplifies the wak recovered ce11 signal to a full-strengch logical O or logical 1.

a- ture. fault modeling and fault simulation. respectively. The design of the programs are discussed in Chapters 5 to S. The ISC.AS file format. common data structures for each program. and the PPSFP simulator running on a sequential machine are discussed in Chapter -5. Chapter 6 presents the design of the pattern-parallel fault simulator running on the C*RAM architecture. Chapter 4 describes the fault-parailel version. and Chapter 8 the hybrid version. Finallu. conclusions are made in Chapter 9. Chapter 2 CaRAM and SIMD machines

2.1 History

By referring to how the instruction streams and data streams are handled. computer architectures can be divided into four categories. namely. Single Instruction Single Data (SISD). Single Instruction .\Iult iple Data (SIIID ). lIultiple Instruction Single Data (SIISD). and lIu1tiple instruction '\Iiiltiple Data (IIIUD) [IO]. SISD machines ( shown in Figure 2.1 ) are convent ional machines wir h a single processor esecut ing a single instruction streani. When a number of SISD machines are put together to esecute a single instruction stream on their orvn data. ive have a SIlID architec- ture. .A lIIJID machine is one with multiple processing units executing their own instruction st reams on t heir own data. and occasionallj- communicat ing among t hem- selves. FinaIl>-.a 11ISD machine is one t hat esecutes multiple instruction streams on a single data stream. Whetlier this architectiire esists is dehatable in the compter architecture cornmunit>-.

Processor

-

Instruction Instruction Stream Datapath Fe tc h ) (ALU+ Registers) and Decode

Figure 2.1: SISD - Single Instruction Stream. Single Data Stream I Communication Network I

PE PE PE Datapath Datapath . . Datapath (ALU+ Registers) (ALU + Registers) (ALU + Registers)

m 4 A Instruction Fe tc h I 1 and Decode Instruction Smam

Figure '2.2: SIUD - Single Instruction Stream. .\luItiple Data Stream

In the early days of computing. the rnajority of cornputers were SISD machines. To achieve higher computing power than a SISD machine could deliver. a logical way is to group man)- SISD machines together. In the 1950s. since each transistor was so precioiis. it was necessary to achieve the highest possible parallelism ivith the lowest hardware oi-erlieacl. hder t hese constraints. the SIND architecture was invented. A SIAID machine (sho~vnin Figure 2.2) contains an axa!. of identical processors. called r he processing clen, ents ( PE). These PEs esecute the same instruction stream in parallel on their local data stream. SIllD machines have usually been estrernely espensire because of the high cost of hardware. design. and programming (which requires a good conipileri. -4s a result. the main market for these machines has. in the past . been largely linii ted to gorernmrnt research agencies. -Utliough a massively parallel SIlID architecture offers potent ially superior per- formance 01-er a con\-entional SISD architecture. it can generally only do so for a certain type of application. For an application to take adiantage of the parallelism of a SIMD machine. the probiem must be decornposed into an arraJ- of sirnilar sub- problenis. Each of these sub-problems must be mapped ont0 a PE of the SI'IID machine so that multiple sub-problems can he esecuted in parallel. Sot al1 appli- cations can be con\-eniently decomposed in this way thus. in the early days. these machines were usually used for scientific applications. that can be e'tpressed in a Communication Network h

Datapath Datapath (ALU + Registers) (ALU+ Registers)

i 4

Instruction Instruction Fetch Fe tc h Fetch and Decode and Decode

Processor Processor Processor

Figure 7.3: 1IIIID - Slultiple Instruction Stream. 1Iultiple Data Stream mat ris or vector format. In a !dI.\ID architecture (shoivn in Figure 2.3). each processor has its own instruc- t ion st ream ( instruction fetch and instruction decode) and data Stream. Processors have to eschange messages (or use share mernory) among each other for such purposes as sharing intermediate results and other data. This kind of communication requires estra hardware. which was costly when transistor densit~.rvas precious. It is also de- sirable to use the '\II'\ID architecture because modern operat ing systern design favors a multi-process environment. where more than one process can run in parallel. The parallel processes can be mapped ont0 different processors of the '\II.\ID machine. Since it is often usuallv easier to map paraliel problems into parallel processes. this type of architecture has become more popular [%]. SISID architecture designers took advant age of advances in YLSI technolog?. In earlier doms. SIND machines were built using processors and commodi tj- rnemory modules. The main drawback of t his design practice is that memor? access is rela- tively slow because borh the processor and the memory chips have to use an esternal data bus to communicate. To solve t his problem. several manufactures have designed m..

4 m.. Row Addr, -m..

< Data SAIS AIS AI^ = SISA Column Decoder I I I

Figure 2.1: The C'oR.4N Architecture single cliip SIMD machines. One esample of such a design is Tesas instruments' Serial \-ide0 Processor (S\-P1 [5!. The SI-P chip \\.al; deuigned as a vide0 signal processor for processing digital TI' signals. It contains a jerial input port and a seriai output port. 1021 PEs. and 2.56 bits of memory per PE. Since the processing elements and the memory bot h reside in the same chip. the memory access t ime is greatly reduced. Other chips include SEC Corporation's Integrated .\lemor!- Array Processor (I.\f.\P) [33] and Parallel Image-Processing RA11 (PIF R;\.\L) [l]. These chips were designed to target digital image processing. -4nother chip that integrates processor and mem- or?- is the Processor-In-.\lemor- ( PIJI) chip designed at the Supercornputer Research Center [l?].

2.2 The CORAMArchitecture

Having described trends in the comput ing indust r>- with respect to SI.\ID architec- tures. it is time to discuss the design that will be used in this research project. narnely the Computational Rr\.\.l (CeR.-\.\I)architecture [Tl. As sholvn in Figure 2.4. the CaRAlI architecture is essentially a DR-411chip ivit h Processing Elernents attached to the sense amplifiers at the bottorn of the nietnory columns to pro\-ide parallel Wnte Enable Register S hift left

right

Global Instruction Bus tie Broadcast Bus

Figure 2.5: CmRASI PE Structure processing. Tradit ionally. SISID machines differ in t lie design of the PEs. the com- rniiiiicat ion network. and the memor?. structure. Hence the C'mR=\.\I architecture will be described in terms of these three parameters.

2.2.1 The Processing Element (PE)

There are a variety of SISID PE designs. One basic difference is the word size of each PE. SEC Corporation's Integrated SIemor>--\rra>- Processor ( IUAP). for esample. uses a S-bit processing element [33]. The chip contains 64 such processing elements with 2 Sfb of memory. Each processing element in this design consists of an S-bit arithmetic logic unit (ALP). an S-bit shifter. 12 registers and a +bit eschange unit. which adds up to about 7000 transistors. This is an esample of a complex SIJID PE design. The CmRAlI PE architecture. on the other hand. is an extremely simplified SIlID design. -4s shown in Figure 2.5. CmRA1I PE consist of an 8-1 multipleser (11LX) and three single bit registers: S. Y.and urile-rnable. The icrite-enable register is used as a mask so that a subset of PEs can imite back to their local memory dile the local memory of the other PEs is not affected. In normal operation. every PE's write- enable register is set to logical 1. The 81 .\IVS. using the S register. Y register. and selected rnemory1 as input selects2. serves as an -\LU. With this configuration. a- 3-variable trut h can be treated as an opcode3 and thus an instruction set wit h 2.56 opcodes is implemented. This PE design is very simple. thus requiring little chip area. For a given die size. ive can implement more PEs or more memory cornpared to comples PE designs.

2.2.2 The Communication Network

It is often necessary to move data among PEs through a communication network. Tradit ionally t lie coinniiinicat ion netivork has been a big concern of SIMD rnanufac- turers. For esample. the 11asPar MP-1 architecture provides routing nodes to route data between PEs in data packets ( llultistage Crossbar Interconnect ) in addition to the basic nearest neighbor communication mechanism ( S-Set Nesh Interconnect ) [2]. hother classical design. T hinking Machine-s CM-?. implement s a Hypercu be Inter- connection Xetwork wit h a masimum distance of 12 hops between any two of the 4096 PEs i29]. These cornplicated communication mechanisms are aimed at reducing the communication t ime between PEs for arbit rary PE-generated destinations. Ln other SIMD designs. the design of the communication network has been greatly simplified. Ln the SVP chip. each PE is capable of communicating directly with its nearest neighbors and the nearest neighbors' nearest neighbors [5]. In ot her words. each PE is capable of communicating directly with its four nearest neighbors in a one-dimensional axa>-. This kind of design is especially useful when the chip is used

' For each C.R.s\M operation. a row of meniory is selected '~heinput sefects of a multipleser determine which input signal appears at the rnultipleser output. 3~nopcode, short for operation code. is a machine instruction. for video signal processing. where digital filters may be applied to the incoming video signal for image enhancement. such as gamma correction [.il. Ot her designs. such as the Terasys. implement more complicated comrnunicat ion mechanisms [l?]. The designer of this chip provides five ivays for inter-processor communication. including linear interconnection. global-OR. partitioned-OR. parallel prefis networks (PPX). and data movement t hrough the host processor. The design of the communication network in CeR-AlI is very similar to that of PI.\[. escept that CeR-411 does not have either the PPS or the partitioned-OR implemented. The designer of CeR;\M. Duncan Elliott . believes t hat a segmented wired-OR bus can emulate the PIAI PP'i mithout loss of parallelism. The global-OR mechanism ries the output of each PE's ALC to a bus in a wired-OR fashion. In other words. the output of the -ALI--s can be logically ORed together ivith a single instructiori. This network is very potverful for algorit hms thar need to compare values betu-een al1 the P Es. such as the problem of finding the minimum among al1 PEs. in which case an aigorithm of order O(log?(n)) could be used [?O]. An alternative implementation of this network is the globai-ASD netrvork4. In our research. the global-ASD implementat ion rvas ~ised. .hother communication mechanism. as rnentionecl earlier. is linear interconnect. Linear interconnect connects each PE with its two nearest neighbors. in a one- diniensional array. This design is the same as the Si-P's escept r hat the SI-P connects the four nearest neighbors instead of trvo. The linear interconnect ion network is use- fui for linear arraj- algorithms that repeatedly require data from their left and right neighbors. siich as digital filtering. The leftmost and the rightmost PEs have onl~ one neighhor. and these PEs can he configured to cornmunicate n-ith PEs in another CeR.i.\L chip. A bank of CmR.-\.\I chips can thus be connected in a two-dimensional fashion wit h l-ery lit t le extra hardware. The last type of communication mechanism involves moving data to and from CoRr\.\l memory using the host processor. Moving data via this method is relativelj- slow as it doesn't take advantage of the high data bandwidth interna1 to the CeR.411 chip. Hoirever the data transfer rate is identical to reading one or more words from ordinar' rnemory and t hen wri t ing t hem back into anot her mernory location.

4The bus equals 1 if and only if al1 PEs output a 1 2.2.3 Memory Structure

SIemory structure is another important part of SISID architectures. Two issues re- lated to memory design are riiscussed here. The first is shared rnemory. In most SISID architectures. each PE has its own data area called local memory. is a data area that more than one PE can access directly. Some architectures. such as the MasPar MP-1. implement shared memory as one rvay to communicate data between PEs ['LI. Implement ing shared memory involves resolving conflicts when more than one PE to access the same memory area. t hus increasing the complexity of the PE. Another issue concerning rnernory is whether or not PEs should access their local memory with indirect addressing. By indirect addressing. we mean that during a local mernory access cycle. each PE can access its local mernory with a different offset. For esample. during a memory access cycle. PE no.1 may be accessing data located at the 5th bit of its local memory. while PE no.:! accesses the 20th. and PE no.3 accesses yet anot her rnernory location. Indirect addressing is very useful in t erms of programrning flesibilit!?. hoive\-er. it lvould also irnply a morp complicated mernorl- structure rvhere each bank of local memory has its orvn addressing hardware. C'm RA11 aims at providing t he highest number of mernory and processing elements iri t hin a chi p. while providing enough feat ures for implement ing parallel algorithms [vit h reasonable efficiency. Kith this airn in mind. the rnemorl- structure was kept simple. and neither shared memo- nor indirect addressing were implemented. -4 regular DR.111 (Dynamic Random Access .\lemor?) array is used as the local memor- block for the PEs. The local memory of each PE can span several columns of DRU1 rnemor~-.In one part icular design. for example. a PE is inserted for every four columns of DR-UI cells. adjacent to the column sense amplifiers [Y]. The local rnemory for each PE is t hus composed of four columns from the DRAM ce11 array.

2.3 The CaRAM Programming Mode1

There are basically nvo programming rnodels for the Ca RA11 machine (See Figure 2.6. The first. the bit-parallel model. is to combine multiple PEs to form a processor with a bigger word size. For esample. eight PEs can be grouped together as an S- Bit-Parallel Model (8-bit processors) Bit-Serial Model

representation of a Byte

Figure 2.6: CaRAlI Programming Models bit processing unit. In this bit-parallel model. an S-bit word is stored horizontally occupying one memory ce11 in the local memory of each PE (of the grouped processing unit ). The vector quantized image compression algorit hm was programmed this way. with a modified C1oR.A.\I architecture [?O]. hother programming model. the bit- se rial model. is to view each P E individually as a single-bit processor. In t his model. an S-bit word is stored \.erticall~*in one PE's local memory. occupying 8 memory cells per PE. Figure 2.6 demonstrates the difference between t hese two rnodels. In this research. ive are going to use the bit-serial programming model. The CoR.-\.\I prograniming environment is a Cf+ library t hat contains a C.RA.\I emulator [Tl. This emulator not only emulates the functionality of the hardware but also provides timing information. Three C.R;\M classes are defiried in this library for accessing the CoR.UI. They are the cboolean (C.R.A.\I Boolean). the cint (CoR.UI integer). and the cuint (CORAM unsigned integer ). The cboolean class defines a single bit variable that allocates one bit in every PE-s local memorJ: .-\ high-level flow control statement. named cif. is defined to take advantage of the write-enabif register for condit ional esecut ion using a cboolean variable. The cint and cuint classes are basically the sarne escept that one is signed and the other is unsigned. These classes are used to define multiple bit ~ariabies which occupy multiple bits in each PE's local memory. .A library of high-level arith- metic operations for cint and cuint. such as addition and subtraction. is provided to ease CmRAN programming. Another high-level construct is array subscripting. which allows an indi~idualPE to be addressed. For esample. A[?O]=.j (where -1is a cintlcuint ) will load the -1 \ariable on the 19th PE (index starts at O) with a value of 5. These high-level library funct ions were built using the CmRAhI assernbly language. Ce RA11 assem bly language also alioivs programmers to emulate the CeR.411 directly. Use of t hese low-level operations can result in optimizations that permit faster exe- cution. .-\variable declaration and a sample CeRAJI assernbly statement is shotvn as follo\vs:

cint b(size) ; b .operate (off set, opcode, output-mode [, f lad ;

where the variable b is declared as a cint \vit h size being the number of bits for the integer. For esample. if size equals to S. then S bits of rnernor!. per PE are allocated to this i-ariable. The statenient b. operate. is the assemblx operation that tells the CeR;\.\I to perform a specific instruction. The operation requires three parameters plus an optional one cs shown abot-e. The oflset parameter specifies a bit in 6. which corresponds to a row of memory cells. Thus if b's size is 16 bits. then the range of offset can be anywhere from O to 13. This bit is fetched from each PE's local memory and is fed into the AL[' as one of the three input selects. The other m-Oinput selects are the contents of the -Y and 1- registers. The second parameter. opcodr. is an S- bit word corresponding to one of the 236 CmR.\.\I instructions. This S-bit word is used as the data input vector of the 81 .\KS ALL. The last parameter. output-mode. instructs the CeR.411 where the .\LU output should be stored. Possible output-modes include the -Y register. the Y register. memory (specified b~.offset j. the mite-enablc register. the left neighbor. the right neighbor. and the wired-AND network. In cases where the rvired-network is chosen. the optional parameter pag. rvhich is a variable of Boolean type. miist be specified so t hat the wired-ASD result can be stored. Khen supplying a Jag. its value should typically be set to true. otherwise the returned result will always be false ( because the valiie of the Jag is also -4SDed wit h the bus's state). Bandwidth Interna1 to SIemory Chips At Sense Amps (2.9TB/sj

I At Column Decode (49GB/s)

Xrchi tect ure At the SIemory Chip Pins (6.2GB/sj Location

System Bus (19OSIB/sj

Cache-CPU (800SIB/s)

Figure 2.7: Slemory Bandu-idth Throughout the Sysrem

2.4 Advantages and Disadvantages of CeRAM

'ilan>- SI11D macllines in corn put ing history ha\-e pro\-ided t remendous processing power. Hoivever. they are usually very costl~..The primary advantage of the CaR;\.\I architecture is the availability of cheap massive parallelism. The PE of CaR.411 is so simple t hat it can be implemented using a single 1a~-erof metal in circuit level [Tl. This property is also possible in the DRAN. Since a large portion of the CmRAII circuit is DR;\.\I. CmRr\.\l could be manufactured in a DR.-\N fabrication facility using a DRA11 process. These facilities usua11~-produce high volumes of chips wit h relatively lower cost because the cost of memory is cheaper and CmR-411 is mostly memory. Despite its low cost. the CoRAII also provides high computing power. Since both the processing elements and nieniory reside in the same chip. the time for driving data signals along wires (thebit-lines) is minimized. \!'hm accessing memory. usualiy a row of memory is selected from the '2D rnemory array. and then a smdl group of columns is selected. and the data is then sent to the output. The bandwidth arailable from t,he rest of the columns not selected is wasted. The C.RAM architecture avoids this twte by having its PEs located at the tips of the column sense amplifien. The data items in one selected row are sent to their respect ive PEs. and parallel operat ions cm make use of these data. The data bandaidth interna1 to the CeRAM is compared to the data bandwidt h at txrious location in a microprocessor system in Figure 2.7 [ï]. Another advantage of using CeR.431 arises from its programming kit. The kit is built using the C+i programming language. which is ver- popular in todafs computing industry. thus the leaning curve of programming CaR.A.\I is not as steep as in most of some otlier SIND machines. Since r he CORAMarchitecture is pretty much self-contained. it can be used in both embedded applications and also as the main memory of ivorkstations or personal cornputers. Combining these factors. CaRr\31 provides a convenient platform for scientific research as well as practical solutions to commercial pro blems. C'mR;\.\I is not perfect . One niain disad\-antage is t hat the communication network supporteci is limited. Besides broadcasting data using the global data bus. each PE can only commtinicate wirh its neareat iieighhors. If each PE has to pass a byte (S bits) of data to its 4th neiglibor to the right side. then a total of 32 instructions are needed to accomplish this task. This cost affects our choices of algorithms when designing CeR.-\.\L applications. The single bit data output is another bortleneck. The fact that CmR-UI supports only global memory addressing is another disad- vantage. With this limitation. we cannot implement look-up tables efficiently. This again limits our choices of algorithms. Chapter 3 Fault Models

3.1 Introduction

During the integrated circuit manufact uring process. physical defects are introduced as a result of contaminants and manufacturing flaws. These defects include short circuits. open circuits. incorrect component size. defect ive transistors. etc. Each of t hese defects affects the circuit funct ions in ~ariousways. For esample. if a 2-input OR gate contains a defective transistor. the net result could be rhat the gate output responds onlj- i-er?. slov-I>.to changes in input signais. or that it always gives a logical O as out put. The numher of ivays t hat defects could cause a circuit to behave incorrectly is enormous. and ir is thus ver- difficult to design tests to target. al1 the possible defects in a circuit. An alternatii-e u-ay is to group the possible defects into classes of sirnilar behaviours called fault rnodels. For esample. it is observed that many kinds of defects cause a circuit node to be logically stuck at O. so a logical stuck-at O fault on this circuit node is used to represent t hese defects. The st uck-at fa ult mode1 rnodels the effects of a11 the defects that cause any circuit node ro appear as if it is stuck at a logical O or logical 1. If a test sequence is capable of detecting al1 the stuck-at faults. t hen it is also capable of detecting al1 the physical defects that cause a stuck-at fault . The fault rnodels of interest in this research are the stuck-ut fault model. the gatf-delay jault model. and the C.\IOS tmnsistor stucb-open fault ntodel. Each of these fault models defines a number of faults equal to approsimately the number of wires in a circuit. In realitj-. multiple faults can CO-esist in a single circuit: hoivever. if a circuit has n ivires. the number of possible combinations of multiple faults is approsimately 23'n. an estremely large number. It is nearly impossible to simulate - wire = node

(a) Non-fanout node (b) Fariout node 1 wire in 1 node 4 wires in 1 node

Figure 3.1 : Xode T~*pes ali combinations of multiple faults in a real circuit. In [li]it was shown that a test that detects al1 the single stuck-at faults could detect more than 99.9% of multiple stuck-at faults. Although similar research for the other 2 models has not been conducted. we believe t hat the single fault assumption for the ot her 2 fauir niodels should be equally effective. Therefore in t his research. ive assume t hat an>-fault circuit contains esactly one faitlt of one type. The rect of this chapter introduces the three faiilt niodels used in this research. including the behai-ior of the faults being modeled. the condit ions for triggering these faults. and ways to reduce the number of faults that need to be considered explicitly.

3.2 Stuck-At Faults

The most widely used fault model is the stuck-at model. The stuck-at fault model assumes that each piece of wire in the circuit has the possibility of Ilal-ing a st uck-at 1 fault or a stuck-at O fault. -4 wire is not necessarily the same as a circuit node. For esample. a /anout node contains at least three pieces of wire: the stem wire that cornes out of a gate output. and the tivo or more fanoiit wires that leads to other gate inputs. Since each circuit wire can be stuck at either logic 1 or 0. the number of possible stuck-at faults in a circuit is equal to two tinies the number of tvires in a circuit. The difference between wires and nodes is illustrated in Figure 3.1. In order to detect a stuck-at 1 (O) fault for a piece of wire. the test sequence has to Figure 3.2: Inverter ('JOT gate) mit h Possible St uck-.At Faults

.\'O?' (input S. output Z) Fault Input Good Faulty S z 2 1 SAO(S) 1 O 1

Table 3.1 : Triggering Conditions for Çt uck--At Faults affecting SOT Gates ( Inverters)

first force the wire to a logical O ( 1). This is called triggenng the fault . The erroneous output must t hen be propagntrd to at least one observing point. If this fault affects an' of the obser~ingpoints. then this fault is detected. So. triggering a fault is a precondit ion of detecting it. Fuult collapsing is the process of reducing the number of faults by finding faults that are quivalent and t hen only considering one fault in each fault equivalence class. TIVOfaults are said to be equivalent if t hej- can't be distinguished during testing. For esample. the SOT gate in Figure 3.2 has two stuck-at faults at its input S. named SAO(S) and S.U(S). and tivo stuck-at faults at its output Z. narned S.AO(Z) and S.From Table 3.1. we see that in order to trigger SAO(Z). the test sequence has to force input S to logical 0. and output Z is expected to produce a logical 1. This condition is t lie same as t hat for triggering S;\l(S). Bot h of these faults would force Z to an erroneous logical O. l\'hen the fault is detected. we simpiy can't tell ivhet her the fault is a S-41(S) or a Sr\O( 2). therefore. ive say t hat the' are equivalent. Since the two faults are equivalent. onl~-one of them needs to be considered for test pattern generation and fault simulation. In other words. me collapse the two faults into one. In fact. for the stuck-at fault model. we can say that an' two faults that Both gates collapse inputs: 4 faults collapsed

Before Fauf t Collapsing

NAND co1Iapse inputs. NOT collapses output: 3 faults coilapsed

Figure 3.3: Fault Collapsing for the Stuck--At Fault hIodel

have the same t riggering conditions are equivalent. and t hus can be collapsed. The triggering condition for the faults of the other basic logic gates (BL-FF. ASD. OR. 3-ASD. SOR)are listed in Tables 3.6 to 3.10 from [31]. Sote that for inverters and buffers. the choice of collapsing the fault at the input or at the output is not significant. whereas for the other 1 basic gates. ifs always more advantageous to collapse the faults at the gate input because more faults can be collapsed. It is even more advantageous if we collapse the stuck-at faults for inverters and buffers in the same way as the other basic gates because of the rippling effect of collapsing through a gate netivork. These effects are shown in Figure 3.3.

3.3 Gate-Delay Faults

Some physical defects cause a gate or wire to respond more slowly than expected. There are two tj-pes of faults in this so-called gate delay fault model. namely slow- to-rise (SR) and slow-to-fa11 (SF) [32]. If the response is too slow mhen the signal changes from logical O to 1 (rising). then it is a SR fault. On the other hand. if the signal is too slo\v v:iien it changes from 1 to O (falling). it is a SF fault. For a signal CLK r'

Figure 3.4: Gate-Delay Faults -4ssociated \vit h %input XASD Gates

change to be significantly slow to cause a fault it must be slow enough to miss the correct signal change before the arrivai of the nest dock signal. Ueassume that the circuit under consideration is synchronous alt hough Ive rvill only consider the effects of physical defecrs in combinational logic. There is another popular fault model. called the path delay fault model. which also deals with signals that change more slowly than espected. However. this model assumes there are a number of gates which are slower r han normal gates. 'et their effect is only significant when a signal propagates through al1 these gates: in other words. the effect is only significant rvhen numerous sloiv responses are accurnuiated. The pat h dela>-fault model is useful for modeling t he effects of silicon process varia- tions that affect man? gates only slightly. Since there are ver? man' possible different signal propagation paths in a circuit. it is difficuit to sirnulate al1 of these paths for path delaj- faults. As a restilt. this fault model \vas not irnplemented in this research. Two consecutive test patterns are required to trigger a gate-del- fault. In order to trigger a slow-to-rise fault. the first test pattern has to set the wire in question to logical O. Then the second test pattern must attempt to set it back to a logical 1. If the ivire is sloiv-to-rise. then it will staj- at a logical O when the nest clock cycle starts. If this fault effect propagates to any of the observing points. then the fault is detected. Similarly. sloir-tefail faults can be triggered by setting the wire to 1 with the first pattern. and then to O with the second pattern. Each wire in the circuit can be either slow-to-rise or slow-to-fall. therefore the number of gate-delay faults in the circuit equals twice the number of wires before fault collapsing. The fault collapsing rules for the gate-delay fault model are difFerent SASD (input S1 S2. output

Fault First ' Second Good Fault y SIS2 SIS2 z2+ SR(S1) Ox 11 10 SF(S1) lx 1 O1 xl SR(SY j so 11 10 1 SF(S2) sl 10 XI SR(Z) 1I (!I) 01 , SF(Z) 1 (!1) 1 11 1 10 (! 1 ) = not al1 ones

Table 3.2: Triggering Conditions of Stuck-At Faults for ?-input NAXD Gate

from t hose of the stuck-at fault model. Let 's take a 2-input S-4SDgate as an esample (See Figure 3.4). Table 3.2 lists the triggering conditions for al1 the gate delay faults associated with a 2-input SAYD gate. In this table. Z denotes the output signal for the first input vector while Z+ denotes the same output signal for the second input vector. The loiver case -sa denotes a don3 care value. The table entries imply that t here are no tn-O faults t hat have esact ly the same t riggering condition: hoivever. any test pattern thdetects tlie SF(Z) fault (slotv-to-fa11 on output signal Z) also detects eitlier SR(SI) or SR(S2i. In other words. when a SF(Z) fault is detected. ive can't be sure whether ir is reallj- a ÇF(,Z)or a SR at one of the inputs. \lé cannot collapse the SR faults at the inputs because the detection of SF(Z) does not guarantee the detection of al1 the SR faults at the inputs. whereas detecting any of the SR at the inputs would guarantee the detection of SF(Z). -4s a result. the SF(Z) fault can be collapsed. and the SASD gare is Left wit h only 5 gate-del- iauits. Similar fault collapsing rules can be used to reduce the nurnber of gate delay faults from the other standard logic gates. Triggering conditions for the other basic logic gates can be found in Tables 3.5 to 3.10. Sote that again. for inverters and buffers as with stuck-at faults. ive have the option of collapsing the delay faults at either an input or the output. In order to have the minimum number of faults after collapsing. we should choose the same collapsing direction for al1 the gate-delay faults. Figure 3.3 show the advantage of collapsing the fauits in one direction. U'e should also note that the preferred collapsing direction Both gates collapse output 3 faults collapsed

Before Collapsing

NAND collapse output, NOT cotlapse inputs 2 faults collapsed

Figure :3..i: Fault Collapsing for the Gate-Delaje Fault lIodel for the gate-delay fault mode1 (collapsing the outputs) is different from that of the stuck-at fault model (collapsing the inputs). This is not significant if rve do not take advant age of the relat ionships betu-een different fault models. which will be discussed in Section 3.5.

3.4 Stuck-Open Faults

The third fault model used in this research is the C710S tranststor stuck-open fault model. This fault mode1 is specific to ClIOS technology. whereas the two earlier fault models are technolog. independent. The CUOS transistor stuck-open fault model. often simplj- called the stuck-open model. targets problems that cause the source to drain channel within C'iIOS transistors to be perrnanentlj- opened. In CMOS technology. transistors corne in two tupes. namely the S-transistor and P-t ransistor. Mien an S-transistor channel is open circuited. the fault is called an S-transistor open fault (SO):on the other hand. an open circuited P-transistor is called a P- transistor open fault (PO). In ClIOS technology. P-transistors are responsible for driving a gare output signal N-cype P-type Transistor Transistor

Figure 3.6: -4 ?-input SAXD Gate rvith Stuck-Open Fault from Loir to high. rvhile S-transistors are used for driving signals from high to low. See Figure 3.6 for the schemat ic of a ClIOS ?-input S.-\SDgate. If a P-transistor channel in a logic gate is open circuited. then for some input vector sequences. the out put signal will st a? low erroneous1~-. Since stuck-open faults are associated \ri t h transistors. before fault collapsing. the number of stuck-open faults in a circuit equals the number of transistors in the circuit. For SOT. TOR and SASD gates. the number of transistors for a logic gate is equal to twice the number of gate inputs. .ASD gates. OR gates and buffers are special cases since these components can have different possible implementations in a C'lIOS en\-ironment. In this research. these gates are assumed to be constructed from the primitive SOT. SOR and S.\SD gates. Table 3.3 lists the rriggering conditions for al1 the stuck-open faults of a ?-input SASD gate. As shown in Figure 3.6. the S-transistors of the XASD gate are connected in series while the P-transistors are connected in parallel. Since the 5- transistors are in series. if an>-of the Y-transistor fails. the output signal cannot be driven lotv under an? circumstances. From the testing point of view. al1 ive know is that there is a failing S-transistor in the gate but we cannot know which one is actually failing. Therefore. the SO faults of a SASD gate are equimlent and can be collapsed into a single ?;O fault. The same holds true for an- series connection of S.ASD (input S1 S2. output 2) / Fault 1 First Second Good Faulty / ! S1S" sis2 ZZf ZZ' j SO(S1) (!1) 11 10 11 PO(S1) 11 01 O1 O0 YO(X2) (!1) 11 10 Il PO(X2) 11 10 O1 O0 (! I ) = not al1 ones

Table 3.3: Trriggering Conditions for Gate-Dela? Faults Affecting a 2-input SAND Gate

Figure 3.7: Comrnon Structure of ClIOS ASD Gates and OR Gates transistors in an)- logic gates. The fault tables for SOT and SOR gates are shown in Tables 3.5 and :3.9. respect ive/?: -4s mentioned above. ASD gates. OR gates and buffers are assumed to be con- structed using SOT. SOR and S;\SD gates (see Figure 3.7). The fault table for a %input ;\SD gate is as shown in Table 3.1. The 1- columns of the table indicate the intermediate signal between the SAXD gare and the SOT gate. Thus SC)(\') and PO()') are called the interna1 stuck-open faults. As seen in the table. the trig- gering conditions for SO(S1). SO(S2) and PO()-) are identical (except for the 1- colurnns). The three faults are therefore equivalent. and can be collapsed into a sin- gle fault. Also note that the triggering condition for PO(\-)is similar to SR(Z). while SO(1') is similar to SF(Z). Ident ical triggering conditions imply equi\alent faults. However. since the)- are faults in different fault models. collapsing them would result in an incorrect fault coverage calculation' for at least some of the fault types. Fault

'The cornnion way of calculating fault coverage is to divide the number of faults detected by the

2'1 ASD (input SI S2. interniediate Y. output Zj Fault Firsr Second Good Good 1 Faiilty i S1S2 SIS2 - zz+ 1 ZZ' i O (!il 1 11 i 10 1 01 j 00 1

1 1 (! 11 = not al1 ones

Table 3.4: Triggering Condit ions for St uck-Open Faults Affect ing ?-input ASD Gate equivalence among fault niodels tdl be discussecl in the nest section. Similar fault collapsing rules are used for OR gates and huffers. The faiilt tables for the OR gate is listed in Table 13.10. The fault table for the biifler is listecl in Table 3.6-

3.5 Fault Implication and Equivalence Among Fault Models

In the pre\-ious t tiree sections. t lie faiilt niodels iised in t liis researcli arid t lieir laiilt collapsing rules ivere int roduced. i\'lien usine t liese t hree nioclels toget lier in a fault simulation. the fault simulation algorithni can he made much more efficient b!- con- sidering fn ult irrrplic(rt ions and fault rquicdeiice arnong the faiilt niodels. Recog~iizing and esploiting these relationsliips rediices the anioiirit of effort reqiiirecl in fatilt sini- iilatiori. Fault irriplicntion is the one-ivaj- relatioiisliip bettveen faults wliere if a faiilt .\I is detected. then fault .\. niust he detected. ivhere the opposite is not neceisarily true.

III t bis case. u-e say that fault .II irnpfits fault . For esaniple. in order to trigger a slow-to-rise gate-dela>- fault on the input of an inverter. a first test pattern has to force the input to logic 0. and a second pattern must force the input ro logic 1. On rhe otlier hand. to trigger a stuck-at O fault on the input of an inverter. a test pattern has to force the input to logic 1. Çince both faults req~iirea test pattern to force the input of the inverter to 1. if the sloiv-to-rise fault is detected. it niust he total riurnber of fauIts after collapsiiig. true that the stuck-at O fault is also detected. In other words, the slow-to-rise fault implies the stuck-at O fault. The reverse implication is not true because detection of the delay fault requires a stronger condition than the stuck-at fault (7 patterns versus 1). Figure 3.8 shows al1 the one-way relationships that esist between the fault models used in this research. In the figure. a fault at an arrow tail implies the fault at the same arrow's head. Khen two faults of the same fault mode1 type are equivalent. the! are collapsed into one fault of t hat cornmon type. iv,'hen two different faults from different models are equivalent . hoivever. 'ou cannot collapse any of t hem because t his would result in an incorrect fault coverage calculation. One way to exploit equidence relationships ivit hout collapsing the faults is to view the relat ionship as two separate fault implica- tions. For esample. the slow-to-fa11 fault on the output of a SASD gate is equident to the 3-transistor open fault on its input (wecan sec that by comparing the trig- gering condition of the faults). Instead of collapsing these faults into a single fault. the? are kepr as two separate faults while having them separately imply each other. When a test seq:ience detects eit her one of t hem. the ot her fault can be considered detected using the implication relatioriship. [vit hout requiring furt her simulation. The fault list used in our fault simulators supports both fauit implication and fault equivaience. as discussed in Section 5.2.". The effect of these relationships is also presented t here. 401 Gate (input S. output 2) Fault First Second Good Faulty Collapsing X 1; ZZ+ ZZ+ Fault s SAO(S) x 1 xo sl SAl(Z) S)x O sl xo SAO(Z) SAO(2) x O 1 rl x0 Sl() s 1 1 so sl SR(S) O 1 1 10 ' 11 SF(S) 1 O Oi O0 , !=(a 1 0 O1 00 1 SF(S) . SF(Z) O 1 10 11 SR(S) SO(S) O 1 10 11 PO(S) 1 O O1 O0

Table 3.5: Triggering Conditions for Faults Affecting the SOT Gate

Buffer (input S. output 2) Fault First Second Good Faulty Collapsing S S ZZ+ ZZ+ Faults SAO(S) x 1 sl so SAO(Z) SAl(S) s O XO i sl S-U(Z) SAO(Z) , s 1 11 j so (Z)1 s O I so 1 sl

Table 3.6: Triggering Cond;rions for Faults Affecting the BVFF Gate SASD Gate (input S1 X. output 2) Faul t First Second Good Faulty Collapsing SIS2 SlS2 22+ ZZ+ ' Faul t s SO) xx 11 x0 xl S.U(Z)

SAl(S1) ?Ex O1 sl 1 xo SAO(S2) ss 11 so sl SAl(Z) S(ss 10 sl so S.-lO(Z) SS (!l i xi ) xo , S(Z 1 ss 11 / 10 1 sl 1 I SR(S1) Os ; 11 10 / il / iSF(S1) i 1s i O1 sl j sO ! j SR(S2) j XO 11 ) 10 j 11 1 j

( !1 ) = not al1 ones

Table 3.7: Triggering Condition for Faults Affect ing the S.4SD Gate .4SD Gate (input X1 S'1. output 2) Fault First 1 Second Good Faulry Collapsing SlSZ ' S1S2 ZZ' ZZ' Faul t s l SAO(S1) xs 11 sl xo S.AO(Z) 7 SAl(S1) ?Ex 01 xo xl SAO(S2) xs 11 xl xo SAO(Z) S.Al(S2) xs 10 xo xl SAO(Z) XS 11 sl xo SA1(2) ss (!1) 1 10 ?cl I SRNI Os 1 j O1 O0 , I

(!1) = not al1 ones

Table 3.S: Triggering Conditions for Faults Affect ing the -4SD Gate SOR Gate (input SI S2. output 2) [ Fault 1 First / Second 1 Good 1 Fault- 1 Coliapsing /

PO(S1) (!O) O0 01 O0 SO(Si) 00 10 10 11 PO(S2) (!O) O0 O 1 O0 PO(S1) SO(S2) O0 ! O1 10 , 11 i (!O) = not al1 zeros

Table 3.9: Triggering Conditions for Faults -4ffecting the SOR Gate OR Gate (input S1 52. output 2) Fault First Second Good Fault- Collapsing 1 Sis2 ZZ+ ZZ+ Fault s SAO(S1) xx 10 sl xo SAl(Si) sx O0 xo xl SA l(Z) SAO(S2) xx 01 sl x0 SAl(S2) ' ss 00 s0 xl SAl(Z) SX sl sAar 2) (!O) xo t SA1(Z) xx 1 O0 xo sl SR(Sl ) Os 10 sl x0

SF(S1) 1s O0 10 11 1 SR(S2) s0 01 1 sl s0 SF(S2) xl O0 10 11 . SW) O0 (!O) 01 O0

, SF(Z) (!O) O0 10 11 , SF(Sl).SF(S2) PO(S1) (!O) - O0 10 11 SO(SI) O0 10 01 00 I PO(52) (!O) O0 10 11 PO(S1) SO(S2) 1 O0 O1 O1 O0 PO(Y) O0 (!O) O1 O0 SO(Y) CO) O0 10 11 PO(S1) ] (!O) = not ail zeros

Table 3.10: Triggering Conditions for Faults Affecting the OR Gate - SR(X?j SFt XZ i SAOi X2, %a NO(X2) --l=me-

Figure 3.S: Fault Implication and Fault Collapsing Relationships Chapter 4 Fault Simulation

4.1 Introduction

When integrated circuits are manufactured. physical defects are likely to be intre duced because of the estremely small dimensions that aie involved. Boolean fault models are used to represent these physical defects. as the previous chapter has de- scribed. After rnanufacturing. it is necessary to distinguish the faulty circuits from the good ones througli testing. To detect Faults in a circuit under test (CCT). a seqiience of test patterns has to be applied to the inputs of the CVT. The outputs of the C'LT are then recorded and compared to the sequence of erpected outputs from a good circuit. If the output sequences differ. then a faulty circuit has been detected. Tliere are generally two methods of circuit testing. The traditional one is to test a circuit with esternal automatic test equipment (;\TE). Modern ATE tends to be ver>-espensive and its lifetime tends to be re1ativel~-short before it becomes obsolete. An alternative way is to use Builf-ln Sdj- Tut (BIST). BIST is a kind of

Figure 1.1 : The Scan Chain/BIST Architectures circuit architecture rvhere the test ing hardware is included as part of the circuit (See Figure 4.1). Most circuits today are sequential circuits. mhich have mernorx elements such as fIip-flops in their design. Man- popular BIST architectures. such as the LOCST architecture ['Ll]. modify the memory elements into special latches t hat cm operate using either the system dock or a testing clock. depending on the selected mode of operation. The latches are further connected together as one or more scan chains so that data can be shifted serially in to and out of the latches during testing. This feature not only allows the state of the latches to be observed. it also alloivs the sequential circuit to be partitioned into several accessible combinational blocks. -4 corn binational circuit is one t hat does not have an- internal memory elements. Testing such a circuit block is generall- easier than a sequential circuit because the outputs of a combinational block depend only on the present inputs. Thus. if a test pattern is supplied as input to a combinational block. the output of the block can be compared directly with thar of a fault-free block ivithout considering any other conditions i such as the internal rnemory states in a sequential circuit ). Simidation of a combinational circuit is often enough to evaluate man!. important differences bet~veenalternative implementat ions of fault simulation algori t tinis. .\Il the fault simulators implemented in t his research only combinat ional circuits. Another important characteristic of BIST is the use of pseuddlandom Test Pat- tern Generators ( RTPGs). In order to test a circuit. a sequence of test patterns has to be supplied as inputs to the circuit under test. In a BIST design. the hardware to generate the test sequence is included as part of the circuit. The input test sequence coulcl he generated deterrninistically b!- analyzing the structure of the circuit itself. ij'hen ~isinga BIST design. these test patterns have to be stored in ROMs (Read only memory) as part of the circuit. Eren if there are relatively few deterministic test patterns. the hardware requirement for pattern storage is st il1 relat ively higher than that required typically for RTPG. RTPGs are special counters that count in pseudo-random order. The size of a RTPC; grows only very slowly (usually logarith- micaIl>-)with the lengt h of the self-test. thus it is often economical ro include the RTPG hardware into on-chip BIST circuits. There are man- different RTPGs. and niost of the RTPCk can be further cus- tomized with different parameters. Some ive11 known RTPGs are the Linear Feedback Linear Feedback Shift Register - Type iI

O fiip-fiop O XOR gate -, datapath - - optional datapath Linear Feedback Shift Register - Type 1

Linear Hybrid Ce11 Automata

Figure 1.7: Hardware Configuration of RTPGs

Shift Register ( LFSR) and the Linear Hybrid Cellular Automata ( LHC-4) [A]. Fig- ure 4.2 illustrates the structure of these RTPGs. An LFSR implements polynomial di\-ision. The output sequence of an LFSR can be customized by choosing different polj-nomial di\-isors. An LHC.4 is constructed b~.connecting together tivo types of simple. one-bit cellular automata in a linear ana).. One t-pe of automaton (often called rule 90) calculates its nert value as the SOR of the value of its two neighbors. rvhile the ot her type (often called rule 1JO) takes the SOR of i ts neighbors and its own value. Putt ing toget her t hese automata as a linear arra? ivit h different configurations and different initial values can result in different sequences. The random patterns generated by an LFSR and LHCA can be further rnodified using a hIaximurn Transi- tion Counter (.LUC) [6].A AITC is a sequential circuit with maximum length state sequence whose adjacent states different in al1 bits escept one. The hybrid method mixes the test sequence generated by the RTPG and the MTC to lorm a new RTPG. which generally produces test sequences t hat appear to detect sequent ial faults more easily- at the cost of more hardware. Figure 1.3 illustrates the different fault cover- ages obtained by feeding different pseudo-random test patterns into the same circuit. O 20000 40000 60000 80000 100000 Numùer of Test Patterns

Figure 4.:3: Fault Coïerage of Different RTPGs

Circuit rnanufacturers are interested in finding RTPGs that are the most effective at detecting the espected faults in t heir part icular circuit wit h the least possible hard- ware overhead. Li-ith so many options. an efficient tool is needed to rapidly evaluate the effectiveness for a given circuit of the man! possible alternat ive RTPGs. -4s esplained in Chapter 1. fault simulation is the process of simulating the opera- tion of a circuit wit h the presence of faults to e\-aiuate the effectiveness of a proposed test. To evaluate a test sequence. the patterns have to be applied as inputs to the simulated KT. The number of test patterns required for detecting the presence of al1 the faults of interest is recorded. i.e. the test length. This nurnber can be used directly as a measure of the efficiency of the test sequence corresponding to the CCT. The shorter the test. the Iess time is required using espensive ATE for the CL-T to be fully tested. Li-hen using RTPGs for generating test sequences. sometimes it is acceptable to detect fewer than 100% of the faults of interest. The remaining un- detected faults can be handled by applying extra test patterns to the KT. This method is useful when there are known to be hard-to-detect faults in the circuit. and an RTPG can detect al1 the rest of the faults within a relatively short period of time. .A significant amount of testing time can be saved if these hard-to-detect faults are treated as special cases. rvith individually stored patterns. Therefore. fault simula- tion is used not only to find out the minimum lengt h for a test sequence. but also the less than complete fault coverage (Le. the percentage of iaults being detected) for a limited number of test patterns. It also indirectly finds difficult-to-detect or likely impossible-to-detect faults. Xote t hat in large circuits. fault simulation alone cannot prove that a fault is truly impossible to detect.

4.2 Sequential Fault Simulation

The most basic fault simulation algorithm is the sequential /nuit simulation algorithm. In this algorithm. a list of faults of interest is first constructed from the structure of the CUT. The fault list is usually then reduced by esploiting fault equivalences and fault implications. A test pattern from the test sequence is then applied to a simulated fault-free circuit. The logical values at the outputs of this fault-lree circuit are recorded. Then. the first fault from the fault list is inserted into the mode1 of the CVT. and the circuit output sequence in the presence of this fault is re-computed. If the output of this fault simulation is different from that of a fault-free simulation. then the fault is considered detected and is removed from the fault list: otherwise. the fault list will remain unchanged. The nest fatilt from the fault list is then inserted into the iault-free circuit and the effect of the same test pattern in the presence of the new fault is re-computed. This process continues until al1 the faults in the fault Iist have been tested. At this tirne. simulation of the first test pattern is finished. The nest test pattern can non- be Ioaded and the circuit simulation process repeats. This whole simulation process continues until either the fault list is ernpty or me run out of test patterns. If sequential fault models are used for representing physical defects. then the logical states of al1 circuit nodes in the fault-free circuit for a test pattern have to be recorded for the nest test pattern because the' will need to be referenced. -4s seen in Figure 1.4. this fault simulation algorit hm has tu-Olevels of nesred loop- ing: an outer loop that loads individual test patterns sequentially (looping through time). and an inner loop that loads faults from the fault list one after another (looping through space). Each loop is essentially advancing through an array of data defined by the test sequence and the fault list. The tivo arrays can be viewed as defining a load next test pattern

eenerate fault list L-J good circuit simulation

reduce fault list a 1-end of fault lis? load fault. detect fault triggering. and do fault simulation

Inner loop Outer loop

Figure 4.4: Flow Chart of the Basic Fault Simulation -Algorithm Figure 4.5: 2-Dimensional Fault Simulation Space two-dimensional fault simulation space as shown in Figure 4.5. The pattern dimen- sion can be viewed as a time dimensions. and the fault dimension can be viewed as a space dimension. It is possible ro modify this algorithm to speed up the processing by riinning multiple simulations in parallel. The degree of parallelism in the pattern space. i.e. the nitrnber of test patterns simulated in parallel. rd1 be denoted by p: whereas the degree of parallelism in the fault space. i-e. the number of faults simu- lated in parallel. will be denoted f. An algorithm ir-ith p = n means that it simulates n patterns in parallel. -4 sequential algorit hm thus has both p and f set equal to 1. Recalling our objectives from Chapter 1. we are going to describe the irnplemen- tat ions of t hree different fault simulation algorit hms on CoR-411. The algorit hm thar runs multiple faulty circuits in parallel (p = 1. f > 1) is called the jault-paralld sim- ulation algorith m [?SI. Patt em-po rallel simulation. specifically the Parallel Pattern Single Fault Propagation (PPSFP) algorithm (31). simulates multiple test patterns in parallel (p > 1. f = 1 ). Finally. hybrid algorithms can be defined for parallelizing simultaneousl~in both the fault and test pattern dimensions (p >= 1. f >= I ). One adaptive hybrid algorithm is called the Dynamic Tm-Dimensional Parallel Simula- tion [Ml. Ot her algorithms using massively parallel SIMD machines to speed up fault simu- lation have been designed and evaluated by other researchers. In [23]. two algorithms were proposed. namely Gate Parallel Level Serial Simulation (GPLSS) and Parallel Pat tern-Parallel Fault Simulation ( P PPFS ). These algorit hms were implemented on T hinking Machines Corporation's Connect ion Machine. The results presented in [23] were given in terms of the act ual simulation run time on real hardware. which makes it very difficult to compare their performance directly with the results obtained in this research using emulated CoR.-\.\I hardware.

4.3 Fault-Parallel Fault Simulation

The first . Fault-Parallel Fault Simulation. evaluates in parallel multiple faulty circuits (p= 1. f > 1). i.e. each PE simulates a different faulty circuit. Each faultj- circuit is inserted with a different fault in the fault list. A single test pattern is input to r he CL-T. and corresponding gates in each circuit are evaluated in parallel. The out put of each of the faulty circuits is compared with the corresponding output of the fault-free circuit. If an output of a faulty circuit is different from that of the fault-free one. then the fault is marked as detected. After al1 gate evaluations have been performed for the particular test pattern. the marked faults are removed from the fault list. The parallelisrn in our 2-D space is demonstrated in Figure 4.6. Sotice how sets of patterns have been looped in the figure to identify identical patterns that are fault simulated together in parallel in the same session. One potential problern of this algorithm is the size of the fault list. If. at the beginning of the simulation. the number of faults in the fault list is greater than or equal to the number of PEs available. then more than one pass is required for the complete evaluation of a single test pattern. (In Figure 4.6 only one pass is assumed to be sufficient .) Problems also arise when the fault list is too srnall. For esample. if n PEs are available. and t here is only one fault in the fault list . then the simulator has to Faulty Circuit #1

12 Circuit #2 IZI I I

Fautty

a

I a

Figure 4.6: 2-D Fault Simulation S pace of Fault-Parallei Fault Simulation run ivith only one PE active. This inefficiency can be solved by running more than one test pattern in parallel. irhich suggests the other fault simulation algorithm: HJ-brid Parallel Fault Simulation. Other problems with Fault-Parallel Fault Simulation will be discussed in Chapter 7.

4.4 Pattern-Parallel Fault Simulation

The Pat tern-Parallel Fault Simulation algorit hm is basically the Parallel Pattern Sin- gle Fault Propagation (PPSFP) 1311 algorithm estended to fit onto the SISID ar- chitecture. In this algorithm. a number of test patterns are simulated in parallel (p > 1. f = 1). whereas in reality. these patterns rvill be applied to a real CLÏT one after anot her. Since only combinat ional circuits are considered in t his research. the output of a circuit depends only on its current input. thuo simulating test pat------"'-""'------

Circuit

t--,,,-,,--,--,------

Circuit #3

\------,----,------a

Figure 4.7: 2-D Fault Çimulat ion Space of Pat tern-Parallel Fault Simulation terns in parallel does not cause an? problems. escept when simulating sequential faults. Sequent ial faults. as described in Chapter 3. introduce sequential behaviours into combinat ional circuits. The handling of sequent ial faults in t his algorit hm is the same as the approach used in the PPSFP algorithm. which will be described in Chapter 5. The pattern-parallel algorit hm first simulates the fault-free circuit as it responds to the group of test patterns. Then the first fault in the fault list is inserted and its effect is simulated foc al1 patterns in the zession in parallel. The remaining faults in the fault list are simulated one after another in the same manner. Figure 4.7 shows the parallelism achieved in the '2-D problem space. The conventional fault simulator implemented as a benchmark for this t hesis used the PPSFP algorithm. In SIAID programming. it is often true t hat local enhancements that work for a sequential algorithml do not alrvays improve the parallel version. Since the PPSFP algorit hm on a conventional machine is very similar to the Pattern-Parallel Fault Simulation algorit hm on CORAM. the two implement at ions ail1 be compared closely in Chapter 6.

4.5 Hybrid Parallel Fault Simulation

When a fault simulator parallelizes fault simulation in both the pattern space and the fault space. it is called a hybrid parallel fault simulator. Basically there are two waxs to implement such an algorithm. One way is to simulate the same arnount of faults and test patterns in parallel throughout the program (p = n. f = m. where n > 1 and rn > 1 are fised values). Since the degree of parallelism is static. it is called a static hybrid pu ralld ja ult simulation. hother met hod which determines a different p and f at each round of the simulation is called a dynamic hybrid parallel fault simulation (p= n. f = m. where n and rn va.). In [lS]. the Dynamic Tn-O-Dimensional Parallel Simulation (DTDPS) technique is presentecl. It [vas observed t hat the purelx fault-parallel fault simulation technique works most efficient 1)- dieii t here are man? relat ively easy-t o-detect fault s in the fault list. while pure pattern-parallel fault simulation techniques are most efficient with a srnall nurnber of relative&-hard-to-detect faults wirh man- test patterns to be simulateci. The authors attempted to take advantage of the two algorithms by dynarnicall~using fault-parallel fault simulation at the beginning of the simulation. sivit ching increasingly to pat tern-parallel fault simulation towards the end of the whole simulation process. The technique ivas implemented on the F.\CO.\I . Figure 4.8 shows the 2-D fault simulation space of hybrid parallel fault simulation. At the beginning of the simulation. pure Fault-Parallel Fault Simulation is used as described in Section 4.3. After the simulation of a test pattern is finished. the per- centage of faults t hat were detected by t hat test pattern ( fault coverage j is evaiuated. If the fault coverage esceeds a certain percentage FCH.the next pass will simulate two test patterns in parallel. For esample. if n PEs are available for simulation. then

' .ln algori thm that runs without SIMD hardware. Faulty Circuit #n

Figure 1.8: '2-D Fault Simulation Space of Hybrid Parallel Fault Simulation in the first pass. rz PEs are used for fault-parallel fault simulation of the first test pattern. ;\fter the first pass. if sa- more t han FCH = 25% of the faults are detected. then in the nest pass. two patterns are simulated in parallel. In this case. 4-2 PEs are allocated for each test pattern. -4s the fault coverage again exceeds FrH.the number of test patterns simuiated in parallel also increases. -4fter a certain point. when there are only a few faults left in the fauit list. the simulation can switch to purely pattern-parallel faui t simulation. The implementat ion details are discussed in Chapter S.

4.6 Evaluat ion of Fault Simulators 4.6.1 Faults, Uiidetected Faults, and Triggered Faults

To evaliiate the fault simulation algorit hms described above. ive have to invest igate the process of detecting a fault in more detail. Each undetected faiilt can be either triggemd or not-triggered by each test pattern. If the fault is triggered. it will be eit her detccted or undetectcd. These possibilities are shown in Figure 4.9. The total number of undetected faults decreases monotonicall~.as a function of time in simulation (or undetected fadt Fault Triggenng Detection (low effort) not triggered triggered fault fault A- Fault Simulation (high effort) detected unde tected faul t fau 1t

Figure 4.9: Classes of Faults ivith Respect to Fault Derection m Total Number of Faults

Remaining Undetected Faults

Faults Triggered by Each Test Pattern

Faults Detected by Each Test Pritttm

# ---- Ripp'cs # 7- / .. / '. ..-._ _ -__---L. ,-- ! ''.T--.-...... x... - ...--.-..'- - -j- =r \ . / . # - ---c Number of Test Patterns

Figure 4.10: C'urves for Vndetectecl Faults. Triggered Faiilts and Detectetl Faults test patterns) hecause of the faiilt dropping effect. The ntiniher of faiilts heiiig tris- gered hy each pattern rends to decrease as ive11 because most of the cas>--tede~ect faults tend to be t riggered and detectetl in the early rounds of the sinidation. After a while. the ciirve levels off \vit h a sliglir ripple remaining due to r he fact that some test patterns trigger faults bet ter t han ot liers. This is the same for the faul t detection curve. which plots the number of faulrs being detected versus tinie. Typ are sliown in Figure 4.10. 4.6.2 Measuring Efficiency - PE Utilizatioii

Pattern-parallel fault simulation is most efficient ivlien t here are a lot of faults t hat are t riggered frequent ly while being very hard to deteci. It is wasteful. hoivever. when many test patterns are simulated in parallel and ver' feu- of them actually trigger the fault. In addition. even if there are a lot of t.est patterns triggering the fault. it is wasteful if the very first triggering pattern is capable of detecting the fault. making the rest of t he pattern simulations iinnecessary. In siimrnar>-.the simulation of a fault triggering situation is necessary if the pattern that triggers r he fault is hefore or the same as the test pattern that first detects the fault: i.e. if test patterns 15. 25. 43. 67. 80. 190. 2.53 are al1 test patterns that trigger a specific fault F. and both pattern S0 and 283 detect F. then the number of r2ccessary simulatio~~sequals 5. which is the niimber of test patterns that trigger the fault before and including pattern number so - Based on t hic: definition. PE utilixtiorz is calculated as t lie niimber of necessary simulations dividecl by the number of PEs perforniing simulations. In t lie ahove esaniplr. the PE iitilizatio~iis 5/2-56. assumiiig 2.56 PEs are used for siniiilation. Here. PE means the actual number of PEs in a C'oR.A-\I. or the niimber of bits evaliiated in parallel in the case of a conventional algorit hni. Using the PE utilization figure. we can esplain why the PPSFP algorithm is not Iinearly scalable with the size of the SIMD machine. This is clone in C'liapter 6.

4.6.3 Anot her Measureineiit - Speed-up

-4riotlier measiirement t hat ive found to be useful in t his research is r-pctd-up. Speed- up is a measure of how an algorithm perforrns after some enhancenient relative to Iiow it performed previously [16]. Hence. by definit ion (-4rnd(rhl> Ln u*).

wliere t, is the sequential run time. t, is the total parallel run time. and -\-is the nuniber of processors. \\é will use the PPSFP faulr simulator sirnf. described in the nest chapter. to give us --belore irnproïement- esecution t inies. Often an algorithm can be di\-ided into separate modules. and enhancements may onlj- be applicable for some of the modules. In oiir case. the faiilt siin~ilationprograms can be divided into ttvo categories: code that can be accelerated h!- C'aR.431 and the code that cannot be accelerated- Since CaR.411 only accelerates part of the 11-hole process. increasing the size of CaR-411 does not inip1~-a linear iiiiproi-eiiient of the performance. Det ails for measuring the speed-ups in indi \.idual simrila t ors can be found in t heir corresponding evaluat ion sections. .A CORAM emulator is used to emulate the C'aR-411 operations. The parameters used are listed here:

non-standard parameters f ollow bits of memory PEs simulated Shift Block Size time per operation ns time per memory row operation ns time per read ns time per write ns operations for free in memory cycle time per broadcast ns mask to identify data on same cache line

Comment s : Proposed DRAM chip

In this research. ive are running our fault siniiilatoru on a .Su11 I'lt,n- 1 workstatioii. which has a processor that runs at 113.\IHz. On the other Ilaiid. the C1aR.-\.\I ernulator emulates a DR.\.\I implementat ion. which runs at l2Ons per iiiemor>-row operation (Y.3 11Hz). This implies a slo~wlownof rouglily 17 tinies. Tliis set ting is used for e\.altiatioti in Chapters 6. 7. and 8. In t lie conclusions. a fastes nienior!. technolog? rvill be assunied to give a bet ter idea of hou- mucli inipro\.ernent cati Iw ohtairied. Chapter 5 simf - a Fast Conventional Pattern-Parallel Fault Sirnulator

5.1 Overview

The first fault simulator writ ten for t his research project implement s the Parallel Pattern Single Fault Propagation ( P PSFP ) algorit hm. which runs on a convent ional single processor machine'. This program is named simf. As shown in Figure 5.1. si4 consist s of the main cont rol module and t hree ot her modules \vit h specific functions. The ReadISCAS module reads from netlist files in the ISC.ASS.5 format and builds an interna1 data structure. From this data structure. module GenFault generates a list of potent ial st uck-ar fault S. gate-del- faults and sr uck-open faul t S. The list is collapsed to eliminate equiialent faults using the rules described in Chapter 3. The collapsed- fauit list and the circuit data structure are then passed to the Simulation module.

'Tradit ional Single Processor machines without CmRAlI.

l ,Main Conuol

1 ReadlSCAS v Simulation Read Circuit File Fault Simulation tnro data suucnire Genente Colhpsed Faulr List

Figure 5.1: Top Lewl -Architectural Design of simf This module reads test patterns from the standard input. performs the fault-free and the fault-inserted simulations. and drops detected faults from the faulr list. When either there are no more faults left in the fault list or al1 the test patterns have been simulated. the simulation is finished. The main control module of simf then displays the simulation results. simf was designed as a reference simulator to represent convent ional fault simu- lators in this thesis. It will be used in the following chapters to evaluate the fault simulators based on the CoR.4M architecture. It is thus designed to be as efficient as possible to give a conventional technique a fair assessment. It is also important for the reference simulator to be as accurate as possible. This was made easier because its performance can be compared directlx to the public domain fault sirnulator sim3 In fact. simf has pro\-en to be so efficient and accurate that it has already been deployed in a separate research project at the University of Alberta to gat her simulation data. Evaluation of sirnf is discussed in Section 5.3. In the following section. design details of three essential components of the fault simulator are discussed. First. the design of the Circuit Data Structure (CDS) is presented in Section 5.21. Then the GenFault module is presented (Section 5.22). This module generates faults from the CDS and collapses them into a minimum size fault list. The structure of the fault list. which contributes to the acceleration of fault simulation. is also presented in this section. Final1~-.the Simulation module and techniques used to increase efficiency are presented in Section 5.2.3. The CaR-O1 wrsions of the fault-simulators discussed in the following chapters more or less use the same module structure. Changes to t hese modules. which were made to address the needs of the individual faiilt-sirnulators. are discussed in their corresponding chapters.

5.2 Design Details

The Parallel Pattern Single Fault Propagation ( PPSFP ) algorit hm \vas first proposed as a stuck-at fault simulation algorithm in [31]. In the following year. the algorithm iras modified to simulate sequential fault models as well. namely gate-delay faults and stuck-open faults [32]. In this algorithm. eacli circuit node is assigned an in- teger. Each bit of the integer represents a different test pattern. with the bit value Input patterns Unused bits of the integers

@ = @ nand @

Apply patterns 00.0 1.10.1 1 to the circuit in parailel. Each node is represented by an integer.

Figure 5.2: Computer Representat ion of Circuit Sodes in PPSFP

represent ing the state of the circuit node when the corresponding test pattern is ap- plied. In ot her words. each bit of the integer represents a different copy of the circuit. Figure 5.2 shows how a circuit and a set of test patterns are rnapped into a set of variables.

5.2.1 Circuit Data Structure

In simJ a combinational circuit has two types of basic elements. namely logic gates

(yatrsj and interconnections ( u-irts).The circuit data structure (circuit)thus contains an array of gates ( yntt-[kt) and an array of ivires ( uirt-fist). In addition. the input-list and output-list integer arrays are used as indices to point to the prirnary input and output gates in the gate-list. Khen several interconnections are connected together electrically. the group of wires becomes a circuit node that in a good circuit should carry one common voltage signal. -4 gatc is a logical gate of one of the following types: primary input. BCFF. SOT. SASD. ASD. TOR. OR. SOR. SSOR. Each gatc structure contains a field named gatc-type which denotes the type of the logic gate. In the standard ISCASS5 circuit format. each gate is assigned a gate number (gate-mm). which is essentially the same as the output wirt number of the gate. Sote that this number is different from the indcr numberof the gate. .A gatc also contains an array of fan-in nodes and an array of fan-out nodes (see Figure 5.3). The arraJ- of fanin nodes is essentially a list of al1 the - myof wire I!Fl cl----, CDS I/ \ # 1 # I I I \ 0 \ v wire array ' \ -t w ire gate array - ' 1 (index) 1 . input index 1 node number 1 . output index 1 from gate (index) 1 1 gate number 1 1 to gate (index) 1 1 exist fia$ 1 fan-in array 1 array of gare indices 1

'. - ,array of wire indices r - I II1a.. - - - * these mows denotes datatype. not pointers I

Figure 5.3: Top Level View of the Circuit Data Structure

inputs to the logic gate. The array of fan-out nodes is the list of \rires t hat connects the output of a logic gate to other gates. LYit h these two arrays. the connection from the logic gates to the wires is established. If a gate is a prirnary input. then the input number is stored in a list called input-num. -4 wire records four pieces of information. The node-num variable contains an integer denoting the circuit node this piece of ivire is connected to. The from-gate ~ariableand the to-gate variable contain the indices of the logic gates connected by the irire. \\Yt h t hese two variables. the connect ion from wires to logic gates is established. The indes to the wirf axa>. is the same as the wire number given in the ISC.lS85 file. Since the ISC.4SS.3 file format does not require the wire number assignrnents to be continuous. an ~ristvariable is used to mark whether or not the tvire really esists in the circuit. This information is necessary for fault generation. The CDS was designed so that from an- gate in the CDS. the wires feeding its input and the gates connecting to its output can be found easily and efficiently. Ease of use and efficiency are especially important for designing algorithms iater on in the research that minimize any changes in the CDS that might be required to extend the simulation program. 5.2.2 Fault Generation

After a netlist is translated into CDS. the information is used to generate a fault list. -4 faulf [kt is a list of possible faults associated with a circuit. Three fault models. namely stuck-at. gate-delay and stuck-open. are used to generate these faults. An obvious way to store the faults is to use three separate fault lists. This approach was used in the prototype version of simJ Hoirever. as discussed in Section 3.5. there are relationships between faults in different models. and it could be advantageous to make use of t hese relationships. -4s a result. one single fault list is used to store al1 three types of faults. which is called the combined Iauit fist. In simJ a is used to represent the combined fault list. Each array element. named Fault. has sis fields: three fields are used to store the information about the fault. while the other three are used to provide the linked list capabilit- The three fields store the type of the fault. ivhich is one of the six possible fault types (SAO. SM. SF. SR. SO. PO). The wircnurn variable is used to store the wzr~ that the fault is associated with. .A jZag variable is used to tell the status of the fault. The different possible stat uses are Id-L.WE TECTED. fd-DETECTED. fd-IMPLIED. fd_EOl7\:-1 L. and fd-.\:O.YESIS T. The definitions of these statuses will be given later when recursive fault detection is described. The other three fields used to implement a linked list are: ntrt. irnpliedi and irnplied2. The nert variable points to the next element in the linked list. A Fauh thar has the nert lariable pointing to SIZL signifies the end of the linked list. The impliedi and irnplied? pointers are used to point to faults t hat could be implied by the Fault. when a fault is detected. al1 the faults implied bj- it could also be detected. llarking a fault as detected by implied detection has a much lower cost in CPU time than actually running the simulation to determine whether the fault is detected. Thus by considering the implication relationships between faults. higher efficiency in fault simulation can be achieved. Figure -5.4 illustrates the fault list of a small circuit. This esample circuit is con- structed with 2 logic gates and -! \vires. The \vire number is not consecutive. as for most of the standard ISCr\S85 circuits. The fault list is shown on the right side of the figure. The faults in the list are connected in a linked list fashion (shown with solid arrow). The fault implication relationships of the faults are shown with dashed Figure 5.4: -4 Fault-List Esample arrow. king the fault collapsing rules described in Chapter 3. three faults are col- lapsed. namely SF(6). SAO(1) and SAO(5). These faults are marked as fd-EQI'IK-IL in the status fiag. In the GenFauM module. the first step is to allocate enough memory for the com- bined fault list. Since each piece of ivire in the circuit can have a maximum of 6 faults associated wit h it . the number of faults before collapsing is less than or equal to sis times the number of wirw in the circuit. In other words. a list whose size in number of records is six times the number of wires of Fault elements is sufficient for storing the combined fault list. Hoivever. since the CDS uses the wire number in the net list. and the wire numbering in the net list is not cont inuous (i-e. wire number 2 may not erist in the circuit). direct indesing into this array of faults will not be possible. Direct indexing is desirable because it provides an efficient ivay to access a fault given the uiw numbcr and the fault type. There are two ways to get around this problem: one is to translate the original non-continuous ivire numbering into a continuous space. the other is to allocate more memory than is actually needed. The second approach is taken in simf. and the amount of Fault elements allocated is 6 * (highestwirenumber + 112. ..\fter allocating memory for the combined fault list. the list elements are ini- tiaiized. For gate-delay faitlts and stuck-at faults. al1 the faults are marked as ex- isting while the remaining faults (including the Fault elements that corresponds to non-existing wires and the stuck-open faults) are rnarked as non-existing (flag = fd-.VO:YESIS T). .-\fter init ialization. the algori t hm goes t hrough the list of gates and collapses the faults using the fault collapsing rules presented in Chapter 3. For the faults being collapsed. their flag mould be set to fd_EQC11 :AL. Since stuck-open faults on1~-affect the wires t hat are inputs to some gates. their flag would be set to /dC'.\-DETEC'TED once the>- are verified to be gate inputs. In addition. assuming standard ChIOS technolog!: stuck-open faulrs also affect the internal wires in ASD gates and OR gates. These faults are called internal stuck-open faults. The internal wires are not part of the wire list. and thus a separate Fault elements for these internal faults are not ai-ailable. Four observations of the properties of stuck-open faults help allocate a Fault element for this internal fauIt without inserting extra internal lines into the CDS: (1) only internal wires of ASD gates and OR gates need to be consid- ered (buffers. which are configured as two inverters in series. also have internal wires. Howei-er. the stuck-open faults for these wires are collapsed): (3) .AXD gates and OR gates always have more than one input: (3) t here is always at least one stuck-open fault at the input that is collapsed because in .AXD gates and OR gates. there are at ieast trvo transistors connected in series (e.g. the n-transistor open fault (SO) for an .iSD gate at its second input ): and (1)the type of internal fault is the same as this non-esisting fault (e.g. the internal fault in an ASD gate is also SO).As a result. the collapsed Fault element for the ASD gates and OR gates is re-used for storing the in- ternai fault . In order to distinguish t his fault from the normal stuck-open faults. they are called j-i.\-O (internal SO)and f-if0 (internal PO). While collapsing the faults.

'In C. aince array indes starts at O. allocating an array with S elements will esclude element number S fron-i the array. That is the highest array indes ivould be S - 1. Therefore. a +1 term is needed. fault implication and fault equivalence among fault models are recognized using the implicd 1 and implied? pointers.

5.2.3 Simulation Algorithm

The basics of simulation in general were described in the last chapter and the basics of the PPSFP algorithm were described at the beginning of this chapter. In this section. more design details on the simulation method used in simf will be discussed.

Double Buffered Simulation

The ISCASS5 netlist format has the very important property that the gates in the netlist are recorded in sorted order [3]. -4 gate list is aorfed if for al1 the gates in the list. al1 the inputs to a gate g must be generated by gates with gate inder less than that of g. In other words. when the gates in the list are evaluated in the sorted order. it is guaranteed that the states of the inputs to the gate g are dl up-to-date when gare g is evaluated, Since simf stores t tie gat es in the same order as t hey were defined in the netlist file. this propert?. is inherited. This sorted order is ver' important for doing simulations because a fault-free simulation can simply be carried out by evaluating al1 the gates in the gate list in the defined order. The resulting simplification is illustrated in the following code fragment :

for (int j=O; jcnumber-of-gate ; j++) Evaluate-gate(j ) ; // evaluate the gate

// make an extra copy of the state variables memcpy (good-value, wire-value, number-of-state-variables);

It is also true t hat when a fault is inserted at gate g. al1 the gates t hat are affected by this fault must have a gate inder greater t han that of g because the fault can only

affect the outputs of g. Therefore. mhen a fault is inserted. n-e can propagate the fault br e~aluatingonly the gates after the point of fault insertion. -4 lot of simulation effort can be saved using this fault simulation method because al1 the unnecessary gate evaluations preceding the site of the fault are eliminated. In order to implernent this. a double bufTered data structure for storing the circuit states iras used. 1 1 (256 bit pattern) 1 1 (256 bit pattern) / 1 (256 bit pattern) 1 I + 1 1 read -

Figure 9.5: Gate Eduation in simf

Tivo integer arrays are used for storing circuit states. namely wire-'alue and good-value. Each array has n elernents. where n equals the number of gates in the circuit +1. and each element is an integer arral- for representing the parallel circuits (stafe wriable). When gates are elaluated. circuit states of the gate inputs are read from the u?ire-cahe array. and the evaluated outputs are also stored in this array (see Figure 5.5). The good-raluc array is always used for storing the fauit-free states of the circuit nodes. Iieeping the fault-free circuit states around enables us to perform fault simulation downstream from the site of the fault instead of resimulating the whole circuit from the primary inputs. .-\fter each round of fault insertion and simulation. the fault-free circuit states can be copied back to wim-ralue for the next round. Here is how a full simulation takes place. -4fter reading the circuit netlist and generating the collapsed fault list. simf is read- to perform fault simulation. The simulation module first reads a set of test patterns from the standard input (stdin). The patterns are read direcrly inro the wire-cnlue arraj- as the circuit states of the primary inputs. In sirnf. primary inputs are defined as a special kind of gate which has zero input. whereas primary out puts are normal gates wit h zero fan-out . The program dlread in only enough test patterns for one round of simulation; the cemaining ones will be read oniy after the current round is done and there are in fact still undetected faults. -4fter the nest set of patterns is read. simf will first do a fault-free simulation. Since the u.ircralue array has already been loaded with test patterns. a fault-free simulation simply evaluates al1 the gates in the gate list. in the order of the gatc index. skipping over any gates defined as primary inputs. .Ifter that. the tcire-value array is filled with the fault-free states of al1 the circuit nodes. The content is then 1 1 Gates not evaluated Necessary gate evaluations Wastefril gate evaiuations

o Location of the fault Fault-free A fault that A fault that simulation is detected is not detected

Figure 3.6: Fault Propagation in Circuits saved to the goodxaluc ar-. Once the good-calue arraj- is ready. simf starts looking for faults that are triggered by the current set of test patterns. This process is called fault triggen'ng detection. The faults that are being triggered will then be inserted into the wire-value array for fault simulation. The rire-ealuc array is clean. meaning that it stores the fault-free states. before a fault is inserted. After fauit insertion. the circuit node where the fault is inserted is the onl? dirty gate in the wirccalue array. A fault simulation can then be carried out by evaluating al1 the gates after the dirty gate. i.e. by propagating the fault. If an>-of the primary output nodes in the mire-value array is dirty after this process. then the fault has been propagated to one of the prirnary outputs. and the fault can be dropped from the fault list. Before going on to insert the nest fauit. al1 the dirtj- nodes in wirr-rulu€ are restored to fault-free states so t hat the wire-ralue list is clean again. When al1 the faults in the fault list have been esamined and al1 the fault simulations are done. t his round of simulation is finished and anot her set of test patterns can be loaded if necessarl-.

The Event Heap

Simulation effort can be reduced by simulat ing downstream from the point where the fault is inserted. This is because al1 those gate evaluations before where the fault is located just recompute known fault-free node states. On the other hand. if the effect of the fault stops propagating somewhere before the primary outputs. then al1 those gate evaluations from the point where the propagation stops up to the primary output are also wasteful. This \vaste could he very significant if the fault is near a primary input and its effect stops propagating after a short while afterwards. Figure +5.6shows these wasteful gate evaluations in a simplified diagram. In order to eliminate these unnecessary gate evaluations. an ecent-list is used. The eaent-list is simply an array of Boolean variables where each Boolean represents a gate. If the Boolean variable is TRCE. then the gate needs to be evaluated: othenvise. the gate can be ignored. CVhen performing fault simulation. the ecent-lzst could be initialized so that al1 the gates except the starting gate are set to FALSE. After evaluating each gate. the state of the gate output is compared ivith the fault-free state. If they are the same. t hen the fault effect stops propagates and the fan-outs of this gate don't have to be added to the ecent-[kt. On the other hand. if the states are different. then we need to add the fan-outs of this gate to the euent-list. i.e.. by setting their corresponding Boolean variable in the erent-lkt to TRCE. The folloit*ing code demonst rates this algori t hm:

starting-gate = index of the gate with fault inserted;

// set al1 the elements of event-list to FALSE memset (event Jist , 0, number-of -gate) ;

for (int j = starting-gate; j < number-of-gate; j++) < if (!event-listcj]) continue;

Evaluat e-Gat e (j) ;

if (gate output is different from good state)

if (gate output is a primary output) € the fault is detected stop the simulation; > else

for each index of the fan-out gates event-list [index of f an-out gates] = TRUE; > 3 > event-list 012345 7 KYWasted effort in checking the event-list

Order of evaiuation : 2.203.203. 1003 Order for adding to eventlist : 2. 202. 1003.203 Out of otder when using a FIFO Queue v Figure 5.7: C'sing a Heap for Sorting Simulation Events

Restore the fault-free States for the next fault simulation;

Psing eccnt-list. the problerns illustrated in Figure 5.6 can be avoided. However. there is stiIl room for improving the algorit hm. From the aboïe code. we see that a for loop is used for going t hrough the gate list. If t here are a lot of gates in the circuit and the fault on1~-affects a small number of gates. then a lot of time will be wasted checking the er~nt-[istfor the gates that are not triggered. -4 data structure that stores only the index of the gates t hat need to be evaluated is needed. A simple FIFO (First-In-First-Out ) queue cannot solve the problem because it doesn't guarantee that the gates will be evaluated in a sorted order. Figure 5.7 illustrates these problems. A fault is assumed to affect gate 2. After gate 2 is evaluated. gates 202 and 1003 are added to the ecent-list. Sest. gate 202 is evaluated. If the fault propagates to gate 203. then gate 203 is also added to the ecent-list. LVhen a simple FIFO queue is used. the gate evaluation order would be 2. 202. 1003. 203. This evaluation order is incorrect as we see that gate 1003 depends on the output of gate 203. To solve this problem. a data structure that sorts the €cent-list efficient ly is needed. As a result. a heap data structure was used. Starting Heap Add 16 to the Heap > Restore the Heap property

Pop 12 From Heap > Fi11 mot with 47 > Restore the Heap praperty Figure 5.8: Heap - Random Input. Sorted Output

-4 heap is a special kind of binary t ree structure. with the added property that the parent index is always smaller than the indices of its children (in the case of a heap that is sorted in ascending order). -4nother property of a heap is that it is a complete binar?- . meaning that al1 of its leares (nodes without children) are separated b!; no more than one level. and al1 the nodes at the bottom le\d are left-justified. Figure 5.8 shows the operat ions of a heap. When an element is added to the heap. it is always added as a new node at the incornplete bottom level on the right-hand-side. The element may have to be added aione to a new bot tom-most level. The elernent 's indes is t hen compared with that of its parent. If the parent is greater t han the child. t hen the two nodes swaps the indeses the' contain. The node containing the inserted indes is then compared mith its parent again. This process continues until either the child indes is greater than that of the parent or that there's no more parent (i-e. the new element reaches the root) When an element is taken out from the heap. it is alirays popped from the root. The last node of the tree (Le. the rightmost element at the bottom level) is removed and is placed in the root node position to start the process of restoring the heap. The element in the root is t hen compared with i ts children. If the parent index is smaller than its tno children. then nothing needs to be done. Otheru-ise. the smallest index of the children is swapped with the parent. If a swapping happens. then the new child is compared with its two children again. This process continues until no more swapping can occur. .An integer array is used to implement the heap. For more information. please see [Il]. L-sing a heap for sort ing the ment-list on-the-fly. the fault simulation algorit hm can be made more efficient.

add-to-heap (starting-gate index);

while ((j = pop heap) >= 0) // get the first element in the heap

Evaluat e-Gat e (j) ;

if (gate output is different from good state) < if (gate output is a primary output) < the fault is detected; stop the simulation; 3

for each index of the fan-out gates add-to-heap (index of f an-out gate) ; 3 > >

Restore the fault-free states for the next fault simulation;

Fault Triggering and Fault Insertion

U-hen a set of test patterns is applied to a CUT. the states of the circuit nodes are changed. From these states. we can determine a set of faults that are triggered by these patterns. For esampie. if a test pattern resets a circuit node to logical O. then Ive know that the stuck-at-1 fault associared with this circuit node is triggered (triggering conditions were discussed in Chapter 3). Lt'hen a fault is triggered. ive rvant to simulate the CCT with the fault effect. The act of adding the fault effect to the circuit is called fault insertion. State variable (a)

Shified state variable (b = a» 1)

Transition mask (c = b A a)

Low-to-high transition mask (d = c & a)

High-to-low transition mask (e = c & b)

* For the fus1 pattern. there are two cases: if there is a preceeding pattern, use that pattern. otherwise. repeat the pattern itself.

Figure 5.9: Detecting Circuit-Sode Transitions

To keep things simple. let's consider fault triggering detecfion for the stuck-at fault niodel only. In PPSFP. the st are of each circuit node is stored in one or more integers. where each bit of the integer represents the circuit state corresponding to a dif€erent test pattern (state uanable j. Giïen a stuck-at-1 fault for a circuit node. Say. S.Al(24). ive could sirnply check e\-er>-bit of the integers corresponding to circuit node 24. If an- of these bits is O. we knoiv that the fault is triggered. In order to masimize the throughput. ive can take advantage of the 32-bit datapath of the processor3 and use the logical operations provided. For the same esample. instead of checking each bit of the integers one by one. we can check the array of integers ivith a bit-wise .AND operation. If the result is not an all-one pattern (Oxffffffff). then there must be a O in one of these patterns. For stuck-at-0 faults. an OR operation and an all-zero pattern (OsO)could be used. Fault triggering detection wit h stuck-at faults is easy. since triggering of stuck- at iaults requires onlp one pattern. However. both gate-dela!. faults and st uck-open faults requires two consecutive test patterns to trigger the faults. which makes the situation more complicated. Figure 5.9 illust rates the steps needed for detecting t ran- sitions. In PPSFP. since each bit of an integer represents a pattern. the consecutive

------3-4ssuniing a typical present-day 32-bit processor. State variable (a) High-to-low transition mask (e)

State variable (a) Low-to-high transition mask (d) OOOIO

Insert slow-ro-rise (a AM> -d)

Figure 5.10: Fault Insert ion Csing Transit ion Masks test patterns are located side---side in an integer. in order to detect a lorv-to-high or high-ro-low transit ion on a circuit node ( which is required for triggering eit her gate- dela>-or sr uck-open faults ). ive need to first shift the state caria ble to t he right by 1 bit and then perform an esclusi\.e-OR operation between the shifted and unshifted date cariable. The result of t his operation is an integer n-here the 1's represent transitions ( f ransition mask). If a Ioiv-tehigh transition is needed. t hen the transition mask is ASDed with the unshifted date rariable so that al1 bits representing the high-tdow transitions are masked out (louito-hzgh transition mask). If the for-to-high transition mask (or the high-to-low transition rnask) does not contain a one. t hen the fault in question is not triggered. On the other hand. if there is at least one 1 in the mask. then the fault has to be inserted for fault simulation. Khen performing fault insertion. it is necessary to keep the patterns that do not trigger the fault at their fault-free state while setting those that trigger the fault to the faulty state. For esample. a state cariable may contain both 1's and 0's. and a stuck-at 1 fault needs to be inserted. In this case ive want to set al1 the bits containing O to 1 while keeping al1 those bits that were 1 to remain at 1. In other words. al1 the bits of the state cariable are set to 1. In cases of stuck-at O faults we can set a11 the bits to 0. When dealing with gate-delay faults and stuck-open faults. however. it is not that simple because not all the bits containing 1 could trigger a slow-to-rise faults or PO fault. For these faults. a transition is needed. as mentioned above. Figure 5-10 illustrates Our method of fault insertion using transition masks. To perform fault insert ion. the low-to-high transition mask or the high-to-lou? transition mask is required. Since the 1's in these masks correspond CO a transition. the mask can be used to selectiveIy set some bits of the state uariable to 1 (with an OR operation) or O (with an .ANI operation).

Recursive Fault Detection

After a fault is triggered. inserted. simulated and found to be detected. it needs to be removed (Le. dropped j from the fault list. The structure of the fault list was described in Section -52.2. rsually. this procedure is repeated for each fault in the fault list. However. by exploiting the fault-implication relationships. we can Save a lot of the effort in simulation by recursively removing faults from the fault list . We can do so when a fault with implied faults is detected. The following code demonstrates the code for the recursive function. void Mark-det ected (Fault *f ault) < // fault is detected by implication fault->flag I= fd-IMPLIED;

return ; >

if (fault is detected) < Mark-det ect ed(f ault) ; fault->flag = fd-DETECTED; // fault is detected 1

Here. the pag variable in the Fault structure play a very important rule. -4s mentioned above. the flag variable is used to record the status of the fault. Before t - Sumber of Number of Root/Fault Maximum Actuai Circuit Roots Faults Ratio Reduction Reduct ion c 13.55nr 2405 .j238 459l% .54.09%l -TKmq c 190Snr '298s 640 1 46.6S% .5:3 .:34% c26ïOnr 3.523 4446 47.31% -52.69% c3340nr .s 723 11833 4S.3S% 51.6% ~531%~ 9072 19261 47.10% 52.90% c6288nr 11220 2496.2 449.5% 53.05% c iw5Xnr El77 1 25579 47.61% 52.39%

Table 5.1: Root-to-Fault Ratio for the ISC-ASSj Circuits

fault collapsing. al1 exist ing faults are marked wi t h fd- LXDETECTED whereas the rest of the faults are fd-XO:VESIST. After fault collapsing. the collapsed faults are marked as /d-EQLII.:-I L. meaning that the- are equivalent to some ot her faults. The fault list contains only faults that are marked as fd- L-XDETECTED. When a Fault is detected. the recursive algorithm first searches for al1 the faults implied by the Fault and marks them as id-1-VPLIED. meaning that they are detected by implication. Then the Fault itself is marked as fd-DETECTED. When al1 the faults are checked for the current set of test patterns. a procedure will be called to remove from the fault list ali the Faults marked with either fdDETECTED or jd3-VPLIED. As a result. ivhen the next set of test patterns is loaded for simulation. the fault list again contains only al1 the undetected faults. Table 5.1 shows the root-to-fault ratio for the ISC.-\SS5 benchmark circuits [3]. .A roui in the fault list is defined as a fault that is not implied by any other faulr. The root-tefault ratio is thus the number of roots in a circuit divided by the nunber of faults. In the table. al1 the ISCASS3 circuits have a root-to-fault ratio near 47%. If a set of test pattern detects ail the roots the first time they are triggered. then a maximum simulation reduction of about 53% (100% - 47%) could be achieved. In reality. it is nearly impossible to reach this maximum: however. my experiments have shorvn that the simulations are still accelerated by an average of ?S..j% using this recursive fault detection algorit hm. [ Circuit 1 sim.3 1 simf Speed-up / # Patterns 1

Table 5.2: Evaluation of Simulation Speed (in seconds)

5.3 Evaluation of simf

simf rvas designed to be both efficient and accurate. Another fault sirnulator named sirn.3. which is a public domain fault simulator. \vas used as a benchmark to evaluate these claims. The source code of sim3 is not available. so it is nearly impossible to know how it is implemented [34. -4s a result . simf was designed from scratch. following the basic design concepts shared by al1 of PPSFP simulators. .MI the tech- niques ment ioned in the previous section are irnplemented in simJ The seven larger ISC.ASS.3 circuits were used as benchmark circuits for the evaluations. The T--SISshell command tinlr iras used to measure the user time of the simulators to evaluate the efFicienc~+of simJ .As shown in Table 5.2. fault simulation times for most of the circuits were sped up with respect to sim9 by anywhere from 1.13 to 6.93. The speed-up is calculated as (sim3time/sim f time).These circuits are al1 simulated to 99% stuck-at fault coverage (escept for c2670nr and c7552nr. rvhere 95.89% and 9S.ï.j% were obtained. respectively ). This shows t hat simf is part icularly good at detecting the faults early on in the simulations. One of the reasons for not achieving a high speed-up when there are a lot of test patterns is that the time to read in a test pattern from stdin (standard input) is relatively constant for both simulators. The more patterns that are run. the lower is the contribution of this common constant. In general. simf appears to be considerably faster than sirn3. In terms of accuracy. the total number of faults after collapsing is an important rneasure of the accuracy in generating faults. The total number of faults generated by simf and sim9 for each circuit are compared in Table 5-13. Sote that for stuck-at Circuit

Table 5.3: Fault Collapsing Results for the ISC-4SS.5 Circuits faults. both simf and sim.3 produce exactly the same number of collapsed faults and fault coverage. For gate-delay faults. the number of collapsed faults is different for some circuits. This is because simJ collapses gate-delaje faults in a slightly different way t han simf. In sirn3. gate-delay faults for inverters and buffers are collapsed at the input. whereas t hey are collapsed at the output for the ot her logic gates. This results in the situation described in Chapter 3. where some faults t hat are equivalent cannot be collapsed. Although ive don't have the source code of sim.3 to verify this claim. ive did rnodify simf in an experiment to collapse faults in this fashion. This modified version of sim f produces esactly the same amount of gate-delay faults as sim3. which provides empirical evidence of how sim3 does i ts fault collapsing. Keshould also note that the number of stuck-open faults is also different. This is because sim9 did not take into account the stuck-open faults at the input of buffers. -4nother important measure of accuracy are the simulation results. A small test circuit was created and fault simulated manually in the early phases of the develop- ment of simf. After getting simf to sirnulate this circuit correctly. it was used for simulating the ISC.\SS.j circuits and the results were compared with those of sim-3. .As mentioned in the previous paragra~h.the total number of delay faults for the two sirnulators was different. As a result. the fault coverages for de1ay faults cannot be compared direct ly. To test the simulation of gate-delay faults. simf !vas again modi- fied to generate the same number of faults as sim3. Through al1 these testing phases. simfwas found to be accurate in fault simulation. \Ve can also compare the faiilt simulator is in terms of supported features. This. however. rvas not a high priority in this research. There are significant differences between sim3 and simf in terms of features. For example. simJ has three options for feeding the circuit with test patterns: it cari either read from the standard input (stdin). read from a file. or generate LFSR patterns by itself. In simf. only the first option is supported. The other two options were not implemented because they were unnecessary for the purposes of this research. hnother feature is that nhen simulat ing a circuit with exclusive-or gates and exclusive-nor gates. which does not have collapsing rules as simple as the ot her gates. sim3 mil1 convert t hese gates into the other logic gates. whereas simf mil1 simply ignore them. There were two primary reasons for this decision. First of all. in CMOS technology. exclusive-or gates are not simply a combination of the basic logic gates. In fact. the- usually use a special circuit structure called transmission-gates. It was decided t hat converting the exclusive-or gates to basic gates is simply unrealist ic and generates unnecessary faults. In addition. none of the benchmark circuits selected contain ei t her esclusi ve-or or exdusive-nor gates. Therefore. they are ignored for the purpose of this research. Although these features are lacking froni simJ this does not diminish its value as a benchmark fa& simrrlator in t his research. Chapter 6 Pattern-Parallel Fault Simulation on CaRAM

6.1 Overview

After developing and validating simf to obtain a representative conventional fault simulator. the first fault simulator based on CmR.AhI rvas developed. The first sim- ulator. named simf-pps. implements a purely pattern-parallel fault simulator. The architecture of this sirnulator is basically the same as the conventional one. except that instead of using 32-bit integers to store the circuit states and simulating gate evaluat ions wit h the logical operations provided by the CPU. the CaRASI processing elements and their local memory. respectively. are used. The simuiator uses every PE available in COR-UIto simulate the patterns in parallel. therefore the maximum number of patterns in parallel equals to the number of PEs on the system. simf-pps reuses al1 the modules discussed in the previous chapter ercept the sim- ulation module. .A modified simulation module that uses the CaR-411 PEs instead of the integer arra>-sis developed by replacing a11 of the statements that use integer arrays with the corresponding C.R.411 operate statements. By doing so. the basic architecture of simf is rnaintained. The following section will discuss the differences between the simulators in more detail. simfi integer array

# of PE bits l

O Integer n = number of circuit nodes

Figure 6.1: Data Structure used for Storing Circuit Sode States

6.2 Design Details 6.2.1 Data Structure

In simf. the logical States of circuit nodes are stored in two arrays of integers. The array named good-value stores the fault-free value of the circuit nodes when nefault is inserted: the ccire-value arra- is used for performing gate evaluations. fault in- sertions and fault simulations. LYhen implenienting the pattern-parallel fault sim- ulator on CaRA'tl. these integer arrays are replaced by cints. where each cint is a CmR.A.\I array of multiple-bit variables with the numler of bits equal to the nuni berojclrcuitnodeJriods + 1' and the number of variables equals the number of PEs. Figure 6.1 shows the difference between these two data structures. After the fault-free simulation. each fault is checked for fault t riggering condit ions using the same procedure as simf. In simf.2-56 test patterns are simulated in parallel. Hotvet-er. we often need to simulate more than 2.56 test patterns. in which case more than one round of simulation is required. Since this simulator implements not only the st uck-at fault mode1 but also the gate-delay and the transistor st uck-open models. the fault-free value of the last pattern in the previous round has to be stored for the fault triggering detection to work correctl. This is because two patterns are required to trigger transition faults. Al1 patterns except the ver? first one have at Ieast one par tern preceding them. so. if there are n patterns to be simulated. then there are n - 1 pairs of test patterns that should be considered when checking the fault triggering

'The extra node is needed for fault insertion.

13 patterns patterns

first second n th .-* round round round 1 \ duplicate first pattern duplicate last pattern in the previous round

Figure 6.2: Csing the First PE to Backup Data conditions for the transit ion faults. However. when these patterns are divided into subsets of test patterns for simulating in different rounds. one pair of the test patterns is broken up. When simulating groups of 256 test patterns. the pattern pairs (2.55'. 256). (51 1.JI?). etc.. are broken up. -4s a result. the fauit-free circuit states of the last pattern in the previous round need to be stored aside for correct faiilt simulations. In simJ tracking the node values for the last pattern in the last sinitilat ion round is done by allocat ing a byte array for backirig up the last h>-teof eacli .dute c-«riable in the good-rulue arraj- before loading in the nest set of test patterns. Diiririg faiilt triggering condition detection. when the st«tec.uriablt is shifted to the riglit bj. one bit. the last bit of the corresponding byte in the back up arra>-is sliifted back into the stciternriable at the Ieft end as the first bit. For sitnf-pps. howei-er. it is not desirable to back up the data in a byte array because this ivoulcl require m hits to he read from C'eR.\.\I during the back up process. and the hits needecl to be irrit ten back to the CoR.411 when checking fault t riggering conditions. The est ra nienior!- accesses would slow down the whole simulation. An alternative method. which is implemented in simj-pps. is to use the local rnemor- of the first PE to back up the fault-free state of the last pattern in the previous round (See Figure 6.2). Before reading a set of test patterns. the content of the last PE of the good-cal ue cint is copied to the first PE. Then test patterns can be written into CORANstarting from PE 13. In practice. only the test pattern in the last PE needs to be duplicated because. after reading a11 the test patterns. a fault-free

'Patterns nurnbers start with zero. 'PE numbers start from zero. simulation ivill take place and al1 the circuit states will be determined. including the first PE, During the first round. there is no previous round for PE O to duplicate. In this case. the first pattern will be written into both PE O and PE 1. This is reasonable because the first pattern has no pattern before it. and thus there shouldn't be any signal transition between PE O and PE 1. The most efficient way to avoid having transitions there is to make PE O and PE 1 store the same circuit states. This is efficient because this may the first PE in the first round does not need to be treated different Iy. Mïth the first PE duplicating either the very first pattern or the iast pattern of the previous round. the effective number of patterns simulated in parallel is reduced by 1: Le. if a 1031 PE CORAMis used. only 1023 test patterns are simulated in parallel in each round.

6.2.2 CORAMAssembler

Al1 the code in simf that uses the integer array data structure is replaced with the corresponding CaRALI assembly code. These code changes are found in three main parts of the simulation module.

Reading Test Patterns

The first section of code requiring changes is the where test patterns are read from standard input into the data structure. It is assumed that each line of the standard input contains a test pattern. These test patterns are read one bit at a tirne (a -1' or -0') and are stored in an interna1 data structure. In simf.this data structure is an array of char that is big enough to hold one round of test patterns. For example. if we are simulating 256 patterns in parallel and there are 100 bits per pattern. then the size of the char array is 236 x 100 bytes. When the primary inputs are evaiuated. these test patterns are copied into the good-value array for simulation. In simf-pps. the data structure for storing the test patterns is a cint whose number of bits equals the number of bits per pattern (i.e. number of primary inputs). Since each pattern is read one at a time. the local mernories of the PEs are filled up one PE at a time. The following pseudo code is used to load the test patterns: After reading first pattern (PEO : Arbitrary, PE 1 - 1023 : pattern 1) O000 O00 O O000 O00 O 1000 O O 0 O O000 0 O O O

O O O O 0.. O O O 0.. O 1000 O00 O O000 O00 O 1111 11 li 1 1 Test Pattern PEid O 1 2 3 40 41 42 10231 Y O111 111 I

After reading into PE 40 (PE 1-39 : patterns 1-39.PE 40- 1023 : pattern 40)

PEid O f 2 3 40 41 43 10231

Figure 6.3: Copying Test Patterns into CeRAM

copy pattern from PE 1023 to PE O;

// setup Y register as mask as 01111 ...1 // * it takes advantage of the f act that the CRAM emulator // shifts in O for the first PE. pattern->operate (0, op-one, uritey) ; pattern->operate (0, op-y , shiftright) ; while (read from standard input)

if (it is a new line character)

// a pattern is finished reading // shift the Y register to the right by one PE . pattern->operate (0, op-y , shiftright) ; // put the current character into the X register if (it' s 1) pattern->operate (0, op-one , writex) ; else (it' s 0) pattern->operate (0, op-zero , writex) ;

// store the read character into the appropriate row of // local memory, for the PEs where Y register is set to // 1 only // opcode Oxe2 = (X & Y) 1 (M & -Y) pattern->operate (current bit, Oxe2, groupwrite) ; > 3

// copy first pattern to first PE if this is the f irst time // pattern is read if (f irst t ime) copy pattern f rom PEI to PEO ;

The 1' register here is used as a mask for selecting the PEs to irrite the test pattern. .-\ i-alue of at one position selects the corresponding PE. Sote that al1 the PEs trith a P Eid greater than t hat of the desired PE are also selected. As a result. the test pattern will be rrritten to al1 these PEs. After each test pattern is read in completely. the 1- mask is shifted to the rigiit b>-one bit. so t hat the previousl>-stored test patterns irill not be overwrit ten. The procedure is illust rated in Figure 6.3. .in alternat ire iva?. of pattern generat ion is to use P Eid as random seed to generate test patterns ivithin CaRAlI. This method should be able to saïe the rime needed to read in test patterns one bit at a time. However. the quality of such test pattern generator is unknown. \\-het her t his met hod can produce test patterns generated by other test pattern generators (such as LFSR) is ais0 an unanswered question. It is an entirelx neiv research topic to generate test patterns iising P Eid as random seed. Since t his method is not compatible with sinlf. ive did not irnplement it in this research.

Gate Evaluation

-4nother portion of the program where CaR.411 operations replace traditional ones are t tie gate e~aluationrouiines. In both sinc,fand simbpps. each basic logic gate has a gate evaluat ion funct ion associated with it. When the gat es in the gate list need to be evaluated. t heir corresponding gate evaluat ion functions are cailed. Each such function basical1~-loads the States of each of the gate's inputs. evaiuates the result based on the logical operation the gate represents. and then writes it back to the gate's output node. In the conventional simf version. t his step has to be repeated for each integer in the integer array that stores the date calue of the circuit nodes. -4 simple cornparison of the evaluation of NAYD gates is presented below:

// NAND gate evaluation in conventional simulation void g-nand (int j) rL // n = number of integer per integer array for (int k=O; k

// set output state to be the same as the f irst input >sstate WIRE-VALUE(g-cut->gate-list[j] .out-n0de.k) = WIRE-VALUE(g-cut->gate-list [j 1 .in-node [O] ,k) ;

// apply AND operation // to the output state and the rest of the inputs for (int 1=1; lgate-listcj] .n-fan-in; 1++) WIRE-VALUE(g-cut->gate-listcj] .out-n0de.k) &= WIRE-VALUE(g-cut->gate-list [j] . in-node Cl] ,k) ;

// negate output sa that it becomes a NAND function WIRE-VALUE(g-cut->gate_list[j] .out-node,k) = VIRE-VALUE(g-cut->gate-list [j] . out-node ,k) ;

// NAND gate evaluation using C*RAM void g-nand (int j) < // set X register to be the same as the f irst input's state wire-value->operate (g-cut->gate-list [j] . in-node [O] , op-rn, writex) ;

// apply AND operation // to the X register and the rest of the inputs for (int 14; lcg-cut->gate-listCj].n-fan-in; 1++) vire-value->operate (g-cut->gate-list[j] .in-nodecl], op-xandm, writex);

// store X bar at the output of the gate wire-value->operate (g-cut->gate-list [j] .out-node, op-xbar , group-ite) ; Fault Triggering Detection and Insertion

CORAMoperate statements are also used in fault triggering. detection and insertion. -4s mentioned in the previous chapter. fault triggering detection is done by checking the state iariable at the location of the fault in question. In sim/. a bit mask has to be applied when checking any one of the sis types of faults (S.40. S.11. SR. SF. ';O. PO) because we only want to detect faults where the test pattern is valid. For example. if only 10 test patterns are in the simulation. we want to ignore the 216 bits of the state variables that are not simulating valid test patterns. In simf-pps. this is still true for the dela? faults and stuck-open faults because ive need to detect signal transitions in order to detect the triggering conditions of these faults. which requires shifting of the state \-ariables to the right b>*one bit. Since the left-most bit is always shifted in as a O (one of the COR.~.\Iconfiguration options chosen for this project). the first PE could erroneousl~produce an incorrect signal. Therefore ive want to mask out any signal frorn the first PE. For stuck-at faults. however. no masking is needed. This is because all the PEs are initiaiized using valid test patterns. although there are repeated ones. To compare the difference between the conventional simulator and the CoRr\SI version. the code fragments for detecting st uck-at- 1 fault triggering for both programs are Iisted.

// stuck-at-1 f ault triggering detection // in convent ional simulation case f ,SA1 : i int flag=O;

// fault triggering // if any of the bits in the good-value state variable is O // then the fault is triggered. // or-mask is a 256 bit mask // where O represents a valid test pattern // and 1 represents invalid ones.

for (k=O; k

Table 6.1: Comparing simfand sirnf-pps in Terms of Speed-'p

if (f lag)

the f ault is triggered. prepare for simulation 3

// stuck-at-1 fault triggering detection using C*RAM case f-SA1:

// fault triggering // if any of the bits of the state variable is O, flag = O

boolean flag = true; good-value->operate (fault-node, op-ma busaccess, flag);

// setup vire value

the fault is triggered. prepare for simulation

6.3 Evaluat ion

After verifying that the CmRAhI version of the pattern-parallel fault simulator pro- duced correct resu1t.s. the program was ready for performance evaluation. This pro- gram nas compared to sim/ in two respects: speed and PE utilization. The time measured by the existing CoR..\.\I emulator measures the amount of time it takes to 1 Total i 2-56 PE 1024 PE t't il. Circuit Triggering Needed/ Used 1 Ct il. Seededlvsed Ct il. Ratio c l35Snr 6% 39k/79ïk 1 4.91% 32k/2108k 1.54% 3.2 c1908nr 319k 323k/24-1 k 1 13.36% 23/45 6.03% 9-.- -1 c267Onr ?dm 44m/194m 22.51% Ilm/l9Tm 22.17% 1 .O c3530nr 390k 31Sk/3695k 8.62% 25ïk/703.jk 3.65% 2.4 c.53 1.5nr -5 110k/3163k 9.18% 219k/S544k 2.56% 3.6 c62SSnr 60k 73k/25llk 0.93% '13k/1005Sk 0.23% 1.0 c7.5.52nr , 33m, 33rn/l5lm ,21.73% 1 33m/170m 19.23% 1.1

Table 6.2: Comparison of PE Gtilization for simf and simf-pps

run the CoR.411 operations plus the t ime spent in the perforrning secpential opera- tions such as organizing the fault list . The results for the ISCr\S-8.5 benchmarks are presented in Table 6.1. where the CmRAM had 1024 PEs. Another measurement is the notion of PE utilization. which was introduced in Chapter 4. PE utilization is measured by implementing an instrurnented rersion of simf that has more feat ures t han the normal one (and is hence slower ). One of these features is used to calculate the PE utilization. For esample. by setting this version to simulate '256 test patterns in parallel. the PE utilization figure for 256 PEs can be measured. A higher utilization implies Iess processing power is rvasred. Table 6.2 shows the PE utilization for running 2.56 patterns and 10'24 patterns in parallel. ivhich represent s sirnf and sim/-pps. respect i\+ely. \\'ben measuring PE utilization. ive must record the number of useful simulations versus the total number of simulations done. In theory. the number of useful simula- tions should be independent of the number of patterns simulated in parallel. This is because the area under the curve of faults being triggered is the same no rnatter hoiv rnany test patterns are run in parallel (see Section 1.6 for more detail). However. this is not the rvhole truth in our case because fault implication is irnplemented. M'ith fault implication. simulations can be elirninated by realizing t hat some faults (such as stuck-at-1 faults) can be dropped from the fault list when another fault is being detected (such as slow-to-fa11 faults). In Table 6.2. ive can see that the needed columns under 2.56 PE and 1021 PE are indeed mostly different. Sote that the PE utilization ratio is generally very low (as lotv as 0.23%). When 1024 test patterns are used. the worst case in PE utilization is t hat the first or second PE detected the fault. In that case. onl- 1 out of the 256 PEs is useful. and the PE utilization ratio is about 0.4%. LVhen 10'24 PEs are used. the worst case is about 0.1%. From Table 6.2. we can see that the results for the c62YSnr circuit is ver- close to this worst case situation. The PE ut ilization ratios of our benchmark circuits are expected to be low because the circuits are relat ively easy to detect (Le. t hey take less than .ik test patterns to reach 99% fault coverage in five of Our benchmark circuits). -4 PE ut ilization ratio t hat is greater t han 10% is t hus a very good ratio. Lié should also note that the utilization ratio does not necessarily equal the speed ratio between the two simulations because of three reasons. First . t here is a difference betu-een the instruction speed of the C'PL- and the vector operation speed of CeR-441. Second. when 1021 PEs are used for simulation. only 10'23 patterns are actually simulated in parallel. Lastly. a point tvhich is not obvious is that a fault simulation session with 10'24 patterns in parallel could potentiall~run faster than one that has only 256 test patterns. This is an effect of event-driven simulation. Let-s use an esample to illustrate this point: \\-hen 2.56 test patterns are simulated in parallel. faiilt A is detected at primary output '20 with pattern 123: when 102-1 patterns are used. fault -4 could be detecred at primary output 1 with pattern 923. .lssuming primary output 1 is near the beginning of the gate list. whereas primary output 10 is near the end. a lot of gate evaluat ions can be avoided entirel>-bj- stoppjng the fault simirlat ion earlj-.

6.4 Discussion

In Our implementation. two arrays of cbooleans are used for storing good circuit states and faulty circuit states. Csing two arrays can Save a lot of time in simulation because when faults are inserted. we can sirnulate from the site of the fault and do not have to simulate al1 the way from the primary inputs. r\lso. we can stop the simulation as soon as the fault effect stops propagating. However. using two arrays also requires more rnernory for storing the circuit node values. For esample. when simulating a circuit with 10k circuit nodes and 256 patterns in parallel. the amount of memory required rvill be 2 * 10k + 26618 = 640kbyt~s.This is not an issue in simf because the workstation used is configured rvith hundreds of rnegabytes of For simf-pps. hoivever. since no virtual memory is irnplemented for the CmR;\51. the amount of CoR.-\.\I rnemory available for programming is lirnited to the total amount of local memory the CORAM chip ha. Csing the same exarnple. a CmRAM chip that has at least 2Ok bits of local memorp per PE is required. However. the biggest implementation of CaR.411 at the time of this writing has only 16k bits of local memory per PE. Therefore. the biggest circuit simf-pps could support should ha\-e less t han 8k of circuit nodes. Currently. integrated circuits with more than 8k of nodes are not uncommon. This limitation renders simf-pps impractical for realisticaliy large industrial circuits. at least in this simplest version. One way to make this simulator useful for larger circuits is to group multiple PEs together to increase the local memory size. For esample. when two PEs are grouped toget her. the effective memory size for the PE group is twice t hat of a single PE. If each PE has 16k bits of local memory. then a group of two PEs will have 3% bits of local memory. u-hich is enough for the circuit described in rhe previous paragraph. The cost of this met hod is that half of the PEs are effectivel~disabled. thus reducing the parallelism. Also. an estra CJ-clehas to be spent when data from a PE's local memory has to be transferred to the nest PE in the same group. The larger the PE group. the higher the penalty. Severt heless. this could be a viable method for simulat ing larger circuits. Chapter 7 Fault-Parallel Fault Simulation on CmRAM

7.1 Overview

The second algorit hm implernented on COR-411 is Fault-Parallel Fault Simulation. which ive cal1 simj-fps. In t his simulator. CeR-111 is still responsible for performing gate evaluat ions: however. the PEs are no longer simulat ing the same fault simultane- ously. Rat her. al1 the PEs are simulat ing the same test pattern at the same time for difFerent faults. Since fault-parallel fault simulation is quite different From pattern- parallel fault simulation. a new algorithm needed to be designed. The key differences between this simulator and the previous ones are (1) the use of fault-families in the fault List organization: (2) the use of the conventional CPP for fault-free simulation: and (3) multiple fault insertion. A fuult-familg is a group of faults that can cause a circuit node to appear [vit h the same erroneous value. Figure 7.1 shorvs the data florv diagram of sim/_fpb. In t his diagram. bot h the control Row and the major data structures are shoim. After reading a test pattern from the primary input. a simulation round begins. .A fault-free simulation of the test pattern is carried out by the host CPL and the data is stored in a data structure. which is labeled Good Sini Result in the figure. Sote that in rimf-pps. the fault-free sinlulation was done in CeR,UI. Since the resuit does not reside in the CaR-01. a module called Ci RI1.U-rnapp~ris used to Ioad the node values from the host 's mernory into CeR.414 w henever the data is needed. After the fault-free simulation terminates. the heap is prepared for fault simulation. At this stage. each fault-family is checked against the fault-free circuit's node States to determine whet her the fault-famil? needs --) data flow Good Sim II Result gate 1 node value gate 3 -m Simulation.Ood k gate 3 C*RAM mapper is used to load node values

gate n - - information at primary outputs

----'------O------\ I A pas of fault simulation 1 - I I 1 / ,I I Prepare \ Fault 7C / IfI Drop Fault ' Heap Simulation I 1

fauit information gate number of Simulation heap fault farnily

1 Fault Family tree list (fault list) 1 droppd faults

Session

Figure 7.1: Datafiow Diagram of simj-lps

S.? 1 Circuit 1 CPI: (ms) 1 C.RA.\l(ms! 1 Ratio i

Table 7.1: Speed DiRerence in Fault-Free Simulation

to be simulated. Then. fault simulation is carried out by the CORAM. The result of the simulation is then checked to see if an>-fault-family is detected. If so. the triggered faults in the family are removed from the fault list. If the fault-family contains no more undetected faults. t hen the famil! itself is removed from the fault-farnily tree list. \\-hen simulating large circuits. it is likely that there will not be enough PEs for fault simulation in one CaR,UI load. If this is the case. then fault-families that weren't simulated in the previous round will he added to the simulation heap for simulation in the nest round. The current round of simulation is completed when al1 the triggered fault-families are simulated. Sote the three steps surround bj- a dashed bos. This box represents a simulation pass. Fl'hen there are more fault families needed to be simulated than the amount of available PEs. it is necessary to run multiple passes of fault simulations in one round. In the nest section. the design of the fault-parallel algorithm will be discussed in more detail.

7.2 Design Details 7.2.1 Simulation with CaRAM

In sin~/_fpe.the fault-free simulation uses the host CPC ro perform gate evaluations and stores the result in main memory: however. in simjlpps. the CORAM PEs and their local rnemory are used. From Table 7.1. we can see that the host CPC runs about 3 times faster than C.RA.\L when there's only one single test pattern to be fault simulated. Since the fault-free simulation data is accessed very of'ten by the host CPC. it is advantageous to store it in main memory. 1 O ~tmof cycle 1 > End of cycle Start and End O of cycle

Gate evaluation time line (heap order: 1, 3. 5, 7. 8)

* Node 4 is never used in this heap

Figure 7.2: Illustration of Circuit Sode Life-Cycles

Fault simulation is carried out in the CeR-411. In our simulator. a dynamic mem- or!. allocation scheme n-as implemented: i.e. a circuit node occupies CoRAlI rnemory onl!. for the time when it is needed for fault simulation. It is observed that dur- ing a fault simulation. each circuit node passes through a life cycle. .-\ circuit node value can corne into esistence in tivo tvays. First. when a gate is first evaluated. the gate output value needs to be stored somewhere. This is when an output node of a gate becomes active (starts occupying memory). Second. when a node's value is needed for fault simulation and it hasn't been loaded into CeRAM yet. then CeR.411 space has to be allocated for it and the node becomes active. The circuit node occupies CeRAlI rnemory until it is not needed anyrnore. rodes t hat are not pri- mary outputs are active as long as they are inputs to at least one gate that has not been evaluated yet. A re ference nuniber is associated with each circuit node. The reference number is the maximum gate number arnong its fan-out nodes. When I Circuit / a b c 1 b/a / c/a 1 Reduction 1 c1355nr 387 104 c1908nr 1194 312 c26'70nr 1610 310 c3540nr 2476 509 c5313nr 2431 286 c6'288nr 911 182 c7552nr 3604 623 a) Llït hout dynamic merno- allocation ( number of circuit nodes). b ) king dynamic memory allocation wi t hou t sort ing. c) L-sing dynamic memory allocation wit h minimum-act ive-nodes sorting.

Table 7.2: Nemory Requirement Wit h or Kithout Dynamic Memory Allocation the currenf gate rraluation number exceeds this reference number. the allocated CmR-414 memory can be freed and the circuit node becomes deacticated. Figure 7.2 illustrates the concept of a circuit node iife cycle. The circuit is simu- lated with the heap containing nodes 1. 3. 5. 7. and S. Wien each of these gates is evaluated. the inputs of the gates will need to be loaded. To satisfy t his requirement. it can he seen in the timing diagram that nodes 2 and 6 are activated just before gates 3 and 8 are evaluated. These two nodes are immediately freed. or deactivated. as mon as the gate evaluations are done. because the' are not needed anymore. Me should note that node 4 is never activated in this setup although it is an input to node 6. This is because ive have already deterniined the value of node 6 in fauIt-free simulation. In addition. node 5 is not freed once acti\-ated because it is one of the primary outputs. The dynamic memory allocation scheme was implemented so t hat fault-free node values are loaded into CeR-AN only when the- are needed. It has the side advantage of reducing the amount of CeR-W memory that is needed for fault simulation. thus allowing the simulation of larger circuits. Table 7.2 shows the memory requirement for three cases: (a) not using dynamic memory allocation at all: (b) using d-namic memory allocation with unsorted gate numbers: (c) using dyamic mernory allocation with gate numbers sorted by min- imizing the number of nodes that are active at any tirne (minimum active nodes). Memory reduction of up to 96% can be achieved. Sorting gate numbers in minimum- acticc-nodes is an NP-complete probleml. The data shoivn in the table was obtained using heuristics2. tvhich do not guarantee that an optimal solution will be found. Finding the best heuristic. however. is outside of the scope of this research. A module named cram-mapper mas created for maintaining links between the fault-free node values and the CORAM. It provides six routines. An Init-CR.4 ilf-Wapper function must be called once in the program before using this module. This rou- t ine init ializes the necessary data structures. The cram-rnapper module defines a CR.4 .\%!ode structure which alloms the creatioo of a cboolean linked-list. .A nodemapper array is defined for associat ing circuit nodes wit h CRAMNodes. Four funct ions are provided for accessing the CR-UISodes. namely Obtain-.\ode. Obtain-.\-ode-u'-Loading. Obtain-.Yod€-1-mapper and i :se-.\'ode. The following code demonstrates inverter eval- uation using two of these functions. inline void g-not (Circuit *cut, int j) € cboolean *out-node = Obtain-Node (cut->gate-list [j] .out-node) ;

cboolean *in-node = Use-Node (cut->gate-list [j 1 . in-node [O] ) ; in-node->operate (0, op-m, writex) ;

out-node->operate (0, op-xbar , groupwrite) ;

Here. Obtain,.\odf is called to allocate a node from CORAN.\i-hen Obtain-'ode is called. the crammapper module first checks nodemapper. If a node is already associated [vit h the gare number requested. t hen t his node is ret urned. Ot herwise. CORANmernory for a new CRRMSode is allocated3. and nodemapper[gate number] is set to point to this new CRANSode. The pointer of the new cboolean variable is also returned. Once the output node is allocated. L'se-.\ode is called to *useothe input node value. Again. the nodemapper array is checked. If there isn't already a CRAlISode for this node. then a new C'R,AllSode is created and the cboolean ' Arbitrary instances of an SP-cornplere problem cannot be solved deterministically in polynomial time. %odes were reordered by anaiyzing the fan-in and fan-out values. 3The present program will fail if there is not enough CmRAll menior-. is initialized with the fault-free circuit node due. This is done by using eit her the op-one or opzero CaR-UI operation. which sets a cboolean to either al1 1's or al1 0's. respectively. Before returning the cboolean for gate evaluation. the reference number of the node is checked. If the current gate evaluation number equals the reference number. then the node is put on the avail linked-list so that it is available for future use. There are two ot her ObtainSode funct ions. The Obtain-.\ode-w-Loading funct ion not only allocates rnemory for a node. but also initializes it \vit h the fault-free node value. This function is useful during fault simulation. where the fault-free circuit node is needed as input to some other gates. The Obtain-:Vodod~-f-mapperfunction simply returns a XLLvalue when there isn-t a CRAMSode associated with the node. rather t han allocat ing memory for it . It is useful when the primary out put values are checked for detected nodes. where primary output \dues that are not affected by faults can be ignored. Finally. there's a Free-411-.\*ode function which rnoves al1 the CRAMSodes to the avail linked-list for the nest simulation round to begin. Kith the crarn-rnapper module. fault-parallei fault sirnulat ion is carried out in CaRAII. The fault simulation process is described in detail in the following subsec-

7.2.2 Fault Simulation Fault-Farnily

So far. faults have been associated rvith indi\-idual rvires. Each wire can be affected b>- up to sis faults (SAO. SAl. SR. SF. ?;O. PO) and each fault is inserted and detected as a separate entity. Li-hen multiple u-ires are connected together. the' form a circuit node. ivhich is either the output of a logic gate or a prirnarj- input. Thus ive have a node- wirefa ult hierarchj-. Treating faults as individual entities was adequate in the previous simulators where only one fauit is inserted in each pass of fault simulation. LVhen there's only one fault. the prïrnary effect4 of the fault can be calculated and inserted as erroneous node values before the simulation t akes place (sirnulating downst ream from the fault ). If there are multiple faults to be simulated. i.e. in fault-parallel fault simulation.

'The primary effect of a fault is the behaviour of the first node being affected by the fault

90 Good insert After Gate Desired 1 Values Errors EvaIuation Values

*Enors masked by gate evaluation

Figure 7.3: Gate Evaluation with .\lultiple Fault-Insertion When fault-free vaiue ofnode31 = 1, Effect of SAO(128) : Effect of SAO(129) : Node 31 = O Node 31 = 1 wire 138 = O wire 128 = 1 wire 129=0 wire 129 = O gate 68 wire 130 = O wire 130 = 1 gate 79 wire 131 =O wire 131 = 1

Figure 7.4: Problem \vit h Fan-out Sodes hoivever. this simple fault insertion method is not adequate. Let's rake the circuit in Figure 7.3 as an esample. Four faults are inserted to their corresponding wires for fault simulation. Ideally. three faults will be detected (SAO(2). SAl(3). SAO(1)). However. the evaluation of the OR gate erases the fault effect inserted for wire number 3. and the evaluation of the inverter erases that of wire number 4. -4s a result. only one fault is detected (SAO(2)). In order to avoid this. fault effects must be inserted after the gate has been evaluated. this requirement calls for a way to address faults with respect to the gate nurnber. Our simulator is node-based. meaning that the state of a group of rvires connected toget her as a node is represented by a single Boolean variable. This met hod is far more efficient t han a wire-based sirnulator because there is Iess memory transfer. However. using a node-based simulator increases the difficiilty in treating faults as separate ent ities. Figure 7.4 illustrates t his problem. Node 31 is shown as being composed of the four wires 128. 129. 130. and 131. When there is only one fault (SAO(129)) to simulate. a temporary node can be ailocated to store the fault. keeping the original node value unchanged so that gate 68 and gate 79 cm be evaluated correctly. With multiple fault insertion. this method rvill be inefficient because there will be too many temporary nodes to handle (one per each fault ). This again prompts for a way to address faults and their effects with node numbers. The idea of a fault-farnily is

int roduced for t his purpose. Each circuit node can be set to one of two logical values: 1 or O. When the fault-free value of a node is 1. then the faulty value is O and vice-versa. .A fault has the effect of forcing a wire to a faulty value. which in turn will also affect some node value. \\'ben the wire containing a fault is the output wire of a gate. then forcing the value of the wire to a -lue is the same as forcing the value of the corresponding node to that value. On the other hand. if the affected wire is a fan-out tvire. then it must be an input to sorne ot her logic gate. and the downstrearn node it affects ivill be the output node of t hat logic gate. Based on this observation. we can conclude that any fault has the primary effect that a node tri11 be forced to an erroneous value. and so this prirnary effect can be described by the node n umber and this value. The primary effect is not unique to each fault. For example. the primary effect of a stuck-nt-1 fault affecting the output node of a USDgate can be caused by a stuck-ar-L fault at the output n-ire or a stuck-at-0 fault at a gate input. We cal1 a set of faults t hat have the same primary effect a fault-family. For a circuit tvith x nodes. there are '2s fault-families. Wth fault-families. faults can be more easily inserted into the fault simulation on-the-fly. Khen we want to simulate a fault in parailel along uith other faults. we can eïaluate the gates until after the node that contains the primary effect of the fault being evaluated. At this point. the entire fault-family. which contains the effect of the fault . can be inserted instead of individual faults. Since the fault-family is node-based. this step avoids the fan-out wire problem ive mentioned previously. -4 side effect of using fault-families rat her than faults is t hat multiple faults can be sirnulateci in one PE. Since each fault-family represents a number of faults. when more 1 EVALL'ATIOS 1 ISSERTIOS 1 Gate Status I 1 - - The gate is not current ly affected by any fault S - One or more fault-families propagat e t hrough this gate - N The gate originates the primary effect of a fault -famil- S N The gate is affected by more t han one fault -familv

Table 7.3: Interpretation of Flags in the Event-List than one of these faults needs ro be simulated. the fault-family can he inserted only once. thus freeing up PEs and thereby allowing more faults to be simulated in one pass.

Fault Insertion Using the Event Heap

Multiple fault insertion is done using the erent hcap introduced in Chapter 5. Recall t hat r he e\-ent-heap is a random-in-sorted-out data sr ruct ure used to ensure correct gate e\-aluation order. When each gate is etaluated. it's output t-alue is checked against the expected good value. If the output node is erroneous. t hen al1 gates connecting to that node are added to the heap for further fault propagation. Ac- companj-ing t his heap srruct ure is an arraJ- t hat is used to niake sure t hat each gate is added to the list once only in each pass. This anab-. named e~sent-list.must be modified to support fault-parallei fault insert ion. Recall from Chapter 5 that the event-list ivas used for storing Boolean values. tvhere a 1 indicates that a gate is alread~.in the heap and a O othertvise. )ou can view this as a byte with bit number O indicating the Boolean SEED-EI 1-1 L C:-1TIOX. In simLfps. this data structure is extended with bit 1 indicating a new Boolean .\'EED-ISSERTIOS. During gate eva1uations. when a gate with either bit set is popped (fetched) from the et-ent heap. the following fault insertion action can be carried out.

initialize heap ; for al1 triggered fault-families do C set NEED-INSERTION bit insert fault family into heap 1

while (j = pop heap) // get the first element in the heap

if (NEED-EVALUATION bit is set) € Evaluate-Gate (j) ; >

propagation = f alse;

if (NEED-INSERTION bit is set) < Insert-Fault-Family(j); propagation = true ; >

if (NEED-EVALUATION bit is set and NEED-INSERTION bit is not set) C if (output node is erroneous) propagation = true ; else propagation = f alse ; 3

if (propagation) r add f an-out gates to the heap with NEED-EVALUATION bit set;

The above code fragment dernonstrates the use of the bit flags. Before the next round of simulation starts. each fault-family is checked to see if it contains any member fault triggered by the test pattern. The node number of the fault-families that need to be simulated are added to the heap. and the corresponding byte in the event-list has the NEEDISSERTIOS bit set to 1. This is the Pr~par~Heapprocedure shotvn in Figure 7.1. Fault simulation may noir begin. The first element in the heap (the one with the Ieast gate number) is popped. If this gate needs to be evaluated. then node 25 0)110111 / 0 1 0111111( 1 O I I The only PE with fault inserted O Token 0101 1101 1 O Y. \ . x - ( pointer-token correspondance) \ -.

Pointer .2~

Fault-family linked-list

Figure 7.J: Fault Insert ion Kitli the Token-passing .\lethocl a gate eialuation is carrieci oiit. Theii tlie SEEDJSSERTIOS bit is checked. If tliis Hag is set. tlie faiilt \vil1 Le inserted. Faiilt insertion cari Ije clone ii-itli a simple Esclusive-OR operatiori n-itli one single PE active. Hoiv tliia ii doiie ivill Ijr esplainec! in niore ilet ail in t lie nesr paragrapli. If a faiilt insert ion took place. tiieii it is certain tliat tlie fari-out gares of the ciiri-ent gare iieecl to he eialuatetl ao t hat t lir effect of the insertecl fault can be propagated. Otherivise. the node valiie is checked against the faiiit-free valiie. If t lie node is erroneous. nieaning t hat it is affectecl by a faiilt . t heri faiilt propagation must also be perforriied. Finallj-. if fat11t -piopagat iori is iieecld. the fan-out gates will be acldecl to the heap tvith tlie SEED-E\'--\LI.-.\TIOS hir set. These fan-out gates ma>-already he in t lie heap witli the SEEDlSSERTIOS bit ser: iii wliich case. botli bits ivill be set for the gate. \\-hm sucli a gate is popped from - the heap. botli gate-etaluatioti and fault-iiisertion will he tloiie. Table 1 .:3 describes these fiags aiid their implication.

\\'lien C'mRr\'\l is iised to implemerit t lie faiilt-parallel faiil t siiiiulat ion algori t lini. eacli PE represents a fault-family. Kheii a faiilt is inserted. the corresponding faiilt- family is inserted into one PE only. (;il-eri a faiiir-famil>-. ive iieed to knoiv ivhicli PE is representing it before fault-insertion can take place. One wa>- to do this is to associate each fault-farnily ivit h a P Eid so t hat a PE cm be selected mhen desired. Selecting one arbitrary PE out of al1 the PEs. horvever. is a ver- time consurning process'. In order to reduce this overhead. a token-passing method is used. Before fault simulation. the fault-families needing to be inserted are stored in a list in order of ascending gate number. mhich will be the same as the order in which they ni11 be inserted during fault simulation. .A pointer is initialized to point to the beginning of this list. In addition. a cboolean variable is initialized with al1 bits set to O except for the first bit (PEO). which will be used as the first instance of the token. During fault simulation. when fault-insert ion is encountered during the fault simulation loop. the PE wit h the token is used to represent the selected fault-farnily. After this fault- insertion. the pointer is adranced to point to the next fault-family in the list: this is accomplished simply by shifting the cboolean variable to the right by one bit. This procedure is repeated until al1 the gates in the event-heap are evahated (and fault inserted ). Figure 7.5 illust rates t his t oken-passing met hod.

Fault Dropping

Fault detection and fault dropping are done after each pas of fault simulation. Recall that ivhen more fault-families need to be simulated than the nurnber of PEs. more t han one pass is also needed: i.e. for each test pattern being simulated. the fault dropping procedure ma? be called more than once. The procedure is called multiple times so t hat the CoRASI can be cleared for the nert pass. .Ifter the current pass of fauit-simulation. CoR.411 id1contain a set of cbooleans representing the state of the primary outputs of the faulty circuits that were simu- Iated in parallel. At the beginning of fault-dropping. the values of these cbooleans are compared to the fault-free values. and the result of t his cornparison is collected into one single cboolean. where a 1 will indicate a PE with a detected fault and O ot herwise. Finally. role-calling is used to determine which fault-family is actually detected. Rolc-calling is the process where the corresponding PEid is returned when a PE contains a -dirtym*bit in a specified cboolean [Y]. When role-calling is first called with a pointer to the cboolean variable in question. 0(log2n)binary search

50(10g2n)CmR.AM operations are needed for each selection. ivhere n is the number of PEs in total. I 1 simf 1 PPS 1 simf/PPS 1 FPS 1 simf/FPS 1 Circuit 1 time tirne speed-up 1 time speed-up

1 c3n1 061s 0.190s 1 3.:31 1.7.56s . 0.36

Table 7.4: Result Comparison of PPS mith FPS is carried out. and the PEid of the first PE that contains a 1 in the cboolean is returned. In subsequent calls. the nest PE that contains a 1 will return. L\-hen no more PEs contain a 1. -1 is returned as the PE number. In this rvay al1 PEs can be polled efficient lu. In our implementation. an extra feature is added to further speed up the role- calling process. Before going into a binary search. the function first considers the PE nesr to the right of the PE returned in the previous call. If this PE also contains a 1 in the cboolean. then the PEid of this PE is returned without going into the binary search. In the rvorst case. rvhere the cboolean contains a bit pattern such as 01010101010101 .... this heuristic is slower. However. in most of the cases. especially when the 1's are often clustered. this method should good performance. We foiind that 1's were often clustered in this ivay for our benchmark fault simulations when the fault coi-erage is loir. For each of the PE numbers returned by role-calling. the fault dropping routine will search for the corresponding fault-family. Each memher fault of the fault-family is again checked against the t riggering condit ion. If the member is triggered. the fault and the faults implied hy it are marked as detected. The detected faults are then dropped from the fault list. If the fault-family does not contain an' more undetected faults. it ivill be dropped from the fault-family list. 7.3 Evaluation

The fault-parallel fault simulation algori t hm is more complicated t han the pattern- parallel algorithm. A large portion of the program uses only the CPU to run (such as fault-free simulation and heap initializat ion and loading). Many subrout ines use both the CmRAM and the CPU (such as fault simulation and fault dropping). In order to evaluate the speed-up of the simulator. two measurements were taken: the amount of CPL time needed to run fault-free simulation. heap preparation and the other sequential processes common to the other simulators. plus the CmRASI run time. It is assumed that the CaRAlI run time is adequate to represent the run tirne of those t hat are dominated by CdLW operat ions.

7.3.1 Test Results

Table 7.1 compares the result s of pat tern-parallel fault simulation ( P PS ) wit h t hose of fault-parallel fault simulation (FPS). It can be seen that in most of the cases. simffps is worse t han the non-CeRAlI version (with speed-up ratios of less t han 1). On the other hand. simf-pps is at least 4 tirnes faster than simJ One interesting exception. however. is the c628Snr circuit. Ir is the only case that runs faster than the convent ional version. It should be noted that only 256 test patterns were simulated for the c62SSnr circuit. This is because the circuit is so easy to test that roughly that many patterns are needed to achieve more than 99% fault coverage. For al1 the other circuits. at least 1024 test patterns were simulated. For the rwo -bard- benchmark circuits. namely c2670nr and cT.5.32nr. 236k test patterns were used. The results shown in Table 7.4 here agree with our theory. stated in Chapter 4. that fault-parallel fault simulation is useful when there are a lot of undetected faults. Another reason for the disappointing performance of simf-fps is due to inefficient fault-free simulation. In Table 7.5. the simulation times of sirnj-jps are broken down inro two pieces: the CPC portion and the CoRAlI portion. It is shown that more than 90% of the simulation time is spent in the CPC. Part of the CPV time is used for fault-free simulation (from 40% to 80% in Table 7.8). This is because only one pattern is evaluated during fault-simulation. .A possible way to soive this problem Circuit patterns 1 C*R.-\.\Iti~ne 1 C'PU time cl3-n 1024 0.0:35s( 2.OX j 1 1-72ls( %-O% ) 1 cl90Snr :3328 012s O 1 12,4Ss(99.0%i i

Table 7.5: Analpis of the Fault-Parallel Fault Simulation Resiilts

Table 7.6: PE Vtilization : .A\-erage Kiinil~erof FFs Siniulared Per Pas is to Iletter utiiize tlie C'PLv's bit-parallel datapatli to siniiilare 32 test patterns at once. Another possible wa~-is to use the C'oR.-\.\I to siiriulate iiiany test patterns in parallel. Eit lier of t tiese two inet liotls siiggests t lie iise of a Iiyljritl sclieni~coiiihiriing the pat terri-parallel and faul t -parailel approaclie-.. In the preïioiis chapter. ive einliiated siiri?f_pp.i iising tlie PE ut ilizat ion nieasuse. In sin,/-fps. the corresponding PE ut ilizat ion is riieasured b~-calciilat iiig t lie average niirnl~erof faiilt-families sirnolatecl per iiniiilation pas';. The resiilt .: are ihown in Ta- . . ble 7.6. \\è see thar PE utilization for simJ.jp.5 is ver>-close to tliat of s,rrtj-ppa. For the circuits simulated [vit li niore par terni ( cIYOSiir. c2670nr. c7557rir ). PE ut ilizatioii in fault-parallel fault sim~ilatioriis clearly less rliaii that of pattern-parallel fault sim- ulation. For the easy circuits witli not so man? patterns simulated (ci:353rir. c5315nr. cû2Sdnr). the PE utilization ratio in fault-parallei fault simiilatioii is iip to 37 rinies greater t han t hat of pat tern-parallel algori t hi. Once again oiir resultc give iis st roiig 1 Circuit 1 Without DFS 1 With DFS Speed-up Speed-up (CORAM) 1 cl:355nr 1 1.7.56s 1 2.023s 0.81 1.O0

Table 7.7: Effect of Depth-First-Search from Output Optimization support to justify the development of a dynamic hybrid fault simulation method to take advantage of the good PE utilization at the beginning of the simulation. when t here are man- easy-t o-detect faul ts rernaining.

7.3.2 Optimization wit h Fault Grouping

In !-14]. the PROOFS fault simulator was described. PROOFS uses the 32-bit data- pat h in the CPC to perform fault-parallel fault simulation. In addition to the basic design of the simulator. the authors also describe a waF to optimize simulation speed. called the fault grouping technique. It was found that fewer gate evaluations are re- quired when faults are grouped using a d~pfh-first-searchfrom the primary outputs. Tivo versions of the simf-f's simulator were designed: one with the fault grouping

- C technique and one without. The results are shown in Table 4. L We can see from the table t hat using the fault grouping technique does not greatly reduce simulation time (only a slight improvement ivas obtained for c62S8nr and c7.5.2nr). Sloreover. it takes up more CPL- time. The primary advantage of fault grouping is that faults (or fault-families) with similar output stems ivill be put closer together so that the benefits of simulating them together can be mbvirnized (by propagating the faults to fewer gates). In PROOFS. when there are thousands of faults with only 32-bit parallelism. the simulator will have to run many passes to finish fault simulating a single test pattern. Having groups of 32 faults with similar output stems is easily achievable. and it is therefore a good technique to use in PROOFS. In simf-fps. however. man' more faults are simulated in parallel with less 1 1 I Faul t-free / Fault Triggering 1 Circuit Patterns C'Pr operat ions operat ions c1353nr 1024 0.782 0.300(38.4% ) 0.481 (61.6%) i c 190Snr 3328 4.412 2.211(-50.3%) 2.1 72(49.27( ) c2670nr 256k 356.684 202.437(56.S%) 154.247(432%) c3540nr 1.536 . 4.229 1.$:31(43.3%) 2.398(36.7%)

Table 7.S: Fault Triggering Detection Time in FPS passes per pattern. and the benchmark circuits are not very big. The achie~ablespeed improvement is therefore insignificant . Mthough the advantage of fault grouping is not obvious here. it may be useful for the hybrid fault simulator. which will be discussed in the next chapter.

7.4 Discussion

.A ver? ob\-ious bottleneck of the fault-parallel fault simulator is the fault-free simu- lat ion. -1s ment ioned abo\-e. ive could improve the simulator's speed by simulat ing the fault-free circuits in a pattern-parallel fashion. Another bott leneck. which is also caused bj- single pattern fault-free simulation. is fault-triggering detection. which !vas not discussed in the pret-ious sections. In the pat tern-parallel fault sirnulators. not onl?. is simulation (both fault-free and fault inserted) done in parallel. so is fault trig- gering detection. For a simulation with 256k test patterns using simf-pps. each fault is checked at most 2.56 t imes for fault triggering detection. However. in simLfps. t his process has to be done for each test pattern. which means up to 256k such checks have to take place. From Table 7.d. ive can see that the fault triggering detection time can take up as much as 60% of the C'Pr run time. One possible solution to rernove the detection bottleneck is not to perform fault triggering detection at all. which would involve redesigning part of the fault simulation engine. Anot her observation can be made regarding fault implication. Fault implication was mentioned in Chapter 3 and n-as implemented in bot h simf and sim f-pps. Speed irnprovement \vas achieved in t hese implement ations. In sim f-fps. fault implication is realized among the members within a fault-family. However. this stratw does not reduce simulation effort as much as it does in the pattern-pârallel fault simulators. One reason is that the list of fault-families was generated once only even if a test pattern requires multiple passes. After each pass when faults are dropped from the fault Iist. some fault-families in the next pas may no longer need to be simulated due to detection via fault implication. However. there's no mechanism right now to take ad\antage of this feature. It is not clear at this point whether an irnplementation of bet ter fault implication can significantly speed-up the fault-parallel fault simulation. It is thus another interesting issue for future research. Chapter 8 Hybrid Fault Simulation on

8.1 Overview

In Chapter 6. ive described a simulator that implements pattern-parallel fault simula- tion using PPSFP as the basic algorithm. It was found that this sirnulator is generallx efficient. although it doesn't always utilize CeR.A.\I PEs very weil. In Chapter 7. a fault-paralle1 fault simulator was described. The simulator is generally too slow be- cause of the inefficient fault-free simulation process and fault triggering detection. Severt heless. the simulator demonst rated t hat under certain conditions it can run fasrer t han the pat tern-parallel fault simulator due to relat ivelj- high achievable PE utilization. .A hybrid of these tio simulators will be described in this chapter. Lie called it simf_hps for Hybrid-Parallel Fault Sirnulator. The hybrid-parallel fault simulation sclieme can be further categorized into two different schemes: static-h ybrid fault simulation and dyna mic-h ybrid jaulf sirn ulation. A static-hybrid fault simulation scheme is one where the fault-to-pattern ratio' in each round of a simulation rernains unchanged throughout the fault simulation of the whole circuit. On the other hand. a dynamic scheme is one ivhere the fault-to- pattern ratio in each simulation pass may be adjusted from one simulation round to another. The difference between the tivo schemes is illustrated in Figure 8.1. Each charr here contains a typical fault dropping curve. The li ttle t iles representing fault- simulation rounds must be used to cover up the area under the curves. The area under the curve represents the lower bound on the number of pattern-fault pairs that

lSumber of faults sirnulated in parallel versus number of patterns simulated in parallel.

103 Number of Applied Test Patterns (a) Fauit Dropping Cuve of c 1908nr

Number of Applied Test Patterns Number of Applied Test Patterns (b) Static Hybrid (fixed tiles) (c)Dynamic Hybrid (adaptive tiles)

Figure S. 1: Static-Hybrid Fault Simulation versus Dynamic-Hybrid Fault Simulation must be simulated. -411 of the tiles have the same area. rrhile the different shapes mean different fault-to-pattern ratios. The cune in Figure Y -1( b) is covered by tiles of the same shape representing a static-hybrid fault simulation. ivhereas the one in Figure d.l(c) represents a dynamic one using tiles with different shapes. The static method requires 9 t iles to complete the t ask. ivhile the dynamic one needs only 6 t iles: this implies that the dynarnic scheme is sped up 1.5 times wit h respect to the static scheme since 3 fewer fault-simulation rounds are needed. The simpler static-hybrid met hod rvas implemented first in t his research. The dynamic-hybrid fault simulator is a modification of the static-hybrid fault simulator with an additional routine that adjusts the f (number of faults simulated in parallel) and p (number of patterns

ioa sirnulated in parallel) dues.

8.2 Design Details

Figure 8.2 illustrates the flow of a fault-simulation round that applies for both static and dynamic hybrid-parallel fault simulation. After the next set of patterns has been read and fault-free simulated in CmRAM. a new fault-simulat ion round begins. In each round. the program tries to fil1 up a simulation queur for simulation. The simulation queue stores pointers to the /au&-groups that will be sirnulated in the next pas. When a queue is filled up. fault simulation begins. The process OF fault simulation inchdes both fault evaluation and fault insertion, as described in the previous chapter. At the end of a round. the node values at the primary outputs are compared to the corresponding fault-free values. Each fault-group in the simulation queue is then checked for detected faults. If al1 the faults in the fault-group have been detected. then that fault-group is deleted from the linked-list of fault groups. -4 fault-simulat ion round ends after simulating al1 the triggered fault-groups. In a dynamic-hybrid fault simulation. an addit ional step for adjusting the fault-to-pat tern ratio can be added as shown in the flow-chart with dashed-Iines.

8.2.1 Partitioniiig CaRAM Memory

The first important concept for implement ing hyhrid-~arallelfault simulators on CoR;\.\I is fi R.4.V partitioning. We can logically divide the CoR.411 into P par- titions. each with the same number of PEs. so that different partitions can handle slighrly different jobs. For simplicity. the number of PEs in CaRr\.\l is assumed to be a power of 2. In the diagram shown in Figure 8.3. the CmRr\kI is logically di- vided into p = 1 partitions. In order to proïide access to individual partitions. a partitionmasli CeRAJI variable is introduced. The partitionmask variable is basi- cally a cint with the number of bits equal to P. the number of partitions. Each row of the variable is a mask of the corresponding partition. In the diagram shown. the black stripe represents a row of 1s. With this partitionmask set up. ive can easily select an- combinat ion of the partit ions for separate operat ions. hother CORAMvariable used in this program. which is related to the CaRAIl partitions. is the P Emask. It ------

I Adjust the 1I I fault-to-pattern ratio 1- - 7 F I k,,,,,-----for the next session l I S imQueue I / I I

Check Triggering Condition for the Next FaultGroup - Fault Simulation 1 \ J I

Add FaultGroup to SimQueue 1 / Check and Delete [~aultsin ~ault~roup1 SimQueue Full?

Empty ?

1 Delete FaultGroup 1

Figure S.?: Flow Chart of a Fault-Simulation Round CaRAM with p = 4 logical partitions:

Partition mask

1 Partition O 1 Partition 1 Partition 2 1 Partition 3 I ault group A Fault group B Fault qoup C pl I 1PEs

Figure 8.3: Logical Partition of CeR-411 allo~vsthe select ion of one PE out of each partition. In fault-simulation. there are at least tivo tvaxs to use the partitions. One way is to have each partit ion perform fault-parallel fault simulation wi t h one particular test pattern. \.je call this the fauit-oriented method because it is similar to purely fault- parallel fault simulation. The ot her ivay is to have each partit ion simulating the same set of test patterns and one particular fault. This we call the pattern-oriented method. if-e chose to implement the second method for three reasons: First. since the pattem- orifrlted m c thod resembles pat tern-parallel fault simulation. the design is much more elegant and more easily understood. Second. fault triggering for transition faults in\-olves comparing the currenr and previous test patterns. In sirnjlpps. a method was in\-ented so that only one shift in CeRAN is needed to perform the cornparison. ivhereas in simLfps. a lot more C.R-4.\1 memory copying is required. Finally. we have already seen t hat simLpps can outperform simf-fps in most cases. In a static-hyhrid fault simulation. CmRAlI partitioning is done only once at the beginning of the program. On the other hand. a dynamic-hybrid fault simulator can adjusr the fault-to-pattern ratio by changing the number of partitions. This can be easily done by calling a Set Partition routine. The Setpartition routine sets up both the partitionmask and the PE-mask variables in CeR.-\.\I. Both variables can be implemented using the PEid feature of the CoR;\.\I. PEid is a cint variable that holds a different value (identity) for each PE. This value is simply the arraj- index of the PEs (O .. nj. Partition ID Partition ID O 1 10 Il 000 001 010 011 100 IO1 110 111

PEid - 16 PEs PEid - 16 PEs 4 logical partitions 8 logical partitions

Figure d.4: Partitioning With PEid

L\*hen creating the pnrtitionn~nsli.note that the rnask can he created using the most significant bits of t,he PEid variable. which is unique to each PE. Figure 8.4 illustrates a CoR.-\.\I with 16 PEs di\-ided into 4 logical partitions and d logical partitions. It can be seen that. in each case. the .\I most significarit hirs ( JI = log2P) of the PEid 1-ariable are the same for each PE in the same partition. which can t lierefore he used as r lie correspoiiding partit ion's port it iorz-I D. Eacti partit ion's niask caii tlius be constructed in .\I steps. The PE-n~skcm he crearetl in similar fasliion. Howei-er. a miicli faster rnetliod is to recognize rhat there is a row of bits in PEid. naniel!. the one ~orres~oiidingto the least significant bit of the p«r!itiotil D. ivhich can be used to set up t.he PE-m«ak in three steps as follows:

PEid .operate (PEid.bits-nMSD, op-mbar, writex) ; // load a bit in PEid PE-mask.operate (0, op-x, shiftright); // shift bits to right PE-mask.operate (0, op-x ' op-y, groupurite); // exclusive or

The first step loads the interesting row of bits inverteci into tlie s-register. Then. tlie bits are shifted to the right and witteii to the y-register. Here we assumecl t liat -0' is going to be shifted in from the left at the leftmost PE. Finalll-. the two registers are esclusi\-el' OR-ed to get the mask. This optimization does not affect the

C'mR;\SI run-t ime \-er>. mucli because it 's not a highly repeateil step. Severt heless. tve descrihed it here for future reference.

LOS Linked-Lists of Faults Nul1 pointer

v A may of Fault Groups (Modified Fault-Fd@ bidirectiond lided-list for fast access and sorting 1: Data structure used in pattern-paralle1 fault simulation (as a singly Iinked-list) 1+2: Data structure used in fault-parallel fault simulation 1+3+3 : Data structure used in hybnd-parallel fault simulation

Figure 8.5: Data Structures Supporting Fault Grouping

8.2.2 Modified Fault-FamiIy Fault Simulation

The pattern-parallel fault simulator used a simple fault list data structure. In fault- parallel fault simulation. a more complicated data structure called a fault-family Ras implemented. as described in the last chapter. Yet another data structure is required for hybrid-parallel fault simulation. The neiv structure. which kve cal1 a fault group. is a modification of the fault-family data structure. Sote that this is different from the fault-group mentioned in the PROOFS paper [ZI]. In simLJps. only one test pattern \vas simulated in a pass. In that case. only one of the two fault-families associated wirh a node will be inserted for simulation at any 1 Circuit 1 HPS 1 HPS + DFS 1 Speed-up ]

Table 8.1: Speeding-up HPS by Using Depth-First-Search Grouping time because one test pattern cannot set a node value to both 1 and 0. This is not the case in hybrid-parallel fault simulation. Since more than one pattern is simulated in each pass. a node value can be set to both O and 1 in the same simulation pass by different test patterns. which in turn can possibly trigger faults in both fault-families of the same node. Kouldn't it be nice to simulate both fault-families in the same partit ion. sa\-ing up one partit ion for or her faults? The fault -group data structure primaril). designed for this purpose. .A complete view of the data structure hierarchy is shown in Figure 8.5. .-\t the outermost level (le\-el 3). an array of fault-groups is allocated with the same number of elements as the number of gates. Each fault-group is associated ivith a gate number. The faults represented by a fault-group are al1 the faults that have prirnary effects on the output of that gate. These faults are organized into two fault-families in each fault-group. One farnily contains al1 the faults that will force the gate's output to 0. and the ot her wit h al1 the faults t hat force t hat out put to 1. Wth this data structure. al1 the faults rrith primary effect on a gate can be found ver?. efficiently given a gate number. The array of fault-groups is furt her organized bu a bi-direct ional linked-list . This structure is useful for delet ing detected fault-groups efficient ly after fault simulation. It also provides a way to sort the fault-groups in an order different from the gate- evaluation order. allowing the use of optimization methods such as the depth-first- search from output method described in [?a]. The simulator was implemented with a swit ch to enable or disable the dept h-first-search gate ordering met hod. other partitions other partitions Faul t-gr ou^ Partition

o~~~I~o~o~~~o~o~I~o~~o~~oNode value

. - 1 i ftIA4l:IA! Fault-Family O 1 j 1 I i Fault-Family 1

Figure d.6: Simulating iVith Fauit-C;roups in a CoRAAI Partition

The met hod of applj-ing depth-first-search from the primary out piits to sort gates iras introduced in the PROOFS paper !24!. When ive appliecl it in the faiilt-parallel fault simulator. t liere ivas no significant adtantage. i\é coiicluded t liat t lie niet hoc1 should onij- be applied when t lie iiuniber fatilt s sirnulated in parallel is relat i\-el!- small. Table S. l shows that tliis met tiod can improl-e the perforniance of the stat ic- hyhrid fault simulator sliglitly. niost of the time. The results were ohtained with four CoR.A.\I partitions. The depth-first-search niethod Iras therefore used in al1 of our hybrid fault simiilators. Figure S.6 shows a C'oR.411 partition witli a fault-poiip iriserted for fault simil- lation. In this diagram. n-\-e can see that a CoR.-\.\I partition is capable of siniulating t lie t wo fault-families associated wit h the fault-group ar t lie sarne t inie. The prirnaq- effect of the faults associated with tlie faiilt-group can I)e simulated hj- siriiply in- \-erting al1 the bits that gives the node \alue represented b!- the faiilt-groiip. The p«rtitionma.ik is used here to select only the partition siniiilatiiig the fault-group affected IIJ- the insert ing operat ion (siniply Esclasive-O R t lie partit ion -n~nzkand the original node t-due). Circuit HPS HPS with CmRAMJIapper Speed-up 1 cl355nr 0.11 1s O. 104s 1 .O7

Table 8.2: Speeding-up HPS by Csing CaRA.\I hlapper

8.2.3 Hybrid-Parallel Fault Simulation

The fault simulation process is very similar to fault-parallel fault simulation described in Chapter 7. Before fault simulation takes place. each fault-group is checked to see if it's triggered. If it's triggered. it is added into the simulation queue. which is basically an array of pointers into the array of fault groups The number of fault-groups in the queue is kept in a counter. -4s fault-groups are added to the simulation queue. the gate nurnber of the group is also pushed into the sorted €cent heap and the SEED-IXSERTI0.Y flag is set at the corresponding location in the event list array. When fault simulation is finally carried out. gate evaluation and fault insertion are based on the flags in the event list. follou-ing the same procedure described on page 93. There are two irnplementation options for the fault simulation data structure. One option is to use a structure similar to that used for pattern-parallel fault simulation. with a large cint variable whose size in bits equals the number of gates in the cir- cuit. This method is easr to implement. but a lot of copying must be done to keep the simulation cint ( riruxzlue) synchronized with the cint containing the fault-free \~~l~es(pod-value). -4 second option is to use the CaR.4Il mapper. which allocates and frees cbooleans whenever necessary. This option can saïe copying time. however it is more complicated to implernent. Both options were implemented for this simula- tor. and the difference is shown in Table S.?. \Ve can clearly see t hat using CoR;\lI mapper is more efficient. Since bot h sirnf and simj-pps irnplement the first option. the hybrid fault simulator has the first option implemented so that a fair comparison f 7 Fault Simulation

Break to simulation

rnember Iist

Figure 6.7: Fault Triggering/Detection llechanism between the three sirnulators could be carried out. The results are discussed later in Section 8.3.

8.2.4 Fault Triggeriiig and Fault Detection

\\*ben the fault r riggering and fault detect ion mechanisms were designed. the prirnarh- goal was to reduce the arnount of iinnecessary checking as much as possible. -4 design \vas used that allows the fault-group to branch to the fault simulation routine as soon as it is determined that some faults in the group are triggered. a design that has the ability to resume checking from the fault after the fault that got triggered during the simulation. This is shown in Figure 8.7. Here. the rnember list of each of r he fault-families is traversed to check for triggering conditions. Wit hin each list. if a triggered fault is encountered. the traversing is stopped and the fault's position is saved. The fault-group can then be added to the simulation queue. After fault simulation. scanning t hrough the member list picks up again at the triggered fault. In this second phase. al1 triggered faults (including the first one) rd1 be deleted from the list of faults for the current fault group. With this type of mechanism. each fault in the fault list is visited at most once. Our design minimizes wasted triggering detect ion effort or tvasted fault simulation effort. To implement Our design. three routines were required. namely, Triggered. Deted- Fault. and CreateMask. Triggered and DetedFault each perform one pass dong the current fault list. CreateMask is a Boolean function that takes a fault as input parameter. If the fault is triggered. a true value is returned. otherwise a false value is returned. In addition. a cboolean mask is created. If a pattern is capable of triggering the fault. the corresponding PE bit in the mask is set to 1. This function is used in both Triggered and DrtedFault for checking the triggering condition of faults. When a fault is triggered. both the position of the fault in the fault list and the corresponding cboolean mask are saved. The pseudo code for t his implementation is as follows:

while (current fault-group is not NULL) r

initialize empty simulation queue ;

while (simulation queue not full)

if (Triggered (current fault-group))

append current f ault-group to simulation queue ; advance t O next f ault-group ; if (current fault-group is NULL) break; 1

if (simulation queue is empty) break;

perf orm f ault simulation;

for each fault-group in simulation queue

DetectFault (f ault-group) ; t

After fault simulation. Detect Fa ult is called to continue the checking. The saved cboolean mask is first checked against the simulation result wit h the logical SOR operation. wbere result is also a cboolean. Bits in result are set to 1 if and only if the fault-free and faulty output values differ. The fault list traversal then continues with additional steps t hat check for fault detect ion and that drop faults from the fault list . MarkDetected (page 67). which takes advantage of fault implication relat ionships. is used to accelerate the fault dropping performance. Since we use the fault group data structure. which collects multiple faults and represents them rvith their prima- effects. to represent faults. it is necessary to make sure that the primary effect of a fault is triggered before fault simulating it. Primary effect checking is necessary when the site of the fault is one of the input rvires of a gate wit h multiple inputs. and t hat the mire is a fan-out wire. -411 of these checkings are done in the CreafeMask routine.

8.2.5 Dynamic-Hybrid Fault Simulation

In order to design a dynamic-hybrid fault simulator. one must define a function for determining the fault-to-pattern ratio. There is an esample of such funct ion in [Hl. From tliis esample. ive can see t hat it is important to find a cost function that can be used to accurately estimate the run time. The cost funct ion t hat ive de\-eloped for our h>-bridfault simulation algorithm can be stated as follows:

= FST=nr+iqpercd Cost per ROURd GST + C'-\Ir * n uridetect& + i GST = Good Simulation Timf C*.\IT = C'reute.\lask Tinlr FST = Faulty 5'irnulution Tirnt (S. i ) riundetected = number of und~tect~dlau/ts n,,,,,.,,d = nurnber of triggcred faults f = number of faults simulated in parallel il'here cost is in terms of the number of CeR.411 clock cycles. The Good Simula- tion Time is circuit dependent. It is proportional to the total number of gate inputs

(n,,,,, j plus the total number of gates (n,.,.). It is quite hard to obtain an exact Creat ehsk Time. However. by analyzing the source code of the Create.\Iask routine. it can be estimated as Where 3 and 8 (op = CeR-441 operations) are the values of average cost for fault triggering (arg(ttrZgg,,) ) and the set up cost of a gate evaluation ( t,,,,). respectively on our VltraSPARC 5 workstation. The ratio ni,p,t/~,,te represents the average number of gate inputs per gate. Xote that the above equation does not take into account the fact that some faults do not need a gate eduation for checking the primary effects. This can be estimated by multiplying the average number of CORAM dock cycles for primary effect checking by the fraction of faul ts ( rPejf.,,)t hat need prirnary effect checking (or. the fraction of wires t hat satisfy the primary effect checking conditions). Both GST and CMT can be determined before fault simulation takes place. The Faulty Simulation Time per pass ( FST) hoivever. depends on the nature of the faults being simulated. This is because fault simulation only evaluates the gates afFected by the faults. ive used the folloiving approximation:

nurnber of gate ecaluations in fault simulation in prei*ious run FST = (S-3) number of passes in the prerious run

ilsith CST. CMT. and FST defined as above. Equation 8.1 can be used to estimate the number of CmR-411 clock CJ-clesneeded for a fault simulation round. Our goal is to find a fault-to-pattern ratio that minimizes the cost of the nest two round. Khen determining the next fault-tepattern ratio. only three scenarios are con- sidered. namely. doubling p,,., (the previous pattern parallelization factor). keeping p,,, , unchanged. or halving p,,,,. It is necessary to determine the total cost to fault simulate the nest 2p,,,, test patterns in al1 three scenarios. The cost for running one. tu-O and four rounds are thus calculated.

Sote that there are two more variables in Equation 8.1. These variables. n,,d,,,,,d and n,,,,.,.d. are not knomn in most of the cases. In order to estimate these values for evaluating the round cost estimate. it is assumed that the fault dropping rate. d. and fault-group triggcnng ratio. t. are going to be similar for the next 2pp,,, test par terris. Thus Test Patterns SimuIated

Figure d.S: Choosing the Best Fault-tepattern Ratio

wliere Dl is the current number of undetected faults after the last round. Do i.; that nimber hefore the last rouncl (the last pPrc, test patterns). and Tl il the ntimber of fault-groups triggered in the last round. I'sirig Equation S.4. the values of DLj.

D2. and otliers shoivn in Figure Y.S caii Ile estiniated. Tliese \aliles are t lien iisecl

to calculate the total round cost for t lie tliree alternative fut lire sceiiarios. If pp,,,,:

is already at a predeterniined rnasiniurii (mininiuni) \-due. tiieri doubling 1 hali-ing) pur.:. is not considerecl. The sceriario tliat iniplies tlie miriirriuiii covt is t heii choscn as t lie fault-to-pat tern ratio for the nest round.

(S.7.I For t lie doubling p,,,, scenario. Equat ion d.5 is used. In t his scenario. p is doul)led and only one round is needed to siniulare tlie nest ?p,,,, test patterns. The cost can thus be estimated ivith Dl ancl Tl. 111 a scenario ii-liere ive keep 1) ~inclianged.tiw rounds \ri11 be needed to simulate the nest ?p,,,,. test patteriis. In Figure 5.8. LI-e can see that r he first round ivill simitlate froni Dl to DL.aiicl t lie nest rouiicl carries i I FauIt Simiilat ion Tinie in Seconds and S~eed-C'D 1

Circuit ' simf-pps simffps static hps 1 dynarnic hps 1 cl355nr 0.190s (3.3'1) I.7.56~(,0.36) 0.10Ss (5.93j / O. 108s (5.94) / 1 c190Snr 09s5-77 I 12.61% (0.11) 0.221s (6.7s) 1 0.196s 17.62) I(I c26iOnr 6.67-5s ( 10.19) 136.3.713s (0.04) 11.ïOSs '(5.~1') 7.256s (9.37) c3510nr 0.3.33s (4.71) 12.956s (0.13 j 0.390s (1.33) 0.439s (:3.55) I c.53ljnr 0.443s (4.63) 11.41s (O) 0.466s (4.40) 1 0.459s (4.47) 1 c62dSnr i 1.730s (5.46) 3.141s (3.01 j 0.52.j~(lJ.00) 0.517s (18.26) 1 nr9.170s (.S.YS) 3971.430s i0.0.11 j 20.636s (3.72) , 8.Y.50~(8.68) 1

Table S.:3: Final Results

on from D2up to D3.The cost of this scenario is tlius the sum of the cost of these two rounds. DI.Tl. and p = p,,,,. is substituted into the Equation S. 1 to calculate tlie cost of the fissr round. If tlie fault tlropping rate ancl the faolt triggering ratio are unchanged

8.3 Evaluation

.As nient ionecl in t lie prei-ious section. niost of r he ru~i-time is spent in goocl siiiiulat ion. creat ing fault simiilat ion niasks (C'reute.\lu>k). and fault i-in~uiarion. These are t lie niajor actii-itiesin oiir fault simulation algorithni. In dyiianiic-li>-briclfaiilt siniulation. addit ional t inie is needed to perforrii t lie ratio calcular ions and to create new partit ion masks. Table S.:] lists the performance of al1 the siniiilators e\aluated in t his thesis. The best resiilt for eacli circuit is highlightecl ivitli holcl-type font. The non-dfs version of .sint/-jp? is used for obtaining the fault-parallel fault simulation results. On the ot her hancl. hot h the static and the dynamic simulators ha\.e dept h-first-search enabled and the CaR-431 mapper disabled. Four partitions !vit h 256 PEs eacli were iised for

'The version of sin1f-jps tliat do not use t lie Depr li-First -Searcli-Ironi-priniary-ou t pu t lieiirisl ic. the static-hybrid fault simulator. while 16 partitions (64 PEs per partitions) were used initially in the dynarnic one. These partition sizes were selected after some esperimentat ion. From the discussion in Chapter 1. the hybrid fault simulators would be expected to run faster than al1 the other ones. However. this is not always the case according to the results in Table 8.3. This can be explained by the following reasons. The theory discussed in Chapter 1 did not take into account some of the overhead associated \vit h decreasing p. This overhead includes the good simulation run time and the fault triggering checking time. From Equations S.5. 8.6. and S.;. ive should remember that as p decreases. the GST cornponent also increases (halving p implies doubling the coefficient of G'ST). This rneans increased fault simulation overhead. There are also problerns on the technical side. More time is spent in fault checking not only because of the smailer p. but also because more time is needed to check the primary effects of faults in the fault-groups. -4s a result. the overhead per round is generally higher for the hybrid-paralle1 fault simulators than for the pattern-parallel one. It ma? be worthivhile to develop a hj.brid fault simulator that can speed up or even eliminate the checking of primary effects. This will require a whole new design. Another technical problem is t hat when faults are simulated in parallel. the early stopping technique used in the pattern-parallel fault simulator cannot be applied to the hybrid ones. In the pattern-parallel fauit simulator. when the fault effects propagate to at least one primarl- output during fault simulation. the fault can be conaidered detected 11-ithout further gate e\aluations. This can reduce a lot of un- necessary gate evaluarions. However. this tactic cannot be applied in a hybrid fault simulator efficiently because the checking will have to make sure that a fault is de- tected in each of the partitions. which can be very time consuming considering that some partitions in the simulation may not detect the fault at all. thus wasting al1 the t ime spent in checking. This problem should only significant ly affect the perfor- mance when detecting the faults early on in the simulation. where faults tend ta be easier ro detect . When the fault coverage is already very high. most of the remaining faults cannot be detected very easily anyway. thus the early stopping effect is not as significant. Figure 8.9 illustrates the effect of running the static-hybrid fault simulator as the Figure 8.9: Effect of king Different Sumber of Partitions (Sormalized to f=l) number of part irions is varied. The curve is normalized with respect to the run-time with only one partition. Again. the simulations are run up to 99% fault coverage. It can be seen t hat there are two estremes among the seven of the benchmark circuits. rote t hat the easier circuits tend to run faster wit h more partitions (e-g.c6ZSSnr). On the other hand. the harder circuits such as c'2610nr and cT55'Lnr run much faster with fewer partitions. The remaining four benchmark circuits run fastest rvith 4 partitions (each with '256 test patterns). Ideallj*. the dynamic-hybrid fault simulator should recognize the changes in per- formance and adjust the fault-to-pattern ratio accordingly However. the current implementation of our dynamic-hybrid fault simulator is inadequate for doing so. Let's take c6ZSSnr for example. Initially. ive run the simulation with 16 partitions. assuming 10'21 PEs. Since the c62SSnr circuit runs best wit h more partitions (smaller p). an ideal dynamic-hybrid fault simulator should increase the number of partitions after each round (halving p). In Our esperiment. hoivever. this is not happening. In fact . our simulator has never reached the point where the p halving option won over the other two options (with 1024 PEs). This fact can be interpreted in two tvays. (1) Ué have neïer reached such occasion or (2)the fitnction is inadequate. In the three cost equations. the only term that has an exact value is Good Simulation Time. The other two terms are only approximations. In this design. it \vas assumed that the fault dropping rate and the fault triggering ratio are going to be unchanged throughout the next 'Zp,,,, test patterns. This assumption may not be accurate enough for the relatively small benchmark circuits that we considered. Also. the terms CMT and FST are approximations from the structure of the cir- cuit and statistical data from the previous round. We could possibly improve the dynarnic-hybrid fault simulator bu developing a better function for se!ecting the next fault-ro-pat tern ratios. Currently. the initial nurnber of partitions is chosen by tria. and error. It aould be desirable to develop a better rnethod to find the initial ratio as a result of an initial heuristic eduation of the circuit.

PE utilization was used to evaiuate simf-pps and simf-fps. We would want to find out PE utilization for sim fhps also so that we can compare the fault simulators. Since the hybrid fault sirnu!ators use the fault group data structure. special techniques must be used to find out the PE utilization ratio. Recall t hat the PE ut ilizat ion equals the number of PEs doing useful fault sim- ulations divided by the number of PEs used for fault simulations. The denorninator is simply the number of passes times the number of PEs. In order to measure the numerator. ive must first define the idea of a use~ulPE in the presence of fault groups. \l-hen a fault group is fault simulated in a partition. only the set of test patterns that could trigger at least one of the faults in the fault group can be considered useful. For each of the undetected faults in the fault group. if it is not detected by the current set of test patterns. then al1 the test patterns that trigger this fault are useful. On the other hand. if the fault is detected. then only the fault triggering test patterns in the partition up to and including the pattern that detects the fault are useful. The set of useful PEs (test patterns) is thus the union of these individual useful sets. If none of the faults were detected. then the useful set of PEs equals the set of PEs that triggers the faults. The total nurnber of useful PEs is thus the accumulation of the number of PEs in the each useful set. Figure S. 10 illust rates how a useful set of PEs is determined. Let's consider a fault case (3) case (2) Faults 2 and 3 Fault 1 detected by 7 and 9. detected by 6.

Fault 3

9 10

Test pattems that trigger the fault. Useful set of

case (1 ) No Fault detected. * 13

Figure S.10: Determining a Cseful Set of PEs group rvith the three faults 1. 2 and 3 with at least 13 PEs. Each fault is triggered by a set of test patterns (running on different PEs j. In case ( 1). none of these faults is detected by an' of these test patterns. The usefui set of PEs (in bold lines) in this case is thus simply the union of al1 the sets of PEs. In case (2). fault 1 is detected by tivo test patterns 1 and 9. Since the PE 7 is sufficient to detect the fault if Ive simulate the patterns wit h less parallelism. PE 9 is not a useful PE even t hough it can also detect the fault. -4s a result. only the PEs triggering fault 1 before and including PE 7. plus the ones triggering the other faults. are useful. In case (3). fault 2 is detected at PE 6. The useful PEs are thus 2 and 6 plus the ones that triggered fault 1. Although PE 3 also triggers fault 3. it is not useful because of the fault implication effect. When we determine that fault 2 is detected. ive imrnediately knom that fault 3 Table S.4: PE L-tilizations of the Hxbrid Fault Si~nulators is also detected. In tliis case ive do not need to find the trisgering conditions of fault 3 at al]. thus the extra PEs that triggers fault 3 cannot be considered useful.

The PE ut ilizations for the stat ic-hybrid faiiit siniulator and t lie dynamic-liybricl fault sirnulator are recorded in Table 8.4. The PE utilizations for sirnj-pps and sirnf-fps are also listed in the same table. It can be seen t har PE utilizat ion-: for the hj-brid fault simulators are generall?. Iiiglier rlian the nori-hj-bric1 ones. \\'lien conipared to aim/-pp. bot li hyhrid faiilt simulators Iiave higlier PE ut ilizat ions t han the pat terii- parallel fault simiilator for ail circuits. This proves t hat Our iniplernentat ion of the hj-brid fault sinidators succeeded in reducing the number of wasted PEs in par tern- parallel fault sinidation. iVheii compared ro .iirnLfps. ive found that the 11)-hrid fault siinulators have a lower PE utilizat ion only when the fauits are L-erj-eaiJ- to tletect ( i.e. for circiiits c53 15nr and c6288nr). This is hecaose only relat ively feu- test patterns were fed into t hese circuits. and t hese are t lie best cases for the faiilt-parallel faiilt siniulator. Figure S. 11 slioivs the accumulated PE ut ilizatiori ratios at different points during fault siniulation of circuit cK0nr. The curve for fault-parallel fault sin~ulationdrops like a I/r curve. This is because the hest cases for fault-parallel faiilt simulation is at the beginning of the faiilt simulation. where there are many undetected faults such that each C'.R.-\.\I PE is assigned a fault. -4s faults are detected and dropped from the fault list. the number of CoR.411 PEs esceeds the nurnber of iindetected faults. This results in wasted PEs. which reduces the final PE iitilization ratio. The ciirve for pattern-parallel fault simulator. on the other hancl. starts n-ith ver- Io\\. PE Initial PE utilization of fps = 64.8%

O 50000 1OOOOO 150000 200000 250000 300000 Number of Patterns Simulated

Figure 8.11: PE Ltilization C'un-e utilizat ion. This is because there are man! easy to detect faults at the beginning of the fault simulation. These faults can be detected by the very first test patterns t hat t riggered t hem. thus ut ilizing only 1/ 1024 P Es. Event ually. the fault list contains only the faults that are relatively hard to detect. The test patterns may be capable of triggering the faults. but not detecting them. If these faults are triggered every n << 1024 test patterns. then the PE utilization ratio will be driven up. The PE utilization will not reach 100% as there is an upper bound (l/n). If n = 5 t hen the upper bound is about 20%. The curves for the trvo hybrid fault simulators are very similar. Both curves start with low PE utilizations. However. the ratio increases more rapidly than for the pat tern-parallel fauit simulator. This is because of the CoR.-\'\I partit ioning. which reduces the number of PEs spent detecting -easyWfaults. If four partitions are used. then the utilization is 1/256. which is 4 times greater than that of simf-pps. In Our graph. as more patterns are simulated. the PE utilization drops. This is a non-trivial effect because i t depends on the structure of the circuit and the test patterns. If the Fault Simulation Time in seconds and Speed-Lp I circuit simf-pps si mf-jps static hps / dynamic hps ] c1355nr 0.189s (3.58) / '2.567s (0.25) 0.ll.j~(5.59) / 0.124s (5.lSj c190Snr 0.256s (5.S6) 1 1-1.937s (0.10) 0.210s (6.82) 0.193s (7.76) c2670nr 1.574s (14.87) 1 1446.264s (0.05) 10.:31:3~(6.60) 4.773s ( 14.25) c3540nr , 0.334s (5.06) 3.3s O 0.332s (5.09) 0.361s (4.67) 1 c.531.5nr 1 0.435s (4.72) 14.67Ss (0.14) 0.4:36s (4.70) 0.536s (3.52)

O c62YSnr 3s(4 1 3.220s (2.913) j 0.432s (21.88) 0.59:3s (15.94) cnr 5.36.5s (14.32) ! 4027.996s (0.02) / 19.SSSs (3.86) 5.2SSs (14.54)

Table S.5: FauIt Simulation Resirlts with 4Ii PEs ' Circuit / simf-pps 1 simfips ! static hps dynamic hps i c 1:3.5.5nr 1.O 1 0.9 / 1.8 , 1.T cl90Snr 1.0 1 1.3 1 1.5 1 1.5 ,

Table S.6: Çpeed-iip of Fault Simiilations Going froni III PEs to 4K PEs test patterns are capable of trisgering the --hard- faults \-er>-freqiientlj-. then the PE utilization ratio \vil1 rise. For esampie. if the PE utilization ratio is 40% wfien there are only -hardM faults left. then the PE utilizatioii ratio ivill drop if the faults are t riggered everJ- 3 test pat terlis or niore ( approaches 1 = 33% 1. Hou-ever. if the triggering frequencJ- is ei-ery 2 test patterns. then the PE iitilization \vil1 sise (it will approach 1/2 = 50% ).

8.3.2 Siiiiulatiiig Usiiig a Larger CORAM

Finally. ive ei-aluate the C*Rr\.\I faiilt simulators by emulating a larges C*RA.\I. This elaluation can test the capacity of our faiilt simulators for a more realistic C'eR.411 platforni. In our evaluation. a SJ-stem ivitli 41i PEs and 161; bits of local niernop. per PE is emulated. This eqiials a systeni ivitli i-11 bytes of R.411. For C'.AD u-orkstations. this is actually a ver>- small amount of niernory. \\é chose tliis configuration only because it allows ils to obtain resirlts iti a reasonable amount of emulat ion t ime. Table S..5 shows the performance of the four CaRAhl fault simulators using 41.; PEs. By increasing the number of PEs. ive obtain more dramatic performance im- provement over the simulator without CeRAM. In the table of final results we pre- sented at the beginning of this section. simj-pps runs faster than the hybrid fault simulators in t hree benchmark circuits. We can see from Table 3-5 t hat this number is reduced to one. The speed-up ratio from going from II< PEs to IK PEs is show in Table 8.6. The speed-up ratio for the hybrid fault simulators is generally higher than that of the pattern-parallel fault simulator's. .A higher ratio implies a more scalable solution. This suggests that the hybrid fault simulators are more scalable than the non-hybrid ones. In addition. the dynamic hybrid fault simulator is more scalable t han the static hybrid fault simulator when simulating circuits that are -harderY (need more test patterns to reach 99% fault coverage). This is due to the adapting ability of the dynamic hybrid fault simulator which allows it to run in a purely pattern-parallel fashion when the fault dropping rate is 101~. Figure 8.12 show the speed-up \.ersus the number of CeR.411 PEs for four of our benchmark circuits3. The of the four simuiators can be observed clearly from these graphs. We can see that the scalability of the dynamic hybrid fault simulator is better than the other ones in three of the four circuits (c354Onr. c62SSnr. and c7.552nr). It is interesting to note that the dynamic hybrid fault simulator does not speed-up as much as the static hybrid fault simulator and simj-pps for circuit c2670nr. This situation can happen ivhen the faults are not only hard to detect but also hard to trigger. Let's consider three faults with fault triggering rate of about 1/161i for esample. On a CmRAM system with 16K of PEs running static hybrid fault simulation using 1 partitions. only one simulation pass is needed. On the other hand. if dynamic hybrid fault simulation with only one partition is used. then three simulation passes are needed. An ideal dynamic hybrid fault simulator çhould have analyzed the circuit structure and determined that four partitions should be used.

3There are three data points in eacli curve: 1K. 4K and 16K PEs. The first two data points were taken with 64 PEs per CeRAlI partitions at the beginning of dynamic-hybrid fault simulation. In contrut. the data points with 16Ii PEs had 256 f Es per CeR.N partitions. Since the configurations were different . ive may not compare the 16K PE results with the other results directly. Therefore. the simulation results with 4K PEs were used in the previous paragraphs to represent a larger COR-411 system.

Chapter 9

9.1 Summary of Results

-4 benchmark convent ional fault simiilator sirnjwas de\-eloped init ially to gain espe- rience with implementing the basic algorithms and to establish a point of cornparison for other fault simulators. It \vas shown to I>e efficient and t>-picallx fastes than the public doniain fault simulator sini-3. Three differe~itesperimentai C'eR;\.\I fault simulators were designed. narne1~-ai~i~pps. sinif-h1.s and .itnij-hp.s. In addition. t\vo different \-ersions of sirr1fAp.i twre tle~igiiecl. One inipleiiiriir i a itat ic-li>-il ricl faiilt simulat ion çcheme wl~ilethe ot her one im plenierit a d>-iiatiiic-li~-l~riclfaiilr ~iniulation scheme. -411 the source code is inclodecl in the .-\ppendis. The various opt iniizations risecl in each faiilt sim~ilatoris siinlmarized in Table 9.1. Sirizj-pps is a pat tern-parallel faiilt siniiilator ivliich is basecl oii t lie fanious PPSFP algori t hm ( Parallel Pattern Single Faiilt Propagation ). 1t [vas foonil to be higlil>- efficient ancl relat ively easj- to implemeiit on C'a R;\.\I. Hoive\-er. t Ili- wlut ion riiay not be scalable as the size of CoR.\,\I increaces because of the Ioiv PE ritilization at the beginning of the simulation. where most of the faults are relati\-el>-easy to detect. SimLfps implements a faiilt-parallel fault ~irnolationalgorithni. Tlii simulator riins our benchmark prohlems even more slotvl>. than sinif inost of the time. This is primarily due to two bot t lenecks: single pat tern fault-free siniiilat ion and fault triggering condition checking. Hoivever. lvith benchmark circuit c6ZdSnr. simf-fps is even faster t han sin1f-pps. This supports our claini t lia t fault -paraIlel faiilt simulat ion tends to be more efficient trhen there are man!- eas~--to-detect faiilts. Techniques developed when designinj tliis siniularor were re-iised in siir?l_-lil~.>. Optimization simf 1 PPS / FPS j HPS Event Heap ves yes -ves 1 yes : r , Faul t Irnplicat ion ves yes no ' ves , DFS Fault Grouping no no , no ves 1 Fault-Family no no -Ires >-es 1 Sode Allocationx no no -ves no i Fault Triggering Test ves yes -\-es 1 >.es 1 Early St oppi ng / !es / yes no no i Simulating from Site of Fault(s] 'es 1 yes 1 !-es" ! \-es" 1 -- use of the crarnmapper niodule '"Multiple faults were inserted

Table 9.1: Optimization Implementecl in Each Faiilt Siniiilator

Simj-hps is a hybrid fault sirnulator that parallelizec: dong botli the fault and the pattern dimensions. -4 stntir version \vas first developetl. n-here the faiilt-to-pattern ratio is fised. This sirnulator \vas found to be efficient: however. it [vas not as fast as aimf-pps for four of the seven bencliniark circuit.; n-hen 1024 PEs were iised. Estra funct ions were t hen adcled to this siniiilator to niake it dyriurnic. u-here the fault-to- pattern ratio is adjustable. The dynaniic version can somet inies pro\-ide ber ter results t han the static-hyhrid fault siniulator. Hoivever. comparecl to sin~~pps.the rwo hybrid faiilt sirnulators are not corisist ent 1)- faster. .Also. for t lie hericliniark circuit l- at least . the d?.naniic h~-bridmethod iras not alivays faster than the static h>-bric1metliod. In this thesis. a software en~ulatioiiof C'eR.\,\I \vit h 102-4 PEs [vas tiseci to run the algorithnis. Tests ivith 4096 PEs ivere also run to in\-estigate the scalahility of the C'.RA.\I fault simulators. It was fourid that the 11)-brid faiilt siniiilators are niore scalable than the non-hybricl ones. Siiice 4096 PEs n-oultl be found in S.\[ 1)~-tesof CaR;\.\I. this could be considered a sniall system. Ir 11-ould be desirahle to emiiiate an el-en larger C'eR;\!iI hecause that ~rouldbe closer to real memory capacitirs foiind on today's workstations. Howei-er. this Iras not done I~ecauseC'eRi\.\I emularion is \-ery time consuming. Severtheless. \rit11 respect to a Iiost ivithout C'eR.411. a host and a C'eRr\.\I ivit h IO21 PEs was est imated to be capable of accelerating fault siniulat ion by 10 times using appropriate algorithms. and a Iiost and a C'mR.\.\I tvit h 4006 PEs can accelerate fault simulation b~.ahoirt 10 to 30 tirnes. The notion of PE titilizatioii Fault Simulation Tinie in seconds and Speed-l'p circuit s imf-pps sinif-jps static Iips i d-namic hps 1 cnr 0.069s(9.24) 3s(O. 1 0.09-ls(6.X) / 0.09Ys(6.44) cl90Snr 0.10.5s (14.20) 1 14.816s (0.10) 0.16Ss (S.90) j 0.132s 19.S7j 1 c2670nr 0.6.57s (103.55) 1427. 177s (0.05) 5.014s ( 13.5;) ! 0.71 7s (94.83) cd-540nr 0.194s (8.71) 1 13.531s (0.12) 0.252s (6.70) 0.276s (6.1 1) 1 ~:310.294s (6.98) 14.413s (0.14) 0.:357~(5-75) 1 0.4:31~(4.75) c6'LSSnr 0.:392s ('24.10) 3-1-13 (3.01) , 0.355s (26.63) i 0.52% lid.00) 7--5 1.463s (52.52) / 4019.670s !0.02 i : 13.5YOs (5.66) i 1.632s i47.09) 1

Table 9.2: Fault Sinlulation Results \\'ith 4I\: PEs Riinning at 143.\IHz was int roduced to evaluate the efTecti\-eness of the algorit hnis in using CmR.411. It is found that PE utilization for the hybrid fault simulators can he iip to two times that of the pattern-parallel fault sirnulator. depending on the circuit and the test patterns. .At the end of Chapter 1. Ive nieiitioiied tliat the cycle rinie of the workstation is 17 rimes faster than the assumecl C'aRA.\I systeni. frider this asoumption. Our fadr sirn~ilatorscan improve CmR.-\.\l iip to 22 tinies I Figure S.5 i. \\é asaunied t hat oui. CaoR.-\.\I systeni is based on the DR -4 11 techriolog>-. Faster menior>-arcliirect ure tectiriologies esist chat can be usecl to iniplenirnt C'.R.\.\I. Assurning C'.R.\'\I is iniplemented on a technoiogy ie.5 static R.-\.\i) that riins as fast as the C'PI. ( 1671IHz in oor case). the speed-iip of our algorittiiii. usino 41; of PEa. is up to 103 tinies (see Table Y.?). Soie that in Table 9.2. the patterii-paralle1 faiilt siniulator runs faster than tlie other ones in sis out of the set-en tests. This is hecaiise this algoritlm uses the least C'PL' tinie. -4s the C'~R.411speed irnproves. the C'PL- rime becomes tlie main bottleneck. This indicates that the future research priority slio~ildbe ctianged to find algorithms that use the least CPC tinie when the CmR.-\.\l is relaril-el? fast.

9.2 Further Research

In order to appll- CmR;\.\I to fault-sirnulate indtistrial circuits. t liere are at least three remaining concerris. The prima- concern is that niost incl~istrialcircuits are niuch larger tlian our benchmark circuits. Second. most circuits iii inclustr>-are sequeritial circuits t hat include clocked memory element s as rvell as conil~inarional logic. FinaIl-. there are other popular fault models besides the ones used in tliis research. Li-e will discuss each of t hese concerns in t lie folloiring subsections.

9.2.1 Fault-Simulating Large Circuits

The fault sirnulators developed in this research are capable of simulating circuits as large as CeR-411 can support. L-p to now. CoR.Ul ivith 161; bits of local rnernory has heen simulated. This meanc that the simulators can simulate circiiits wit h up to roughl~1% gates. This is rather small compared to most indiistrial circuits. One solution is to use a larger CmR.411. To be flexible. hoivever. ive ma! want to de\.elop a fault simulator that does not depend or1 the upper liniit of the amoiint of local mernofi- per PE. This can be clone hy groiiping multiple PE's together and to pool their local memory. For esample. when tii-O PEs are grouped togetlier. we can have 32Ii bits of meniory \rit h one PE. at the cost of rediicing the parallelism by a factor of tno (the second P E is used as a rneniory -sen-er- ). This st rategy would suffer the cost of occasional1~-lia\-ing to trarisfer data among local merilories hy left and right shift iiig. Shifting data aniong PEs. hoive\-er. is a relat ivel>-fast operatiori compared to a niernoq- nccess. The overhead is liard to preedict wit hout siiniilat ion.

9.2.2 Sequeiitial Circuits

\\é ma>-ivaiit to espantl t lie faiiit sini~ilatorsto simiilate seqiient ial instead of cornl~i- national circuits. since niost iridiistrial circuits are sequential. \ié could upgrade tlie siniulators developed in this researcli to siniiilate sequent ial circuit-: ivi th the folloiving niodificat ions. First. the fault simulator riglit noir. only siniulates sis th-pes of gates (not el-en SOR gatej. In order to sinitdate seqiiential circuits. ive niust he able to simiilate D flip-Rops. This iniolves defining faiilt-collapsirig rules ancl faiilt-triggering rules for D flip-Rops. In order to define these rules for the D flip-flops lalso for SOR gate). one must decide on whether the circuit element should ha\-e a belia\-ioural view or a structiiral view1. In a bchnriournl ciclr. ive onlJ- need to take care of tlie faults at the input and output of the gate. If Ive \-iew the circuit elenient as a block of basic

'The two ternis are taken frorii tlie \*HDL hardware description larigunge.

131 gates (sttuctutal rifu*).faults at the interna1 wires should also be considered. The ASD gates. OR gates and BL-FFERs in this thesis used a structural \-iew. Csing a behavioural model for the flipflops tvould allow the simple use of parallel cint variables to hold each bit of memory in the circuit. Second. Ive must take care of the gate evaluation order. Currentiy. the heap sorts according to the gate number because this number reflects the gate e\aluation order. \\-hen simulating seqtiential circuits. ive rvill ha!-e to corne iip with a scheme to sort the gates so that the gate evaluation order is correct. L\'e ma>-sort the gates and use the gate number in a synchronous circuit. In asj-nchronous circuits. timing is very important. For esample. if a signal propagating to the input of a D ff ip-flop arrives after the edge of the clock sig~ial.the flip-Rop will not latch in the correct -Lue. In order to simulate asynchronous circuits. ive must use actual propagarion delays in gates. ma>-heeven separate wires. The event heap should t hen ilse rhe signal timing information for sorring. There is also more trouble in perforrning faolt-free simulations. Clcarl?- a synchronous secpential fault simulator u-ould he much easier to implenient t han an asynchronous secpient ial one.

9.2.3 Otlier Fadt Models

In t his researcli. onlj- st iick-at . gare dela'-. ancl the transistor st uck-open nioclels are considered. In industry. there is considerable on-going iriterest in t lie harcler probleni of path-delay fnult sirnulntion. The path-delay fault model assunies that signal delays are caused b>-acciimiilated sniall delay~on the signal propagation path. This contrait.s witli the gate delay faolt. trhere the spurious delaj. is niodelecl as hein- luniped at the out put of one gate. Fault simulation of t his fault niodei will reqiiire niore sopliist icated designs such as patli generation and 6-value logic simulation. Reference [9] describes a met hod to fault -simulate pat h-delay faults hy parallel processiiig of patterns. It ~vouldbe a worthn-hile ancl likely very challenging researcli projecr to espand the CmRA.\I fault simulator to sirnulate patli-del- faults. Bibliography

[l] Y. Aimoto. T. Iiimura. Y. Yabe. H. Hejuchi. Y. Xakazawa. 1 Motomura. T. Iioga. Y. Fujita. SI. Hamada. T. Tanigawa. H. Yobusawa. and K. Koyarna. .A i.6SGIPS 3-84 GB/s 1W Parallel Ima e-Processing RAhI Inte ating a 16Yb DRAM and 128 Processors. In 1996 1B EE International SoliF -State Circuits ['onference. pages 32-33. 1996. [2] T. Blank. The Sf aspas MP-1 .Architecture. In Procredings of the IEEE Compcon Spnng 1990. pages 20-21. IEEE. February 1990. [3] F. Brglez and H. Fujiwara. .A Xeutral Setlist of 10 Combination Benchmark Cir- cuits and a Target Translater in FORTRAS. In 1985 International Symposium on Circuits and Systerns. pages 663-698. June 19S.5. [a] K. Cattell and J.C. Muzio. Synthesis of One-Dimenstional Linear Hybrid Cellular Automata. IEEE Transactions on Cornputer-rlided Design of Integrated Circuits and Systems. (3):325-335. hlarch 1996. [5] J. Childers and P. Reinecke. SI'P: Serial i'ideo Processor. In Procecdings of lEEE 1990 Custom Integrated Circuits Conference. pages 17.3.1-1 Ï.3.1. 1990. [6] B.F. Cockburn and .A. LX. Kmong. Transit ion Maximizat ion Techniques for Enhancing the Two-Pattern Fault Coverage of Pseudorandom Test Pattern Gen- erators. In 1998 IEEE VLSI Test Symposium. pages 130-437. April 1998. [7] D. G. Elliott. Computational R.4.V: A Mcrnory - SI.1ID Hybrid. PhD thesis. L-niversity of Toronto. Dept . of Elect rical Engineering. 1998. [SI D. G. Elliott. K. 11. Snelgro\e. and SI. Sturnm. Computational R.411: A Memor-SISID Hybrid and its .Application to DSP. In IEEE 1993 Custom Inte- gratcd Circuits Conjerence. pages 30.6.1-30.6.4. May 1992. [9] F. Fink. Ii. Fuchs. and SI. H. Schulz. Robust and Sonrobust Path Delay Fault Simulation by Parallel Processing of Patterns. IEEE Transactions on Cornputers. 41( 1'2):1.527-1536. December 199'2. [IO] 11. J. Flynn. Céry High-Speed Computing Systems. In Procerdings of IEEE .j4:l? pages 1901-1909. December 1966. [Il] 11. J. Folk and B. Zoellick. File Structures. chapter Cosequential Processing and the Sort ing of the Larger Files. pages 280-284. Addison-Wesley. Reading. 1Ia.s~.. 199'2. [12] M. Gokhale. B. Holmes. and K. Iobst. Processing in IIernory: The Terasys S,lassively Parallel PIM Array . Computer. pages 23-31. April 1995. [13] D. Harel and B. Krishnamurth. 1s There Hope for Linear Time Fault Simula- tion.? In Proceedings of FTCS 17. pages 23-33. 19Y7. [llj J. P. Hayes. Rational Fault ..inalysis. chapter Modeling Faults in Digital Logic Circuits. pages 23-95. R. Saeks and S. R. Liberty. eds.. Marcel Dekker. Xew York. 1977. 1 R. A. Heaton and D. W. Blevins. BLITZEY: A VLSI Array Processing Chip. In Proceedings of IEEE 1989 Custom htegrated Circuits Conference. pages 1'2.1.1 - l'E.1..5. 1989- [16] J. L. Hennessy and D. -4. Patterson. Computer Organization and Design. chapter Fallacies and Pi tfails. pages 75-1 0 1. Morgan Kaufmann Publishers. Inc.. San Francisco. California. 1992. [17] J. L. A. Hughes and E. J. McCIuskey- An .\naIysis of the Multiple Fault Detection Capabilities of Single Stuck-at Fault Test Sets. In Pmceedings of International Test Conference. pages 52-59. October 1984. [18] Y. Ishiura and S. Yajima. Dynarnic Two-Dimensional Parallel Simulation Tech- nique for High-Speed Fault Simulation on a Vector Processor. IEEE Transactions on Cornputer-Aided Design of Integrated Circuits and Systems. 9(8):86û-875. Au- gust 1990. [19] SI. Iiurokawa. 4 Hashiguchi. 1. Sakamura. H. Okuda. K. Aoyama. T. Ya- mazaki. 11. Ohki. '1. Soneda, K. Seno. 1. Kumata. '11. Aikawa. H. Hanaki. and S. Ilvase. 5AGOPS Linear Array Architecture DSP for Video-Format Conver- sion. 1996 IEEE International Solid-Stnte Circuits Confc r~ncc.pages 254-255. 1996. T. SI. Le. S. Panchanat han. and SI. Snelgroïe. Corn put at ional-RA 11 Implemen- tat ion of )lean-Average Scaleable \éctor Quantizat ion for Real-Time Progressive Image Transmission. CC'ECE.96. pages 442-445. 1996. J. LeBlanc. LOCST: -4 Built-In Self-Test Technique. IEEE Design and Test of Cornputers. 1(4):46-66. Sovember 1984. [E!SI. Jlaresca and T. J. Fountain. Scanning the Issue. 1Iassively Parallel Cornput- ers. Proceedings of the IEEE. 79(1):395-402. April 1991. [23] Y. Sarayanan and V. Pitchumani. Fault Simulation on 1Iassively Parallel SIMD Ilachines: Algorit hm. Implement at ions and Results. Journal O/ Ekctronie Teat- ing: Thtory and Applications. 3( 1 ):ï9-92. February 199'1. [21] T. M. Siermann. W.-T. Cheng. and J. H. Patel. PROOFS: -4 Fast. Memory- Efficient Sequential Circuit Fault Sirnulator. IEEE Transactions on Computer- -4 ided Design. 1 1(2):198-207. Februq 1992. [25] B. Parhami. SI'rID Machines: Do They Have a Significant Future. Cornputer -4rchitecture :\'EU'S. 23(-l):19-22, September 1995. [26] D. Parkinson. D. J. Hunt. and K. S. 1facQueen. The AJIT DAP 500. In Digest of Papers: C0JIPCO.Y Spring 88. pages 196-199. IEEE Comput. Soc. Press. March 19SS. [XI D. Patterson. T. Anderson. S. C'ardwell. R. Fromm. K. Keeton. C. Kozyrakis. R. Thomas. and K. klick. A Case for Intelligent R.411. IEEE Micro. pages 31-44. hfarch and r\priI 1997. [28] S. Seshu. On an Improved Diagnosis Program. IEE Trans. on Electronic Com- puters, EC-11('1):76-79. February 1965. [29] Thinking Machines Corporation. Cambridge. Mass.. Thinking Machines Corpo- ration. Connection Machine Mode1 CM- Technical Summary. 6 edition. 1990. [30] E. G. Clrich and T. G. Baker. Concurrent Simulation of Searly Identical Digital Setworks. Computer. 7(4):39-44. April 1971. [31j J. -4. Waicukauski. E. B. Eichelberger. D. O. Forlenza. E. Lindbloom. and T. Mc- Cart hy. Fault Simulation for Structured VLSI. l ZSI Systems Design. 6(12):20- 32. December 198.5. [32] J. -4. Waicukaski. E. Lindbloom. B. K. Rosen. and V. S. Iyengar. Transition Fault Simulation. IEEE Design and Test of Cornputers. -!(2):32-38. 19Y6. 1331 S. Yamashita. T. Kimura. Y. Fujita. 1. Aimoto. T. SIanabe. S. Okazaki. Ii. Sakamura. and II. Yarnashina. A 3-54 GIPS Integrated Memory Array Processor with 61 Processing Elements and a 2-SIb SRAhI. IEEE Journal of Solid-Staf e Circuits. 29( 1 1): 1336-1 343. Sovember 1991. [31] S. Zhang. BIST Cenerators for Faults \vit h Sequential Behavior. Master's t hesis. University of Victoria. Dept . of . May 1993. Appendix A Source Code

Basic Modules

lac ouc,nod*. inc @u-node. ) CAC..

cypadof rcrucc FauLc,scrucc Faulc. rtrucc Faulc-struct i Faulc anexc. // point CO noxc fau:c Faulc iupliodl. // f rrst uplrod tault Faulc muplrod2. // second uplrid faulc

xnc gaco-nu. chu fully. 1.

rnc 6.m-nu; // FaulC Furlj nambor tnis SC~UCLropr*soncr FaulcT typa. // type of otiocc cnir taulc futly producos. // rhrch 1s *%chor ?,SA0 or t,SAI Faulc @Pombor. // lrnk Iuc of wibrs an Fault Furly

FaulcFurly mnoxc. // noxt fault furly; /a 11 - noc axisc */ FaulcFurly *prov. // prw faulc furly. FaulcFutlj @noxcrnsu. // Ch* lrsc of faulc fully in suulacion l. cypodof scrucc FeulcCroup,rcruct FaulcCroup. rtnclude scmcc FaulcGroup~scrucc f rnc gaca-nu. FaulcFuily wFFCN.

Fwlt @TragFauIt[23. // -> ch. frrsc miber tn FF crrggorod crnc r:r%gFlag[2]. // -,a 8uk of Cho Crlgg*rod taulc inc gaco,num. Caret gara-cyp*. /* chu n,frn-xn. n-fui-ouc . inc @fan-an. // and of structure h tnc .2an-ouC. /@ rf cher* um no tui,ouc. n,f.n,ouc 1 uid tbn-auc concbrns ch. gaco-nw rf cher* AS fui-out. n-fan-out - tf.n,ouc Md fui-ouc concaras ch. llsc of fan-eut. excludlng gacr-nu r/ brs copyrrgac orcrr* msc r.nsnV rncacc - D.rriacave marks uycmcaan ddaCl~na3noclces rotd CoppxnacO . i Yhrs soic8ur cous 8rch no murrnty prrntf Y\ \aCopyrxghc cc) 1997. t998 by Albwt L -C Kmong. Lmioncon. AR. C.nwIa \ ...... \al11 rxghcs r*s*n*d \ . "). pfototyp* h chrs f;li concatns functron proCotypes ?or propar /

rnt urn irnc ugc. chu .-O) { Copyrl~hcO.

case 'S. su-irule i- 1. break. // scuck-AC case 'p' IuiPV - rc~r(ugv[ugc-t~).brarlr. *xc*rn bool ~~~&~a~lt~ro~~

// ana O? prococype n

8wPutrcron pertPuus processors / IruPY. 1 ...... 8 Puallbl Frulc Suuletors on ch* C*UP Arcnrcactur*

Copyrrghc (cl 1991. 1998 sr Albrrr L -C Kuang. Edmonton. Cumda A:: rrghts riserid Thas totcuua may O* used ior non-profic unlv*rsicy raseucn af 6rv.n Che ruthor's bzpressd p*rilSsron in ar*cuced :tc*nse &6r*raanc urch ch* auchor as r*qulr*d fur al1 ocber US*$ of * cars sofcu&-e t~drscrrbucron of tbrs sotruari rs nor p*mtccma U~EILOUI cnr auraor's *xpr*ssrd per~rssaon This copyrrghc nocrca musc reuln rncacc Derrvacrvr u~rk~~y contaan ddrcronrl naclces ...... fhis satcuur coars wttn no uuraacy . sttr C Purpose rhrs xs ch. uxn body of a fault samlitor. saaf . The suulacor suulac~sStuck-Ac. Sruck-Open. .ad .8 G.c*-0*Iry faults CO-d Lin* Optlons Sa. ch4 Usy.0 rourrns 8 ; mt nou-sa. nom-d. nom-IO. CoMC-Faolc-Lrst(fuilt,lrsc. kaor-sa. &mm-a. &a.#-sol.

rord Re&-ISUS (clur @filonru. Carcurt ocut) i Idotrno III-LISTJITt 1024 FILE icucuac2p ;

/a cboso tomp vua.blos bold tao saze ot ch. 1r1c at cm monnt tomp~n~~mire- romp-atm-yto - trip-nurraput - comp,naoutput 111-LIST-SIU,

prrnrt i"\n'.l . prrncf ("\n") . prrntt ("Total cti. - 11 - U - Zd 0 U Usin'-. cvu::tiu/lW0000000.0. StOPUATQ,TfüWlO00.0. srn~~~a-elilm.SXIPYAT~~,OTCUIEAD. ciu..e~./1000000000.0 SmPYltCB-TIPe/LWO-O - STOPYATQ-cirrrn SmWlTCa,OVEMEIO). prrntt ("\n'-) . roturn 0. 1

This skcuue iry bo USO~for non-protac uniirrsacy resouch C r: gavon tao aucbor.s axprosso~porirssron 4n rrocucod lrconse *priaet 'srdorr. "C.nnOc @?On ZSin". frlonama' a~oamontaith ch4 aucnor 1s roqurrod tor al: aehor usos ot oxic (1). chrs sottuu~ Iodfs:rtbutron ot CEis sotcuuo is noc ? po~lctramrcaaut tba rucaor'r ozpross*d pamrssaon This COpJrlght notrci iusc rrurn rntact /- r0.d c:rcu1c */ Dwavatrvo morks -y contarn adart ionrl noc ;cos i

This sottmuo cous rich no uur~cy buftor = (chu *) mlloc (srzoot (cnul aut,r~zoi.

road-rscrs c rnt uiro-nui. chu gato-namC331. ~t*-c~*t53. Purposo ~0.dtroi an rscu noclrst rtlo rnd produco ch* carcuit data dOSCtlb0d in structura b trlo anc nuifan-ouc . rnc nu8-frn-rn. Moto lscas ta10 fomc - 6rcn Irne doscfrbos a pioc* ot air0 or r last ot roiorincos /a pars* lrno */ vira 1 18.c rnpc 6 O SSCM~ (bWff0r. "%a xi ZX". rrronuibor am rr0,nUD. urro typo gato outpuc/tan out gaco-nue . sac* typa butt/noc/.nd/nura/orfnoffsor/.nor/troi gare-type . r ot tuiouc inulfm-out. S ot fuiin &nui_fui,rn ). - if tanout and tmrn. cbrn tanrn tarse tollou by tanout - tan out rs rbDresoncod bv a rarm rien tfP as miro tmo C tpranct (stdorr. "Lrror urro nuabor O 1s usod\n"i oxic (1). 1

i/m Add miro eo ch* mare lise */

lt (marr,nurl >= Camp-nui-miro) ( inc Roplacr-Gace

Matano kW-SIZE 256 chu .Oufior. rnc sut-sr:. - WF,SIZE. I. IO~U-ISC~~

Pro It 1s assumo chat cuc rs ripcy . lrnos in Cho crNFrc trio d0.s not oxcrod 256 chus Inpuc talenru iscas file nu* . optrons - rod optaons ( co 00 spocatrod 1 OUCPUC *CUC- crrcuit data structura fprrnrt (stdom. "CUI 't r*al1oca\n") ciic-~~ue-1rstCirro-nd.rxrst - 1. eut->irro,I tstCrrrr~no.3 troyic*-cut-mmgat*. cut->ntmJlr. - alr.-num * I. eut->rue,lrstCrrro,nd. aodo-nui -CU=->nmgat*. j

prxntt ("Puurrr nrubar of rira Id\n". cut n-ruil. prract ("Puuoi nummr ot grco Id\a". cut nui-gitol. prtnet ("Tora1 noibor ot raput ld\n". cut nui-raput). prrnct (-Total ntnsmr of oucput W\n". cuc a-output)

rnc cn. inc count.

auit*rZ - (chu *) ulloc (srxrot (chu) # but2-sttil

/a prtnt outpuc List a/ E for (me 1-0; iccut nui-output. 1-1 E printt ("pr- output Id urro td(fd)\n". 1. cur oucpu~~LiscCal. cut ulrr-Itst Ccut output-1 tstCiJ3 irai-&&toi

rnt Clchotial (Curuic ecuc) i int mucacha-O. tac curcacno=o. chu cacherof Ccut-~nru-pto3

/O cru lino a/ i xnt r. J. chu t1y o. for (14 .i

Copyrtgbc (c) 1997. 1998 by Albort L -C. frong. Ibonton. mada Al1 rlgbZS rirorvrd fhas softuuo uy bo urod tor non-protar oaivorsrry rarouzh grton th* autaor's isprosssd prmrsrton An irocurod lrcanso yoounr rrrh cbe iucaor rs riq~lradtor al1 acbor usir of cals sottruo todxscrrbaclon ot chrs roftiui ir art porirtcod rlthout Cho &iiclior's oxprerrid priasrton. fhrs copyrrght notice murs roum incace. Dorrr&civo rorks uy coscarn .dditroiill nocrcos

Purpolo . gonorata faulrs tram Cao rtruccurr of Clio crrculc undor cort piriori fiult collapsing riid taule uplacatxon rorup u roll /

char mtault-tabtoCB] = ..s~~... .. sr.. .- SF.. . .. 10.. .. pc- , .-:BO- . .-lpa-). cbu *gato,r~bhClO] = {-f~py-. -WF" .-iar-.--~ro- .-BABD- :-or. -iUr .-:ai.. .-xro~~:m~u.*) .

/. 0. Pro tault,l~st 1s allocitoe ta 6 cuc-mu,urrr doLay fau!cs .ad stuck-ar faults uo inltlrlly rrrkod &r bxrrc unoreas scuck-opon faulcr uo xnrcully urkod as non-miart O tor rncrrrodrito taulrs tor ABD/OE gatbs. tbo sacond rnput pin :a riprosont ch* taule. uhich rould bo iukra as *:Chor ;PO or a . Buitorr are nor roplacra as BOT-BOT scrucruro ./ vaad Con-Fiult (Carcuit -

I. rriovo al1 ch. non oxistrn(f faulcs */

NU. - m.

- IULL. W. break.

break. braak.

FarilrPurly ~Gon,lauIt,?urly (Carcurc *cor. Faulc *t.otc,lascl <

// doosn't mark tor xor/xnor rat ccabloC91 - €0.L .O. t .O. I .O. L .O).

// oucpucs for (rnc 1-5. ]>-O . J--) i Fault etaulc - iaulc-lrst - 6*rgaca gaca-nui 1. if ('tault-~tl.g)

casa t-rPO CAS. ?-as0 casa t-10 cas* t-PO // doas not bolong CO cbrs gara bl~kk. cas. *_SA0 t-SE.

//prrntf ("td\n". tault - tault-lrscl . ~ddfafurly (iaulr. r. 1).

break; 1 1 f

// rercructuro cba llneu taulr Lrst hoadad by taule-hoad anc tartruct,Faulc_L~st (Faulc i*taulc-haad. Fault *trult-lrscJ f Faulc *CU. -prav. F~ULCtmip-saaa . tnt couac-O.

// tnpucr tor (rnc )=O. jccgaco n-tu-an. j*-j

// add the IC ad PO troi chts vari tor (rnc k-L. k>-O, k--b //pruief ('-Z$\n". taule - frulr, AddToFurly (fuit . 1. nceabl*ctgat* gat*-typI) . 1

//prrnct ("td\n". taule - taule, uuoc (couac~~~ay.O. surof (me 1 81 AddToFurly (faule. t . ctab1rCegat. .gat*,typ*3 . 3 Get-Count (o-sa. a-d. a-50). 1

votd Count,Fmzlt,Llst

cas* i-rPü for (1-0. t<6*nru_mrr*. a*+) casa r-&ma rt t-tauit-lrstCr3 ?lqJ counc,.rr&yCiault~l~scC~ ~fp.1~. // ilrmady taken cul of break. cas* t-SA0 CU* f,Slt. CU* 1-10. C //prratf <"Zd\n". taule - taule-1 trc) . AddTofurly (Zaulc. t. *ct&bi~Cegac~.gac~-cyp~J). 1 srltch (hw.1) i cas* t,Shl car*for !(1-0. j26. J*-i ?or (1-0. r break.

crrrtor 2 (]Mi 3<6. j+*b for (1-0. ~

C

prtnrt ("Zstfd) - ". taulc~ea~I~Ccueyp*J . CU uir*-nud tf (CU flag -- id,iJXDETECïU)i for Cint 1.0. j<1. 1-1 prlnci ("Und*caccrd "1. Lf (CU flag " id-eOUIVhL) Faulc -CU. pruiet ('-Cotlaprod "1 . tor (CU fiC(icc~)*jl ihmeer. CU '0 tüU. car CU->a*xt) it (CUZly =- io-DETECtEo) i prxnct ("D.Cactod "1. tt (CU flag m- td,IRPLlED) prrncf ("Dat*cc*d by Implicacton ") . prtnet ("(k)".cu thg) . prtnti ("\n" 1 . 1 > roturn Ci. or*&. i cas* 3. // drautrq* i iorb Pruic,Faule~Lr~c(Paule -tault,lrst. Faulc *trulc-hoad) tnt nkoac - 0. i Faulc *t&ult-bsad - NLL. int 1. Fwle *CU. rnt n-O. p-O; Iistruct,Fault,Lxrt (ttault,h*ad. taule-lirt). Faulc *CU. for (CU iaule-rima. cu a- WLL. CU CU-m*xc) c

7. prlnef ("lodm itd\n'- .*-nioor . Drarloom (CU. O). k ? for (CU - tault,boul. CU *- MU. car cu->n*zt~ f c~>?l46t- !td,DtBVG. // chu eh* dabuggrng ilag 1 &met ("~otaï aoibr oi ~oocs Xd\n'.. niaoc). %nt ara. nso. ad; Got-Count Unsa. hd. Lnsa) . prrntf ("Tocai nusbar of iaulc td\n". asa*nd*nso). prtntf ("Pumur ttficr*ncy - f3 .lttZ\n". nloot*iOO O/(nsa+nd*nro)). k braak. for (tac r=O. rccur-~nolourpur. r-) C tac troqat. cuc-~urro,lirc Ccoc-~rucpat,Irst Cd3 frai-gato.

for Ctnt 110: i

Pua1101 Faulc Suularors an ch* Cilili &cbrcoccuro

Copyrtghc (cl 1997. L998 by Albort L -C Iraq. Ed8onCon. Crnaâa il1 rrgars rosorroa Thrs so?cuuo uy 00 usod ior non-profrc univorsrcy rosouch ri grvin Cho auchor'? oxprossod poaaissron An oxocucod lrcoaso agriounr wsch ch. auchor rs roqurrod ior a11 ochor uros ot :hi1 SO?CYUO Iodiscributran of chrs sofcwuo rs noc rora Aaafofutl~(Piult -taule. inc gara-nui. mc cypo) pOr8lCtOd Yltho~CCh0 a~ChOt*lb~pr0SSOd pOZ.tSSlOn i This copyrrgac nocrco iust reurn ncrct int rador 9 (gara-nu8 1) cype. Oorlvacivo rocks uy concran oddicroaal noclcos Thrs sotruuo cous uxch no uurancy

. tiulc,turIy C . Purposo a soc nt roucrnos for drdrng urch taulc furlros Coipilo Opcrons princ? ("\n\n"J. FPSDFS anab10 dopin txrsr souch taulc grouping */ - 1 ./

//prmrt ("Truirtoriag trom Xd Co Id\n". tram-gare-nu. Ce-pro-nui)

chu -cbocklrsc. rnc idfs.

//////#'///II // hcursrio Doprh Frrsc Soucn //////////// rnc L-DFS(Circurr *cor. inc gico-nui. rnr mdox) <

FaulcPurly ~GocFaulcFuilyO { Faultfuity boad. FaulcFurly *CU - thoad. rnc counr O. 8a2dai FPSDFâ // construct the OB sort06 IaulCGroup Lrrc FaultQroup mcu - &On-boa: CaP>dfl-pr*l - ItlLL: for (me 1-0: r~cuf-~~~~~~~o.a-*) int counc - 9. f FauLcFu~ly *CU.

tor (Fault CUCU - tiCrndod.Poiber. cucu *- m. cucu cucu->aexcl < - ii ( !cucu->ilqJ DrarIodo (cucu.0) localcount-. > prlncf (.'Faulcs an chxs turly = Zd\n\n" . locrlcount) . counc loc~count. > -- 1

tor (rnc 1-0. rgnw,gate. 1--r f

prrnti ("coca1 numbor 02 taulcr rn Futly = ld\n'.. councl. 1

/////////1//////////////////// / / // FaulcGroup // // uhen taulc poup ri usod. ohm link liit foaturo rn ch* // tault furly uploiraCaClon cm be rgnorod. becau10 Cho // la-lise toacuro of FaulrGroup ri11 cake ovor // II tcb FiulcGraup contrrns cuo faulc tutlros. 4ach fault // turly contaans a lrst of raults as ioib.rs Nhon ail // tao umbar of a taulc furlj uo detoctod. Cho potntar // ln Cho FaultGroup Co chas iurly mrll pornc Co NU // Yhon bath of chos* poantori uo IUU rn a FaulcGroup. // ch FaultGroup rnould b. droppod frai ch* DFS rorrod lut // return g-n-tg. > // end of taulc-2urly.C

// roturn the adOrosr ot DFS-bau! FaultGroup ~InrtFaultüroup(Carcur~ecuc. Fault *tault,Iasc) <

mm**..m.m.*e..*.**mm****m~**m***m

Puallol Fiulc Suulatorr on ch* C*UP Archrtrcturo

Copjrtghc (cl 1997. 1998 by Albort L i: Iran&. W8oncon. Canada xf (bo.p,colrric - 0) rotura -L.

w h0.p.b rotoriaco CO tbo b0ap.C iodulo /

w

WODWW~~W8WDDWWW88WWW~W48WmOIWW8DW w Puai101 Fbult Sualators an th* CmUR Archrtoccuro

Capyrr5hc Cc) 1997. 1991) by ilbrc L -C Irong. Ed8ancon. Cmbda 111 rapts rosrwbd This soituarr i.y b usod tor non-profit untvrrsity rrsouch i? giron th. buthor's rxprossod poriission An oxecutod liconse agrooionc 8iCh Cho aethor is roquired for ail achar uses O? this softuuo kdrscrrb~cranof cbis IO~CYUO is not poriitCod iithout Cbo author's oxpr*ssod poriission Thts copyri6hc notrco iust rouxn intact Dorrrat tio marks uy contaln addrc rond noc xcrs

cane inur . > k 1 ?

inc CU. int top-boap0 for (CU - O. CU hoap-caunt. CU-+) i princi Ce-%& " . heapCcu3 ) . rrturn brbp,count*hoap [O] -1. printt ("\n") . 3 1

trip - hoapipuinc]. hrapCpuonc3 - hoapCcu] . // suply rhotron the hoap by roturning th. last va130 ...... ///If . // &n,par 1s ch. 8lub.r 02 patterns in ruitlacaoa. mhrch is 4 paior of 2 . Pualhl fault Suulacors on ch. C.UP Archxtocturr ///If CopyrtghC (cl :9W. 199a by Alb4rr L -C Inon&. Cdiincon. Cumdr Ill rlpts ?osorriid. Thts softrui my bo usaa tor non-protir untiorsrc~risouch ri grvan ch. auchor's @XpKOSS.d pormasron In oxocueod Ltconro yo.unc rrth ch* aocbor as raqurrod for al1 ocber uses OS cars sotcru. lodrstrxbucron ot cars sotruur :s aoc porirctad mathoue Cho auchor's axprossod pnlssron This coprrxghr nocrcr ausc rouro larace . O.riiacrio ions ul conrrrn adbxrioaal noricas 0 Tbrs SoitmUo cous irch rio ruririt? ivar1 W. : ...... *...... *...... c*ip-nodb IIILL.

/a cruyppor C ./ axcorn iord Inrc-CUR-Pappar (Cucurc *ac. tnc *nodo,ialu.l. oxtarn iold fnrc-CM&-Pappor (CxrcurL ~CUC. CIIIC .nod.-vdue) oxconi io rd Fra.-411-Yod* O. ortarn c8oohrn ~Obcaxn,Iodo (&nt nodol. oscorn cloo1o.n ~0bcaxa,Yoda~r~Lo.d~(inc nado). axtorn cboo1o.n .Obrain-Yod@-i-uppor (rne nodo) . *xcrrn cbooloiri *USO-fOdO (rnt noda). axcorn vo rd Caco,Era1(Crrcurt *cor. xnc node).

ît (tirma t L) Qacr.opotaro (0. op-ru. pouprrito~. *Zr* aara.oparato (0. op,zero. FOUPUTIC.). Iondli 3 ///II///// // rt (tomp,nodo) // follairng as a soc ot pcos dosipod Co usa Chr cru-mappor modulo fi~p,nodo->prov curronc. // rn areor CO utsort faults darang SuuIacron. a cio scop gaco romp-nodo - curronc. // iriluacron procosi ri noodod // on* for updatrng no-1 pros. and on* for msorr frulf rocwn cumnt. / /

brai1 = currint.

ml rno rsrd g-uid (Crrcurc ecuc .rat J) E eboo han *o~t,nodo - Obcrrn,Iodo (clrt-~gito,lrs~Cj] . out-nodo) :

cbooloan l an-nodo Usa-ledo (cuc-~gaco,lrsc Cjl in-nodo CO] 1 . an-nodo-~oporaco (0.0p1.ur rcox) .

rl tcurron: -- NUI rocurn IRILL. .Ir* rocurn tcurronc->data. k

:f (currinc NLL) E -- curronc alLoc,Iodi (nodo) . l ~b00lOui *in-nad0 - U~O-lod* (CUC->~&CO,~~SCCJ]. ln-nodOCO] 1 . rn,nod~-Bop4rrco (O. op-m. rrrtoxl . eurrir.t-gito-nu - nodo. rocurn tcurronr->data. >

rnlxno iord 6-or (Ctrcurc rcuc.rnc j) Inlia. void exor (Crrcurc *cuc. inc 1) i

raid G*ti,E~ai (Circure *cat. rnr 11 f SUlCCII f CAS. g-rnpc (cuc .]). braak. /* th. iollouxng vu1abl.s u* usad for transition iaulcs *I cas. g,buti

Simulator

/ Inac,Su a

@ Inaclalrza ch. suulacion Pua1111 fiult Suulacors on the CaUP lrchlc*ccur. 1 / vo rd hic-Su (Cxrcuic acuc. ;ne n,patc.rn)

Purposa Thxs irlo concains ch. roucanrs ior prrioriang iault pactorn - (chu a) ulloc (IdI l g-n-input sxzoof (chu) 1. s uulacxon us ang ch. PPSFP algorxchi. *v.nc-llsc - (chu *) ulloc (cuc-Wtru-gatm). / faulc-Su

Th ina ratorfbco rt chis modulo. CdLlng thrs rouCLao mould cause Che progru to stut robdrng ra COSC p.CCoras md ur-p Cho tiulcs ?or s~u1iti0zt * Tbis routine uplenncs ch. requoacaal PPSFP aîgorltbi.

/ Ir&-Pbrtern

1.d IumPV pactrrns ira8 tnpuc .na score cboi into tao prtfoni data stru~curo uspiptton us- cbe inpuc pattern longea rs corrocc ad does noc concira bar obrco spue =/ inc Irbd,Pictorn (Cucuit *CU+) c lac 1. int ch. tnc counc-char-0; rnr counc,pattornr

/* scoro cbo provt~usg00d pattern raluo a/ ?or Ci-0. ;

1: t:l? up ch pactorn scorigo usa counr-patcern ccunc-char - O.

:t (stacur =- 4 II couac-pattern &n,anput '- COunC,ChUI E printt (Trror roading inpur pactrrns Zd Za\n". counc-pactrrn g-n,iapuc. count-cau) . printf (" duo co "1. sutcch (stacusJ f case 1 prxnc? ("no pactera loit'\n"l. Brod. car* 2 prxncf ("g~tenaugh paccerns*\n"). broak. cno 3 prxnc? (-&oc enougl cbuiccors (ibnorul)'\n"l. brerL. case 4 prlnct ("noc rnough bits ln lrno Xd'\n".counc-plctorn) . broak. 1 rxre (0). 1

prrnct ("total nuiber of gato eialulacion - td (Id PV)\n'-. (&n-irulc,6aro-eval * gn,good,gico,o~aI) MdV. luiPPI . princ? ("coca1 number O? gares in circuit td\n". cur-mudacoi. prlnct ("niuber or gara eviulicion per pus t?\n". 6-n~iiulc,gace,oval f O / g,n,rocal-passj. prtntt ("roduccion an porconca6e using hoap - UXX\n". ( (coc-*n-gatr - for (:=counc,pactorn. rcIuiPV. r*-) (&n-iault-gace-oral 1 O / &n-cosal-pus) 1 * tOO O i / cuc->aru,grce) 1 . //PIIITEUICX. /. Cbock-Paule . Ch-ch ch* crxggorrag condrcion ot a iault ./ Tord Cnock-Fault (Carcuir meut. FauIt -?auLt-boadl i

chu rua-0. rnr scur,gaco. inc k.

inc coip. rnc ?!dg. inc crins r~ronCIwPV/IlT~51~.

///// // socup rrr* value for snricrng II/// int shrftod.

///// II sacup rare value /////

/.mmma.~~aa*..aa..*.. 4 10 ~~uLCrs crrggarrd ri chara 1s a tr.nsrt1an rr /ai.eaimimiima...emii cae pcr aucput and ch* iaulty Input rs 1 crrggorod rt chor* 1s a 1->O trusxtlon amma. mmm..iam.amr.../ an incoma4arti 10 ai &ID pti rs tba sua as SF rc sucpuc cas* t-IO am.mma.aa..m..mmm.a./ i case f-SF cas. t-JO i tnt camp. rnt ilig. rnc truis~croatIuiPIT/IIT~SIZ~.

///// // sacup go04 mira for shricrng ///// ///Il II socup miro raluo /////

ri (f1.g) i if ( vcut-~g~co,L~ccirom-~ACOJ n-tan-out 1 f Sridof AU#UIT /tg-cr~ggorod**. tondaf //priacf ("dot~cCod*'\n"l. tbult->ilag 1- Id-DETECïED. > .LI* C Na-1. a? (CO-6at.J i for (PO. k~luiPP/1lT,SItE . k--) i YIKPALE(cur->nu.-gaf*.k) = COOD~YAL~(?bulc~nodo.X~CL~cr~sic1ouû3) . i > 61s. i tor (k-0. k~YuiPV/IlT-SIZE . k**)

///// // porpuo tan-out wrros. stuf,gaco ///// ///// // sotup gato-output tor rhrfcing znt r. /////

/////// miro hu fui,out

///// for Cr-1 . tzcut->gac~,lrscprom-gaco] n-fan,ouC. 1-0)

///// // mari hir no tan-out /////

inr uct:. ir (uert Bad-Su (cuc. scur-gacoii f

tor (;ne k-O. k~YuiPt/IIT,S1ZE . k--3 uI~l~C&cuc-~gaco,LtrtCjJ out-nodo. k) -- 8rib.t 4- uI~i~(&c~c-~ato,l~SCt]l rn-nodr Cl] .kl . // // trnd th. ttrst Pt chac dococts tbo faulc // PL numbor stucs mttb x ln chas cas. / / Mt Co*: ?or (k-û. W~mFT/I~-SIZ&.k-) < af (tq- (MI~TUUtCeuc-~g~co~1rscCjl ouc,nodo.k)- COa0,lUCcut-Bgata-1rsCCjI .ouc,nodo.k) )) broJ; 1 ?or (1-1. l~-ItT~IzE;1-) ri <> (IIt,SIZE-l)) R 1) broak.

/O tor (mc &-O; &~luiP1/IrZ,SIZt; .Le) i pranct ("tOlx".POPD,IALUE(cuc-Dgaca-lasttjl. out-nodo .ak) ' YI~l~~(cuc-~gaco,~tstCj]nuC,nodo. J) 1 ; stuc,gaco - ch. gato to rtut frm > princt ("\n"1 . procondLc aoas princt (" J - Zd. k - Xd. 1 - U -D dococcod - td\n" ?or dalay taulcs. suico (11 ch. ?in-ouc 02 a uuo ml11 brrr cl* J . k. 1. aOCOcCbd). SU. taulc valuor. choso mues should a11 bo sot to ch. tulty / valu0 b.?oro cn1s s~~lacaonroucaao is cd1.d mlso dococtod r J. rocun cbb gaco tbat dococc~sth6 fault.0 rt taulc as noc dococced gogo ond,ûM3u. Sondd 1 1 1 rnt t.j.k.1. /. for lrop councors */ rnc dococtid 0. inc drrcy,nodo~cuc-)nulgacoi inc oionc-done - 0. // clou th. miro-vr1uo for noxc cm. ror (]=O. jcov.nc-dono. je-1 C

/. . Good-Su * procbndlc rons * ?or dolay ?aultr. stncr al1 Ca0 iuI,ouc ai a mu0 mal1 hrwo Cho a ru. faulc valuas. thosr uaros sbould al1 b. soc co ch. iaulcy .- value bof0rO Cars suulacron raucrno rs crllra rt taulty ru - rrcurn 1 ii faulc rs aococcra. .* O tt faulc 1s noc doticCod / ln: Gaod-Su (Carcuir acut)

in ij.k.1 1- for loop councors -/

/a rcut goang cluu chr crrcurc -/ for (1-0. j ) OIS* A.3 Pattern-Parallel Fault E ///// // pruuf output Simulator ///// m m.....mw...mmmm.mm-~~mm**~a~mm~aa. . Pardl01 Faulr Suulacors on ch. CmUP Archrcocruro l Copyrrght Cc) 1997, 1998 by Albert L -C Krang. fd.oncon. Uaûa m Al1 rrghts rmsorvod Thrs soitiue may b. used tor non-prafrc unrrorsacy rosouch rt Erroa ch* auchor's exprosrid prirssroa. An axrcucid lrcinso a yoornt mich ch. aucher ts requtrid tar dl otaor usrs ni chrs sotriuo I.drstrrbuttan 02 chrs roitiuo 1s noc p0~1ct.d r1cbauc the auchor's rrprissod p.rmrsslon. Thas ~OpyrlghCoatrco msc t.i.zn rntaet .l Dorriatrr* irr&s uy crntatn Mdicional naCrcos . fhrs sottmu* cons rrch ni murrricy ...... a..maam..am.~..m*~.m*am*a. a

@ suulaca-pps C m Purposa fhrs il10 coataxas chr rouernos tor porfomtng pbccirn-pua1101 tault iuulacton 8 Campil. Qperons PISUTIL - nuilri Pt utilrucioa racla /

Sdot:no STALMIL rcop,cuo - count-PE cm*. Wotrno UDüTIL utrl-tua *- count-PE cuo - stop-tu.. randri

// socup y as ~sk paccorn-*oporato(O. op-on*. urrcoyi . cane mpaccrrn. paccorn->aparatotO. op-,. rhtftraght) ctnc -gona-valu* crnc @urro-ralu*

ri (ch 'b') E -- //prlnti ("count-pattorn 'Ld\n'..counc-parcorn1

/a funcrcan proc~cyp~s./ roro Check-Paule (Cxrcurc acut. Fault .taule-hoad) ~ordiiuk~doc~ccod (Faulr .taule). rnc Goad-Su (trrcult mcurl . tnc hd-Su (Circurc mcuc. tnt scuc-gati): raid crnccpy (clne 1.. clnt &B. Int offsoc).

// arrwrng a bits - b bacs. cpy trm b CO a thlano Tord canccpy (crnc ta. crnc ab. rnt offsoc) C ri (ch -- '1 '1 partorn->oporatb (0. op-on*. rrrrrx) . 01s. patrirn->op*racr (O. op-taro . rrrrox) . parcorn->opbric* Ccount-chu--26-n-rnpuc. Ozr2. groupmrrt.1 >

/ 8 .m Inrc-Su 8 1nrclal:zo Cho s riulac ion

'/ orwk.speraco (O. op,?. pouprrrtoi : II y mbu

ro tan. 1

1. l Càock,tault .* Cbocks en. trrggorxng condrtxon ot a tault */ roi4 Cbock-Faulc (Circuxt *tut. Fault *fault,boull /m~aaaaaaa~aa~~r~mair C trlggorod ri thora 1s a O->l cruislclan Faulc atault. M ancemrdtarr W O? 01 gacb AS ch. sama as 511 at output ..*****a**m.a**.mm.*/ cas. t,Sl cas. r-1PO { i/f// // cnbck it taule 1s crllgorbd If/:/

or-ust oporara (0. op-B. urrroxJ.

chu run-0. bool0.n il.& cru.. mt stut-pco. good-value->opbraco (taulr,node. op-m. shtirrlghc). rat k. good,valuo->oporato (tault,nodo. 0x20. wrltox) . // x y0.r m 6ood,valur->oporatb (tault,oodo. op-rbu. bu sac cos^. tly).

// x trmriclon usk

rxtdbt PMIL STlmmIL. couac-Pf oporare (O. op-r. poupurxco). UOUTIL. rbndrt

///// // sotup urro .duo If///

rrtabt PMIL STAUUTI L . goo4,talur->opbraco (taule-nodo. op-.. urieoxi. count,Pf oporaco (O. op-r. poupurltbi. UOVTIL. rondtt

/mri~amra~*mmrmmram.i trtggrrod lt chiri as a 1->O cruisxclon ui rntorirdrato IO 02 MD gara as ch* SU* as IF aC output *a**.*.*******.***.n/ cuo f-SF CU* r,110 c ///// // chock a2 taulc as trx68srod ///// or-usk oparate (0. op-. urtcex).

///// // secup rtrr value ///// ///// // secup mrra ialua ///If

1 1 break.

boalean ilag - true. good,value-Doperate (gacr,oucpuc. op-m. ShrZtrrghC). gaod-valur- oper race (gace,oucpuc. 0x60. mrrtex). // rL(y-i) good-value-Doperace (taule-node. op,xurdr. rrttax) . good-value-waprac4 (taule-nodr. op-xbu. busaccesr. ?la().

Sitd*t PMIL STAlllUtIL. couns-Pt operace (0. op-x. &roupurat4). UDUTIL. Eendrf

int csmp cut->urr*-l~scCcuc-~grce~IiscCfrorgrce1 fui-oucC~I co-&.ce. stuc-gare - KfS (sCuC,g&ce. foql . rrw~c-lxsc[ceip] = 1. ad-heap (ceapl . 1

If/// // rire bas no fan-out //If /

/.a..~.**.m....*.~..i r PO trulc is crxggrred Li Caere 1s a cransrcian ac the gare oucput urd the iaulcy rnpuc rs O *m. *.*..m.....*....*/ case t-PO cua &tYPf &rapc(j). broak. Cu* &mg,buIi< J) : brarir. case &BOT enoc (1). broak. CU* gUO &.nd (1). brmak. casa GIAID &nUld(~). break: cas* g-OU CU* &lot cas* g-XOL gsor (JI. break. -. Cu* gJ10.. CX~O~(J) . are&. p.t~*rn->op.ra~r ig~cat-~grt~-~rsc~jItnput,nrti.op,i.rritix~. 1 uaro,valu*->oprraco (6-cuc->gac*,l rsc CjJ out-noab .op-x ,~oupmrrr.). ? k /* inlrn* vol& r-buft (rut J) Bad-Su f O mrrm,irlua->oporarr (g-cuc-~gara-~rscCj] au-nod~tOI.ep,i.rracax~. * scuc,gatr - ch* gaga CO serre troi iu~,raiur->oporacr (~-cuc-~gac~,lrsccl1 ouc-nodr .op-. . grsuprrrt* ) . 1 Pr*CQUdttlOllJ for delq iaulrs. srnc* dl cba tan-out of a 8-0 mil1 baie cb* rnlrno iord 6-uoc (rnt JI * sur frult values. cher* uirrs should al1 p. sac Co th* taulry i talu. ùator* 081s smula~f~nrouttno 1s calad urro,vrlu*->rperico (g~cuc-Dgat~,lrrc~]~an-nodo CO]. op~.mrrcor1. mum,valuo->op*ricr tg-cnc->gar*,lrscCj] o~c~node.ap~xbu,goupurrc~)trocrun ch* gara that detacCas th* iaulc.0 ri fruit 1s not d*cmccad 1 ./ rnlma tead brnd (rnt J) rnt 8.d-Sm (Crrcuxc -eut. rnc stut-grcml i u~rr-~alu*-~oprrac~(g-cor-~gace-1 rsc Cl1 rn-noaa LOI. op-r.urrcrx1. rnc 1.3. /* ior loop councmr# */ inc d*toccod O. 1nc 1. ter (1-1. l~g-cut->gacb-lrst~j]n-*=-ln. ?*-) siid*t PMI; utr*-ialur->oporatr STAUMIL. ((-~ut-~gac~,l:stCI] rn,nod*Clj .op,x.adi. uricax) . cDae10u1 dacocc,Pf. aacacc-PE oparata (0. op-xoro. groupirxto) . *ara-iaIur->op*rrte (6-cut-*gare-LrstC~l out,nad*.op-r.groupurac~). UDVIIL. i Iond if rnltnr ioid &-nue (nt J) rnc earty,nodo [eut-~nui,gacrJ. i roc rirnc,don* - 0: irro-ialua->op*racr ~g~cuc-~gic~~l~scC~lan-noar (03. ap,i.uratox). 8hih ((j - pop,b*ap< 1) >- 01 rnc 1. C ior (1-1. lcg-cut->gara-lrsc CJ] u-fin-ru. le-) ulr*,valu*->op*rrco Cg-cuc->grcm-l tsrCj1 in-node Cl1 .~p-x~d~.rrrc~xJ

///// // ctmck for chrng*s in oucput idu* //;//

rnc 1. for (lit. 1~~cuc-~pc~,l~rc(31 . n-fan-ru . le-1 urra-value->oparate // rdd eimc for turchrr propagacron (g-cut->gato-l astlj]. ru-node Cl1 .op-xori.mra~ox) . /////

rnc nmc* cuc->urro-lrsc [ CUC->(AC*,~~SCCJ~ ~M-OUC Cg 1 COdAC* :

ant 1. for (1.1. LgataJ rstcj] in,nod*ElI .op,rrori. urlcex) . cboo Koiii tomp2. cboolem uusk. doc.ct,PE oporato (O. op-8. rrrrry) . cospl oporaro (0. op-zrro . grouprrrco) . uiuk oporare (0. op-on. . groupmrrCoi . for (!nt k=(PEld. brts-11. k>*O. k-1 f boalo~lltlag - %rua. PEra.oporaco (k. op-a. mrrtox) . PL14 oparaci (k. op-ybu ! op-x. busacciss. t1.g).

// ln louer brlf nask oporato (0. op-xbu t opr. groupmrrco). cripi oparacm (O. op-y t op-xbu. wrrtoy). 1 011. f rurk oprraco [O. op-rbU t op-m. urrcox). romp2 oporaco (O. cp,m I op-x. groupur:col. 1 i tmp2 oporiro (C. op-.. shaffrr6ht) . // anclud. dococc:on counc-Pt oporare (O. op,y L op-i. MrlCOrJ. // crlggorrngs or-usk oprraco (0. opr t op-1. UrXCOxI. count-PE oporaco (0. op-x. grouprrrc~i. STAlTCUiCX . > 8tind.i PMIL OIS. Pl1lT,PMCESS. f 8olso or_ysk oporaco (O. op-=. wr:coxJ. prtncf ("Xld Il5 9f Zf\n". counc,PE. oporaco (0. up-m t op-x. ~oupurico). g,tocrl,p.scora - g-paccorn,l.fC. 1 Ccounc-P€.cu. - utrl~tu~)/10ûOQ00000. usaiul-PI 100 O / (usod-PE p.riPuus procossorsi 1. Iondri .

/. . 60oa-sm pr.cond~croni tor dohy IauIts. srnco al1 ch* tan-ouc oi a rrro vrll have the sua trulc vaiuos. caoso mares should a11 bo sot CO Cho Iaulty A .4 Fault-Parallel Fault .0 +&lu* botoro cnir suu:rcron roucrno rs cal1.d @ rt faulty Sm - rocurn 1 rt taulc 1s dococc.d. Simulator . O rt taulc 1s sot dotocrod

e / rnt Cood-Su (Crrcurt ecucl C c k.1, /O for laop cauncors *f

/- SCUC 6oin6 chru th. crrcult O/ Al: rigbcs rasorvod fhrs sofcvur uy bo osod 20s non-proirc untvrrsrcy rrsoucb fl givra th. .iirhar*s oxprrssoa pr~asrron oxocucod lrcoasr yoousc vrtb tao autaor 1s rrquuod for al1 ocber usos et rbir s02cm~. L.dastrtbucion 02 tbrs sotcvuo rs ooc priicrrd mrtbrat tbo urbor's cxprossod pormisiron Tbrs copyright aocico mst rrura tntact D.t:~4Cil* ~OrklUr CO~Cai~.dd;tl00~1 00tlC0S

Purpose . fhis t114 cencams ch rouctnor tor portorarng 2aolt-pull101 tault srnlatron . COSpll* Opt10as PMIL - wuuro PE utrlazacroa rarro fPSDFS - ssrs ch. dopcb 2rrrc sruch rlgorrchi CO goup 2aultr

Sinclod* "ScnJCtur. a+- ~tacxudo-procotyp .a- Srncludr "cru-mmppr b" Srncludo "hoap a' rdottao op,y.adrbu op,^ t op-mu tdo2rno op,xburadm op-xbu t op-i idotrno op-xbuom op-rsu L op,=

I/ // eh. foltorrng gaco rvaluctoa rouirnos uouird

/ l. Inrc,S~l In:caalrro Cho ruulation

/ void hic-Su (Circuit mcuc. rnc n-pactorn)

faulr-irro oporrto (O. op-on. . sbt2tright) . rnlino inc rsd-nor (Circuic mcut. rnc gara) taule-trrr oporara (O. op-ybu. irteoy). // y ! 10000111 i taule-tror oporatr (0. op-y. pouprrrtoi. mc valur - nodo-va1uoC~uc-,gatr-1 1rcbaf03 la-nodrCO]]. tor (rnc J-1. J~CUC->a~o~lrscCgrto1 . n-tao-rn . j- t Sridof PMIL VA~UO I nodo-valu.~cuc-~gAro~~1st katel in-nodoCj33 . us02u1,PE - usod-PE - 0. rrcurn -value : I@ndt2. 1

i rnc valu. - nodo-value Ccoc->gato-lisf fgaf 01 ra,neddOl]. for <%nt J-1 j ito-lrrr~acr] n-fan-in. J-1 val=. *- aiao-vrtuotcir-~garo-lrsrrcrat*~.in-n~dOCj]]. rmcurn *duo. f Cu. çm CU* &IOI . CU* &MD CU. &lm CU. 6-01 cur @OR CU* gJ0t cur &Xi01 1

/ .* a~k,d.~.ct.d ruks 4 cri. oi irulcs u decoct~aOy mplacacron r.cnrsiv.Iy

/: Ic rs 4SSW.d cnac ch. taulr 2uali.s consains ic iast on* 10viL ot JI gac. A iulcrpl. love1 taulc tuily rrquaros more co8plrcat.d /I progrunlog. unrcn uplrrs more tua as a..d for il checkmg 0Os.mabi1icy inlrn. boahan Chckobs

anc iaulc,valur.

cas. g-BUPF cas. gJOT cas. QAID cari 6-IAID cas* &OL cas. (1-101 cas. g-xar cas. g-xlol >

CU* t-aPO FaultFurly *brut. *pop,cu. Fa~lcFurlygoad - CU. LEC en. mlmo Ir~ov~FaultfroiPurly(Cucurc mcuc. FaultFurly *car) c E4Elf *.CU - NU. Eault ~ICU9 cu->O*lb.f; cu->O*.b.r - NLL.

inlano bool*ra tolaC.11 (mc -Pf_nru. cbooloui *collocc) c stactc crnc *trip nrr cint(1. ALIGOED). boolorn ?hg. mc Phnurbor = *PLniu.

CO 1 loct-BuparaCa (0. op-ybu I op_ibu. busaccbsr . ?hg) . ri ('ilagi c

// ch*ck rf iny more PE contaans a L tlag - cru*. coll4Ct->Op4t&tO (O. op,=. irltry,. collocr->oporrtr (0. op-ybu. buraccoss. ?hg). ft (?lagb

// brgrn bin- Sruch PLnuiBor - 0. ?or (xnc i - (PCad.btcs-1). & >- O. 1-1 i tlrg - cru*. // cb*ck 10m.r brli ot Pfr PEad oporac* (1. op-ybu I op,& Busacc*rr. rlq).

//////////////////////////////////////////////////////////// // rhrs rouctnr rorcs ch. trrst p.riPuma .procassors sr2tt-boc4 // rlauncs rn rat* rialuacroa ordrr *bll. Cmuff) // colloct al1 cbr pr- oocput aoais raco one cboa1e.n C Iti4.2 PMIL ru*d,PE+-. londti ter (xnc PO. aecuc->aoroucpuc. 2-) i me ouc-nado - cut->~iri-ïrst[cuc-,ruspucf 1x1s Et33 tr084.C..

// socop y-rogrscor cboolo~~cokea. frulc-trrr.oporaco (0. op-i. *rrtoy); rokoa oparace (O. op-y. ~oupmrxtol.

// noco cnit ln a prttorn purllrl cuo. II ch* Pb Chat rziul~rodch. iru trultr // sbould Uso cillrcc tbou rosulc MCO one PL.

// dtor collocrmg. axtrict th* lrrt of PEa char uo dacoctod // soucb for th. tault turlf chat char PE rrprosonts. // chock rndavrdurl nmbors ot tbrs furly for trygorrng condrcron FaulCfulXy @CU - sxa-tt-hoad->aoxtlnru. rnc counrrr - 0. rnc couac,ti,rriggoroa - O.

rokin.oporace (O. op-m. rrlroy). currant->oporata (0. op-y - op,.. gouprrrcoi. currrnt-,operarr (O. op-y. rbtftrlghtl . cokon optaro (0. ap,y. poupmrrto) . CU CU->noatmsu. > -

// ao taule :nSartron. jure suulrrron. // so docorrino taule propyacron

STAKTCWCK . ri (noam-vrluaCj3 t 1) for (, countar PE-nuibor. countrr--) CU - car->nrxcrnsu. curront->opbraco (O. op-B. busrccosr. flyi. olso corront->oporiCo (0. opsbu. busaccosa. tlyl.

II update horp rf (propyatoi C for (xnc k-O, kccuc->yro-lircCji n-tu-eue. k-1 i DOi~DtûPPIiG for (. councor < iuff. countor*) CU = car->noxcanszi. s u-tt,hoad->noxr rnr am - CU. break.

ri (iutt >- porfPuur procrssorrJ i Drap-Fault Ccuc. Li-brad. p.riPuus procorsors. &su,rf-ber4i. uttt -- portPuus procorsors. 1

// sli-ti-hord rs usod to sror* ch* lut of taule turly // no04 ta b. suulatoa. hop ln irna tnac rr rs 4 duan). FaulcFully s18-H-hoad. peraitCed michout the author's oipmssed pormtsslon Tbrs copyrrght nottco msc roula rntact Drrrraclie uorks uy contarn addtt tonal noclcos

Compxlr Opctoas CE-PüRë - us* ch* crb&uppor iodu1. fPSDrS - onrblo dbpth iirrt srnuch iault qoupmg DPIIAPICa - oaablo Dynurc lybrld iault suulatlrn PMIL - mouuro PL utrlurtron ratro

rnr Faulc-SadCtrcuit *tut. Fbulr afault-lrss. roc n-pattern) trncludo "structura b" C rrncludo "protocypo a- Inrr-Su (cuc. n-patrora) . trncludo "boap h-

trindrt PMIL prrncf ("XBsIBs~BsLLls\n". "Patrarn". "SA". "CD" . "50") . PUFT-PlOeesS. rolso prrnti ("Z8sIl6sZBs\o". 'Pattorn".Y ruw."urll ") . prrntf ("Ud t15 Pt 17 Zf\n".O.O.O) . Iondlf

// CaUR putrclanrng rarxablrs ant g-n-parc. inc PLpor,puc.

tridot FPSDFS c rnr .put-rrp. // ups putittons CO gagas // prrntt ("pattern td\n". --councor) . trndrf Suulrto kuc. Lff-hoad) . clnt *put-mark. // pur~.sk an uray of muls. // rhrro oach .uk solocrs one putrtron STALfCLûCI . // PQusk srtrcr 1 PL an racb particlon Irfndrf PMIL cbooloro PEsask-rnrc. // rnrtirl posrcron of PL-iuk PILIIT-PUICES5. cbooloui or-usk. ar1s. cboolaan rrsult. CALCCWQ . prrncf ("%Ba XL5 92 V\n9*. g,roral-partom - &patc6rn,Ioit. cint ipattirn. cru t~o/1000QOOOOO O * STOPMATCE-TIPW1000 0. clnt -goad-valu.. ssaIu1,PE 100 O / (usod,PE l portPuus. procoss~rs) 1. cxnc *urro-valur. tondit . cnu ahse-par. // sror* ch. Iast parcorn SfOPCLnCI. rriart DY~A~I~ rnc Dn,grtr,rnpuc. an: Dn-tr r&gorrd. tiraet P~UTIL int Dn-gato-rial . prrntt ("PE utr1rxacron - 16 2f (%d/td)\n". rnt On-PuPart. usetu1,PE l 100 O / (us*a-PI a partPubu procossars) . rnr On-UlnPut . usrful-PE. usrd-PE prrtPuus procosiors). Irndrt Iondri tifdof PMIL

prrncf ("nuabor of pro oral 9 Id (Ld PE)\n". xnt usriul-PE. n,gaco,aral l portPuus procorsors. prfPuus proc6ssorsl rnc usrd,PE. PLI lf CLnCI . caaoloui counc-P6. k

/* cbr iollourng vurables ur osod for trmsrrron taults */ lator cnu tirsr-tu.. /a tndrcarrs frrst sor ot patroras r/

Iw Luncrron prototypes -/ Copyrr&hr Cc) 1997. ;99B by llbort L.-C. Kuong. Ed8onton. Canada void SrcPartrcroa(1nt n-part) . Al1 rights rosoruod lord Puk-drtoctod (Faulc miault). Thas sortauo uy br us04 ior non-profit unrvrrsrry rosruch rnt Goad-Su iCrrcuat *cutJ. il grrrn cno author's orprossr6 parrissron An rxocurrd Ircrnso rord Bad-Su (Carcuit icutl . agrroionr urtn tao author 1s rrqurrrd tor 411 otbrr usas of lord cmtcpy (cint La. clnr Lb. rnt offsrt). taas soitmuo IodrstrlbUclon of chrs softuur xs not put_irsk->opmraco (0. op-on.. groupirxca). P-sk-rare oparata (O. op-on.. sbt?cri#hc). PLmaslt-inic op. race (0. op-ybu. groupmric*) . > i1.i f // sacup pur-ml; fur (tnc 1-0. teg-n-puc . x-l i

/ trad,Pactrrn O 0 lead PLprr-put pattmrns frai input ad seor* cha. inca ch p.tt*rn 4.c. stTIIccurm /I sacup PLusk,tnir O // ustripcion usma ch* anput pact*rn langea 1s cotrrcc rad // PErd bics - nDSD - 0011001X lusuring 9 Pt.4 puricruni O dams nec cineam uiy uhir* space // ./ rnt had,P4tcarn (Cucurt .eut) f tnr ch. int counc,cau-0; inc counc,prctmrn~O.

lt (~JSC-P~CC~'1 ') // rssuatng a brts - b bicr. cpy frai b CO a pattrrn->op*rat* (1. op-ana. goupurrt*). tnlin* vord cinccpy (clnt ta. clnc Cb. inc oftsaci 0 1s. f pactrrn-~oprracr (r. op-:ara. poupurice). inr i. > for (1-oftrac. ic a bits. r**i // rac usk CO s*cand PE f

// turc rue -> sacup a tu11 1 mask PE-muk,tnrt.op*rata (O. op-on*. rricryJ 3 /e . Inte-Su IntcraXaza ch* ruulacron * if (ch -- .\nn) ./ i vara Inir-Su (Carcuit acur. int n-pactarn) //prrntf ('.coune-paecarn L4\na-.count,pattarni . f /* strucc rlutr rlp. g*crluit (ILIBIT-DATA. trlpj. prrnct ("safc lmit - U\n". (intj rlp.rlu-cur). prinrf ('-hue luit = U\n". (rnt) rlp.rlu-.ul . gecrlmac (ILIXIT,STAtl. trlpl . prrntt ("soie luxe Zd\n". (mcl rlp rl-curl . printf ("hud luit - Zd\n". (rat) rlp rlu-aax) ; ./ ?trst-clm - 1. patcern->oporara CO. op-y. shrttrr6htj /..*...a*...... **r. crrggrraa ii char* rs a 0-a1 cr.nsrcaan .nd thora ts a trrnsicron at outpuc ...... M rncorwdraco W at O1 &ato as ch. rame u 51 ac outpuc cu4 1-31 cas. i-lm. €

or-usk oporato (0. op-y 1 op-x. irrtoi or_usk .oporaco (0. op-y. rhxfcrrghc) ;

// do noc ranc itrst FE // y .bu /...... *...... a 10 taule rs ertggitod it cher* rr l CransrCron ...... ch* gare ouepuc and cno iaultr mput 1s 1 CU0 t-nO C

sols. /f rnsort iault. croate PiYioc+ usk. thon chan op urro-ialuo->oprraro (iaulc-nodo. op-ibu. groupsrrtol. Cam-Uap (cut . ta-gaeo 1 . &ood,valuo-raporaro (gare-oucpur. opr. urrcoxi: siro~iduo->oporaro Trxg?lyCxl. iuk-~op.raco (O. op>. rrrcox) . CbipZ.oporato I op-x. poupartco~

dOCOCEOd - CNO. cancmuo. 1 Croaceiluk

baol local-trrggorid ?aho. cinc ausk - nru cinc(l.ALIGlED).

// croate Ch0 iuk local-crrggbrrd - CreacaUask

tf (*tg-~TrrgFaulcCrJ conc anuo.

tor [Faulc acu tg->TrlgFaultCil. cu '9 m. CU car->noxc) i - - i?(CU-,ilan *- td,U1DRECnW) f 6.C.CC.d - cm.. contanuo. )

In D.c.c~,F~uIc booloui uadoeoccod - crue. a romp oprracr(0. op-. wracox). Stucing ?rom rack tault paincod usk->oporate(O. (op-xkop-il. busaccess. unditocced). 0 look tor turcher iaults char uo rocursstel~or dxoccly a

rrsulc aporrci (0. op-m. iracox) . puc-mask->opracr (puctcaon. op3 t op-x. rricox) . camp operaro(0 .op-; .groupurrtoj. Iridot PMI4 boolorn und. - crue. STAmnIL. crmp oporrce CO.op,xsu .busaccass .undo) . ri (undococced) { wk->oporare (O. op,=. rrrtox). putmk->oporaco (pucltaon. op-. C op-x . urrcor) . or-usk oporara (O. op-m C op-S. urrcex). counc,PE operat. (O. op,= I op^. grouprrrcoi. 1 ols. i // seuch for ch* trrsc PE chrc dococcs ch* tault // pre op-x camp // parc: toipl aall haro Cho bats betoro thas PE sac to 1 cboolrui t08p2. cemp2.iprara (0. op-zero. qoupirrlce) . auk opraco (O. op-one . pauprrrci) .

baolarn tly - crue. PErd. oprate (3. op-m. arrcaxJ . Plrd. operate (j . Op-f8U 1 Op-X. busbccoss . ?hg) .

> romp2 operata (O. op,.. shxftrlghCJ . // 111clude daraceron iuk-Doperata (O. op,^ C opa. ur1t.t) . // Crrggorrngr put-iuk-Daperate

IridaL PmIL STAIMIL . courir-Pt oprare (0. op-tero. grouprrrcoi. UDUTIL. IanQrt

/. .m CoXloccl~sulc * coipuo th. pi- oucpuc ro cha taule-troa prsury ouepuc th* rasulc tooolau, urll haro Pb rrch daceccoa taule sot ce 1 ./ rord Colloctt~sult(Crrcutt *eut) f result operaco (O. op,xaro. rrrcax) . for tint PO. ~

I* FauleSossron

Atcar gaod rmuhtron. ch. lrsç oi faulc-poup rs craversaa ac mach pur. a nrubar ot taule groupa ua rnserrad anro C*UU tor taule suul~caon Utor rhrcb. oach ai ch. * putrcron ri cbackad CO ooa ai bny Fault an tba correspondra& Iault-group 1s dacactod It sa. the taules uo dropped tram . eh0 ra~lc-p~p11s~ I rord FaultSession(Crrcurt *cuC. FaultGroup *FG,headl I :nt quouo_counc - O. // numaor of aloients an quaua FauLcGroup *SuOueuaCg,n,partl. inc 1. For (1-1. lqprit-%atm-lisct]J n,t.n,rn. 1-) mrrm-value->oprrca (~-cuc-~ata,1ISCCJJ - h-nodoC11 .op,xom. url+*xJ .

anlrn* ioid bnor (rut JI c r1r~~vUur-~rparac~(&~uc-~gac~~l~stC~Jrn-no4eC03 .op,i.irrcmx). Si.td.2 CP,PVIIl rup-Doparate CD. op-i-op-y. gaup.rrr*J. 8elsa mir*-idam->opr&t.(j. op-i ' op-y. grouprrlc.). 8.Ddti

///// // check for chrngas in output idu* ///// az (.iaat,ltstCjJ 11 r --

mt 1; 2or (1-1. lc&coc->gat*-l~sc tjf . a,f.n,rn ; 1-1 uu._ialu*->oprat* cg-cuc->gata-l~scC~f. in-nod*[lf .op-x*orr .rrxcmx) ; ///// // add aient for furchar propqactoa ///// anlue tord Gaca,nap (Circule acut. in= J) for (inc 1-0; tccuc->gat*-lrscCj3 n-fan-rut. 1-) i C rnc naxce - cut->mrr*-l rstC CUC-~~~C~-~~SCCJ]Fan-outCd 1 co-gace. cas* g-IMPT 6-rnpe( JJ . sr.&. Cu* g-rnF &-bUrr(Ji. break. cas* &MOT &noc (11. break. casa g-IID g,Md (11 . bt*ak. case g,IAID &nad ( J 1 . br*& . case g-0ii g-or (JI. brea. case g,#Ol g-no? ( J) . break. cas* &IO1 &xor (ji . 0r.U. cas* g-ZIOL 6-xnor( JI . Drrak. >

/. // clman th. alro,ialuo for next Cam 8.4-SU 8 12dmf CO-Wlt 0 Fr~a,lll,Bod~~). 8 / 8.1.. ioid nad-Su (Cxrcult *eut) for (J-O; ]

inc d&rty,nod*Ccuc->nui,gat*l. lnc Ll*nC-dOD. ' o. inc cur-put - O.

//////////////////////////////////////I///I/f///// // // Funccronr for upleionclng Dynulc-Hybrtd ruulatlan // //////////////////////////////////////////////////

IMAGE EVALUATION TEST TARGET (QA-3)

APPtlED- A IMGE, lnc -.-= 1653 East Main Street --- Rochester. NY 14609 USA ---- Phone: il6/482-O3OO --- Faxi 716/28&5989

O 1993. Applied Image. Inc. Al1 Rights Resenred