Science Based Human Reliability Analysis: Using Digital Nuclear Power Plant Simulators for Human Reliability Research

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Rachel Elizabeth Shirley

Graduate Program in Nuclear Engineering

The Ohio State University

2017

Dissertation Committee:

Dr. Carol Smidts, Advisor

Dr. Tunc Aldemir

Dr. Ronald Boring

Dr. Catherine Calder

Copyrighted by

Rachel Elizabeth Shirley

2017

Abstract

Nuclear power plant (NPP) simulators are proliferating in academic research institutions and national laboratories in response to the availability of affordable, digital simulator platforms. Accompanying the new research facilities is a renewed interest in using data collected in NPP simulators for Human Reliability Analysis (HRA) research.

An experiment conducted in The Ohio State University (OSU) NPP Simulator Facility develops data collection methods and analytical tools to improve use of simulator data in

HRA. In the pilot experiment, student operators respond to design basis accidents in the

OSU NPP Simulator Facility. Thirty-three undergraduate and graduate engineering students participated in the research. Following each accident scenario, student operators completed a survey about perceived simulator and watched a video of the scenario.

During the video, they periodically recorded their perceived strength of significant

Performance Shaping Factors (PSFs) such as Stress.

This dissertation reviews three aspects of simulator-based research using the data collected in the OSU NPP Simulator Facility:

First, a qualitative comparison of student operator performance to computer simulations of expected operator performance generated by the Information Decision Action Crew

ii

(IDAC) HRA method. Areas of comparison include procedure steps, timing of operator actions, and PSFs.

Second, development of a quantitative model of the simulator introduced by the simulator environment. Two types of bias are defined: Environmental Bias and

Motivational Bias. This research examines Motivational Bias— that is, the effect of the simulator environment on an operator’s motivations, goals, and priorities. A bias causal map is introduced to model motivational bias interactions in the OSU experiment. Data collected in the OSU NPP Simulator Facility are analyzed using Structural Equation

Modeling (SEM). Data include crew characteristics, operator surveys, and time to recognize and diagnose the accident in the scenario. These models estimate how the effects of the scenario conditions are mediated by simulator bias, and demonstrate how to quantify the strength of the simulator bias.

Third, development of a quantitative model of subjective PSFs based on objective data

(plant parameters, alarms, etc.) and PSF values reported by student operators. The objective PSF model is based on the PSF network in the IDAC HRA method. The final model is a mixed effects Bayesian hierarchical linear regression model. The subjective

PSF model includes three factors: The Environmental PSF, the simulator Bias, and the

Context. The Environmental Bias is mediated by an operator sensitivity coefficient that captures the variation in operator reactions to plant conditions.

The data collected in the pilot experiments are not expected to reflect professional NPP operator performance, because the students are still novice operators. However, the

iii

models used in this research and the methods developed to analyze them demonstrate how to consider simulator bias in experiment design and how to use simulator data to enhance the technical basis of a complex HRA method.

The contributions of the research include a framework for discussing simulator bias, a quantitative method for estimating simulator bias, a method for obtaining operator- reported PSF values, and a quantitative method for incorporating the variability in operator perception into PSF models. The research demonstrates applications of

Structural Equation Modeling and hierarchical Bayesian linear regression models in

HRA. Finally, the research demonstrates the benefits of using student operators as a test platform for HRA research.

iv

Acknowledgements

This research was funded by the Department of Energy’s Nuclear Engineering University

Program (NEUP) Graduate Fellowship, the OSU Distinguished University Fellowship, and the US Nuclear Regulatory Commission Graduate Student Fellowship. The initial funds for the OSU NPP Simulator Facility were provided by Battelle Energy Alliance.

This research is only possible because of the students who participated in our experiments. I want to thank you for your enthusiasm, your dedication, your willingness to try something new. Working with you was a wonderful part of my experience at OSU.

To my advisor, Dr. Carol Smidts. You taught me how to keep pushing past dead ends and to recognize small but valuable insights. Thank you for always asking for more, and for sharing the vision and intuition that has guided my work, and for your encouragement and patience.

Dr. Yuandan Li, thank you for your support in this research. Dr. Ron Boring, thank you for your levity and encouragement, and for teaching me about HRA. Dr. Yunfei Zhao, I am so grateful you joined this project near the very end; I’m not sure we would have made it this far without you.

I also want to thank my labmates, Mike Pietrykowski, Qingti Guo, Atul Gupta, Chetan

Mutha, Jatin Gupta, Matt Gerber and Meng Li. And finally, my husband and my parents: thank you for going on this journey with me, all the way to the end.

v

Vita

June 2001 ...... Shaker Heights High School

May 2005 ...... B.A. Physics and English, Case Western

Reserve University

2005-2006 ...... English as a Foreign Language Instructor,

Hubei Polytechnic Institute

May 2008 ...... M.A. English, Case Western Reserve

University

2008-2011 ...... Knowledge Management Specialist, NASA

Safety Center/ARES Corporation

2011-Present ...... Graduate Research Fellow, Department of

Nuclear Engineering, The Ohio State

University

vi

Publications

Shirley, Rachel Benish, Carol Smidts, Meng Li and Atul Gupta. "Validating THERP:

Assessing the scope of a full-scale validation of the Technique for Human Error Rate

Prediction." Annals of Nuclear Energy, v77-6 (2015)

Fields of Study

Major Field: Nuclear Engineering

vii

Table of Contents

Abstract ...... ii Acknowledgements ...... v Vita ...... vi Table of Contents ...... viii Table of Tables ...... xii Table of Figures ...... xv Acronym List ...... xix 1 Introduction ...... 1 1.1 Overview of this Dissertation ...... 3 1.1.1 Chapter 1, Introduction ...... 3 1.1.2 Chapter 2, The OSU NPP Simulator Facility ...... 3 1.1.3 Chapter 3, Qualitative Comparison of the OSU Student Operators and an IDAC Simulation ...... 4 1.1.4 Chapter 4, A Method for Quantifying Bias in NPP Simulator Experiments 4 1.1.5 Chapter 5, Development of a Bayesian Subjective PSF Model ...... 5 1.1.6 Chapter 6, Summary ...... 5 1.2 Background ...... 5 1.2.1 The Great Challenge in Human Reliability Analysis: Reliable Data ...... 6 1.2.2 Integrating Data into HRA Models: A Bayesian Approach ...... 14 1.2.3 A Test Platform: The Information-Decision-Action Crew (IDAC) HRA Model 18 2 OSU NPP Simulator Experiment Overview ...... 31 2.1 OSU NPP Simulator Facility...... 31 2.1.1 OSU Simulator Facility setup ...... 32 2.1.2 Human Behavior Data Collection ...... 34

viii

2.2 OSU NPP Systems and Operations Course...... 37 2.2.1 Using Students as Research Subjects ...... 38 2.3 Simulator Sessions ...... 38 2.4 Data Collection ...... 40 2.4.1 Simulator Bias and Static PSF Data...... 41 2.4.2 Dynamic PSF Evaluation ...... 44 2.5 A note about terminology: Simulators and Simulations ...... 47 2.6 Scenario Narratives ...... 48 2.7 Summary ...... 53 3 IDAC Simulations and Student Operators ...... 54 3.1 The IDAC Simulations ...... 54 3.1.1 Variability in the IDAC Model ...... 55 3.1.2 Student Operator Model ...... 55 3.1.3 IDAC Performance Shaping Factors ...... 56 3.2 OSU Data ...... 57 3.3 Results ...... 57 3.3.1 Pressurizer Level ...... 58 3.3.2 Procedure Status...... 61 3.3.3 Performance Shaping Factors ...... 63 3.4 Discussion ...... 69 3.4.1 Suggestions for IDAC-Student Simulations ...... 70 3.4.2 Lessons Learned for OSU Student Operators ...... 70 3.5 Conclusions ...... 71 4 Simulator Bias ...... 73 4.1 Inherent Bias in Simulator Studies: Environmental and Motivational Biases ... 74 4.2 Methods and Materials for Assessing Simulator Bias ...... 80 4.2.1 Manipulating Simulator Bias – A Causal Model of Bias Effects ...... 81 4.2.2 Data Collection ...... 86 4.2.3 Structural Equation Models ...... 90 4.3 Results ...... 102 4.3.1 Bias Structural Equation Models ...... 103 4.3.2 Mediated Effects ...... 111

ix

4.3.3 Model Fit & Sample Size Recommendations ...... 113 4.3.4 Recommended Sample Size for Future Research ...... 123 4.4 Discussion ...... 126 4.4.1 Quantitative Analysis of the Variations in Simulator Bias ...... 127 4.4.2 Quantitative Estimate of Overall Bias Effects ...... 127 4.4.3 Simulator Bias and the SPAR-H HRA Model ...... 130 4.5 Conclusion ...... 133 5 A Data-driven Bayesian Model of Subjective PSF Values ...... 137 5.1 Data Used in this Research: Environmental and Reported PSFs ...... 139 5.1.1 Environmental PSFs...... 139 5.1.2 Reported PSFs ...... 157 5.2 Introduction to Continuous Bayesian Networks ...... 159 5.3 Building the Subjective PSF Model ...... 171 5.3.1 Evaluating Model Results ...... 173 5.3.2 Data Transformation ...... 176 5.4 Model Description: The ABC Model ...... 177 5.5 Results ...... 188 5.5.1 Model Coefficients...... 193 5.5.2 Model K Values ...... 195 5.5.3 Model Prediction ...... 198 5.6 Analysis ...... 200 5.6.1 Interpreting A, the Environmental PSF Effect on the Reported PSF ...... 200 5.6.2 Interpreting B, the Bias Effect on the Reported PSF ...... 202 5.6.3 Interpreting C, the Context Effect on the Reported PSFs ...... 204 5.6.4 Stress ...... 205 5.6.5 Model variance: capturing the Uncertainty in the PSFs ...... 205 5.6.6 Sensitivity Analysis ...... 207 5.6.7 Cross Validation...... 210 5.7 Conclusions ...... 214 References ...... 222 Appendix A: PSF Definitions ...... 233 Full PSF BN – PSF Definitions ...... 233

x

Simplified PSF Network Definitions ...... 234 Appendix B: Questionnaires ...... 236 Appendix C: Scenario Narratives ...... 250 Appendix D: OSU Environmental CTL Coding ...... 255 Appendix E: ABC Model Fit 95% CI Plots ...... 263

xi

Table of Tables

Table 1.1: Comparison between the International HRA Empirical Study and the OSU NPP Simulator Experiments...... 12

Table 1.2: Bayesian update to SPAR-H HEPs for four error contexts (Groth, Smith, & Swiler, 2014)...... 17

Table 1.3: Sample HERA event description with associated PSF tags...... 25

Table 1.4: Comparison of the Major PSFs in the Full PSF Network (middle column) and the Simplified PSF Network (right column)...... 30

Table 2.1: NPP Systems Course and Operations syllabus...... 37

Table 2.2: OSU NPP Simulator Experiments ...... 39

Table 2.3: Data collected in the OSU NPP Simulator Experiment...... 41

Table 2.4: The Questionnaires Checklist ...... 42

Table 2.5: Scenario PSF Evaluation points. Additional evaluations at researcher's discretion...... 45

Table 2.6: Guidelines given to the student operators for the PSF evaluations ...... 46

Table 2.7: Key terms used in this dissertation...... 48

Table 3.1: A portion of the IDAC Knowledge Base highlighting an item removed from the Student Operator Knowledge Base...... 56

Table 3.2: Average step time for IDAC and OSU crews...... 62

Table 4.1: Summary of concerns when conducting simulator studies...... 75

Table 4.2: Motivational and Environmental Simulator Biases ...... 78

Table 4.3: Hypotheses in the OSU NPP Simulator Experiment ...... 83

Table 4.4: Variables in the OSU NPP Simulator Experiments ...... 87

xii

Table 4.5: SEM terminology...... 90

Table 4.6: Notation frequently used in this chapter...... 100

Table 4.7: The Latent Bias Model estimated coefficients. Standardized coefficients represent the change in response variable per change in predictor variable in units of standard deviation. P-values are the probability the effect is negligible; bootstrap p-values are estimated through bootstrap resampling...... 105

Table 4.8: The Bias Path Analysis Model estimated coefficients. Standardized coefficients represent the change in response variable per change in predictor variable in units of standard deviation. P-values are the probability the effect is negligible; bootstrap p-values are estimated through bootstrap resampling...... 108

Table 4.9: Mediation in the Latent Bias Model ...... 112

Table 4.10: Mediation in the Bias Path Analysis Model...... 113

Table 4.11: Bootstrap resampling in the Latent Bias Model ...... 115

Table 4.12: Bootstrap resampling in the Bias Path Analysis Model ...... 116

Table 4.13: Cohen's effect size thresholds (Fritz & MacKinnon, 2007)...... 121

Table 4.14: Recommended minimum sample size required for bootstrap power analysis (휋 = 0.8). Effects are listed as distal-proximate effects, i.e. SM indicates Small distal effect (a in Figure 4.5) and a Medium proximal effect (b in Figure 4.5). S, M and L correspond to Cohen’s criteria for Small, Medium, and Large Effects; H represents Halfway between S and M. From Table 3 in (Fritz & MacKinnon, 2007)...... 122

Table 4.15: Expected values of Bias Measures in a control room (rather than a simulator)...... 128

Table 4.16: SPAR-H PSFs, levels, and multipliers (Gertman, Blackman, Marble, Byers, & Smith, 2005); biases expected to influence the SPAR-H PSFs...... 132

Table 5.1: The components of the proposed PSF BN...... 138

Table 5.2: Plant parameter thresholds used to calculate Environmental TCL...... 144

Table 5.3: Monitored parameters associated with the primary procedures used in the experiment. Parameters that are used to calculate OSU System Dynamics are shaded gray...... 152

xiii

Table 5.4: The cognitive tasks and associated cognitive load (Table 6-5 in (Li, 2013)). 155

Table 5.5: Data used to populate the PSF model. Data for observations 1 and 2 are shown...... 159

Table 5.6: Sample data for the simplified illustration BN...... 162

Table 5.7: ABC Model - A Variations ...... 179

Table 5.8: Operator sensitivity groups...... 182

Table 5.9: ABC Model - B variations ...... 183

Table 5.10: ABC Model - C variations ...... 185

Table 5.11: Prior distributions for unknown model parameters...... 188

Table 5.12: ABC Model - Fit results ...... 189

Table 5.13: Posterior estimates of the model parameters used to estimate μpij...... 194

Table 5.14: Posterior operator sensitivity coefficients...... 194

Table 5.15: Posterior coefficients for stress (Left) and posterior variance parameters (Right)...... 195

Table 5.16: K.fit data for all the estimated values ('All PSFs') and each of the five PSFs. (Model = C.8) ...... 197

Table 5.17: K.predict data for each Test observation and averaged over both observations ...... 199

Table 5.18: ABC model coefficient posterior mean probability in the logit-transform model (left) and the model built using the raw data (right)...... 210

xiv

Table of Figures

Figure 1.1: The evolution of the IDAC HRA model. IDAC editions that are used in this research are shaded gray...... 20

Figure 1.2: The Full PSF Network. Major PSFs are circled in red; figure modified from (Sundaramurthi & Smidts, 2013)...... 23

Figure 1.3: The 'visible' portion of the Full PSF Network, with Present/Absent probabilities from the HERA data. Again, Major PSFs are highlighted in red. Figure modified from (Sundaramurthi & Smidts, 2013)...... 26

Figure 1.4: The Simplified PSF Network and associated surrogates, operator characteristics and manifestations (from (Li, 2013))...... 28

Figure 1.5: The Major PSFs in the Full PSF Network (left) and the Simplified PSF Network (right). Items in bold are present in both versions of the PSF network. Rectangles represent PSFs; ovals are factors that are classified as Major PSFs in the other version of the network...... 29

Figure 2.1: Student operators in the OSU NPP Simulator Facility ...... 32

Figure 2.2: One of the Large Screen Displays showing the Chemical and Volume Control System, the Pressurizer, and the three Steam Generators, along with numerous other indicators...... 34

Figure 2.3: The data collection system captures two views of the OSU NPP Simulator Facility, where a researcher interacts with the GPWR program ...... 35

Figure 2.4: Example of the data collected during a simulator session in the Noldus live coding system...... 36

Figure 2.5: Example of the Post-Scenario Questionnaire, showing two questions about simulator bias...... 42

Figure 2.6: The 2013-2014 PSF Evaluation Log ...... 45

Figure 2.7: The updated 2015 PSF Evaluation Log ...... 45

xv

Figure 2.8: Expected OSU scenario events...... 50

Figure 2.9: Crew #4 Scenario Narratives ...... 52

Figure 2.10: Crew #1 Scenario Narratives...... 53

Figure 3.1: Pressurizer level in IDAC simulations and OSU simulator sessions. A1-A7 are the OSU crews; H1 and V1 and the IDAC Hamlet and Vagabond simulations respectively...... 58

Figure 3.2: IDAC and OSU Crew progress through procedures...... 61

Figure 3.3: IDAC PSFs TCL and PIL. Left: PSFs over time, right: PSFs by procedure step; red: Hamlet, black: Vagabond...... 64

Figure 3.4: IDAC PSFs CTL and Stress. Left: PSFs over time, right: PSFs by procedure step; red: Hamlet, black: Vagabond...... 65

Figure 3.5: OSU and IDAC TCL and PIL. Left: PSF over time; right: by procedure step...... 67

Figure 3.6: OSU and IDAC CTL and Stress. Left: PSF over time; right: by procedure step...... 68

Figure 4.1: Causal model of simulator bias in the OSU NPP Simulator Experiments ..... 82

Figure 4.2: The iterative SEM analysis process...... 92

Figure 4.3:The preliminary OSU Latent Bias Model...... 94

Figure 4.4: The Preliminary Bias Path Analysis Model ...... 96

Figure 4.5: Mediation in SEM ...... 97

Figure 4.6: SEM Model Development Process...... 101

Figure 4.7: The Latent Bias Model estimates the latent variable, Bias. The Bias Measure Likely.Dist and the crew characteristic BaseStress are not included in the model because they are found to be insignificant in the process described in Section 4.2.3.2...... 104

Figure 4.8: The Bias Path Analysis Model. Bias Measures Prepared and Likely.For are dropped from the model...... 107

xvi

Figure 4.9: Power analysis and sample size for a model with 30 degrees of freedom. (a) shows the OSU NPP Simulator Experiment sample size, (b) is the recommended sample size for 훼 = 0.2, 휋 = 0.8, and (c) is the recommended sample size for 훼 = 0.1, 휋 = 0.9...... 120

Figure 4.10: Sample Bias SEM for future work...... 124

Figure 4.11: Recommended sample size as a function of degrees of freedom. LEFT: The solid line shows n(d) for 훼 = 0.1, 휋 = 0.9; the dotted line corresponds to 훼 = 0.2, 휋 = 0.8. RIGHT: recommended sample size for the anticipated degrees of freedom in the proposed Bias SEM...... 125

Figure 5.1: The PSF model building process...... 138

Figure 5.2: Environmental TCL (black), normalized time available (red), and normalized plant parameters (gray)...... 145

Figure 5.3: Number of active alarms (black) and Environmental PIL (red). Left: all active alarms; right: normalized active alarms, with the maximum number of alarms capped at no = 3...... 148

Figure 5.4: Normalized number of changing parameters N.i (black) and system dynamics (red) for two simulator Exam sessions: Exam A (EA) for Crew #3 and Exam B (EB) for Crew #6...... 153

Figure 5.5: Basic Model with all parameters. Observed data are in boxes; estimated parameters are in circles...... 164

Figure 5.6:Hierarchical Model ...... 166

Figure 5.7: An informative prior (red) and the weakly informative prior created by broadening the informative prior (black)...... 170

Figure 5.8: The ABC model building and selection process...... 172

Figure 5.9: Data transformation. Original data (gray), cushion added to change the range from [0,1] to (0,1) (black), and logit transform (red)...... 177

Figure 5.10: Operator sensitivity coefficient api quintiles...... 181

Figure 5.11: Model fit for Operator 2. Reported PSF (thick red line), Environmental PSF (thin blue line), and posterior 95% CI (gray shadow). Operator sensitivity group is listed in parentheses in each plot title...... 192

xvii

Figure 5.12: Model fit for Operator 12. Reported PSF (thick red line), Environmental PSF (thin blue line), and posterior 95% CI (gray shadow). Operator sensitivity group is listed in parentheses in each plot title...... 192

Figure 5.13: Posterior parameter boxplots. Left: coefficients for 휇푝푖푗, 푏푝, 푐푝; right: 푎푝푖...... 193

Figure 5.14: Posterior parameter boxplots. Left: 푏푆푝; middle: 휇푎푝, right: model variance...... 195

Figure 5.15: Posterior K.fit. Left: histogram of K for all observations; 2nd: mean K.fit for each operator; 3rd: mean K.fit for each session; 4th: mean K.fit for each PSF. Results are sorted into two categories: Treatment 0 operators who encountered only single-fault accidents (black) and Treatment 1 operators who encountered a mix of single fault accidents and multiple fault accidents (red). The overall average is shown in gray. In the 3rd figure, results from complex (multiple fault) scenarios are marked with an x...... 197

Figure 5.16: Prediction plots for the two reserved Test observations. As in previous plots, Reported PSFs are red, Environmental PSFs are blue, and the posterior predicted 95% CI for the reported PSF is gray...... 198

Figure 5.17: Estimated Bias in each observation, separated by Session (PA, PB, EA, EB) and identified by the reporting operator's Team Number and Role (S ~ SRO, R ~ RO)...... 203

Figure 5.18: Bias organized by operator for four representative operators...... 204

Figure 5.19: Predicted values for the two Test observations. Dotted lines show the mean predicted values in cases of extreme Bias or lack of Bias...... 207

Figure 5.20: Sensitivity test results...... 208

Figure 5.21: Comparison of the C.8 model evaluated using logit transformed data and raw data...... 209

Figure 5.22: K-fold comparison for model coefficients...... 212

Figure 5.23: K-fold comparison for operator sensitivity factors...... 213

Figure 5.24: K-fold comparison for variance parameters...... 214

xviii

Acronym List

Acronym Definition ACT Attention to Current Task AOP Abnormal Operating Procedure APP Annunciator Panel Procedure BBN Bayesian Belief Network BN Bayesian Network BWR Boiling Water Reactor CAER Center for Advanced Engineering and Research CNMT Reactor Containment CREAM Cognitive Reliability and Error Analysis CSNI Committee on the Safety of Nuclear Installations CST Condensate Storage Tank CTL Cognitive Task Load DDET Discrete Dynamic Event Tree EA Exam Session A EB Exam Session B EC Error Context EOP Emergency Operating Procedure EPRI Electric Power Research Institute FW Feed Water GCA Global Condition Assessment GP Garde Path decision making style

xix

Acronym Definition GPWR Generic Pressurized Water Reactor H Hamlet decision making style HEP Human Error Probability HERA Human Event Repository Analysis HFE Human Failure Event HMI Human Machine Interface HRA Human Reliability Analysis HRC Human Cognitive Reliability HSSL Human Systems Simulation Laboratory IDAC Information Decision Action Crew INL Idaho National Laboratory IRB Institutional Review Board LOCA Loss of Coolant Accident LOFW Loss of FeedWater LSD Large Screen Display LVL Level MCMC Markov Chain Monte Carlo NPP Nuclear Power Plant NR Narrow Range OAT Operator - Action Taker (IDAC name for RO) ODM Operator - Decision Maker (IDAC name for SRO) ORE Operator Reliability Experiment OSU The Ohio State University PA Practice Session A PB Practice Session B PIL Passive Information Load PRA Probabilistic Risk Assessment PRZ Pressurizer PSF Performance Shaping Factor

xx

Acronym Definition PWR Pressurized Water Reactor RCS Reactor Coolant System RO Reactor Operator RWST Refueling Water Storage Tank S Stress Scenario Authoring, Characterization, and Debrief SACADA Application SEM Structural Equation Modeling SG Steam Generator SGTR Steam Generator Tube Rupture SLB Steam Line Break SPAR-H Standardized Plant Risk Analysis-HRA SRO Senior Reactor Operator TC Task Complexity TCL Time Constraint Load TDAFWP Turbine Driven Auxiliary Feed Water Pump TRL Task Related Load, another name for CTL V Vagabond decision making style VCT Volume Control Tank

xxi

1 Introduction

The focus of this research is using digital Nuclear Power Plant (NPP) simulators to improve Human Reliability Analysis (HRA) models. Like flight simulators for pilots,

NPP simulators give NPP operators the opportunity to practice operating a NPP without real-world consequences.

Human Reliability Analysis is the practice of predicting human actions in complex systems. When an accident occurs, what is the probability that the human operator will be able to restore normal operations or mitigate the damage? Alternatively, what is the probability that an operator error will turn normal conditions into an accident situation?

Accidents in nuclear power plants are rare—a testament to many years of rigorous engineering—but this means HRA cannot rely on data from accidents. Consequently, the technical basis for HRA is weak, with many HRA models built on expert judgment and common sense that may or may not be accurate tools for predicting human failure rates.

One practical solution is to use simulator data.

This dissertation presents methods for using NPP simulator data in HRA research. The data in this analysis are from novice operators—graduate and undergraduate engineering students at The Ohio State University (OSU) who participated in the NPP Systems and

Operations Course. As such, the data and models do not represent expert operator

1

behavior; however, the methods introduced below are designed to be useful in a variety of contexts and simulator environments.

Three applications of NPP simulators are investigated using data collected in the OSU

NPP Simulator Experiments:

• Qualitative comparison of student operator performance and operator performance

simulations generated by the Information-Decision-Action Crew (IDAC) HRA model

(Chapter 3)

• Estimation of bias effects introduced by the simulator environment using Structural

Equation Modeling (SEM). (Chapter 4)

• Development of a quantitative, dynamic, data-informed Performance Shaping Factors

(PSF) model based on the IDAC PSF network using Bayesian analysis. (Chapter 5)

The danger in relying on simulator data is the concern that the data is not representative of real world conditions. HRA simulator studies tend to acknowledge this difficulty, but to our knowledge no one has attempted to formalize a language for discussing these effects or to quantify simulator bias. The model presented in Chapter 4 is a first step in that direction. This model is preliminary, based on a small sample of student operators, but the approach illustrates a method for addressing bias in simulator data rather than simply trying to minimize and then disregard these effects.

In Chapter 5, the preliminary bias model is integrated into a PSF network as an example of how bias might be treated in HRA models. The PSF network developed in this research expands on existing PSF models in several ways. First, this PSF model is

2

dynamic, evolving throughout an accident scenario. Second, the PSF model generates probability distributions for PSFs that account for variations between operators without requiring extensive detail about the operators. Third, the PSF model relies almost entirely on objective data to predict the subjective PSF values experienced by the operators. The

PSF model uses operator reported data to bridge the gap between what we might expect an operator to experience given the plant conditions and what the operator actually experiences. As with the Bias model, the PSF model is a preliminary build using data collected from student operators. These model parameters are not expected to apply to experienced operators, but the methods used to analyze the model are provided as a useful tool for robust, data-driven HRA.

1.1 OVERVIEW OF THIS DISSERTATION Chapters 1 and 2 provide context and background information. Chapters 3, 4 and 5 present three applications of simulator data for HRA. Chapter 6 summarizes the findings and outlines the path for future research.

1.1.1 Chapter 1, Introduction Chapter 1 provides an overview of the research and reviews relevant literature on HRA, the use of simulators in HRA, and the IDAC HRA model.

1.1.2 Chapter 2, The OSU NPP Simulator Facility Chapter 2 describes the OSU NPP Simulator Facility and the data collected and analyzed in this research.

3

1.1.3 Chapter 3, Qualitative Comparison of the OSU Student Operators and an IDAC Simulation Chapter 3 compares student operator behavior with behavior predicted by the

Information-Decision-Action Crew (IDAC) HRA model in a design basis Steam

Generator Tube Rupture (SGTR).

1.1.4 Chapter 4, A Method for Quantifying Bias in NPP Simulator Experiments Chapter 4 introduces an approach for estimating the bias introduced by the simulator facility. A set of biases are proposed to characterize factors introduced by the simulator environment. There are two categories of simulator bias: Environmental Biases

(physical differences between the simulator and the control room), and Motivational

Biases (cognitive differences between training in a simulator and operating a NPP). This study examines Motivational Biases.

A preliminary causal model of Motivational Biases is introduced and tested in a demonstration experiment using 30 student operators. Data from 41 simulator sessions are analyzed. Data include crew characteristics, operator surveys, and time to recognize and diagnose the accident in the scenario.

Quantitative models of the Motivational Biases using Structural Equation Modeling

(SEM) are proposed. These models are used to estimate how the effects of the scenario conditions are mediated by simulator bias, and to demonstrate how to quantify the strength of these effects.

4

1.1.5 Chapter 5, Development of a Bayesian Subjective PSF Model Chapter 5 uses student operator reported PSF data to develop a data-informed model of subjective Performance Shaping Factors (PSFs) such as Stress and perceived Time

Pressure. Objective or 'Environmental' PSFs are calculated based on the IDAC HRA model. These values are updated with subjective, operator-reported PSF data to produce a posterior 'Subjective PSF Model'.

The PSF model incorporates three primary effects: the Environmental PSF, the simulator

Bias, and the immediate Context. The model accounts for variation between operators with an operator sensitive factor that estimates each operator’s sensitivity to the

Environmental PSF.

The PSF model is a mixed effects, hierarchical linear Bayesian model that generates probability distributions for five major PSFs: Time Constraint Load, Passive Information

Load, Task Complexity, Cognitive Task Load, and Stress.

1.1.6 Chapter 6, Summary Chapter 6 summarizes the findings and outlines the path for future research.

1.2 BACKGROUND Before diving into the research, some background information is in order. This section of the chapter reviews three topics: First, the need for data in HRA and why the HRA community is interested in simulator data. Second, examples of how other researchers have integrated data (both field data and simulator data) into HRA models. Finally, an

5

introduction to IDAC, the HRA method used in this research, and to the PSF Bayesian

Networks that have already been developed from the IDAC model.

1.2.1 The Great Challenge in Human Reliability Analysis: Reliable Data Despite fifty years of research and development, Human Reliability Analysis (HRA) continues to be the weak link in Probabilistic Risk Assessment (PRA) for commercial nuclear power plants in the United States and globally. The Nuclear Energy Agency’s

Committee on the Safety of Nuclear Installations (CSNI) summarizes the problem: “The uncertainties associated with human actions, particularly those occurring after an initiating event, are significantly higher in comparison to what is typical for hardware failures” (Hirschberg, 2004). The committee identifies six weaknesses in the current

HRA approach that lead to the high uncertainties in HRA:

1. Current methods employ limited representations of the cognitive aspects of human

performance (diagnosis and decision errors).

2. Poor inter-rater reliability among analysts using the same method or using different

methods.

3. “Excessive reliance” on expert judgment due to lack of empirical data for accident

scenarios.

4. Insufficient analysis of errors of commission.

5. Insufficient analysis of dependencies between actions.

6. Insufficient accounting for organizational and management aspects of operations and

performance.

6

This research develops methods to address item 3, the lack of empirical data for accident scenarios, by collecting data in digital NPP simulator facilities to validate and improve

HRA models.

A 2008 report (CSNI, 2008) summarizes the types of data needed for robust HRA. Areas of insufficient data reflect the weaknesses in HRA listed above:

• Operator decision-making.

• Ergonomics of human-computer interfaces in control rooms.

• Operator performance in long scenarios (from six hours to upwards of 24 hours).

• Operator contributions to latent failures through maintenance operations, etc.

• The impact of safety culture and organizational factors on operator performance.

Of these, the first two lend themselves to simulator studies. The ergonomics of Human

Machine Interfaces (HMI) can be addressed by human factors evaluations, leaving data related to operator-decision making as the emphasis for simulator studies.

The same report concludes that simulator studies require significant time and effort, that utilities are interested in conducting HRA/PRA studies in their on-site, plant-specific simulators, but that they need guidance in order to invest their time and resources well for these efforts, and that the validity of the simulator data collected need to be addressed.

This dissertation tackles two challenges associated with using simulator data to improve the research basis for HRA. The first challenge is the validity of simulator study data for

HRA research. The second is the experiment design, data collection tools, and data analysis necessary for updating the HRA model using simulator data.

7

With respect to the validity of a simulator study, this research identifies the biases that are likely to be introduced by a simulator environment, assess the biases present in the OSU

NPP Simulator studies, and propose mitigation strategies for both the experiment design and analysis of experimental data collected in simulators.

With respect to incorporating simulator data into a specific HRA model, this research outlines an approach and tests this approach through a prototype experiment. As this experiment is conducted using student operators, the quantitative results of the study are not directly applicable to commercial plant operation; however, the experience gained by conducting the study is applicable to future work conducted with experienced operators.

This portion of the research is a continuation of the Bayesian PSF network developed in

(Sundaramurthi & Smidts, 2013) and (Groth K. M., 2009).

1.2.1.1 Collecting Simulator Data for HRA The nuclear industry has long recognized NPP simulators as a potential solution to the lack of data in HRA. The Human Cognitive Reliability (HCR) model developed by the

Electrical Power Research Institute (EPRI) in the 1980s is built on data collected in the

Operator Reliability Experiments conducted in plant simulators (Spurgin, 1990). Several databases established to consolidate available data—including the SACADA data collection tool currently being developed at the Idaho National Laboratory (INL) (Chang, et al., 2014)—are structured to support simulator data.

The renewed interest in simulator studies corresponds to the recent increase in access to simulator facilities. Historically, simulators were limited to the full-scale, plant specific

8

training simulators on site at commercial plants. These simulators are analog mock-ups of the real control room; each switch, dial, indicator and display used in the control room is also installed in steel control panels in the simulator. Such simulators are large, expensive, difficult to modify, and proprietary to the plant. Access to the simulator facility for research purposes is limited, as the plant is constantly using the simulator for training, but building a research simulator would be even more daunting. In his 1994

HRA Data Handbook (Gertman & Blackman, 1994), David Gertman’s discussion of

“simulator issues” focuses on the cost and effort required to build a sufficiently high fidelity simulator.

These problems disappeared with the advent of digital simulators. The simplest installation of a digital simulator requires nothing more than a computer and a display, and consequently, digital research simulators are proliferating in the nuclear industry.

Within the last five years, at least four research simulators have been established in the

United States: the OSU NPP Simulator Facility, the NPP Main Control Room at the

Center for Advanced Engineering and Research (CAER) in Virginia, the Human Systems

Simulation Laboratory (HSSL) at the Idaho National Laboratory, and the North Carolina

State University Simulator Laboratory. Although the HSSL and CAER are not focused on HRA data collection, all three research centers are conducting research related to nuclear power plant operations.

9

1.2.1.2 Significant HRA Simulator Studies Of the numerous HRA simulator studies that have been conducted for HRA research, two stand out. The first is the set of Operator Reliability Experiments (ORE) conducted by

EPRI to provide data for the Human Cognitive Reliability (HCR) HRA analysis model.

The second is the more recent International HRA Empirical Study conducted at the

Halden Reactor Project in Norway. These two studies illustrate the evolution in HRA over the past fifty years. The ORE emphasize quantitative data for particular error-likely situations, while the International HRA Empirical Study emphasizes the value of qualitative data, understanding context, and operator cognition during an accident scenario.

1.2.1.3 The Operator Reliability Experiments (ORE) The HCR HRA model is unique among HRA models in that a series of simulator experiments were conducted to build the time reliability curves used in the model

(Weston, Whitehead, & Graves, 1987). EPRI conducted the ORE in the 1980s to quantify the time operators needed to perform a variety of “Human Interactions” such as “Isolate a faulted steam generator following a steam line break” (Boring, Shirley, Joe, Mandelli, &

Smith, 2014). Eight plants participated in the study, with multiple crews completing scenarios at each plant. Results include tables listing human interactions with median response times and time range for each response recorded in seconds. The HCR premise is that operators will fail if they do not have sufficient time to complete a task. Thus,

HCR uses the ORE Time Reliability Correlations (TRC) to calculate a probability of non- response.

10

The results from this study are incorporated in the EPRI HRA Calculator (EPRI, 2013).

1.2.1.4 The International HRA Empirical Study The International HRA Empirical Study is the most robust HRA validation experiment conducted to date. This study compares simulator data to analyst predictions for 13 different HRA models (Bye, et al., 2012). Although technically a pilot study because the study only used eight operating crews—a statistically small sample—both operators and analysts were experienced professionals in their respective domains. The study employs a between-subjects design, with different analysts for each HRA model. In addition, two teams analyzed the scenarios using Standardized Plant Analysis Risk HRA (SPAR-H), providing some insight into the consistency of usage of this model.

Table 1.1 is a high level comparison of the aims and objectives of the International HRA

Empirical Study with the preliminary OSU simulator experiment discussed in this dissertation. The aim of the International HRA Empirical Study was to benchmark an array of HRA models against each other and against some limited empirical data collected in a simulator facility. Despite having ‘empirical’ in the study title, the bulk of the findings are qualitative and a reflection of expert judgment of crew performance. The aim of the OSU NPP Simulator Experiment is to develop methods for using simulator data in HRA.

11

Table 1.1: Comparison between the International HRA Empirical Study and the OSU NPP Simulator Experiments. International HRA Empirical OSU NPP Simulator Experiment Study Scope Multiple HRA models: benchmark Single HRA model: Update portion multiple HRA models with of one HRA model (IDAC) using empirical data empirical data Emphasis Focus on Human Failure Events Focus on treatment of PSFs (HFEs) and overall model success Failure Identify significant PSFs for HFEs Predict operator-reported strength Events of PSF; do not tie PSFs to failure events Operators Experienced crews Novice operators Scenarios Simple and complex versions of Simple and complex versions of SGTR & LOFW SGTR & LOCA Objective Assess strengths and weaknesses of Develop methods to improve HRA existing HRA models research in simulator facilities Simulator Use simulator data as a benchmark Develop formal treatment of bias in Effects to bound real-world events simulator data

Most of the key lessons learned from the International Empirical HRA Study relate to

PSFs (Forester, et al.) and are relevant to the research in this dissertation. These include:

- HRA models that do not specifically address crew cognitive actives (such as

diagnosis of the underlying event) fail to identify relevant PSFs and consequently

underestimate Human Error Probability (HEP). The model proposed in Chapter 5 is

consistent with this finding. The Environmental PSF model derived from IDAC was

designed to be as objective as possible, but the posterior PSF model proposed in Chapter

5 includes Operator Confusion as an explanatory variable because all of the PSFs are

influenced by the operator’s mental state.

- HRA analysts have difficulty assessing the degree of influence of different PSFs,

resulting in a wide range of HEPs for a single event. The method introduced in

12

Chapter 5 is designed to reduce the judgment required in assessing the PSFs by providing

a data-driven model for assessing the strength of the PSFs in the model based on scenario

context.

- Determining which PSFs are relevant to a scenario is a challenge in HRA. Some

HRA models require analysts to specify a host of potentially influential factors, while

other models attempt to characterize the environment with a few key PSFs. Despite the

wide variety of HRA models included in the study, researchers could not identify a

preferred model for selecting PSFs. The approach introduced in Chapter 5 illustrates one

way to reduce the uncertainty, using Bayesian analysis to identify PSFs that can be

eliminated from a HRA model and to identify areas of uncertainty where other factors

may be missing from the model.

The International HRA Empirical Study does not address directly the impact of the simulator environment on the data obtained in the simulator. This effect—referred to as simulator bias in this dissertation—is inherent in any simulator study. We identify two categories of simulator bias: Environmental Bias introduced by the physical differences between the simulator and the control room, and Motivational Bias induced by the effect of being in an artificial environmental (see Chapter 4). Although the study authors do not use these terms to describe the effects of the simulator environment, the study design seeks to reduce the Environmental Bias by providing additional training to the operators before collecting data, and to reduce Motivational Bias through a varied simulator scenario schedule (Dang, et al., 2014).

13

1.2.2 Integrating Data into HRA Models: A Bayesian Approach Data collected from these and other studies are useful for benchmarking HRA models, but they can also be integrated into the HRA models to bolster the scientific basis of the

HRA model.

The primary approach to integrating simulator data into HRA is through Bayesian

Networks (Mkrthyan, Podofillini, & Dang, 2015). This includes turning traditional HRA models into Bayesian Networks (BNs), which can incorporate many types of data. Groth and Swiler propose a Bayesian version of SPAR-H, for example, which transforms the

SPAR-H PSFs into a Bayesian Network (Groth & Swiler, 2012). One advantage of this approach is that the analyst does not have to specify each of the PSFs. For example, the analyst can state that the stressors are not “extreme” without having to decide if the stressors are “nominal” or “high.” Another advantage of the Bayesian SPAR-H is the ability to model dependencies between PSFs, rather than assessing them individually as is required in the original SPAR-H model.

Similar approaches are used in (Kim, Seong, & Hollnagel, 2006) and (Yang, Bonsall,

Wall, Wang, & Usman, 2013) to update the Cognitive Reliability and Error Analysis

Method (CREAM) (Hollnagel, 1998). Again, Bayesian networks replace parts of the

CREAM structure to allow analysts to make predictions using imperfect information rather than having to explicitly define every aspect of the situation of interest.

In general, treatment of PSFs has been a fruitful area for integrating data into HRA.

Groth and Mosleh (2012) illustrate how to use a combination of expert judgment and

14

observed data to construct a Bayesian PSF Network. Starting with data from the Human

Event Repository and Analysis (HERA) database (Hallbert, Whaley, Boring, McCabe, &

Chang, 2006), they use Factor Analysis to identify Error Contexts (ECs)—combinations of PSFs that are likely to result in a failure event. These ECs are the basis for the network structure, which is then populated with probability tables based on the HERA data and, when necessary, expert judgment. The resulting network of Human Failure Event (HFE) conditional probabilities can be used to quantify the probability of a HFE event in a given scenario S if the likelihood of each relevant EC in the scenario is known:

푝(퐻퐹퐸|푆) = ∑ 푝(퐻퐹퐸|퐸퐶푖) ∗ 푝(퐸퐶푖|푆) (1.1) 푖

Instead of relying on operational data, Musharraf et al (2014) collected data in a simulator environment, manipulating PSFs and observing scenario outcomes. The simulated scenario is evacuating an offshore platform; the three PSFs are training, visibility (i.e. day or night) and scenario complexity (complex scenarios add hazards that block the fastest escape route). The researchers recorded four outcome metrics: successful evacuation, time to evacuate, backtracking time, and exposure to hazards.

Based on this controlled experiment, they designed a simple BN and calculated the effects of the three controlled PSFs on the outcome indicators. They also noted whether the personnel successfully evacuated. This data populated a simple BN that relates the presence or absence of the three manipulated PSFs to the outcome metrics.

15

Finally, Sundaramurthi and Smidts (2013) use the IDAC HRA model as the basis for the structure for a complex PSF Bayesian Network, which is populated using data from a combination of simulator scenarios and real events. This is discussed in greater detail in

Section 1.2.3.2, after a brief introduction to the IDAC HRA model.

The approaches discussed above use discrete Bayesian Networks with assigned node probabilities based on observed data.

An alternative Bayesian approach to integrating data into HRA is proposed in (Groth,

Smith, & Swiler, 2014). The authors use data from the International Empirical HRA

Study to update Human Error Probabilities in the SPAR-H HRA model. In SPAR-H, analysts specify a set of eight PSFs. Each PSF has three to five levels (poor, nominal, high, etc.), and each level has an associated HEP multiplier. After specifying each PSF level, the analyst combines the multipliers to compute a discrete, context-based HEP.

The authors identify four error contexts—sets of PSF conditions—that are repeated in the

International HRA Empirical Study. For each EC, they change the discrete SPAR-HEP into a beta distribution with the expected value equal to the SPAR-H HEP. They update this distribution using data from the HRA study, a set of x successes in n attempts for each error context. The sample is small, no more than four attempts for each of the four error contexts. The posterior is a beta distribution of the probability of failure given the error context. The results are listed in Table 1.2

16

Table 1.2: Bayesian update to SPAR-H HEPs for four error contexts (Groth, Smith, & Swiler, 2014). Contex SPAR Prior Beta Observe Posterior Beta Posterio t -H Distribution d Data Distribution r Mean HEP (x/n) A 0.001 푝표~ 퐵푒푡푎(0.5,499.5) 0/4 푝1~ 퐵푒푡푎(0.5,503.5) 0.00099 2 B 0.167 푝표~퐵푒푡푎(0.5,2.4975) 1/4 푝1~퐵푒푡푎(1.5,5.4975) 0.214 C 1.00 푝표~퐵푒푡푎(50000,0.5) 4/4 푝1~ 퐵푒푡푎(50004,0.5) ~ 1.00 D 0.000 푝표~퐵푒푡푎(0.5,4999.5) 0/3 푝1~ 퐵푒푡푎(0.5,5002.5) 0.0004 1

In general, the data affirm the SPAR-H HEPs. For three of the four cases, the posterior mean is consistent with the prior mean. In the fourth error context (Context B), the posterior mean increases slightly over the prior mean, but the shape of the distribution narrows, resulting in increased confidence that the HEP is close to the mean value of the distribution.

This approach is similar to the method used in Chapter 5, although in Chapter 5 the emphasis is on estimating PSFs—that is, specifying the error context—rather than the

HEP itself.

Groth et al. reject concerns that simulator data is not representative of actual accidents, arguing that the HEP is the probability of error given a specific context, and that the error context is replicated in the simulator environment. This is certainly valid for PSFs such as time available or procedure quality, but the discussion on simulator bias in Chapter 4 shows that motivational factors may play a role as well. Bias could be incorporated into the Bayesian HEP update using a different value from the distribution as the HEP—the

17

75th percentile instead of the mean, perhaps—to account for the expected HEP variation between the simulator environment and a real control room. Experiments such as the one described in Chapter 4 could be used to calibrate a shift in simulator-based HEPs for real world events.

1.2.3 A Test Platform: The Information-Decision-Action Crew (IDAC) HRA Model As with many of the studies mentioned above, this dissertation focuses on updating the treatment of PSFs in an HRA model. The HRA model used as a test case is the

Information Decision Action Crew (IDAC) HRA model. In Chapter 5, an objective PSF model derived from the IDAC model is updated using subjective, student-reported PSF data. IDAC also is used to generate the virtual operator data in Chapter 3.

1.2.3.1 About IDAC IDAC is one of the few HRA models designed specifically for dynamic PRA. IDAC was calibrated using data from the Halden study (Coyne, 2009) and is being used in the first dynamic Level 2 PRA simulations (LaChance, et al., 2012).

IDAC is essentially a virtual operator model that simulates operator cognition in three primary modules. The Information Module simulates the information an operator receives from the plant, the environment and from other operators. The Decision Module models the operator’s decision making process based on the information available and the operator’s knowledge. The Action Module models any actions the operator chooses to take in the Decision Module (including actions such as seeking additional information).

The model simulates the cognition of two operators, a Decision Maker (DM) and an

Action Taker (AT), mimicking a typical control room crew: a senior reactor operator who

18

reads procedures and directs crew activities, and reactor operators who interact directly with the plant main control board.

The operator’s Mental State influences all three modules. The Mental State incorporates individual operator characteristics, operator knowledge, and PSFs such as perceived time constraint and cognitive task load.

IDAC is coupled with the Accident Dynamic Simulator (ADS) (Hsueh & Mosleh, 1996), a computer code developed for full scale dynamic PRA of NPPs. ADS models plant thermal hydraulics, safety systems and operator interactions. The ADS PRA uses discrete dynamic event trees (D-DET) to capture different scenario outcomes. IDAC—also referred to as ADS-IDAC—adds operator interactions to the dynamic PRA code.

IDAC is based on theory and research, incorporating insights from cognitive psychology, neuroscience, human factors, behavioral sciences and expert judgment with insights from first and second generation HRA models (Chang & Mosleh, 2007). IDAC simplifies the task of modeling human cognition by assuming that operator actions are dictated by training, procedures, and standardized responses to specific cues. This assumption is based on the discipline and training required of professional NPP operators. The resulting model is built on a set of written and mental procedures that map out operator actions throughout the scenario (Li, 2013).

IDAC was originally known as IDA, the Information-Decision-Action HRA model

(Smidts, Shen, & Mosleh, 1997). The name became IDAC (IDA-Crew) when a crew

19

module was added to model different operator roles. The model has evolved over the past twenty years, as illustrated in Figure 1.1.

Figure 1.1: The evolution of the IDAC HRA model. IDAC editions that are used in this research are shaded gray.

Two versions of IDAC are relevant to this research: the IDAC-I modification

(Sundaramurthi & Smidts, 2013), and the most recent IDAC update (Li, 2013). These are highlighted in Figure 1.1.

The first relevant IDAC model is IDAC-Improved (IDAC-I), a variation proposed in

(Sundaramurthi & Smidts, 2013) that outlines two modifications to the IDACrew model

(Chang & Mosleh, 1999). First, IDAC-I suggests adding a mental model representing the operator’s understanding of the plant conditions. The mental model is a simplification of the full plant, meant to capture the heuristics and generalizations that operators use to manage such a complex system. Second, IDAC-I derives a causal PSF Bayesian Network

(BN) from IDACrew and demonstrates how to populate the BN with accident and event

20

data. This is referred to as the “Full PSF Network” in this research. The Full PSF

Network is the starting point for the research in Chapter 5 of this dissertation.

The second version of IDAC is the most recent update to the IDAC model, referred to in this research as IDAC 3.0 (Li, 2013). IDAC 3.0 builds on IDAC 2.0 (Coyne, 2009), which added new branch points to the IDAC code and implemented a decision making engine to the model. IDAC 2.0 also calibrated aspects of the IDAC model with data from the International HRA Empirical Study. IDAC 3.0 adds a Reasoning Module to IDAC

2.0. As part of the Reasoning Module, IDAC 3.0 updates and simplifies the PSF network and integrates the PSFs into the Reasoning Module.1 In this dissertation, the treatment of

PSFs in IDAC 3.0 is referred to as the “Simplified PSF Network.” The IDAC student- operator simulations in Chapter 3 are generated using IDAC 3.0, and the Objective PSF equations in Chapter 5 are derived from the IDAC 3.0 Simplified PSF Network.

1.2.3.2 The Full PSF Network The Full PSF Network proposed in IDAC-I is a network of 48 factors that influence operator decision making. The Full PSF Network includes PSFs, plant parameter data, and details from the evolution of the event (for example, the number of diagnoses an operator has made before making the current diagnosis of the plant state). The Full PSF

Network is illustrated in Figure 1.2. Six PSFs in the Full PSF Network as classified as

“Major PSFs,” i.e., PSFs that have a significant impact on operator performance: Global

Condition Assessment (GCA), Time Constraint Load (TCL), Task Related Load (TRL),

1 In IDAC nomenclature, PSFs are referred to as Performance Influencing Factors (PIFs). We call them PSFs because this term is more common in the HRA community.

21

Passive Information Load (PIL), Attention to Current Task (ACT), and Stress (S). These

PSFs are defined in Appendix A. The Major PSFs are circled in red in Figure 1.2. GCA influences TCL and TRL; TCL, TRL and PIL influence Stress, and Stress influences

“Response by operator,” i.e., operator success or failure.

22

Figure 1.2: The Full PSF Network. Major PSFs are circled in red; figure modified from (Sundaramurthi & Smidts, 2013).

23

As a first step towards validating the causal relationships in the Full PSF Network,

Sundaramurthi populated as much of the network as possible with empirical data from the HERA database. The data were collated and categorized in Groth’s PSF hierarchy

(Groth K. M., 2009), which is built on a set of 144 accidents and events recorded in the

HERA database that involved human error. Only 16 of the 48 nodes in Figure 1.2 can be populated by the HERA data; there is no information about the other nodes in the HERA database. Several elements of the Full PSF Network only make sense for dynamic evaluations—for example, the number of failed diagnoses—and these and other elements of the network are not recorded in the HERA database.

Sundaramurthi created a probabilistic Bayesian Network from the sixteen “visible” nodes of the Full PSF Network. To populate the BN, each event in the HERA database was reviewed and the associated sixteen nodes were tagged as either “present” or “absent.”

Factors that are “present" increase the likelihood of human error, i.e., TCL is “present” if

TCL is believed to have had an adverse effect on the operator. Table 1.3 is an example of a HERA event and the associated PSF tags. The populated BN is shown in Figure 1.3; the sixteen PSFs are defined in Appendix A. PIL is the only Major PSF that is not included in the data-populated BN; PIL is excluded because the effect of passive information was not recorded in the available event reports in the HERA database.

24

Table 1.3: Sample HERA event description with associated PSF tags. Event Description: With falling RCS pressure, operators performs three successive power bumps in an attempt to raise reactor pressure while incorrectly believing there was a RCS cooldown. PSF Present? (Y/N) Time Constraint Load Y Stress Y Task Related Load N Training Y Group Cohesiveness Y HMI Y Control Room Distractions N Global Condition Assessment N Procedure Quality/importance Y Attention N Favorable operation schedule Y Operator status N Safety Culture Y Failed Diagnosis N Failed strategies/actions N Operator Ascendancy Y

The final node in the PSF BN is “Human Error,” marked as “98% Present” because the

HERA database records adverse events. In over half of the human error events in the

HERA database, Stress is “present.”

25

Figure 1.3: The 'visible' portion of the Full PSF Network, with Present/Absent probabilities from the HERA data. Again, Major PSFs are highlighted in red. Figure modified from (Sundaramurthi & Smidts, 2013).

26

1.2.3.3 The Simplified PSF Network The Simplified PSF Network is part of the IDAC 3.0 reasoning module, which reworks and simplifies the Full PSF Network. In the Full PSF Network, plant conditions etc. lead to adverse PSFs, which lead to human error. The Simplified PSF Network quantifies

PSFs from a set of PSF surrogates and operator characteristics. Instead of tying PSFs directly to errors, the reasoning module links the quantitative PSFs to decision making factors, which in turn lead to operator success or operator error. There are four types of components in the reasoning module:

• PSF Surrogates: Factors that are used to calculate or estimate PSF values.

• Quantitative PSFs: PSFs that are quantified as a function of the PSF surrogates and, in

some instances, of operator characteristics.

• Operator Characteristics: Operator-specific characteristics that impact how the operator

handles new information, makes decisions, and identifies accidents.

• Decision Making Factors: Factors that impact an operator’s reasoning and decision-

making.

The relationships between these factors are illustrated in Figure 1.4. Each factor is defined in Appendix A. The Simplified PSF Network is the set of quantitative PSFs and the PSF surrogates and operator characteristics used to quantify the PSFs.

27

Figure 1.4: The Simplified PSF Network and associated surrogates, operator characteristics and manifestations (from (Li, 2013)).

Four of the six Major PSFs from the Full PSF BN are retained in the Simplified PSF

Network: TCL, PIL, TRL (renamed Cognitive Task Load, CTL2) and Stress. The other two Major PSFs are incorporated into other elements of the reasoning module. GCA is the operator’s assessment of the plant condition. In the Full PSF Network, GCA is influenced by plant parameters and trends, which are PSF surrogates in the IDAC 3.0 reasoning module. ACT is the operator’s attention to the current task. IDAC 3.0 replaces this metric with a probability that the operator’s attention will shift; the decision making factor “Attention Span” quantifies the operator’s readiness to shift attention to another task or topic.

2 TRL is renamed CTL to reflect the nature of an operator’s work; an operator’s tasks are cognitive, not physical, hence Cognitive Task Load instead of Task Related Load (Li, 2013).

28

Two new PSFs are introduced in the Simplified PSF Network: Task Complexity (TC) and Fatigue (F). The reasoning model calculates Fatigue as a function of CTL, Stress and time into the accident. Fatigue is not addressed in this dissertation, as Fatigue is not expected to change significantly over the short timeframe of the OSU simulator sessions.

Information about operator fatigue was not collected in the half-hour to hour long simulator sessions conducted for this research.

Figure 1.5: The Major PSFs in the Full PSF Network (left) and the Simplified PSF Network (right). Items in bold are present in both versions of the PSF network. Rectangles represent PSFs; ovals are factors that are classified as Major PSFs in the other version of the network.

Quantitative TC is determined by the system dynamics and the operator confusion. In the

Full PSF Network, cognitive factors related to operator confusion are distributed throughout the network. These factors contribute to three major PSFs: they are direct

29

inputs to TCL and CTL, and they are indirect inputs to Stress (through TCL and CTL). In the simplified network, all the cognitive effects are consolidated in TC. Stress is affected by these factors through TC, but the other Major PSFs are not.

Table 1.4 maps the factors that influence the major PSFs in the Full PSF Network to the factors that are used to calculate the PSFs in the Simplified PSF BN.

Table 1.4: Comparison of the Major PSFs in the Full PSF Network (middle column) and the Simplified PSF Network (right column). PSF Full PSF Network Simplified PSF Network PIL Alarms: # of active alarms # active alarms Importance of active alarms Perceived # of alarms Perceived Importance of Alarms Alarm rate Operator ascendancy TCL All alarm inputs Plant parameters (GCA) Operator ascendancy Global Condition Assessment (GCA, determined by plant parameters) Number of failed strategies* Number of failed diagnoses* Confidence in diagnosis & action package* CTL Referred to as Task Related Load Rechristened Cognitive Task Load Alarm inputs because all the operator tasks are Operator ascendancy cognitive, not physical GCA Procedure task difficulty Number of failed strategies* Operator Expertise Number of failed diagnoses* Confidence in diagnosis & action package* Working conditions: Control room distraction HMI operating schedule TC Not identified as a single value; dispersed System dynamics (plant parameters) Operator confusion Stress PIL, TCL, CTL PIL, TCL, CTL, TC *These factors are replaced by TC in the simplified PSF network

30

2 OSU NPP Simulator Experiment Overview

The data used in this research were collected in the OSU NPP Simulator Facility during three iterations of a NPP Systems and Operations Course taught in the facility from 2013

- 2015. This chapter describes the OSU NPP Simulator Facility, including the simulator setup and the data collection tools (Section 2.1), the training the student operators received (Section 2.2), the accident scenarios used in the experiments (Section 2.3), data collected from the student operators (Section 2.4), and a list of key terms in this dissertation (Section 2.5). Finally, Section 2.6 reviews a typical crew’s progress through an accident scenario.

2.1 OSU NPP SIMULATOR FACILITY The OSU Nuclear Power Plant Simulator Facility is a full-scope, partial scale digital simulator of a commercial power plant control room (Figure 2.1). Currently, OSU is using GSE Systems’s generic Pressurized Water Reactor (GPWR) simulator (GSES,

2015). This plant is a three-loop, 1000MW Westinghouse Pressurized Water Reactor

(PWR).

31

The GPWR runs on a server in the simulator facility, where it can be accessed simultaneously by four student computer workstations. Digital displays mimic the analog instrumentation and controls found in conventional control rooms.

The OSU NPP Simulator Facility is equipped with two video cameras and a microphone to record activity in the simulator facility. The recording equipment is coupled with

Noldus Observer XT data collection and coding software, a key element of the data collection scheme for this project (Noldus, 2017).

2.1.1 OSU Simulator Facility setup Analog control panels are visually recreated on interactive computer displays. NPP control panels are displayed across four workstations, each equipped with two touchscreen displays. Figure 2.1: Student operators in the OSU NPP Simulator Approximately fifty percent of the main Facility control board panels can be displayed on the workstations at any given time. Although workstations are designated for specific areas in the control room (e.g., one workstation is primarily used for the pressurizer panel), operators can access any panel on the main control board from any workstation. In addition to the student workstations, there are eight displays located next to the workstations that can be customized to display any panels that are relevant to the task operators are performing. Typically, these are used to display radiation monitors, the rod position indicators, and recorder panels tracking

32

variables such as Reactor Coolant System (RCS) temperature, RCS pressure and containment pressure.

Although the simulator software is a full-scale digital model of a commercial nuclear power plant, there are several disparities between the original plant and the digital simulator. The simulator used in this study projects images of the analog panels onto a series of computer monitors. All the controls are accessible to the operator, but they are arranged and accessed differently, and are manipulated through touch screens or a mouse and keyboard. Importantly, the commercial plant’s digital displays are not included in the simulator, so operators do not have any of the supplemental information that professionals rely on for quick information.

In 2015, the facility was updated to the latest GPWR software. This upgrade included the addition of the Large Screen Overview Display (LSD, see Figure 2.2) developed for the

GWPR by the Institute for Energy Technology (Hakon, n.d.).

33

Figure 2.2: One of the Large Screen Displays showing the Chemical and Volume Control System, the Pressurizer, and the three Steam Generators, along with numerous other indicators.

The LSDs provide information at a glance about most of the critical parameters in the

NPP. Both the Senior Reactor Operator (SRO) and the Reactor Operator (RO) can view the LSD. The LSDs also provide trends and summary information not available on the

Main Control Board.

2.1.2 Human Behavior Data Collection The GPWR software tracks plant parameters such as pressures, levels, temperatures, etc., as well as component states (valve status, pump status, etc.). Alarms, operator actions on the Main Control Board, and instructor actions (such as inserting a malfunction) are also recorded.

These logs are supplemented by audio and video recordings and observer logs made using the Noldus Observer XT software. Figure 2.3 is a screenshot of the two-angle video capture system. In the current configuration, a researcher sits at the observer station

34

inside the simulator facility, out of the way of the operators, to log events as they occur.

The result is a time-stamped log of the current procedure, procedure step, and other items of interest.

Figure 2.3: The data collection system captures two views of the OSU NPP Simulator Facility, where a researcher interacts with the GPWR program

• Live coding: During the scenario, the Noldus software allows the researcher to code

events when they occur. The researcher makes notes during the scenario on the operator

actions as well as operator discussions and points of confusion. These notes are

timestamped so they can be synced with the simulator logs after the scenario. A simple

coding scheme was developed to assist live coding simulator sessions. The types of

events recorded are: Start New Procedure, Start New Procedure Step, Monitor

Parameters, and Other. Additional event types that would be useful to add are Start

Wrong Procedure/Wrong Procedure Step. After coding the type of event with a quick

keyboard shortcut, the researcher can choose a sub-event (e.g., select the name of the

procedure from a list) and add comments describing the event. In this research, many of

the comments are bits of conversation between the operators that characterize the

35

moment. This record becomes the basis of a scenario narrative that can be referenced in

the analysis. Figure 2.4 is a screenshot of the timestamped log generated during the

scenario. Most of the comments are direct quotes from the student operators.

• Scenario review: Immediately following the scenario, operators review a video of the

event and evaluate relevant PSFs. They also complete a survey about how they prioritized

alarms, the conditions in the simulator facility during the scenario, their mental state, and

their perceptions of bias due to the simulator environment (Shirley, Smidts, Wang, &

Gupta, 2014). For the PSF evaluation, the researcher pauses the video at various points

throughout the scenario and asks the operators to evaluate the PSF levels they

experienced at that point in the scenario. The Observer XT software is used in the

scenario review to timestamp the PSF evaluation points and note the context in the

scenario for each PSF evaluation.

Figure 2.4: Example of the data collected during a simulator session in the Noldus live coding system.

36

2.2 OSU NPP SYSTEMS AND OPERATIONS COURSE The OSU NPP Systems and Operations Course was developed in tandem with this research. As part of the course, each student is assigned two NPP systems to study and present to the class. After learning about a system in class, students go into the simulator to identify the key controls and learn the practical aspects of that system in the NPP.

When feasible, student operators complete a short exercise in the simulator related to that week’s topic. Table 2.1 lists the weekly class topics.

Table 2.1: NPP Systems Course and Operations syllabus. Topic Introduction to Pressurized Water Reactors (PWRs) & Introduction to Design Basis Accidents Reactor Coolant System Instrumentation Excore Nuclear Instrumentation Pressurizer Pressure Control System Pressurizer Level Control System Chemical & Volume Control System Main and Auxiliary Steam System Condensate and Feedwater System Steam Generator Level Control/Steam Dump Control System Main Turbine and Moisture Separator Reheater Rod Control Auxiliary Feedwater System Reactor Protection System Design Basis Accidents: Steam Generator Tube Rupture Design Basis Accidents: Loss of Coolant Accident Design Basis Accidents: Steam Line Break

Students also study three design basis accidents: the steam generator tube rupture

(SGTR), the loss of coolant accident (LOCA), and the steam line break (SLB). After presenting these accidents in lecture, students review the emergency operating procedures

37

(EOPs) and practice responding to these classic accident scenarios. They memorize normal levels for key plant parameters, learn to assess the size and significance of a RCS leak, and follow procedure guidance to trip the reactor, isolate a malfunctioning steam generator and ensure safety injection is maintaining a safe level in the reactor.

2.2.1 Using Students as Research Subjects The Ohio State University Institutional Review Board (IRB) approved the research protocol and the research subject consent process. Students in the NPP Systems and

Operations course were not required to participate in the experiment, and students were guaranteed that participation in the research would not impact their course grade. In research studies with human trials, personal identifying information is removed from the data so that results cannot be tied to the research subjects. In the case of the video logs, confidentiality was not possible. Instead, as part of the consent form, student operators designated acceptable uses of their videos and associated images: research publications, educational materials, promotional material, or none of these.

2.3 SIMULATOR SESSIONS Three preliminary studies were conducted for this research (summarized in Table 2.2). In each study, students participated in the Nuclear Power Plant Systems and Operations course. Data from the second and third study are used in this analysis; data from the first study are not included because there were significant changes to the operator training program and the experimental design following the initial data collection in 2013.

38

In the simulator experiments, students worked in two-person crews. Each crew had a Senior

Reactor Operator (SRO) responsible for reading procedures and directing control room activities; and a Reactor Operator (RO) responsible for reading control board indications and performing actions as directed by the SRO. These roles correspond to the Decision

Maker (DM) and Action Taker (AT) in the IDAC model. Simulator sessions were recorded on video. After each simulator session, operators completed a post-scenario survey to assess the bias during the session.

Table 2.2: OSU NPP Simulator Experiments Year Crews Summary 20133 6 crews: 3 students, Counterbalanced, within subjects design each student Each crew completes a simple and a complex version of each scenario, switching participated in 2 roles (SRO/RO) between the two versions. Scenarios: crews Steam Generator Tube Rupture (SGTR); SGTR in two loops Loss of Coolant Accident (LOCA); LOCA + reactor coolant pump trip Steam Line Break (SLB); SLB + stuck-open main steam isolation valve (MSIV) 20144 14 crews: 14 Between subjects design students, each in 2 Each student completes a simple and complex version of the SGTR. Crews are crews with different randomly assigned; students do not have the same partner for the two scenarios. partners Scenarios: SGTR (simple); larger SGTR + MSIV stuck open (complex) 2015 8 crews: 16 students, Mixed subjects design each in 1 crew with 2 phases: practice (2 scenarios, ungraded) and exam (2 scenarios, graded). 2 the same partner treatments: Simple (4 crews, only simple scenarios) and Complex (4 crews, simple throughout and complex scenarios): SGTR (simple); SGTR + SLB (complex) LOCA (simple); LOCA preceded by stuck open Pressurizer PORV (complex) Treatment 0, Practice: SGTR, LOCA; Exam: LOCA, SGTR Treatment 1, Practice: SGTR, Complex LOCA; Exam: LOCA, Complex SGTR

3 (Benish, Smidts, & Boring, 2013) 4 (Shirley, Smidts, Boring, Mosleh, & Li, 2015), (Wang, Benish, & Smidts, 2014)

39

2.4 DATA COLLECTION

Data collected in each iteration of the experiment include:

• Operator background survey responses

• Course grades and class work from the NPP Operations course for each student

• Post-scenario surveys following each simulator scenario

• Video/audio recordings of each scenario

• Dynamic PSF assessments of each scenario

• Plant parameters, alarm logs, and operator action logs recorded by the simulator software

Prior to beginning the experiment, operators completed the Operator Background

Questionnaire (see Appendix B), providing information about operator personality and baseline stress. Static data were expected to be consistent throughout the simulator session and were collected through a post-scenario questionnaire. Dynamic PSF data evolved throughout the accident. To evaluate dynamic PSFs, students remained in the simulator facility to watch and evaluate their experience. Researchers paused the video at ten pre-determined procedure steps and asked operators to evaluate their PSFs on a chart provided.

Table 2.3 summarizes the data collected in the OSU NPP Simulator Experiment.

40

Table 2.3: Data collected in the OSU NPP Simulator Experiment Data Source Data Collected Frequency Simulator Logs Plant parameters Continuous Alarms Operator actions on the main control board (closing valves, silencing alarms, etc.) Observer logs Operator actions Discrete evaluations (timestamped) Current procedure step throughout scenario: new procedure, new procedure step, unexpected event. Operator Operator baseline stress Once for each operator Background Operator personality Questionnaire Post Scenario Operator-reported simulator Once per scenario Questionnaire Bias Operator-reported static PSFs PSF Logs Operator reported PSF data Discrete evaluations (timestamped) throughout scenario (10-12 evaluations)—see Section 2.4.2

2.4.1 Simulator Bias and Static PSF Data After evaluating the dynamic PSFs, operators completed the post scenario questionnaire, a mix of yes/no, scaled and open-ended response questions that were written using the

Survey Writing Checklist (Table 2.4) developed for this research based on survey design best practices (Bradburn, Sudman, & Wansink, 2004). The static data questionnaire collected information on two topics: simulator bias and static PSFs. Figure 2.5 shows a small portion of the post-scenario questionnaire.

41

Figure 2.5: Example of the Post-Scenario Questionnaire, showing two questions about simulator bias.

The full post-scenario questionnaire is in Appendix B.

Table 2.4: The Questionnaires Checklist

Questionnaires Checklist

Questions about Questions:

□ What do I want to know? Why am I asking this question? Does this question ask what I want to know? □ Is this question threatening in any way? □ How could this question be (mis)interpreted? □ Is this question specific? o 5 Ws: who, what, when, where, why o How could this question be more specific? □ What words in this question are…? o Unnecessary o Ambiguous o In need of further clarification

Continued

42

Table 2.4 Continued

□ Does this question use and/if/or/not/but/etc. appropriately? □ Does this question require further explanation? □ Does this question ask only one question, along only one dimension? o Does this question differentiate between opinion and strength of opinion? o Double-barreled/“one and a half barreled” question? □ How does this question discourage over-reporting/under-reporting, if appropriate? □ Behavior questions: o Does this question ask about frequency or likelihood? (frequency for frequent behavior, likelihood for less frequent activities) □ Is this question bounded by another question?

Questions about Answers:

□ Checkboxes: are options…? o Exhaustive o Mutually exclusive o Appropriately broad/narrow o Evenly distributed □ Does ranking/scale go from low to high/bad to good/left to right/bottom to top? □ Numerical ranges: is zero a separate category? □ Scales: o Are scales strongly anchored at each end? o Do anchors represent the end of the scale? o 7 points in each scale? Questions about Questionnaires

□ Human factors considerations: o Font o Layout o Separate lines for separate questions o Questions and answers logically displayed □ Is the question flow logical? □ Are threatening questions situated thoughtfully? □ How are previous questions likely to influence responses to later questions? □ Are demographic questions included at the end? □ Do general questions precede specific questions?

43

2.4.2 Dynamic PSF Evaluation Two versions of the dynamic PSF evaluation chart were used in the OSU NPP experiments (Figure 2.6 and Figure 2.7). The second version, introduced in 2015, added new factors that were not included in the 2014 version. In both versions, students rated the PSFs on a scale from 0 (minimum) to 10 (maximum). Although survey writing best practices suggest scaled questions should have five to seven levels, a 0-10 scale was selected to map easily to the IDAC PSF model, which rates PSFs from 0.0, 0.1, 0.2, …

1.0.

Operators were provided with a copy of the definitions of the PSFs a nd practiced this evaluation process several times during training before beginning the experiment. Researchers had the option to pause the video at other points in the scenario to collect additional PSF data, resulting in at least ten PSF evaluations per scenario.

The PSF evaluations occur at the points listed in Table 2.5.

The PSF evaluation points in Table 2.5 are a mixture of familiar, routine procedure steps

(e.g., E-0 Foldout, 12 and 21) and important cognitive moments in the scenario.

Additional PSF evaluations are done at the researcher’s discretion, typically at points in the scenario when operators were confused or deviating from the expected path. Section

2.6 outlines the expected path through the procedures.

44

Table 2.5: Scenario PSF Evaluation points. Additional evaluations at researcher's discretion. Procedure Activity No procedure Pre-alarms Annunciator Panel Procedures (APPs) Response to first alarms Abnormal Operating Procedures (AOPs) Determine size of RCS leak Enter Emergency Operating Procedure (EOP) Reactor Trip E-0 E-0 Foldout Monitor parameters E-0, 12 Isolate Main Steam E-0, 21 Stabilize RCS Temperature E-0, 26-30 Diagnosis [scenario dependent] Initial accident-specific steps [scenario dependent] End of scenario

Figure 2.6: The 2013-2014 PSF Evaluation Log

Figure 2.7: The updated 2015 PSF Evaluation Log

45

Table 2.6 lists the guidelines operators received to anchor their PSF ratings.

Table 2.6: Guidelines given to the student operators for the PSF evaluations MINIMUM  0 – 1 – 2 – 3 – 4 – 5 – 6 – 7 – 8 – 9 – 10  MAXIMUM Stress How stressed are you feeling right now? • 0: No Stress • 5: Moderate amount of Stress • 10: Completely overwhelmed Global Condition Assessment What is your assessment of the plant condition right now? • 0: Normal operations, no problems • 5: Accident conditions • 10: Core melt Time-Constraint Load Is there sufficient time available to complete all the required tasks? • 0: Plenty of time • 5: If I work efficiently, time is not a concern • 10: No time available; completely impossible to complete tasks in time Passive Information Load How much information are you receiving (from extraneous alarms) that is not related to the work you are doing? • 0: None • 5: Some alarms are sounding, but I can process them easily • 10: The alarms are completely overwhelming; it is impossible to focus on the task at hand Attention to Current Task What fraction of your cognitive energy is being directed toward your current task? • 0: None; I’m thinking about something else entirely • 5: Average • 10: All my attention is focused on this task Cognitive Task Load How difficult is the current task? • 0: I do not have to think about it at all • 5: Moderately difficult • 10: Difficult, requires a lot of thinking and concentration

Continued

46

Table 2.6 Continued

Complexity How complex is the current plant situation? • 0: The situation is simple • 5: The situation is moderately complex • 10: The situation is extremely complex Confusion How much do the current plant indicators confirm or confuse your current diagnosis of the plant state? • 0: Nothing contradicts my understanding of the current plant condition • 5: Some plant indicators contradict my understanding of the current plant state • 10: Everything contradicts what I thought was happening in the plant. Confidence in Current Diagnosis What is your confidence in your current diagnosis? • 0: I have no idea what is happening in the plant • 5: I am fairly confident I know what is happening in the plant, but I could be wrong • 10: I know exactly what is happening in the plant

2.5 A NOTE ABOUT TERMINOLOGY: SIMULATORS AND SIMULATIONS This research compares simulator data—data collected from human operators in a digital simulator—to simulation data—data generated by simulating human actions using the

IDAC HRA model computer code. In this document, simulator always refers to the digital simulator facility, while simulation or simulated refers to data generated by the

IDAC program. Table 2.7 defines other terms that may benefit from further clarification.

47

Table 2.7: Key terms used in this dissertation. Term Definition Simulator OSU’s digital NPP simulator facility; data collected or activities conducted in the facility Simulator session One data collection session in the simulator facility. During a simulator session, a two-person operating crew responds to an accident scenario, evaluates their PSF levels and completes the post-scenario questionnaire. Observation The set of data collected from one operator during a simulator session. An observation consists of ten or more sets of PSF evaluations, all from the same accident scenario. PSF evaluation The set of an operator’s reported PSFs at a particular time in a scenario. There are ten or more PSF evaluations in an observation. Scenario The accident scenario (SGTR or LOCA) in a simulator session. Simulation IDAC-generated computer simulation of operator PSFs/cognition/actions/etc.

2.6 SCENARIO NARRATIVES This section describes the expected events in an accident scenario and summarizes the

OSU scenarios.

The OSU scenarios begin with a distractor task such as an unfamiliar maintenance procedure. After the accident begins, an alarm sounds. Operators use the Annunciator

Panel Procedures (APPs) to respond to the alarm. The APPs require the operators to determine if the RCS is leaking; if so, operators enter the Abnormal Operating Procedure

(AOP) 16, Excessive Primary Plant Leakage. Step 4 in AOP 16 directs operators to trip the reactor if the leak is too large to be managed by the Volume Control Tank. When the reactor trips, the operators enter Emergency Operating Procedure (EOP) E-0, Reactor

Trip and Safety Injection. Following the procedure, the operators confirm that safety injection is functioning if needed and that electric power is available to the pumps. Then,

48

still following the procedure, the operators work through a series of questions to diagnose the event. When the underlying event has been determined, the operators transition to the appropriate follow up EOP: E-1 for a LOCA, E-2 for a Steam Line Break (SLB), or E-3 for a SGTR.

OSU scenarios terminate shortly after the transition to the second EOP. SGTR scenarios end after the ruptured SG is isolated, and LOCA scenarios end when the operators begin to prepare for long term cooling.

Figure 2.8 illustrates the expected progression through the scenario. In Figure 2.8, y is the procedure step index, which corresponds to the y-axis values (“Procedure Step #”) in the

Scenario Narrative Diagrams (Figure 2.9).

49

Figure 2.8: Expected OSU scenario events.

The Scenario Narrative Diagrams (Figure 2.9) plot the crew’s progress through the procedures over time. For AOP-16, E-0, E-1, E-2 and E-3, y corresponds to the step number in the procedure.

Negative Procedure Steps in Figure 2.9 and Figure 2.10 have three purposes:

- The step index is negative before the reactor trip. Each phase of the alarm response is

assigned a specific negative step index corresponding to the order the alarms are

activated. Typically, the first alarm is a radiation alarm, followed by pressurizer alarms.

50

In the case of the leaky PORV prior to the full LOCA (the complex LOCA scenario in

2015), the Pressurizer Relief Tank alarm is the first alarm.

-15 = Pressurizer Relief Tank alarms (APP 009):

-10 = Radiation Alarms (APP 010)

-5 = Pressurizer Alarms (APP 009)

-12 = Automatic reactor trip

- The step index is negative after the reactor trip if the operators are in the wrong procedure

(if they are on Step 3 in the wrong procedure, y = -3).

- The step index is y = -5 whenever operators are working without a procedure.

The negative Procedure Step indices repeat and must be inferred from context. Figure 2.9 diagrams the 2015 Crew Number 4 scenarios: Practice A (PA), Practice B (PB), Exam A

(EA) and Exam B (EB). For these operators, the first Practice session (Practice A, or PA) and the second Exam session (EB) are SGTRs; the second Practice (PB) and first Exam

(EA) sessions are LOCAs. This crew followed the procedures as expected, but worked much slower in their first session (PA SGTR) than in the later sessions.

51

Figure 2.9: Crew #4 Scenario Narratives

Figure 2.10 diagrams 2015 Crew Number 1 scenarios. These operators had more trouble following the procedures but successfully completed all four scenarios. For Crew

Number 1 encountered a mix of familiar and unfamiliar scenarios. PA was an SGTR but

EB was an SGTR complicated by a subsequent SLB; PB was a LOCA preceded by a stuck open pressurizer PORV, while EA was a LOCA.

52

Figure 2.10: Crew #1 Scenario Narratives.

2.7 SUMMARY Over three years, 33 student operators were trained in the OSU NPP Simulator Facility and participated in the OSU NPP Simulator Experiments. In total, these operators completed 52 accident scenarios. Accidents included 32 SGTRs, 18 LOCAs, and 2 SLBs.

Eighteen accident scenarios had a secondary fault. Data collected include plant parameter data, procedure logs, scenario narrative event logs, dynamic self-reported PSF records, and bias questionnaires. Video and audio records of the simulator sessions are available as a resource for future research and analysis. The rich data set collected in the OSU

NPP Simulator Facility is the basis of the analysis presented in Chapters 3, 4 and 5.

53

3 IDAC Simulations and Student Operators

As discussed in Chapter 1, this dissertation reviews three analytical approaches for using simulator data in HRA. The first of the three approaches is a qualitative comparison of the student operators’ experience in the OSU NPP Simulator Facility to the dynamic operator simulations generated by the IDAC HRA Model.

In this chapter, data collected in the six 2014 SGTR simulator sessions are compared to

IDAC simulations of the same accident. The IDAC simulation code was modified to better represent expected student operator performance. The IDAC simulations include three operator decision-making strategies.

This chapter briefly describes the modified IDAC model, then compares three aspects of performance: the plant parameters, procedure pacing, and reported PSFs.

3.1 THE IDAC SIMULATIONS The scenario selected for the pilot study is a Steam Generator Tube Rupture (SGTR).

This scenario was selected because of the large body of work that has already been developed around SGTR incidents, including work done in IDAC. The IDAC simulations were conducted by Dr. Yuandan Li at University of Maryland using a modified version of IDAC 3.0 (Li, 2013). The scenario used in the simulation was a 450 gallon per minute

Steam Generator Tube Rupture in Steam Generator A.

54

3.1.1 Variability in the IDAC Model Two types of variation are built into the IDAC simulations: scenario branch points and operator decision-making style.

Branch points: IDAC models variability in crew performance using discrete dynamic event trees (DDETs) with various branches to model crew variations such as slow or fast procedure execution, skipping procedure steps, relying on memorized information, etc.

(Li, 2013). The simulations used in this experiment include one significant branch point: when they recognize a SGTR, operators either continue through the emergency procedures (Branch 1) or jump directly into the emergency procedure for SGTR response

(Branch 2).

The student operators at OSU were trained not to skip steps in the procedures; therefore, they did not jump to the SGTR response procedure. Thus, all of the comparisons between the OSU results and the IDAC simulations are based on Branch 1 of the IDAC output.

Decision Making Style: The simulations mimic three types of crew behavior, labeled

Garden Path (GP), Vagabond (V), and Hamlet (H). Operators who follow the Garden

Path are likely to stick to a diagnosis even when faced with contrary indicators.

Vagabond operators jump from one diagnosis to another without fully investigating their current theory. Hamlets are essentially the opposite of Vagabonds; they thoroughly investigate and constantly seek new information to validate their theories (Li, 2013).

3.1.2 Student Operator Model IDAC was developed to model expert operator behavior. To model student operators

(novices) rather than experts, several items in the knowledge base were changed or

55

removed to match the student operators’ limited knowledge. The knowledge base is coded as a set of corresponding “upstream” and “downstream” phenomena such as “T- average Increasing” and “Pressurizer Level Increasing.” These items are related by a variety of links; for example, some are causally linked, some are both consequences of a third event. The students’ instructors reviewed the existing IDAC knowledge base and identified items to be removed.

Table 3.1 is a portion of the IDAC Knowledge Base. The highlighted item was removed in the Student Operator Knowledge Base because students in the 2014 experiment were not expected to recognize immediately that the steam generator level would increase following a reactor trip.

Table 3.1: A portion of the IDAC Knowledge Base highlighting an item removed from the Student Operator Knowledge Base. upstream phenomenon downstream phenomenon TDAFWP_X_V_close_smaller AFW_X_flow_decrease FW_X_flow_>_MS_X_flow SG_X_level_increase reactor_trip_turn_on SG_X_level_increase SG_X_ruptured SG_X_level_increase SG_A_ruptured SG_A_level_>_SG_B_level FW_A_flow_>_FW_B_flow SG_A_level_>_SG_B_level SG_A_ruptured SG_A_level_>_SG_C_level

3.1.3 IDAC Performance Shaping Factors The PSF values generated in the IDAC simulations include Stress, Time Constraint Load

(TCL), Cognitive Task Load (CTL), Passive Information Load (PIL), and Task

Complexity (TC). Stress is modeled as a linear function of the four other PSFs:

56

1 푆푡푟푒푠푠 = ×(푇퐶퐿 + 퐶푇퐿 + 푃퐼퐿 + 푇퐶) 4

In ADS-IDAC, the four contributing PSFs are given equal weight in the Stress equation.

One objective of the research in Chapter 5 is to obtain data-driven coefficients for these input parameters.

The four contributing PSFs are calculated as functions of the plant parameters, procedure step, alarms, operator expertise, and the operator’s current diagnosis. The algorithms used to calculate these values are discussed in greater detail in Chapter 5.

3.2 OSU DATA The OSU data used in this analysis include simulator data and student-reported PSF data.

The simulator data includes pressurizer pressure, pressurizer level, steam generator levels, and Reactor Coolant System temperature.

The OSU PSF data are reported by the operators as discussed in Chapter 2. The PSFs are evaluated periodically by the operators while watching a video of the simulator session.

PSF evaluations are conducted at the pre-selected points in the scenario listed in Table 2.

5, and at additional points in the scenario at the researcher’s discretion. The PSFs include

TCL, PIL, CTL and Stress, among other factors.

3.3 RESULTS Two aspects of the operator performance are compared: the operator’s progression through the scenario, and the PSFs reported by the operators or generated in the IDAC simulation. The operator actions and pacing through the scenario are an indicator of how

57

well the student operators match the IDAC model in objective performance, and the PSFs are an indicator of how well the model captures subjective, internal factors.

3.3.1 Pressurizer Level Pressurizer level is used to compare the OSU scenarios to the IDAC simulation results.5

Figure 3.1: Pressurizer level in IDAC simulations and OSU simulator sessions. A1-A7 are the OSU crews; H and V and the IDAC Hamlet and Vagabond simulations respectively.

For simplicity, we display two of the three IDAC crew simulation results: Hamlet and

Vagabond. The third crew, Garden Path, tracks closely with the Hamlet results, because

5 OSU crew A6 is not reported. A6 data was not recorded due to a simulator error.

58

both Hamlet and Garden Path require more evidence to switch action plans than the

Vagabond operators. We display only Branch 1 results, as this is the branch that follows the procedure path taken by the student operators.

In Figure 3.1, the IDAC simulation results (black and brown lines) show that ADS-IDAC plant model is similar to the GPWR simulator, although the normal pressurizer level is slightly higher in the GPWR (60% vs 55%) and the level decreases at a slightly faster rate. This is an indicator of a discrepancy between the ADS-IDAC thermal hydraulics code and the GWPR thermal hydraulics code; the two simulator engines are designed around slightly different plant models. As expected, the two plant models show similar behavior: pressurizer level remains constant until the SG ruptures 30 seconds into the scenario, at which point pressurizer level begins a steady decrease. When the reactor trips

(either manually or automatically), pressurizer level drops to zero.

The OSU scenarios are in rainbow colors. The red line corresponds to the only crew that tripped the reactor manually; the thick orange line is the average of the crews who waited for the reactor to trip automatically. These crews are shown in a variety of colors.

The reactor trips sooner in the IDAC simulations than in any of the student runs. While both IDAC crews quickly recognized the leak and decided to trip the reactor, only one

OSU crew manually tripped the reactor. This decision reflects their training and their inexperience: student operators were trained not to trip the reactor unless absolutely necessary, and they had not developed a method for estimating the significance of a leak in the RCS. The crews that did not trip the reactor all anticipated that the reactor would

59

eventually trip, but decided to allow the plant emergency safety features to trip the reactor rather than initiating the process.

Three critical events are marked with larger points in the plot: the reactor trip point; the transition from Path 1 (general accident response) to Path 2 (SGTR response); and finally, the scenario end point, when the ruptured steam generator is isolated. These markers provide a picture of the timing of key events in the scenario.

3.3.1.1 Discrepancy in Plant Models Figure 3.1 also shows a discrepancy between the IDAC plant model and the GPWR plant model. The pressurizer level drops to zero for all the scenarios, but pressurizer level begins increasing again in the OSU scenarios. The pressurizer level remains at zero in the

IDAC scenario, even though the IDAC simulations trip the reactor sooner, which should correspond to earlier recovery in the pressurizer. Investigation into this discrepancy revealed that the IDAC core model simulates a lower safety injection (SI) flow than the

GPWR, which results in a longer recovery time. The IDAC simulations end after the operators complete the procedure, before the pressurizer level recovers with the lower SI flow. We believe the discrepancy between core models has little to no effect on the results of the simulation or on the comparison between the OSU students and their computer generated IDAC counterparts; in this scenario, operators are focused on isolating the ruptured steam generator and the scenario ends before the operators turn their attention to the pressurizer. However, the discrepancy in plant models could be significant in other applications. If possible, future exercises that compare operators to

60

computer generated operators should use the same core models to simulate the plant physics.

3.3.2 Procedure Status The Procedure Status chart (Figure 3.2) shows crews’ progress through the procedures as a function of time. Steps zero through twenty correspond to the step number in Path 1, while steps 20 through 40 correspond to Path 2—we add twenty to the Path 2 step number to illustrate the crews’ progress through the two procedures. The colors match those in Figure 3.1, with the OSU crew that manually tripped the reactor highlighted in red, and the thick orange line representing the average of OSU’s automatic trip crews.

Figure 3.2: IDAC and OSU Crew progress through procedures.

61

Figure 3.2 shows that after Crew A2 manually tripped the reactor, the student operators progressed through the procedures at a similar pace to the other OSU crews. Although a few OSU crews deviated slightly from the expected procedure path, on average each crew worked through 24 steps: 16 in Path 1 and 8 in Path 2. The average step times are shown in Table 3.2.

Table 3.2: Average step time for IDAC and OSU crews.

OSU OSU Manual Average Step Time Hamlet Vagabond AutoTrip Trip (A2) Average

Time of Reactor Trip 328 267 481 750

Path 1, Total time 238 s 247 s 727 s 788±180 s

Path 1, Time per step ~ 15 s ~ 15 s ~ 45 s ~ 50±11 s (16 steps)

Path 2, Total Time 204 s 211 s 453 s 460±125 s

Path 2, Time per step ~ 26 s ~ 26 s ~ 57 s ~ 58 s ± 17 s (8 steps)

To compare the crew pacing, we examine the time taken in each procedure (Path 1 and

Path 2) and the average step time within each procedure. From Table 3.2, we see that in

Path 1, the H and V crews followed a similar pace through the procedure steps. The

IDAC crews both slowed down in Path 2, requiring almost twice as long per procedure step.

62

The OSU crews worked through the procedures at roughly the same pace. Even the crew that tripped the reactor manually, A2, proceeded at approximately the same rate as the crews that waited for the reactor to trip automatically. Like the IDAC crews, the students were slightly slower in Path 2 than in Path 1. These results suggest that for the students, working through the procedures was a rule-based activity with limited knowledge-based reasoning. Students had to think through each procedure step to ensure they were executing the step correctly.

In Path 1, OSU crews required three times as long as the IDAC simulation to complete the procedure steps. In Path 2, the OSU crews were only twice as slow as the IDAC crews.

3.3.3 Performance Shaping Factors The final comparison between the IDAC simulations and the OSU student crews is the

PSFs. The IDAC PSFs are generated by algorithms; the OSU PSFs are reported by the student operators.

Figure 3.3 and Figure 3.4 plot the IDAC PSFs as a function of time (left) and as a function of procedure step after the reactor trips (right). In Figure 3.3 and Figure 3.4,

Hamlet crews are red and Vagabond crews are black. The dotted lines in the left plots are the normalized pressurizer level, to provide context for the PSFs.

63

Figure 3.3: IDAC PSFs TCL and PIL. Left: PSFs over time, right: PSFs by procedure step; red: Hamlet, black: Vagabond.

64

Figure 3.4: IDAC PSFs CTL and Stress. Left: PSFs over time, right: PSFs by procedure step; red: Hamlet, black: Vagabond.

Figure 3.3 and Figure 3.4 show that the Hamlet and Vagabond PSFs are generally consistent. Hamlet PSFs tend to be slightly lower than Vagabond PSFs, but the PSFs plotted by procedure step show that this difference is rarely greater than 0.2, or twenty percent of the full range. The PSFs plotted by procedure step also show that in general

65

the IDAC PSFs are flat or decreasing after the reactor trips, with small variations. TCL decreases after the reactor trip, as does stress to a lesser degree; CTL has a slightly increasing trend overall.

The right columns in Figure 3.3 and Figure 3.4 show that evaluating the IDAC PSFs only at the procedure steps provides limited information about PSFs. These periodic snapshots do not capture the complete picture of the scenario; for example, the small spikes in PIL in the left column are not visible in the right column plot.

Based on the scenario pace, we expect the IDAC crews to “experience” lower PSFs than the OSU operators. The IDAC operators identify the problem sooner, make a proactive decision to trip the reactor, and begin recovery actions well before any OSU crews begin responding to the accident. We also expect the operator who decided to trip the reactor manually to report lower PSFs than the other OSU crews.

Figure 3.5 and Figure 3.6 add the OSU PSFs to the IDAC PSF plots. In these plots,

Hamlet is shown in gray and Vagabond in black. As in previous figures, the OSU crews are shown in a variety of colors, with the red line indicating Crew A2, the crew that tripped the reactor manually. Again, the dotted lines in the left plots show normalized pressurizer pressure. The dotted orange line is the average pressurizer pressure for the automatic trip scenarios, and the solid orange line is the average reported PSF for the automatic trip scenarios.

66

Figure 3.5: OSU and IDAC TCL and PIL. Left: PSF over time; right: by procedure step.

67

Figure 3.6: OSU and IDAC CTL and Stress. Left: PSF over time; right: by procedure step. The variation between OSU crews is a stark contrast to the similarity in the IDAC crews.

OSU crews report across the entire range, some report PSFs that increase over time while others report flat or decreasing PSFs. As expected, the A2 operator who tripped the reactor manually reports lower PSFs than the other OSU crews. Contrary to expectations,

IDAC PSFs are not always lower than OSU PSFs. IDAC TCL and CTL are greater than

OSU TCL and CTL overall. The IDAC PIL spike at the reactor trip is greater than any

68

OSU reported PIL throughout the scenario, but after the reactor trips IDAC PIL is less than OSU PIL. IDAC Stress is in the middle of the OSU range, rather than being consistently lower than the OSU Stress.

The IDAC TCL decreases after the reactor trips. OSU crews A4 and A5 report decreasing

TCL at the end of the scenario, but not in the middle of the scenario. Other crews report flat or increasing TCL.

IDAC PIL spikes at the reactor trip and then is essentially flat; all of the OSU crews who allowed the reactor to trip automatically report elevated PIL at or around the reactor trip, but PIL does not fall to zero after this spike (in the figure, use the sharp drop in pressurizer level to identify the time of the reactor trip).

IDAC CTL is generally increasing, although the increase slows after the reactor trips.

Crew A2 reports flat CTL until the reactor trips, then a steep increase in CTL to the end of the scenario. A1 also reports a generally rising CTL; CTL for the other OSU crews is flat with some noise.

Finally, IDAC Stress peaks at the reactor trip and declines to the end of the scenario.

OSU crews A4 and A7 reflect the IDAC trend of a peak around the trip followed by a general decline, but the other crews report flat or increasing stress.

3.4 DISCUSSION This exercise is a first attempt at comparing student operator performance to IDAC simulations. There are two areas for improvement in future studies: changes to the IDAC

69

student simulation model, and changes in the OSU training for student operators. The results from this exercise also inform the subjective PSF model in Chapter 5.

3.4.1 Suggestions for IDAC-Student Simulations Based on these results, a few changes can be suggested for modelling student operator activities with the IDAC model. First, the IDAC branch point models operators continuing through Path 1 and operators jumping directly to Path 2. Because none of the seven student crews jumped ahead to Path 2, this branch point should be eliminated for student operator modeling, or the probability associated with this branch should be very low. Instead, a second branch point could be added to reflect the student operators’ decision to trip the reactor manually after they recognize the leak in the RCS. The branching probabilities could be set based on the one crew that chose to trip the reactor out of the seven crews that completed the scenario. Second, timing for each step should be increased to reflect the student operators’ inexperience and overall slower pace in Path

1. Finally, PSFs could be adjusted to better reflect operator reported PSFs. As illustrated by TCL, the trends postulated by the IDAC model do not necessarily capture the student operators’ experience.

3.4.2 Lessons Learned for OSU Student Operators The most significant gap in the student operators’ expertise is their inability to assess the size of the reactor leak. Instead, all but one crew decided to wait for the reactor to trip automatically rather than take responsibility for tripping the reactor themselves. As a result of this study, assessing the size and significance of a reactor leak was incorporated into the hands-on training in the next NPP Systems and Operations Course at OSU. In

2015, operators tripped the reactor in more than half the scenarios.

70

A second outcome of this study has direct bearing on the research in Chapter 5 of this dissertation: the variation between operator reported PSFs. As the plots in Figure 3.5 and

Figure 3.6 show, any model designed to predict student operator reported PSF values must account for variability between operators. This variability is greater than the variability captured by the different decision-making styles modeled in IDAC.

Finally, in future efforts to compare operator performance data collected in a simulator facility with operator performance data generated by a computer simulation such as

IDAC, researchers should confirm that the core model used to model the physics in the plant is comparable. Although we believe the core model discrepancies in this study did not have a significant impact on operator procedure use or PSFs, this variation between core models could certainly impact other aspects of operator performance.

3.5 CONCLUSIONS While student operators are still novices in their new field of NPP operations, they are able to recognize an accident condition successfully, diagnose an SGTR, and respond to the accident by shutting down the reactor and isolating the ruptured steam generator from the rest of the secondary side. In the SGTR scenarios, student operators successfully navigated authentic plant procedures.

The PSF collection scheme developed for this experiment is a mechanism for obtaining dynamic measures of PSFs without artificially pausing the scenario for evaluations.

Although crews may be familiar with pausing a scenario from training simulators (for example, (Hallbert, Morgan, Hugo, Oxstrand, & Persensky, 2014)), evaluating PSFs using a video after the scenario allows crews to respond to the scenario naturally, so that

71

they do not have extra time to consider the plant state or devise a plan of action. Using the video review allows crews to re-live the scenario and explain their thinking process to the researcher without intervening in the actual scenario.

72

4 Simulator Bias

The second of the three analytical approaches discussed in this dissertation is an examination of the bias in the data due to the simulator environment itself.

The objective in this chapter is to develop a method for correcting the effects of the artificial simulator environment in studies conducted in Nuclear Power Plant (NPP) simulator facilities. Correcting for simulator bias involves identifying, measuring, manipulating and mitigating biases introduced by the simulator, as well as posing methods for correcting the remaining bias in data collected in simulator environments.

This research identifies two categories of simulator bias: Environmental Bias and

Motivational Bias (Section 4.1). This demonstration experiment focuses on Motivational

Bias. Section 4.2 introduces a causal model of simulator bias (4.2.1), describes the data collected for this experiment (4.2.2) and introduces the statistical methods used for our analysis (4.2.3). Section 4.3 introduces quantitative models of simulator bias (4.3.1) with an emphasis on the indirect effects of simulator bias (4.3.2), then reviews model fit measures (4.3.3) and methods to estimate a minimum sample size (4.3.4). Section 4.4 illustrates how this approach can be used to calculate effects of bias manipulations between simulator sessions (4.4.1) and estimate the effect of simulator bias on simulator data

(4.4.2). As an example of potential applications, Section 4.4 also reviews how simulator

73

bias will impact SPAR-H data collected in a simulator (4.4.3). Finally, Section 4.5 discusses the lessons learned from the demonstration experiment and suggestions for considering bias in simulator studies.

4.1 INHERENT BIAS IN SIMULATOR STUDIES: ENVIRONMENTAL AND MOTIVATIONAL BIASES Although simulator study practitioners acknowledge the bias introduced by simulator studies (for example, (CSNI, 2009)), recent literature has little to say about the necessary limitations of simulator studies. For this, we turn to the previous generation of HRA research. Dougherty’s 1990 editorial in Reliability Engineering and System Safety

(Dougherty, Jr, 1990) summarizes the limitations of simulator data and the challenges of using simulator data for developing (or validating) HRA models. He contends that,

“Simulation in nuclear plants is just not ‘real enough’” (287). Dougherty’s discussion of the “simulation game” illustrates several of the inherent biases in simulator studies:

Operators know that they will never fail a requalification test in the simulator when they ‘err toward safety.’ As a result, operators deliberately do one thing—such as initiate feed and bleed in a pressurized water reactor (PWR) or standby liquid cooling system (SLCS) in a BWR—when they know that they would hesitate or delay in this response in an actual incident. Why? —because they anticipate that the exercise is going to test that action and know that, even though early action is an error in real life, it is not in the simulator. As Dougherty’s comments illustrate, the artificial operator response obtained in a simulated environment is the result of the motivational biases inherent in a simulator study.

We have identified two broad categories of biases that are necessarily present in a simulator study (Shirley R. B., Smidts, Wang, & Gupta, 2014). Motivational biases refer to the factors that motivate an operator’s actions in a simulator that differ from the motivating

74

factors in the actual plant. Environmental biases refer to any physical or environmental differences between operating the actual plant and operating the simulator. The “simulation game” highlights motivational biases, but—as Dougherty describes—environmental biases also present significant challenges to simulator studies. Table 4.1 lists Dougherty’s concerns related to simulator studies ( (Dougherty, Jr, 1990), pages 287-288), categorized as primarily Motivational or Environmental concerns.

Table 4.1: Summary of concerns when conducting simulator studies

Motivational Concerns

1. The simulator game. 2. Decision making pressures under accident conditions cannot be simulated; “operators do not have to make billion dollar tradeoffs which they might in an actual severe accident.” Environmental Concerns

1. Any physical effects of operating the plant are not simulated (“operators do not get simulator sickness as do pilots”). 2. Simulators can only simulate standard, “textbook” scenarios 3. Rules of operation (both formal and informal) may be appropriate for typical situations, but “a reasonable operating rule under many textbook scenarios can, unbeknownst to the rule holder, become a danger if applied under more severe conditions.” Situations of interest may fall outside the fold of scenarios that can be tested in a simulator. 4. Operators will not be alone during an event (technical support center will be activated, and on-site personnel will be available), but operating crews usually do not have this support in the simulator. 5. Following an event, important recovery actions will take place outside the control room and cannot be simulated.

Table 4.2 summarizes the biases we have identified from the literature as likely to be present in NPP simulator studies (Shirley R. B., Smidts, Wang, & Gupta, 2014). Most of the environmental challenges listed in Table 4.2 are due to context effects (Environmental

Concerns 2-5 in Table 4.1), that is, any physical difference between the simulator environment and the actual plant. This can range from differences in controls or layout to

75

differences in simulated scenarios. In a digital simulator, the context effects include effects due to the digital controls—a delayed response from touch screen controls or the difference between swiping your finger across the screen and physically turning a knob or pushing a button. The other environmental biases in Table 4.2 include technology bias and simulator sickness (Environmental Concern 1 in Table 4.1). Technology bias refers to the extra layer of knowledge required to operate a digital plant. An operator must be comfortable with the plant controls as well as the technology used to display them. Simulator sickness is more commonly recognized in driving and flight simulators, which can induce motion sickness, but any physical effects of the simulator environment fall into this category. In the OSU simulator facility, for example, operators might have a headache after looking at many computer screens for an extended period of time. This headache is an artifact of the simulator environment, not operating the plant. As Dougherty points out, this can also go the other way. The simulator sickness pilots experience in a flight simulator might mimic the motion sickness they would experience flying the plane, but plant operators may not have the same “benefit” in a digital simulator. Operators will not be tired from standing at the boards if they are instead sitting in front of a digital display.

In US NPP training simulators, environmental bias is minimized. The US Nuclear

Regulatory Commission (NRC) requires high fidelity training simulators that document any physical discrepancies between the control room and the simulator (NRC, 2011),

(ANSI/ANS, 2009). In digital research simulators which are not subject to these standards and which will have technology bias because they are digital, not analog, many of these biases can be managed. As operators gain experience in the simulator, the technology bias

76

will decrease. Environmental studies and ergonomic assessments can reduce or manipulate induced simulator sickness to mimic the effects of operating the actual plant. To a certain extent, the physical differences between the actual control room and the simulator can be analyzed and addressed. These mitigations are summarized in Table 4.2.

The obvious challenge in eliminating environmental bias is the disparity between scenarios that can be simulated and the scenarios operators may encounter in an actual control room.

As core models improve, fidelity of known scenarios will improve, but it is impossible to anticipate the unanticipated—and therefore certainly most dangerous—situations an operator may face. For this reason, simulators will never be the sole basis for a robust, validated HRA model.

Similarly, an operator’s motivations under accident conditions cannot be replicated in a simulator environment. However, this does not preclude simulator studies from gaining insight into operator decision-making in high stress situations. Five inter-related motivational biases are listed in Table 4.2: hypervigilance, cavalier behavior, policy , prominent hypothesis bias, and incentive effects. Most of these biases are illustrated in Dougherty’s simulation game. Hypervigilance refers to an operator’s tendency to be extraordinarily vigilant and fastidious due to being observed or expecting an adverse condition to occur. Cavalier behavior is the opposite of hypervigilance; the operator, knowing the exercise is only a simulation, relaxes attention or makes reckless choices that would not be appropriate under actual operating conditions. Although these biases represent opposite responses to the simulator environment, both can be expected in

77

the operator game: watchful for adverse conditions (hypervigilance), the operator promptly executes what may be considered an extreme response to an abnormal situation (cavalier behavior). This is also a reflection of the policy response bias, which is activated when an operator makes a decision based on policy (e.g., follow the safest course of action) rather than practical or typical actions that might be followed in the plant. Policy response bias is known as demand characteristics in the psychological research literature (Rubin, 2016).

Finally, prominent hypothesis bias is illustrated in Dougherty’s example by the operator’s expectation that the scenario is designed specifically to test a certain action. An operator who enters the simulation expecting a particular event is more likely to identify that event and respond appropriately than an operator who is expecting a normal day at the office.

Table 4.2: Motivational and Environmental Simulator Biases Bias Definition Mitigation Context Effects Any discrepancy between the control room and High-fidelity simulator the simulator environment Technology bias Impacts from the technology used to simulate Extensive training and the simulator environment experience using the simulator technology Simulator Physical effects introduced by the simulator Training, ergonomic sickness environment (headache, motion sickness, assessment, limited time in the

fatigue, etc.) simulator facility if necessary Environmental Biases Hypervigilance Tendency to be vigilant or fastidious due to Distracter task being observed or expecting an adverse event Cavalier behavior Tendency to be lax or incautious because the Incentive simulation is not real Incentive effects Artificial boost in vigilance due to incentive Decouple incentive from performance Policy response Attempt to meet researcher’s (stated or Anonymity; time to become bias perceived) expectations familiar with the simulator setting Prominent Expectation of a certain accident or condition Multiple scenarios, include

ivational Biases ivational hypothesis bias leads subjects to ignore certain indicators and “null scenarios” in which

pay extra attention to others. nothing of unusual happens Mot

78

One approach is to offer simulator study subjects an incentive to mitigate cavalier behavior

(Gupta, 2013). Test subjects who are being paid are more likely to be attentive than volunteers. The problem is that an incentive can introduce incentive effects: are the study results due to the test condition, or to the incentive offered to the participants? In a nuclear power plant simulator study, incentives—whether inherent in participation in the project or explicitly offered to participants—must be identified and selected with care.

Another mitigation strategy is to introduce a distractor task at the beginning of the scenario.

Operators who are absorbed in a task are less likely to be hypervigilant, and may even forget (to a degree) that they are operating in the simulator rather than in the actual control room. Realistic distractor tasks will help increase the fidelity of the simulation exercise, especially if operators have conducted “normal” scenarios in which no accident occurs.

Many HRA studies focus on “PRA relevant” scenarios, as this is the area of interest (see, for example, the NRC’s guidance on collecting HRA data from simulators, (Hallbert,

Morgan, Hugo, Oxstrand, & Persensky, 2014)). However, restricting data collection to accident scenarios activates the prominent hypothesis bias, hypervigilance, and the policy response bias. Until the impact of these effects is known, studies should encompass a broad range of scenarios.

In the simulation game, Dougherty appears to assume simulator data is being collected for other purposes and appropriated for HRA; the dangers of this approach are obvious in light of the many biases that impact this data. To limit the effect of the motivational biases, appropriated data should be reviewed carefully for contamination from these biases.

Preferably, studies should be conducted explicitly for HRA research, with explicit controls

79

introduced for these biases. Participants should be ensured anonymity by separating the study from any regulatory oversight, and scenarios should include both normal and accident scenarios (even if accident scenarios are the only scenarios of interest).

Finally, just as the unknown scenarios cannot be anticipated in a simulator study, the effects of operating under severe accident conditions cannot be simulated. However, external pressures can be added to approximate the effects. While artificial manipulation cannot mimic the impact of a true disaster, some insights into the effects of an accident may be gained through simulator studies.

4.2 METHODS AND MATERIALS FOR ASSESSING SIMULATOR BIAS Using the list of motivational biases in Table 4.2, we manipulate simulator conditions to test our assumptions of how the Motivational Biases impact operators in a simulator. We do this by observing student operators responding to accident conditions in the OSU NPP

Simulator Facility.

We introduce four experimental conditions to manipulate simulator bias and construct a causal model of their expected effects (Section 4.2.1).

We then observe student operators in the OSU NPP Simulator Facility respond to design basis accident scenarios under the four experimental conditions (Section 4.2.2). Simulator biases are evaluated by surveying the operators, and operator performance is characterized by two measured response times.

From this data, two quantitative models of simulator bias are developed using Structural

Equation Modeling (SEM): the Latent Bias Model and the Bias Path Analysis Model. The

80

Bias Path Analysis Model charts relationships between all the observed variables, and the

Latent Bias Model estimates a new, unobserved variable, Bias.

4.2.1 Manipulating Simulator Bias – A Causal Model of Bias Effects Four experimental conditions are introduced to manipulate simulator bias: Phase

(represented by the variable Graded), Treatment (Trt1), Familiarity (Fami), and Year

(Yr2015). Data are collected in two phases: ungraded practiced sessions (Graded = 0) and graded sessions that are part of the students’ final examination (Graded = 1). In each phase, some students are in Treatment 0 (Trt1 = 0) and some are in Treatment 1 (Trt1 = 1). When

Trt1=0, students encounter only familiar scenarios; for Trt1=1, students encounter a mixture of familiar and unfamiliar scenarios. Fami is the familiarity of the scenario, rated on a scale of 1 (unfamiliar) to 5 (familiar). For scenarios in Treatment 0, familiarity is limited to Fami = 4 or 5; in Treatment 1, Fami = 1,2,3,4 or 5. Finally, data are collected from two sets of students, first in 2014 (Yr2015 = 0) and again in 2015 (Yr2015 = 1).

Yr2015 is primarily a measure of training; students in the 2015 course had more time to practice in the simulator and spent more time recognizing the severity of an accident, e.g., estimating the size and rate of a leak from the primary side reactor coolant system (RCS).

Figure 4.1 is a causal model that illustrates how the four manipulations are expected to impact the Motivational Biases in the OSU NPP Simulator Facility Experiment. Rectangles in Figure 4.1 represent Experiment Conditions (Graded, Trt1, Fami and Yr2015), ovals are the biases that are manipulated by the experimental conditions, and hexagons are the survey responses used to evaluate the biases.

81

Figure 4.1: Causal model of simulator bias in the OSU NPP Simulator Experiments

The Motivational Biases that are manipulated in the OSU NPP Simulator Experiment are

Incentive Effects, Cavalier Behavior, and Prominent Hypothesis. Simply being in the simulator environment triggers Cavalier Behavior. The Graded manipulation counteracts

Cavalier Behavior by triggering the Incentive Effect when the students are taking their final exam.6

The Experiment Conditions Familiarity (Fami) and Treatment (Trt1) manipulate

Prominent Hypothesis. When Fami is low, operator Prominent Hypothesis is expected to be low, and vice versa. Similarly, students in Trt1=0 crews are expected to have a stronger

Prominent Hypothesis than those in Trt1=1, who encounter new and unknown scenarios.

6 Of course, incentives to reduce Cavalier Behavior can result in Hypervigilance. In this preliminary analysis, we do not attempt to differentiate between responsible operator behavior and Hypervigilance.

82

Table 4.3 summarizes the conditions introduced in Figure 4.1, their expected effects, and the associated measures.

Table 4.3: Hypotheses in the OSU NPP Simulator Experiment Condition Expected Effect Measurement Simulator environment Cavalier behavior UhReal Phase: Practice or Exam Incentive Effect, counteract Worried cavalier behavior Familiarity Prominent hypothesis bias Likely, Prepared decreases as familiarity decreases Treatment (0 – all familiar Prominent hypothesis bias Likely, Prepared accidents, or 1—mix of lower in Treatment = 1 familiar and unfamiliar)

In the OSU NPP Simulator Experiment, Cavalier Behavior is measured by Unreal, the operators’ perceived reality of the scenario. In future work, measures of Cavalier Behavior can be developed by modifying measures of safety culture and employee attitudes (for example, (Prussia, Brown, & Willis, 2003)). Reported Unreal values are operator responses to the question, “During the scenario, how real did the simulator feel?”

The Incentive Effect is evaluated by Worried, the operators’ response to the question, “Do you worry that your grades will be affected negatively if you don’t perform well during the accident response?” We expect to see a strong Incentive Effect during exam sessions, and a reduced effect during practice sessions.

There are two measures of the Prominent Hypothesis Bias: Prepared and Likely. Prepared is the response to the question, “How prepared were you for this scenario?” Operators who

83

correctly anticipate the event scenario will experience a stronger overall bias than those who are surprised by events in the simulator. Likely is the operator response to the survey question, “With what likelihood (%) do you expect the following accidents: SGTR, LOCA,

SLB, a complication of one of these accidents, another accident, or no accident.” Data from

Likely are broken into two measures: Likely.For and Likely.Against. Likely.For estimates the operator’s expectation bias for the event that occurs, and Likely.Against is the expectation of events that do not occur during the scenario.

Note that two Motivational Biases are not included in Figure 4.1: Hypervigilance and

Prominent Hypothesis Bias. Hypervigilance is included implicitly as the opposite of

Cavalier Behavior. For clarity, we only use one variable (Cavalier Behavior) in this analysis. Policy Response Bias is not expected to have a strong impact on operator behavior in this experiment. Typically, policy response will be most evident in procedure following, based on the expectation that operators might skip certain steps when they are not being observed. In this experiment, we expect the students will uniformly follow the policies introduced during the course; we do not expect students to skip procedure steps, for example, or attempt to respond to an accident scenario without using procedures. This is because their performance is being graded, and because their understanding of the plant is relatively limited. One way to examine how policy response impacts student operator behavior might be to run additional scenarios after final exams are over and grades are submitted. Without the possibility of adverse impacts on their grade point average (GPA), students might demonstrate more creativity when responding to accidents. Another

84

approach would be to add incentive to finish the scenario quickly, to see whether students stick to the procedures or skip steps for efficiency.

4.2.1.1 Response Variables In addition to the variations in simulator bias captured in the Bias Causal Model, we are also interested in the effects of simulator bias on the overall scenario. In this demonstration experiment, we use operator response times to examine the effects of simulator bias. We report two response variables: Trip and Dif. Trip is the time to trip the reactor. Dif is the time between the reactor trip and when the operators exit EOP E0 to enter an accident- specific procedure. These two times roughly correspond to the time to recognize that a serious accident has occurred (Trip) and the time to secure the plant and diagnose the accident (Dif).

In addition to the effects of simulator bias, Trip and Dif are affected by two factors: crew characteristics and scenario characteristics.

Crew characteristics are expected to be the greatest source of variation in Trip and Dif, with simulator bias triggering small fluctuations on top of the variations between crews.

To capture variations due to variations in crew, we report four crew characteristics:

• Quiz: operator knowledge, represented by Quiz, student quiz grades in the NPP Systems

and Operations course

• Evaluation: operator ability in the simulator, as measured by the researcher’s evaluation

of each student’s ability to follow procedures, interpret signals from the control panels,

and execute procedure steps

• Team: team cooperation, the crew’s self-assessment of their team dynamic

85

• BaseStress: an operator’s tendency towards stress and anxiety (based on operator

questionnaires).

Scenario characteristics are accounted for by normalizing the time for each scenario— instead of reporting the absolute times, we divide the initial values of Trip and Dif by the mean response times for each scenario. This allows us to compare response times from two different accident scenarios.

These effects are significant predictors in the Bias models proposed in Section 4.3—see

Figure 4.7 and Figure 4.8.

The second source of variation is the scenario itself—some scenarios require less time than others to complete. Scenario variations are accounted for by normalizing the time for each scenario—instead of reporting the absolute times, we divide the initial values of Trip and

Dif by the mean response times for each scenario. This allows us to compare response times from two different accident scenarios. Thus, Trip ranges from 0.45 to 1.62, and Dif from

0.47 to 2.33.

4.2.2 Data Collection The data collection process is discussed in Chapter 2. Data from 41 simulator scenarios are used in the Bias analysis: 32 scenarios from the 2015 Experiment and nine from the

2014 Experiment.7 Table 4.4 lists the data collected from these scenarios. For each scenario, both the Senior Reactor Operator (SRO) and the Reactor Operator (RO)

7 In the 2014 Experiment, students completed the post-scenario questionnaire online after they left the simulator facility. Students submitted questionnaires for only nine of the observed scenarios. To improve the low response rate, we asked 2015 students to complete the questionnaire before leaving the simulator facility.

86

contribute data. These data are averaged to estimate the overall scenario value. Data in

Table 4.4 include 8 scenario conditions (X1-X8), four Bias Measures (M1-M4) and two response times (R1-R2). The first four conditions are the manipulations in the experimental design; the remaining four are characteristics of the crews. Bias Measures are ordinal data. Most are reported on a scale of 0-10. To consolidate the data into meaningful categories, the collected data are sorted into approximately five levels using the Hist() function in the statistical software package R. Response times are measured in seconds and normalized to minimize the differences in time required by the different scenarios and highlight the effect of operator actions on timing in the scenarios.

Table 4.4: Variables in the OSU NPP Simulator Experiments

Variable Name Description

Experimental Manipulations (Controlled for in the experimental design) [X1] Treatment Crews are assigned to Treatment 1 or Treatment 0. Treatment 0 crews respond to scenarios with Fami = 0 or 0.25; Treatment 1 crews encounter a mix of familiar and unfamiliar scenarios (Fami = 0.0, 0.25, 0.5, 0.75 or 1.0). In the SEM, Treatment = 1 for Treatment 1 crews and 0 for Treatment 0 crews. [X2] Graded Data are collected during two phases of the experiment: Practice sessions and final Exam sessions. In the SEM, Graded = 1 for Exam sessions and 0 for Practice Sessions [X3] Fami Familiarity of scenarios is rated on a scale from 1 (least familiar) to 5 (most familiar). Familiar scenarios are scenarios that operators studied in the NPP Operations course; unfamiliar scenarios are complications of the more familiar scenarios. In the SEM, Fami is normalized on a 0 to 1 scale. [X4] Yr2015 Data are collected in 2014 (Yr2015 = 0) and 2015 (Yr2015 = 1). Yr2015 is a measure of training; in 2015, students spent more time in the simulator and had more training related to evaluating the significance of an accident.

Continued

87

Table 4.4 Continued

Crew Characteristics (Not controlled for in the experimental design; collected for both SRO and RO) [X5] Team The operator’s self-reported Team Dynamics. Survey question: “How would you rate the operating atmosphere of your 2-person team?” (scaled response; 0 = “We communicate poorly and do not trust each other;” 10 = We work well together. Communication is excellent.”). In the SEM, Team is normalized from 0 to 1. [X6] BaseStress BaseStress measures the operators’ tendency to report higher (or lower) stress than their colleagues. BaseStress is equal to the operator scenario normalized stress over all of the operator’s simulator sessions (2 scenarios in 2014, 4 in 2015). The operator scenario normalized stress is the difference between the operator’s reported stress in the scenario and the median reported stress in the scenario, minus the average difference between the reported stress and the median reported stress in the scenario, all divided by the standard deviation of the difference between the reported stress and the median stress for all the operators who participated in the scenario. [X7] Evaluation Researcher’s 3-point rating of each operator’s general ability to locate controls or follow the procedures in the simulator. 0 ~ below average, 0.5 ~ average, 1 ~ above average. Just over 50% of the students are rated 0.5. [X8] Quiz Grade from in-class quizzes. In the SEM, Quiz is a percent.

Bias Measures (Operator-Reported Values, averaged between the Senior Reactor Operator and the Reactor Operator) [M1] Unreal A measure of overall bias, Unreal is the operator’s perceived reality of the scenario. Survey question: “During the scenario, how real did the simulator feel?” (scaled response; 0 = I felt like a college student in a simulator, 10 = I felt like an operator in a nuclear power plant). We Unreal, i.e. 10-Real, rather than Real because Bias is expected to increase when the scenario feels more artificial. The data are sorted into 5 ordinal levels: 푼풏푹풆풂풍: 푅푒푎푙 ← 1 − 2 − 3 − 4 − 5 → 푁표푡 푅푒푎푙 [M2] Worried A measure of the incentive effect. Survey question: “Do you worry that your grades will be affected negatively if you don’t perform well during the accident response?” (Yes or No). In the SEM, Worried = 1 if Yes and 0 if No. The data are averaged, resulting in a three ordinal levels: 푾풐풓풓풊풆풅: 푁표푡 푊표푟푟푖푒푑 ← 1 − 2 − 3 → 푊표푟푟푖푒푑 [M3] Likely.For A measure of the Prominent Hypothesis Bias. Survey question: “With what likelihood do you expect to encounter the following accident conditions?” (rate each option: [M4] SGTR/SLB/LOCA/Other). A nominal, unbiased expectation is [25% SGTR, 25% SLB, 25% Likely.Against LOCA, 25% Other]. The total bias is the absolute difference between the operator’s expectations and the unbiased baseline. Likely.For is the shift towards perfect bias (that is, perfect expectations of the event that occurs in the scenario); Likely.Against is the shift away from perfect bias. For example, if the scenario is an SGTR and the operator expects [50% SGTR, 50% SLB, 0% LOCA, 0% Other], the operator is biased as follows:

Continued

88

Table 4.4 Continued

Perfect Bias Possible Unbiased Operator (Event = Bias Event Expectation Expectation SGTR) SGTR 0.25 1.0 0.50 0.25 For SLB 0.25 0.0 0.50 0.25 Against LOCA 0.25 0.0 0.0 0.25 For Other 0.25 0.0 0.0 0.25 For Total 1.0 For 0.75 For Against 0.25 Against The maximum total bias is 1.5. Maximum total Likely bias occurs when operators report 100% expectation of one accident (e.g. [100% SGTR, 0% SLB, 0% LOCA, 0% Other]). We rescale total bias from [0,1.5] to [0,1] by dividing all bias estimates by 1.5. In the example above, this results in Likely.For = 0.50 and Likely.Against = 0.17. Likely.For and Likely.Against are continuous variables restricted to a [0,1] scale. [M4] Prepared A measure of prominent hypothesis bias. Survey question: “How prepared were you to participate in this scenario?” (scaled response; 0 = “Completely unprepared. I would need significant additional training to feel prepared in this scenario,” 10 = “I felt completely prepared to respond to this scenario.” The data are sorted into 6 ordinal levels: 푷풓풆풑풂풓풆풅: 푁표푡 푃푟푒푝푎푟푒푑 ← 1 − 2 − 3 − 4 − 5 − 6 → 푊푒푙푙 푃푟푒푝푎푟푒푑

Response Times (Response times are divided by the mean response time for the same scenario to minimize effects of differences between scenarios) [R1] Trip Trip is the scenario normalized time to recognize accident. Trip can be thought of as the time required to recognize and respond to the accident, relative to other crews responding to the same accident. Each scenario’s accident requires shutting down the reactor. Eventually, the plant’s engineered features will trigger an automatic reactor trip, but some crews are able to recognize the problem and trip the reactor before it reaches this level. Time to trip is normalized for each scenario by dividing by the mean trip time for the scenario. Note that instead of dividing seconds to trip by the mean for the scenario, normalized times could also be obtained by dividing by the expected time required, as estimated by a dynamic HRA model such as the Information Decision Action Crew (IDAC) virtual operator program (Chang & Mosleh, 2007). [R2] Dif Dif is the scenario normalized time to diagnose the accident. Dif can be thought of the time required to diagnose the accident, relative to other crews responding to the same accident. After the reactor trips, operators use Emergency Operating Procedure (EOP) E0 to secure the plant, ensure safety functions are operating correctly, and diagnose the accident. This variable measures the time between when the reactor trips and the time when operators exit E0 and enter another procedure that is specific to the accident that occurred. As with Trip, Dif is normalized by dividing by the mean diagnosis time for the scenario.

89

4.2.3 Structural Equation Models We use SEM to develop analytical models of simulator bias. We model two aspects of simulator bias: the factors that and bias’s effect on operator performance.

Operator performance can be assessed by a wide variety of measures; for simplicity, we limit our analysis to time of operator response in this preliminary analysis.

The first model is the Latent Bias Model. The Latent Bias Model estimates a new variable,

Bias, which is measured by the four Bias survey questions. In the Latent Bias Model, Bias varies as a function of conditions, and Bias affects operator response times.

The second model is the Bias Path Analysis Model. Instead of quantifying Bias, the Bias

Path Analysis Model treats the Bias Measures as functions of scenario conditions and as predictors of operator response times.

Table 4.5: SEM terminology. SEM Term Definition Independent A variable that is not a function of other variables in the model. Also called endogenous Variable variables. Scenario Conditions and Crew Characteristics in Figure 4.3 are independent. Dependent variable A variable that is modeled as a function of other variables in the model. Also called exogenous variables. Dependent variables may contribute to other variables in the model; Bias Measures and Response Times in the Path Analysis Model in Figure 4.1 are dependent variables. Latent variable An estimated (unmeasured) variable, associated with at least two indicators. In Figure 4.3, Bias is the latent variable estimated in the Latent Variable Model. Indicator A measured variable used to estimate a latent variable. Indicators are dependent variables. Bias Measures in the Latent Bias Model are indicators of the latent variable Bias. Factor loadings The coefficient estimating the relationship between the latent variable and an indicator variable. For each latent variable, one indicator is anchored to the latent variable—i.e., the factor loading for that indicator is fixed to 1. Path Analysis Model A model with no latent variables Measurement Model Model of a latent variable, associated indicators, and factor loadings.

90

Section 4.2.3.1 provides a brief overview of SEM for readers who are new to SEM while introducing two preliminary Bias SEMs. Section 4.2.3.2 describes the process used to generate the final simulator bias models discussed in Section 4.3.

4.2.3.1 Introduction to SEM Figure 4.2 illustrates the process used to estimate the unknown parameters in the structural model from two primary inputs: a set of observed variables and a proposed structural model of the relationships between variables (Skrondal & Rabe-Hesketh, 2005). A secondary input is the observed covariance matrix, that is, a matrix of the covariances between the observed variables. An algorithm suggests values for the unknown model parameters in the structural model (Suggested Parameters). An estimated covariance matrix is calculated from the suggested model parameters. A likelihood estimator is used to compare the estimated covariance matrix to the observed covariance matrix. Based on these results, the algorithm suggests a new set of model parameters, attempting to increase the likelihood of the estimated covariance matrix. This process is repeated, with the algorithm selecting progressively better suggested estimates until the model converges

(Raykov & Marcoulides, 2000).

91

Figure 4.2: The iterative SEM analysis process.

Let us use Figure 4.2 to develop the Latent Bias Model. The structural model specifies the hypothesized relationships between variables. One of the strengths of SEM is the ability to assess latent variables—that is, variables that are not directly measured. Structural models have two parts: a path analysis model of the relationships between variables, and a measurement model that relates the latent (unobserved) variable to its associated

(measured) indicators. Structural models are often represented graphically; Figure 4.3 organizes the variables from Table 4.4 into a preliminary structural model of Bias.

The preliminary Latent Bias Model treats Bias (B) as a function of the Conditions (X1-X8), and as a predictor of the response time variables (R1 and R2). The measurement model in

Figure 4.3 shows Bias measured by the five Bias Measures, M1-M5: Unreal, Prepared,

Likely.For, Likely.Against and Worried.

Following SEM convention, Figure 4.3 uses ovals to indicate unobserved variables and rectangles to indicate measured variables. In addition to the unobserved variable Bias, the model also estimates variance or measurement errors for the Bias Measures (Em1-Em4) as well as the response variables (Er1, Er2).

92

Arrows show the presumed direction of effects; variables that are at the end of an arrow are dependent variables, while variables that are only the source of arrows are independent variables.

To translate the graphical representation into linear equations, we define three vectors of observed variables:8

• Response vector 푹 = [푇푟푖푝, 퐷푖푓]

• Conditions vector 푿 =

[푇푟푡1, 퐸푥푎푚, 퐹푎푚푖, 푌푟2015, 푇푒푎푚, 퐵푎푠푒푆푡푟푒푠푠, 퐸푣푎푙푢푎푡푖표푛, 퐺푟푎푑푒푠]

• Measurement vector 푴 = [푈푛푅푒푎푙, 푃푟푒푝푎푟푒푑, 퐿푖푘푒푙푦, 푊표푟푟푖푒푑]

Using B to represent the latent variable Bias, we define four new variables to describe the relationships between R, X and M:

• 횪 [Gamma], a 2 x 8 matrix of regression coefficients of the conditions X on the response

times R

• 효 [Alpha], an eight-element vector of regression coefficients of X on B

• 횲 [Lambda], a four-element vector of factor loadings between B and M

• 횩 [Beta], a two-element vector of regression coefficients of the B on R

We also define two variance vectors:

• 푬풎, the four-element measurement error vector in the measurement model, M

• 푬풓, the two-element measurement error vector tied to the response times, R

8 This nomenclature is loosely based on (Rigdon, 1996). Vectors and matrices are written in bold to distinguish them from individual coefficients or variables.

93

Figure 4.3:The preliminary OSU Latent Bias Model.

Table 4.6 summarizes the notation for these and other variables used in this chapter. Using these defined variables, we can write the SEM in Figure 4.3 as a set of equations. Equations

4.1a and 4.1b are the structural model, and Equation 4.1c is the measurement model:

퐵 = 휜 ∙ 푿 (4. 1푎)

푹 = 횪푿 + 횩퐵 + 푬풓 (4.1푏)

94

푴 = 횲퐵 + 푬풎 (4.1푐)

The Estimated Parameters (Raykov & Marcoulides, 2000) include

• Regression coefficients in the path analysis model (효, 횪)

• Factor loadings between the latent variable bias and the Bias Measures in the

measurement model (횲)

• The researcher must define at least one factor loading per latent variable to provide a base

for scale; the other factor loadings are estimated relative to the fixed factor loading. In

this model, the factor loading for Unreal is defined as Λ1 = 1.0, and Λ2, Λ3 and Λ4 are

allowed to vary. We selected Unreal as the anchor for Bias because Unreal is an explicit

measure of perceived operator bias in the simulator.

• Variance or measurement errors of observed variables (푬풎, 푬풓)

• Covariances between independent variables, i.e. between X, 푬풎 and 푬풓 (covariances

between dependent variables are calculated from 횩 and 횲 values).

To select the best model, the estimated covariance matrix is compared to the observed covariance matrix from the observed variables. A likelihood estimator is used to select between possible models as the algorithm proposes new sets of estimated parameters. We use the robust Weighted Least Squares with the mean- and variance-adjusted test statistic

(WLSMV) in the R package developed by The lavaan Project (Rosseel, 2012) to estimate model parameters. The WLSMV estimator is lavaan’s default estimator for models that include categorical dependent variables such as the Bias Measures in our model.

95

4.2.3.1.1 Bias Path Analysis Model Similar to the preliminary Latent Bias Model, the preliminary Bias Path Analysis Model is shown in Figure 4.4. We include the Bias Path Analysis Model to attempt to capture the different bias effects in the Bias Causal Model: Likely.For, Likely.Against and Prepared are measures of the Prominent Hypothesis Bias, Worried measures the Incentive Effect, and Unreal is a measure of overall bias. Ideally, each of these biases would be included as a separate latent variable in the model, but there are too few Bias Measures to estimate separate bias effects (each latent variable requires at least two indicators, preferably three or more). In lieu of a full latent variable model estimating multiple latent biases, we use the Bias Path Analysis Model in Figure 4.4 to examine multiple biases that are active in the OSU NPP Simulator Experiment.

Figure 4.4: The Preliminary Bias Path Analysis Model

96

As with the Latent Bias Model, the Bias Path Analysis Model model can be written as a set of linear equations. The variables X, M and R remain the same; the vectors in the Latent

Bias Model become matrices in the Bias Path Analysis Model. The Bias Path Analysis

Model coefficients are named with lower case letters to differentiate them from the Latent

Bias Model coefficients:

• 휸 [gamma], a 2 by 8 matrix of regression coefficients between X and R

• 휶 [alpha], a 4 by 8 matrix of regression coefficients between X and M

• 휷 [beta], a 2 by 4 matrix of regression coefficients between M and R

• 풆풎, a four-element vector of estimated variances for M

• 풆풓, a two-element vector of estimated variances for R

The structural model equations for the Bias Path Analysis Model are therefore

푴 = 휶푿 + 풆풎 (4. 2푎)

푹 = 휸푿 + 휷푴 + 풆풓 (4.2푏)

Note that the Bias Path Analysis Model does not include a measurement model, because no latent variables are present.

Figure 4.5: Mediation in SEM

97

4.2.3.1.2 SEM Mediation In both the Latent Bias Model and the Bias Path Analysis Model, Scenario Conditions and

Crew Conditions have direct and indirect effects on Trip and Dif. The indirect effect is the effect of these conditions through Bias (Figure 4.3) or the Bias Measures (Figure 4.4). This is known as mediation (Kenny, 2016): simulator bias mediates the effect of experimental conditions. Figure 4.5 diagrams the direct and indirect (i.e. mediated) effects.

In Figure 4.5, c’ is the direct effect of the predictor on the response variable, and ab (that is, 푎 multiplied by 푏) is the indirect effect. The total effect, c, is the sum of the direct and indirect effect,

푐 = 푐′ + 푎푏 (4. 3) For example, consider the effect of Team on Trip in Figure 4.3. Team is the predictor variable, Bias is the mediator, and Trip is the response variable. Using the nomenclature from Equation 1, the variables in Figure 4.5 are

푠 9 • 푎 = Α5, the standardized regression coefficient for Team (X5) on Bias (B)

푠 • 푏 = Β1, the standardized regression coefficient for Bias (B) on Trip (R1)

′ 푠 • 푐 = Γ51, the standardized regression coefficient for Team (X5) on Trip (R1)

To calculate the total effect of Team on Trip, c, we define Δ푋5 as a change in Team and

Δ푅1 as the resulting change Trip because of the change in Team. We then calculate Δ푅1 =

′ (푎푏 + 푐 )Δ푋5, i.e.

9 Standardized coefficients are calculated in units of standard deviation. For example, a standard deviation 푠 change in Team (X5) will change B by 훼5 standard deviations of B. Standardized coefficients are useful for comparing the impact of variables that have different. In this analysis, standardized coefficients are represented by a superscript s (휶풔, 휷풔, 휸풔, etc.).

98

푠 푠 푠 훥푅1 = (훢5훣1 + 훤51)훥푋5 (4. 4) One aspect of the analysis is discerning how much (if at all) a particular effect is mediated by simulator bias. In this study, the amount of mediation corresponds to how much simulator bias affects the outcome. Variables can be fully mediated, partially mediated, unmediated, or inconsistently mediated:

• Effect: the predictor variable has an effect on the response variable

o Full Mediation: The total effect is accounted for indirectly by simulator bias

(푎푏 = 푐, 푐’ = 0)

o Partial Mediation: The factor has both a direct and an indirect effect (푎푏 + 푐′ =

푐, 푎푏 ≠ 0, 푐′ ≠ 0).

o Inconsistent mediation: Variables with indirect effects that counteract the direct

effect (푎푏, 푐′ have opposite signs; 푎푏×푐′ < 0)—the indirect effect reduces the

overall effect the variable would otherwise have on the response variable

(MacKinnon, Fairchild, & Fritz, 2007).

o No Mediation: The factor has no indirect effect and is fully accounted for

without considering simulator bias (푎푏 = 0, 푐’ = 푐)

• No Effect: This effect is insignificant and not included in the final model (푎푏 ~ 0, 푐′~ 0;

note that 푎 or 푏 may be significant, but together they are not important in the model).

99

Table 4.6: Notation frequently used in this chapter. Notation Description SEM NOTATION R1, R2 Response Times Trip and Dif M1-M4 Bias Measures X1-X8 Scenario Conditions Α푖, 훼푖푗 SEM regression coefficients between 푋푖 and the latent variable Bias (B) or 푀푗 (upper case: Latent Bias Model; lower case: Bias Path Analysis Model) 푠 푠 훼푖푗, 훽푖푗, … Standardized coefficients Λ푖 Lambda: SEM factor loadings between the latent variable Bias (B) and the Bias Measure 푀푖 Β푖, 훽푖푗 Beta, beta: SEM regression coefficients between Bias (B) or Bias Measure 푀푗 on response time 푅푖 Γ푖푗, 훾푖푗 Gamma, gamma: SEM regression coefficient between scenario condition 푋푗 and response time 푅푖 E, e SEM estimated variance of observed variables 푎 a: In a mediated model, the effect of the predictor variable on the mediating variable (훼푠) 푏 b: In a mediated model, the effect of the mediating variable on the response variable (훽푠) 푎푏 ab: 푎×푏, In a mediated model, the indirect effect of the predictor on the response variable 푐 c: In a mediated model, the total effect of the predictor on the response variable, ab+c 푐′ c-prime: In a mediated model, the direct effect of the predictor on the response variable (훾푠) OTHER VARIABLES USED IN THIS CHAPTER 푝 The p-value. For the model parameter estimates, low p-values (푝 < 0.2) indicate we expect the parameter is significant in the model. (Note: SEM convention uses 푝 to denote the number of parameters in the model) 휀 epsilon: The Root Mean Square Error of Approximation (RMSEA) for the SEM 푑 The degrees of freedom in the model 푛 The sample size 휆 lambda: The noncentrality parameter in the noncentral 휒2 distribution 퐻0, 퐻푎 The null and alternative hypotheses. In this analysis, the null hypothesis is that the model is a good fit to the data, i.e., 휀 < 0.05; the alternative hypothesis is that the model does not fit the data, i.e., 휀 > 0.08. 훼 alpha: The maximum threshold for the probability that the model is not a good fit for the data. 휋 pi: The power of the test, i.e., the probability of not making a Type II Error. 푟 In SEM, the ratio of the number of indicators to the number of estimated parameters in the model.

4.2.3.2 Final Model Selection All possible direct and indirect effects are included in the preliminary models (Figure 4.3 and Figure 4.4). This is because we can provide reasonable explanations for how all of the independent variables might impact each of the dependent variables.

100

We expect that the effects of Scenario Conditions (X5-X8) on Response Times (R1,R2) will be largely mediated by Bias; that is, that many of the direct effects of Scenario Conditions

on Response Times will be negligible.10

However, we retain the direct effects of these

variables in the model because we do not have

a strong theoretical reason to exclude direct

effects of Scenario Conditions on Response

Times from the preliminary models. The final

models are developed by removing

insignificant links in the model; this process

is illustrated in Figure 4.6. Model coefficients

for the preliminary model are generated using

the WLSMV estimator in lavaan. Then,

variables with low standardized coefficients

are removed from the model. These variables

Figure 4.6: SEM Model Development Process have a negligible effect on the response variable. In this analysis, we retain all variables with a standardized coefficient greater than

0.1. Negligible links between variables are removed iteratively; links with the smallest standardized coefficients are removed (i.e. coefficients are set equal to zero in the model), then the model is re-evaluated and again the variables with the smallest standardized

10 Recall that Trip and Dif are scenario-normalized response times, so variations between scenarios are minimized in the analysis.

101

coefficients are removed from the model. This process is repeated until all of the estimated standardized coefficients in the model are 0.1 or greater.

Finally, coefficients with high p-values (greater than 0.2) are removed from the models. In

SEM, p-value can be interpreted similarly to p-values for coefficients in linear regression analysis. Low coefficient p-values suggest that the effect is significant; high p-values indicate that the effect is likely to be insignificant.

This continues until all of the coefficient p-values are less than 0.25 unless the model fails to converge after a coefficient is removed, or unless the model Root Mean Square Error of

Approximation (RMSEA, a measure of model fit—see Section 4.3.3) increases when the coefficient is removed. In either case, the coefficient is returned to the model (Step 3), and the model revision process is complete. The results of this analysis are shown in Section

4.3.

4.3 RESULTS The final versions of the Latent Bias Model and the Bias Path Analysis Model are introduced in Section 4.3.1, with diagrams of the interactions between model elements and a brief discussion of bias interactions and their impact on Trip and Dif for each model.

Section 4.3.2 summarizes the mediated effects in the models.

Section 4.3.3 introduces two approaches to model validation that can be used to assess models that are developed from experienced operator data, and Section 4.3.4 estimates the minimum sample size needed to assess these models with a high degree of confidence.

102

4.3.1 Bias Structural Equation Models Two Bias SEMs are developed: the Latent Bias Model and the Bias Path Analysis Model.

4.3.1.1 Latent Bias Model The Latent Bias Model estimates the latent variable Bias. Figure 4.7 is the Latent Bias

Model diagram. The estimated coefficients are reported in Table 4.7, along with the standardized coefficient and the coefficient p-value. The p-value is the probability that the effect is negligible (i.e. equal to zero).

In the Latent Bias Model, the Bias Measures Likely.For and Likely.Against are removed from the model because they are not significant measures of the latent variable Bias.

103

Figure 4.7: The Latent Bias Model estimates the latent variable, Bias. The Bias Measure Likely.Dist and the crew characteristic BaseStress are not included in the model because they are found to be insignificant in the process described in Section 4.2.3.2. Table 4.7 lists the coefficients for the significant (low p-value) connections between variables in Figure 4.7. Bias is measured by Unreal, Prepared and Worried; Bias is affected by Graded, Team and Yr2015. Bias affects Trip but not Dif; Trip is affected by

Bias, Yr2015, Team and Quiz; Dif by Trt1, Stress, and Evaluation.

104

Table 4.7: The Latent Bias Model estimated coefficients. Standardized coefficients represent the change in response variable per change in predictor variable in units of standard deviation. P-values are the probability the effect is negligible; bootstrap p-values are estimated through bootstrap resampling. Latent Bias Standardized P- Model Coefficient Coefficient value 푠 Bias =~ 횲푴 Unreal Λ1 = 1.000 Λ1 = 1.014 - Prepared = 푠 = Λ2 -0.378 Λ2 -0.494 0.086 Worried = 푠 = Λ5 0.173 Λ5 0.235 0.164 Graded = 푠 = Bias ~ 효푿 Α2 -1.296 Α2 -0.464 0.016 Yr2015 = 푠 = Α4 -0.853 Α4 -0.259 0.227 Team = 푠 = Α5 -5.216 Α5 -0.535 0.023 Bias = 푠 = Trip~ 횩ퟏ + 횪ퟏ푿 Β1 0.05 Β1 0.214 0.141 Yr2015 = 푠 = Γ14 -0.467 Γ14 -0.612 0.003 Team = 푠 = Γ15 -0.852 Γ15 -0.377 0.037 Quiz = 푠 = Γ18 -2.417 Γ18 -0.596 0. 022 Trt1 = 푠 = Dif ~ 횩ퟐ + 횪ퟐ푿 Γ21 0.258 Γ21 0.363 0.069 Stress = 푠 = Γ26 0.108 Γ26 0.227 0.244 Evaluation = 푠 = Γ27 -0.668 Γ27 -0.555 0.07

In the Latent Bias Model, Bias is measured by three of the four Bias Measures: Unreal,

Prepared, and Worried.

4.3.1.1.1 Effects of Manipulations on Simulator Bias in the Latent Bias Model Bias is a function of Graded, Yr2015 and Team, indicating that simulator bias decreases in the exam session and in 2015, and Bias increases if team dynamics are poor. This matches expectations: Graded is expected to decrease Bias, because the exam session is meant to mimic the pressure and urgency of operating a nuclear power plant. Yr2015 is essentially a measure of training; in 2015, students had more time in the simulator and more experience responding to accident scenarios. This indicates that improved training reduced the

105

artificiality of the simulator experience. Finally, poor team dynamics indicated by low

Team makes crews more aware of the artificial environment.

Bias is not a function of the other manipulations (Trt1, Fami), which might suggest that these manipulations do not impact simulator bias. However, remember that, due to sample size constraints, only one latent variable is included in the Latent Bias Model, and that variable (Bias) is anchored to Unreal. The variables Trt1 and Fami may be influencing bias effects that are not related to Unreal and are therefore not captured in the model. In other words, Bias captures only a part of the bias effects. This interpretation is supported by the

Bias Path Analysis Model.

4.3.1.1.2 Effects of Bias on Operator Response in the Latent Bias Model Similarly, Trip is dependent on Bias, but Dif is not. Again, this does not mean that Dif is not susceptible to simulator biases. Instead, we believe Bias does not capture all of the effects of simulator bias. Dif’s independence from Bias in the Latent Bias Model suggests that treating Bias as a single variable is too simplistic. For example, the bias measured by

Likely.Against may have a significant impact on Dif, but because this effect is not the same as the effect measured by Unreal, the Likely.Against effect is dropped from the Latent Bias

Model and Likely.Against’s impact on Dif is not captured. The Bias Path Analysis Model

(Section 4.3.1.2) supports this interpretation of the results. In the Bias Path Analysis Model,

Dif is a function of Likely.Against and Worried, while Trip is a function of Unreal and

Likely.Against.

106

4.3.1.2 Bias Path Analysis Model The Bias Path Analysis Model treats the Bias Measures as proxies for their associated biases. Figure 4.8 is the Bias Path Analysis Model diagram. The estimated coefficients for the Bias Path Analysis Model are reported in Table 4.8, along with the standardized coefficient, the coefficient p-value, and a bootstrap p-value.

In the Bias Path Analysis Model, two Bias Measures are removed from the model because they do not have a significant effect on Trip or Dif: Prepared and Likely.For.

Figure 4.8: The Bias Path Analysis Model. Bias Measures Prepared and Likely.For are dropped from the model.

107

Table 4.8: The Bias Path Analysis Model estimated coefficients. Standardized coefficients represent the change in response variable per change in predictor variable in units of standard deviation. P-values are the probability the effect is negligible; bootstrap p-values are estimated through bootstrap resampling. Standardized p- Bias Path Analysis Model Coefficient Coefficient value Graded = 푠 = Unreal ~ 휶ퟏ푿 훼12 -1.12 훼12 -0.434 0.046 Yr2015 = 푠 = 훼14 -0.933 훼14 -0.307 0.238 Team = 푠 = 훼15 -4.805 훼15 -0.533 0.03 Fami = 푠 = Likely.Against ~ 휶ퟒ푿 훼43 -0.119 훼43 -0.274 0.233 Yr2015 = 푠 = 훼44 -0.172 훼44 -0.498 0.042 Team = 푠 = 훼45 -0.513 훼45 -0.502 0.07 Team = 푠 = Worried ~ 휶ퟓ푿 훼55 -2.631 훼55 -0.324 0.225 BaseStress = 푠 = 훼56 0.532 훼56 0.343 0.176 Unreal = 푠 = Trip ~ 휷ퟏ푹 + 휸ퟏ푿 훽11 0.071 훽11 0.279 0.056 Likely.Against = 푠 = 훽14 -0.493 훽14 -0.219 0.116 Yr2015 = 푠 = 훾14 -0.546 훾14 -0.705 0.001 Team = 푠 = 훾15 -1.035 훾15 -0.451 0.015 Quiz = 푠 = 훾18 -2.415 훾18 -0.587 0.024 Likely.Against = 푠 = Dif ~ 휷ퟐ푹 + 휸ퟐ푿 훽24 0.421 훽24 0.172 0.103 Worried = 푠 = 훽25 -0.067 훽25 -0.217 0.195 Trt1 = 푠 = 훾21 0.223 훾21 0.313 0.158 BaseStress = 푠 = 훾26 0.147 훾26 0.309 0.107 Evaluation = 푠 = 훾27 -0.682 훾27 -0.564 0.064

In the Bias Path Analysis Model, Unreal has the same predictors that Bias has in the Latent

Bias Model: Graded, Yr2015 and Team. This highlights the influence of Unreal on the estimate of Bias in the Latent Bias Model.

4.3.1.2.1 Effects of the Manipulations on Simulator Bias in the Bias Path Analysis Model

Trt1: Trt1 is not a predictor for any of the Bias Measures. We expected Trt1 to manipulate prominent hypothesis bias (measured by Prepared, Likely.For and Likely.Against). As Trt1

108

is not a significant predictor, it is likely that the effects Fami overpower the impact of Trt1 in the model—remember that only Trt1=1 crews encountered scenarios with low Fami.

Graded: Students in Graded=1 scenario report a decrease in Unreal, which measures

Cavalier Behavior. Graded is included in the experiment to trigger the Incentive Effect, which is expected to reduce Cavalier Behavior. The effect of Graded on Unreal supports this interpretation of the interacting biases Incentive Effect and Cavalier Behavior.

Contrary to expectations, Graded is not a predictor for Worried, which is meant to measure the Incentive Effect. Instead, Worried is a function of Team and BaseStress. This indicates that Worried is not a good measure of Incentive Effect, which can be seen indirectly through

Graded’s effect on Unreal. The strength of the incentive in graded sessions (Graded = 1) is not strong enough to be seen over the impact of crew characteristics on operator’s anxiety about their grades.

Fami: Fami is meant to manipulate the Prominent Hypothesis Bias, which is measured by

Prepared, Likely.For and Likely.Against. We cannot see the impact of Fami on Prepared or Likely.For because these measures are not included in the model (they are not significant predictors for on Trip or Dif). As expected, Likely.Against decreases as Fami increases: the more familiar the scenario, the less likely students are to expect other accidents or events.

Yr2015: Yr2015 is a measure of training, an incidental bias manipulation introduced by the evolution of the experiment over the two years of data collection. We had no theoretical expectations of how Yr2015 would impact simulator bias. In the Bias Path Analysis Model,

Yr2015 is a predictor for Unreal and Likely.Against. The tie between Yr2015 and Unreal—

109

like the tie between Yr2015 and Bias in the Latent Bias Model—highlights the increase in perceived reality of the simulator environment with additional training and knowledge of the accidents. The decrease in Likely.Against in 2015 reflect the fact that some students in

2014 had strong, incorrect expectations about the coming scenario, while students in 2015 were more likely to have a strong correct hypothesis or to report no expectations about the scenario.

4.3.1.2.2 Effects of Bias on Operator Response in the Bias Path Analysis Model

In the Bias Path Analysis Model, Trip is a function of the Bias Measures Unreal and

Likely.Against, while Dif is a function of Likely.Against and Worried. Although Worried is meant to evaluate Incentive Effects, from the discussion above we see that Worried is a function of crew characteristics rather than the phase of the experiment.

Unreal: From the Bias causal model, Unreal represents Cavalier Behavior. The Bias Path

Analysis Model therefore suggests that Cavalier Behavior (through Unreal) increases the time for operators to trip the reactor (Trip) but has no effect on the time to diagnose the accident after the reactor trips (Dif).

Likely.Against: Likely.Against measures an operator’s Prominent Hypothesis, specifically, the strength of false expectations about the scenario. As might be expected,

Dif increases with Likely.Against—the stronger an operator’s false expectations, the longer it takes to diagnose the accident and leave E0 for the correct accident-specific procedure.

Somewhat counterintuitively, Trip decreases as Likely.Against increases. This can be explained by the fact that operators trip the reactor after any severe accident, often before

110

they have a strong understanding of the cause of the accident. In other words, expectations of any accident would decrease the time to trip the reactor. Dif, on the other hand, measures the time to identify the correct response to the specific accident, and will naturally increase if the operators have a mistaken initial hypothesis.

Worried: In the Bias Path Analysis Model, Dif decreases as Worried increases. One possible explanation for this effect is that students who are worried about their performance rush through the procedures in an effort to look competent and capable, while students who are not worried about the grades feel comfortable taking more time to respond to the accident. Also, Worried decreases as Team improves—in some instances, crews with better team dynamics take more time to discuss decisions before moving through the procedures, while in crews with poor team dynamics all decision making is made by the SRO without consulting with the RO.

4.3.2 Mediated Effects Table 4.9 summarizes the indirect effects in the Latent Bias Model. The Indirect Effect

(i.e., ab in Figure 4.5) is the effect of the scenario condition on the response variable that is mediated by Bias. The Indirect Effect is the standardized coefficient of the effect of the condition on Bias multiplied by the standardized coefficient for Bias’s effect on the response variable.

Partial Mediation in Table 4.9 is the indirect effect divided by the total effect. This variable shows the relative strength of the mediation.

There are no indirect effects acting on Dif because Bias is not a predictor of Dif in the

Latent Bias Model.

111

For Trip, Graded is fully mediated by Bias. Without the simulator environment, the model indicates that Graded would have no effect on Trip. Team is partially mediated by Bias; simulator bias accounts for approximately twenty-five percent of Team’s effect on Trip.

Outside the simulator environment, we would expect Team to be less influential on Trip.

Table 4.9: Mediation in the Latent Bias Model Standardized Partial Mediation P- Trip ~ Mediation Coefficient (ab) p-value (풂풃/풄) value Graded Complete 푠 푠 Α2Β1 -0.099 0.133 푠 푠 푠 푠 Yr2015 Partial Α4Β1 훢4훣1 푠 푠 푠 -0.056 0.342 훢4훣1 + 훤14 0.083 0.346 푠 푠 푠 푠 Team Partial Α5Β1 훢5훣1 푠 푠 푠 -0.115 0.205 훢5훣1 + 훤15 0.233 0.238

In the Bias Path Analysis Model, many of the mediated effects (Table 4.10) are inconsistent—that is, the simulator bias counteracts the direct effect of the condition on the response variable.

푠 For example, Table 4.8 shows that BaseStress’s direct effect is to increase Dif (훾26 > 0).

However, the mediated effect of BaseStress through Worried is to decrease Dif (from Table

푠 푠 4.10, 훼56훽25 < 0). Therefore, this model suggests that an operator’s tendency towards stress and anxiety will have a stronger effect in the control room than in the simulator environment.

112

Table 4.10: Mediation in the Bias Path Analysis Model. Standardized Coefficient Partial Mediation Trip ~ Mediation (ab) p-value (풂풃/풄) (p-value) 푠 푠 Graded Unreal, 훼12 훽11 -0.121 0.081 Complete = 푠 푠 Fami Likely.Against, 훼43 훽14 0.06 0.348 Complete = 푠 푠 훼푠 훽푠 +훼푠 훽푠 Yr2015 Unreal, Partial 훼14 훽11 -0.086 0.24 14 11 44 14 푠 푠 푠 푠 푠 = -0.035 = 훼14훽11+훼44훽14+훾14 푠 푠 (0.831) Likely.Against, 훼44훽14 0.109 0.233 Inconsistent = 푠 푠 훼푠 훽푠 +훼푠 훽푠 Team Unreal, Partial 훼15훽11 -0.149 0.121 15 11 45 14 푠 푠 푠 푠 푠 = 0.079 = 훼15훽11+훼45훽14+훾15 푠 푠 (0.764) Likely.Against, 훼45훽14 0.11 0.274 Inconsistent = Standardized Partial Mediation Dif ~ Mediation Coefficient p-value (p-value) (ab) 풂풃/풄 푠 푠 Fami Likely.Against, 훼43훽24 -0.047 0.378 Complete = 푠 푠 Yr2015 Likely.Against, 훼44훽24 -0.086 0.246 Complete = 푠 푠 Team Likely.Against, 훼45훽24 -0.087 0.255 Complete = 푠 푠 Worried, 훼55훽25 0.07 0.39 Complete = 푠 푠 훼푠 훽푠 Base Worried, 훼56훽25 -0.075 0.393 56 25 푠 푠 푠 = -0.318 Stress Inconsistent = 훼56훽25+훾26 (0.545)

4.3.3 Model Fit & Sample Size Recommendations To assess the validity of models such as the Latent Bias Model and the Bias Path

Analysis Model, two aspects of the model should be examined: the overall fit of the model to the data, and the strength of the interactions between model elements (i.e. the estimated coefficients)

In Section 4.3.3.1, we examine the significance of each estimated effect in the model using bootstrap resampling. Overall, the bootstrap supports the initial model values.

In Section 4.3.3.2, we assess the overall model fit using the Root Mean Square Error of

Approximation (RMSEA). We follow the consensus that an RMSEA of less than 0.05

113

represents a reasonable model fit (Hooper, Coughlan, & Mullen, 2008). Based on this threshold, in Section 4.3.3.3 we estimate the sample size needed to eliminate the possibility of an indeterminate RMSEA with a power of 80%.

Finally, in Section 4.3.4 we discuss how the recommended sample size changes if we wish to assess the strength of specific mediated effects in the analysis.

4.3.3.1 Bootstrap Validation of the Significance of Interactions Included in the Model Bootstrap analysis is a common resampling method used to validate models. We use the bootstrap sampling approach outlined in (Kutner, Nachtsheim, & Neter, 2008) to determine the p-value for the model coefficients. First, 1,000 bootstrap samples of 41 data points each are drawn from the original sample. We fit the model to each sampled dataset and store the estimated coefficients for each model that converges. The model converges successfully for approximately half of the sample data sets, resulting in a set of approximately 500 estimates of each coefficient.

The p-value for the coefficient is the probability that the effect is negligible. If the coefficient is positive, the bootstrap coefficient p-value is simply the proportion of the bootstrap estimated coefficients that are less than zero. Similarly, if the coefficient is negative, the bootstrap coefficient p-value is the proportion of the bootstrap estimated coefficients that are greater than zero.

Table 4.11 and Table 4.12 report the initial coefficient estimate (Coef) and p-value from

Table 4.7 and Table 4.8, along with the bootstrap coefficient estimate (Boot. Coef), the bootstrap p-value, and the 90% Confidence Interval (CI) limits for the coefficient estimate

114

from the bootstrap resampling. The CI limits correspond to the fifth and ninety-fifth quantiles of the bootstrap samples; the estimated bootstrap coefficient is simply the mean value of all the bootstrap samples that fall within the 90% CI.

Table 4.11: Bootstrap resampling in the Latent Bias Model Boot. Boot. 90% CI 90% CI – Boot. Boot. P- – lower upper Latent Bias Model Coef. Coef P-value value bound bound Bias=~ Unreal Λ1 1.0 0.1 - - - -

Bias=~ Prepared Λ2 -0.378 -0.519 0.086 0.034 -1.643 -0.02

Bias=~ Worried 횲ퟓ 0.173 0.032 0.164 0.392 -0.637 0.465

Bias~ Graded Α2 -1.296 -1.336 0.016 0.006 -2.563 -0.448

Bias~ Yr2015 Α4 -0.853 -0.906 0.227 0.078 -2.157 0.213

Bias~ Team Α5 -5.216 -5.098 0.023 0.028 -9.526 -0.564

Trip~ Bias Β1 0.05 0.057 0.141 0.072 -0.007 0.157

Trip~ Yr2015 Γ14 -0.467 -0.464 0.003 0.014 -0.702 -0.196

Trip~ Team Γ15 -0.852 -0.782 0.037 0.062 -1.532 0.033

Trip~ Quiz Γ18 -2.417 -2.329 0.022 0.002 -3.675 -1.063

Dif~ Trt1 Γ21 0.258 0.23 0.069 0.111 -0.093 0.495

Dif~ Base Stress Γ26 0.108 0.101 0.244 0.109 -0.035 0.238

Dif~ Evaluation Γ27 -0.668 -0.615 0.07 0.044 -1.131 -0.043

115

Table 4.12: Bootstrap resampling in the Bias Path Analysis Model Boot. Boot. 90% CI – 90% CI – Bias Path Boot. Boot. P- lower upper Analysis Model Coef. Coef P-value value bound bound

Unreal~ Graded 훼12 -1.12 -1.261 0.046 0.005 -2.584 -0.373

Unreal~ Yr2015 훼14 -0.933 -1.007 0.238 0.077 -2.33 0.126

Unreal~ Team 훼15 -4.805 -5.851 0.03 0.01 -10.581 -2.347

Likely.Against~ 훼43 -0.119 -0.127 0.233 0.067 -0.283 0.016 Fami

Likely.Against~ 훼44 -0.172 -0.176 0.042 0.019 -0.319 -0.031 Yr2015

Likely.Against~ 훼45 -0.513 -0.477 0.07 0.009 -0.819 -0.123 Team

Worried~ Team 훼55 -2.631 -3.665 0.225 0.099 -11.139 1.379

Worried~ Base 훼56 0.532 0.998 0.176 0.081 -0.134 4.178 Stress

Trip ~ Unreal 훽11 0.071 0.064 0.056 0.059 -0.001 0.13

Trip~ 훽14 -0.493 -0.507 0.116 0.07 -1.209 0.058 Likely.Against

Trip~ Yr2015 훾14 -0.546 -0.554 0.001 0.001 -0.866 -0.275

Trip~ Team 훾15 -1.035 -0.99 0.015 0.023 -1.718 -0.28

Trip~ Quiz 훾18 -2.415 -2.373 0.024 0.004 -3.672 -1.075

Dif~ 훽24 0.421 0.399 0.103 0.219 -0.785 1.21 Likely.Against

Dif~ Worried 훽25 -0.067 -0.051 0.195 0.197 -0.148 0.052

Dif~ Trt1 훾21 0.223 0.174 0.158 0.205 -0.182 0.489

Dif~ BaseStress 훾26 0.147 0.141 0.107 0.072 -0.025 0.321

Dif~ Evaluation 훾27 -0.682 -0.61 0.064 0.059 -1.128 0.035

All of the initial coefficient estimates fall within the bootstrap 90% CI. With one exception, the bootstrap validation returns p-values that are consistent with (or better than) the initial model p-values. The exception is in the Latent Bias Model: the p-value for the factor loading between Bias and Worried (휆4) increases from 0.164 to 0.392 in the bootstrap. The increased p-value indicates that perhaps Worried should not be a predictor for Bias.

Removing Worried as a factor for Bias is consistent with the Bias Path Analysis Model, which retains Worried as a predictor for Dif but not as a predictor for Trip.

116

With the assurance that the bootstrap resampling supports the validity of the Bias Path

Analysis Model and, for the most part, the Latent Bias Model, we turn to the RMSEA fit measure.

4.3.3.2 Root Mean Square Error of Approximation

The RMSEA value—represented by 휀 in the equations below—is calculated from 퐹0, the final value of the estimator used in the model fitting algorithm (the algorithm iterates through models until the fitting function, 퐹, reaches a minimum value, 퐹0). 퐹 is expected to follow a noncentral Chi-squared distribution, specified by two parameters: the degrees of freedom 푑11 and the noncentrality parameter 휆. The model Chi-squared value, 푇, is the expected value of the distribution, and is calculated as a function of 퐹0 and the sample size,

2 푛: 푇 = 퐸[휒푑,휆] = (푛 − 1)퐹0. The RMSEA value is defined as:

푇 − 푑 휀 = √ (4.5) 푛푑

The relationship between 퐹0, 푇 and 휀 is explained in greater detail in (MacCallum,

Browne, & Sugawara, 1996). General consensus is that RMSEA less than 0.05 represents a good fit, and RMSEA greater than 0.08 is a poor fit (Hooper, Coughlan, & Mullen, 2008).

Because RMSEA follows a known distribution, researchers can estimate confidence intervals to test the significance and power of hypothesis tests of model fit. Following the

11 푑~number of observed covariances (i.e. 푝(푝 + 1)/2 where 푝 ~ number of observed variables) minus the number of parameters estimated in the model (Raykov & Marcoulides, 2000).

117

approach outlined in (MacCallum, Browne, & Sugawara, 1996), we use the test of “not- close fit” to determine whether we should reject a proposed model.

The null hypothesis in the test of not-close fit is that the RMSEA value is greater 0.05, the upper limit for a close-fitting model. The alternative hypothesis is that the model is a close fit to the data:

퐻 : 휀 > 0.05 { 0 (4.6) 퐻푎: 휀 ≤ 0.05

To test this hypothesis, we establish a confidence interval (CI) around the estimated

RMSEA value. There are three possible outcomes:

- Not-Close Fit: Do not reject the null hypothesis if the lower limit of the CI is greater than 0.05. - Close Fit: Reject the null hypothesis that the model is not a close fit if the upper limit of the CI is less than 0.05. - Indeterminate: The test is indeterminate if the CI includes 0.05. For both models, the 90% RMSEA CI estimated by lavaan is [0,0.0001], meaning we can reject the hypothesis of not-close fit and conclude that our models are a close fit to the data.

Our objective in the next section is to determine the sample size needed for a high confidence in this result.

4.3.3.3 The Power of the Test We must now examine the power of that result—that is, the probability that the models are actually a close-fit for the data.

To do this we specify a new hypothesis, again following the approach suggested in

(MacCallum, Browne, & Sugawara, 1996). The null hypothesis is that the RMSEA value

118

is 0.05, and the alternative hypothesis is that the RMSEA value is a very close fit, 휀 =

0.01:

퐻 : 휀 = 0.05 { 0 (4.7) 퐻푎: 휀 = 0.01

We calculate the probability that a model with 휀 = 0.01 will not be recognized as a close- fit to the data. We do this by exploiting the relationship between the RMSEA value and the

Chi-squared distribution. By definition (Krishnamoorthy, 2015), the expected value of a

2 non-central Chi-squared distribution is 퐸[휒푑,휆] = 푑 + 휆. This allows us to calculate the

Chi-squared non-centrality parameter, 휆, as a function of the RMSEA:

휆 = 휀2푛푑 (4.8) With 푑 = 30 (selected because 푑 = 31 for both the Latent Bias Model and the Bias Path

2 2 Analysis Model), we use 휀0 = 0.05 and 휀푎 = 0.01 to calculate 휆0 = 휀0 푛푑 and 휆푎 = 휀푎푛푑.

The associated Chi-squared distributions are plotted in Figure 4.9. The solid line is the distribution associated with the null hypothesis, and the dotted line is the alternative hypothesis. Figure 4.9 compares three sample sizes: (a) n = 41, (b) n = 204, and (c) n =

338.

119

Figure 4.9: Power analysis and sample size for a model with 30 degrees of freedom. (a) shows the OSU NPP Simulator Experiment sample size, (b) is the recommended sample size for 훼 = 0.2, 휋 = 0.8, and (c) is the recommended sample size for 훼 = 0.1, 휋 = 0.9.

2 The thin gray line in Figure 4.9 marks the critical point, 휒푐 : we do not reject the null

2 hypothesis if the Chi-squared value is greater than 휒푐 . The critical point is calculated from

휆0 according to Equation 8:

훼 Pr(휒2 < 휒2) = (4.9) 푑,휆0 푐 2

The power of the test, 휋, is the probability that we do not accept 퐻0 when 퐻푎 is true. In

Figure 4.9, this corresponds to the area under the dotted red line to the left of the critical point:

휋 = 푃푟(휒2 < 휒2) (4.10) 푑,휆푎 푐 Figure 4.9 (a) illustrates the difficulty in differentiating between a close-fit and a not-close fit with a small sample size: with n = 41, 휋41 = 0.16. Figure 4.9 (b) and (c) show how the power improves as sample size increases; with n = 204 and a critical point defined by 훼 =

120

0.2, the discriminating power is 80%. Increasing n to 338 yields a power of 90% with a more stringent 훼 = 0.1. We selected the sample sizes n = 204 and n = 338 by setting 훼 and substituting different values of n into 휆0 and 휆푎 (Equation 8) until Equations 9 and 10 were true for the desired power level.

In summary, a sample of n = 300 is recommended to obtain conclusive results to the test of not-close fit with a power of 80% for both the Latent Bias Model and the Bias Path

Analysis Model. Note that the recommended n will vary with the degrees of freedom in the model. The sample size for the Latent Bias Model and Bias Path Analysis Model is high because each model has only 31 degrees of freedom.

Table 4.13: Cohen's effect size thresholds (Fritz & MacKinnon, 2007). Mediated Effect Standardized Coefficient Small (S) 0.18 Halfway (H) 0.26 Medium(M) 0.39 Large (L) 0.59

4.3.3.4 Sample size requirements for testing mediated effects In general, n = 300 will yield high confidence in the estimated model parameters. The exception is mediated effects, which are more difficult to estimate. To estimate the sample size needed to have high confidence in the mediated effects, we use the guidelines proposed in (Fritz & MacKinnon, 2007).

The sample size depends on the strength of the mediated effect of interest—in this case, the magnitude of the standardized coefficient. Cohen (Cohen, 1988) proposed widely- accepted thresholds for categorizing effects as small (S), medium (M) or large (L) in regression analysis; Fritz and MacKinnon (Fritz & MacKinnon, 2007) add a halfway (H)

121

threshold that evaluates effects halfway between small and medium. The recommended standardized coefficient thresholds corresponding to small, halfway, medium and large effects are listed in Table 4.14. Using Monte Carlo simulation, Fritz and MacKinnon estimate the minimum sample size needed to detect each effect size in mediated models.

Typically, the sample size required to detect mediated effects is much greater than sample sizes needed to assess direct effects. Table 4.14 lists estimates of the minimum sample size needed to detect mediated effects using the percentile bootstrap validation approach with a power of 0.8.12

Table 4.14: Recommended minimum sample size required for bootstrap power analysis (휋 = 0.8). Effects are listed as distal-proximate effects, i.e. SM indicates Small distal effect (a in Figure 4.5) and a Medium proximal effect (b in Figure 4.5). S, M and L correspond to Cohen’s criteria for Small, Medium, and Large Effects; H represents Halfway between S and M. From Table 3 in (Fritz & MacKinnon, 2007).

Mediated effects SS SH SM SL HS HH HM HL n 558 412 406 398 414 162 126 122

Mediated effects MS MH MM ML LS LH LM LL n 404 124 78 59 401 123 59 36

Based on Table 4.14, a sample of 푛 = 41 should capture significant Large-Large mediated effects, that is, effects that have a large effect on bias when bias has a large effect on the response variable. With lower power, our small sample should detect significant Large-

Medium and Medium-Large mediated effects. Smaller effects (SS, SL, etc.) may be found to be significant, but power associated with these estimates will be low.

12 Fritz and MacKinnon review six different tests for the significance of the effect. We selected the percentile bootstrap one of the most powerful and straightforward tests, but the other approaches might be preferable in other contexts.

122

4.3.4 Recommended Sample Size for Future Research Figure 4.10 diagrams the structure of a model that might be used in future experiments.

The proposed model is based on the Bias Causal Model (Figure 4.1), and includes three latent biases, each measured by three indicators.

We retain the expectation from the Bias Causal Model that Incentive acts on Cavalier

Behavior, but we separate the effects of Cavalier Behavior and Prominent Hypothesis on the response variables Trip and Dif. This corresponds to the results from this analysis that show Trip is a function of Unreal and Likely.Dist, while Dif depends on Likely.Dist and

Worried. These results indicate that reducing “Bias” to a single effect is too simplistic, and a more nuanced model is required.

The heavy lines in Figure 4.10 correspond to the model we would expect based on the Bias

Path Analysis Model, with Unreal corresponding to Cavalier Behavior and Likely.Dist to

Prominent Hypothesis. The other possible links are shown by the lighter lines in Figure

4.10.

123

Figure 4.10: Sample Bias SEM for future work.

As illustrated in Section 4.3.3.4, the recommended sample size varies with the degrees of freedom in the model. If we retain all possible links in the model, d = 82. If we include only the links suggested by the Bias Path Analysis Model (i.e. the darker lines in Figure

4.10), d = 106. In general, removing links between variables increases d; removing variables from the model or adding links between variables decreases d. As Figure 4.11 shows, the recommended sample size stabilizes as degrees of freedom in the model increase. In Figure 4.11, the black line corresponds to 훼 = 0.1, 휋 = 0.9, and the dotted line to the less stringent criteria 훼 = 0.2, 휋 = 0.8. For d = 100, the minimum recommended n ranges from 89 to 139.

124

Recommended Sample Size: 휶 = ퟎ. ퟏ, 휶 = ퟎ. ퟐ, d 흅 = ퟎ. ퟗ 흅 = ퟎ. ퟖ 30 372 223 50 272 166 82 203 125 106 175 109 150 143 90 200 122 77 30 372 223

Figure 4.11: Recommended sample size as a function of degrees of freedom. LEFT: The solid line shows n(d) for 훼 = 0.1, 휋 = 0.9; the dotted line corresponds to 훼 = 0.2, 휋 = 0.8. RIGHT: recommended sample size for the anticipated degrees of freedom in the proposed Bias SEM.

Recall from Figure 4.9 that the recommended sample size for testing the Bias Path Analysis

Model is n = 300, more than twice the recommended n for the model proposed in Figure

4.10. This is because d = 100 in the proposed model, and d = 28 in the Latent Bias Model and d = 30 in the Bias Path Analysis Model. The proposed model has three measurement variables for each latent variable. Adding measurement variables to the model increases the information collected in each scenario, which reduces the number of scenarios that must be collected. In essence, the proposed model allows us to extract more information per sample rather than increasing the number of samples required.

4.3.4.1 Alternative approach for estimating sample size

Instead of relying on statistical tests to estimate sample size, many researchers use Monte

Carlo simulations to predict how their model will behave for various sample sizes. Based on a survey of Monte Carlo simulations of SEM research, Westland (Westland, 2010)

125

recommends choosing n based on r, the ratio of indicators to latent variables, according to

Equation 11:

푛 ≥ 50 푟2 − 450 푟 + 1100 (4.11)

In Figure 4.10, each latent variable has three associated indicators. For r=3, Westland’s recommended sample size is n=200. At n=200, the power calculated using the test of not- close fit (MacCallum, Browne, & Sugawara, 1996) with 훼 = 0.1 is 휋 = 0.75 for models with d = 50 and 휋 = 0.94 for models with d = 100.

4.3.4.2 Recommended Sample Size for Mediated Effects Recall from Table 4.14 that detecting mediated effects may require larger samples.

Researchers should use the findings from (Fritz & MacKinnon, 2007) in conjunction with the RMSEA test of not-close fit or Westland’s Monte Carlo-based guideline to design experiments to test mediated effects.

4.4 DISCUSSION The vision behind this research is to develop a method to capture the impact of simulator bias on data collected in simulator facilities. This approach allows us to study human reliability in a simulator environment, then calibrate the data to better represent operator actions in a control room.

The examples below illustrate how to estimate the effects of scenario conditions (Section

4.4.1) and how we could quantify the gap between the simulator and control room data

(Section 4.4.2). We do not expect the numerical results to represent experienced operators—remember that this model is based on a small sample of student operators—but we provide these examples to illustrate the potential applications of the method.

126

The OSU NPP Simulator Experiment also provides insight into data collection for HRA research. With this in mind, Section 4.4.3 examines how simulator bias can be incorporated into the Standardized Plant Analysis Risk—Human Reliability Analysis (SPAR-H) method, a well-known HRA method (Gertman D. , Blackman, Marble, Byers, & Smith,

2005).

4.4.1 Quantitative Analysis of the Variations in Simulator Bias In this example, we examine how experiment phase (Graded) impacts response times in the Bias Path Analysis Model.

Let Condition 0 represent the nominal scenario, i.e. 퐺푟푎푑푒푑0 = 0, and Condition 1 represent the added incentive, i.e. 퐺푟푎푑푒푑1 = 1.

From Table 4.8, 훼12 = −1.12 is the Bias Path Analysis Model coefficient for the effect of

Graded on Unreal, and 훽11 = 0.071 is the effect of Unreal on Trip. Therefore,

푇푟푖푝1 = 푇푟푖푝0 + (훼12훽11)×(퐺푟푎푑푒푑1 − 퐺푟푎푑푒푑0) (4.12)

Inserting the values from Table 4.8 yields

푇푟푖푝1 − 푇푟푖푝0 = −0.08

Recall that Trip and Dif are scenario normalized response times, that is, the measured time divided by the mean time for that scenario. This means, holding all other factors constant, we expect an eight percent decrease in Trip in the Exam phase of the experiment.

4.4.2 Quantitative Estimate of Overall Bias Effects We use similar analysis to estimate the impact of the simulator environment on operator response time in the Bias Path Analysis Model. Here, Condition 0 is in the simulator, and

127

the hypothesized Condition 1 is in the actual control room. We estimate the change in the

Bias Measures when we move from the simulator to the control room, then adjust Trip and

Dif accordingly.

From Table 4.8, 훽11 and 훽14 are Trip’s coefficients for Unreal and Likely.Against in the

Bias Path Analysis Model; 훽24 and 훽25 are Dif’s coefficients for Likely.Against and

Worried:

푇푟푖푝 = 푇푟푖푝 + 훽 (푈푛푟푒푎푙 − 푈푛푟푒푎푙 ) + 1 0 11 1 0 훽 (퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡 − 퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡 ) 14 1 0 (4.13) 퐷푖푓1 = 퐷푖푓0 + 훽24(퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡1 − 퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡0) + { 훽25(푊표푟푟푖푒푑1 − 푊표푟푟푖푒푑0)

Remembering that the Bias Measures are ordinal data, we estimate the control room values for the Bias Measures as listed in Table 4.15.

Table 4.15: Expected values of Bias Measures in a control room (rather than a simulator).

Expected change in Bias Measures moving from the simulator (푪풐풏풅풊풕풊풐풏ퟎ) to the control room. and estimated values in the control room (푪풐풏풅풊풕풊풐풏ퟏ)

Unreal: The variable 푈푛푟푒푎푙1 represents the operator feeling of unreality in the control room. Unreal has five levels; the mean reported Unreal in this experiment is Level 3. We assume the control room value 푈푛푟푒푎푙1 = 1, the minimum level. Moving from the mean Unreal in the simulator (3) to the minimum level (1) results in a decrease of 2 levels: 푈푛푟푒푎푙1 − 푈푛푟푒푎푙0 = 1 − 3 = −2 This measure can be improved by surveying operators who have responded to design basis accidents while on shift. Preferably, the reality of the simulator vs control room environments will be assessed through multiple survey questions.

Continued

128

Table 4.15 Continued

Prepared: Although Prepared is not in the final model, for the sake of completeness we review how to estimate Prepared in the control room. Operators in the control room are not expecting an accident to occur, but their extensive training should make them reasonably well prepared when an accident happens. We therefore estimate that 푃푟푒푝푎푟푒푑1 = 푃푟푒푝푎푟푒푑0, assuming that operators who are equipped to respond to a given scenario in the simulator are equally well prepared to respond to the same accident in the control room. Therefore, 푃푟푒푝푎푟푒푑1 − 푃푟푒푝푎푟푒푑0 = 0 Alternatively, 푃푟푒푝푎푟푒푑1 could be quantified as a function of the time spent training on the accident and recency of the training.

Likely: Likely is a conditional probability; when an accident occurs, Likely is the operator’s assessment of the probability that a specific set of events is the root of the accident. Likely.For is the expectation of the event that actually occurs; Likely.Against is the expectation of accidents that do no actually occur. In the Bias Path Analysis Model, Trip and Dif are functions of Likely.Against (Likely.For is dropped from the model).

Theoretically, Prominent Hypothesis Bias in the control room is zero, so we set 퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡1 = 0. The average Likely.Against in the simulator is 퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡0 = 0.14. This yields: 퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡1 − 퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡0 = 0 − 0.14 = −0.14 Alternatively, it is possible that operators bring a hypothesis bias into the control room, based on their training and experience. We can estimate 퐿푖푘푒푙푦. 퐹표푟1 and 퐿푖푘푒푙푦. 퐴푔푎푖푛푠푡1 by surveying operators about their expectations in conjunction with reviewing training and plant history.

Worried: Worried is 3-level ordinal variable. The student operators report whether or not they are worried that their performance will impact their course grade. Although this is not an exact parallel, we set 푊표푟푟푖푒푑1 = 3, the maximum value on Worried’s 3-level scale. The mean level for Worried in the experiment is 2, resulting in a one-level increase in Worried from simulator to control room: 푊표푟푟푖푒푑1 − 푊표푟푟푖푒푑0 = 3 − 2 = 1

Inserting the appropriate values from Table 4.8 into Equation 13 results in

푇푟푖푝 = 푇푟푖푝 + 0.071×(−2) − 0.493×(−0.14) = 푇푟푖푝 − 0.073 { 1 0 0 (4.14) 퐷푖푓1 = 퐷푖푓0 + 0 .421×(−0.14) − 0.067×(1) = 퐷푖푓0 − 0.126

129

If, as in the previous section, the average time to trip the reactor is 530s, and if the time to exit E0 is an additional 895s,

푆푒푐푇푟푖푝 = 푆푒푐푇푟푖푝 ×(1 − 0.072) = 0.927×530푠 = 491푠 { 1 0 (4.15) 푆푒푐퐷푖푓1 = 푆푒푐퐷푖푓0×(1 − 0.126) = 0.874×895푠 = 782푠

This corresponds to tripping the reactor about half a minute faster in the control room, and exiting E0 approximately two minutes faster in the control room. A total reduction in scenario time of 2.5 minutes is equivalent to approximately ten percent of the time required in the simulator.

A few caveats: First, the significance of a ten percent reduction in time depends on the scenario and the time available. In some instances, this time difference is trivial; in other scenarios, these results might mean the difference between failure in the simulator but success in the control room.

Second, as always, this is a demonstration experiment, meant to illustrate the possibilities of our methods. We do not expect these differences in times between the simulator and the control room to be valid for experienced operators. We understand that, in general, operators are expected to require more time in the control room because of prominent hypothesis effects, etc.

4.4.3 Simulator Bias and the SPAR-H HRA Model One objective in modeling simulator bias is to understand how bias impacts the simulator data used in HRA research. As an example, we examine how simulator bias is likely to impact the widely-used HRA method, SPAR-H.

130

SPAR-H is a high-level HRA method that uses a two-step process to estimate the probability of human error events. The first step is to specify a nominal Human Error

Probability (HEP). The second step is to adjust the HEP based on the expected PSFs and dependence in the scenario (Gertman, Blackman, Marble, Byers, & Smith, 2005). To do this, the analyst estimates the levels for each of the eight SPAR-H PSFs and applies the appropriate multiplier—these are listed in the left columns of Table 4.16. For example, in a scenario with incomplete procedures but highly experienced operators and otherwise nominal PSFs, the HEP will be multiplied by 10 (i.e., 20 x 0.5).

The SPAR-H PSF multipliers are a prime candidate for simulator studies. Researchers can manipulate PSFs and observe resulting errors, developing data-driven confidence intervals for the proposed multipliers. Simulator bias should be considered in the experimental design for such a study. In some cases, bias can be used to manipulate PSFs—incentives, for example, may be introduced to impact time available or add artificial stressors. In other cases, simulator biases may have unintended effects that skew the data collected in the simulator. For example, Policy Response Bias may make change the way operators interact with procedures, which could impact the effective quality of the procedures. This and other examples of how Motivational Biases are likely to impact the SPAR-H PSFs are listed in

Table 4.16.

131

Table 4.16: SPAR-H PSFs, levels, and multipliers (Gertman, Blackman, Marble, Byers, & Smith, 2005); biases expected to influence the SPAR-H PSFs. SPAR-H PSF Level PSF Motivational Biases influencing the PSF PSF Multiplier Available Inadequate time P(failure) = 1 • Incentive effects, if incentives are tied to time Time Time available = time 10 required Nominal time 1 Time available >= 5 x time 0.1 required Time available > 50x time 0.01 required Stress/ Extreme 5 • Incentive effects, if incentive induce stress in Stressors High 2 operators Nominal 1 Complexity Highly complex 5 • Hypervigilance/Cavalier Behavior: the Moderately complex 2 impact of complex scenarios may be altered by Nominal 1 the operator hypervigilance or cavalier behavior. Hypervigilant operators will investigate each aspect of the scenario as thoroughly as possible, while cavalier operators will pass over important details. • Prominent Hypothesis Bias: if operators expect the event, its complexity is effectively reduced. Masking effects that might be significant in the control room are eliminated if the operators correctly anticipate the scenario. Experience/ Low 3 • Prominent Hypothesis Bias: expectations of a Training Nominal 1 particular accident may prime operators to High 0.5 response appropriately or encourage them to disregard training and experience Procedures Not available 50 • Policy Response Bias: operators who believe Incomplete 20 they are expected to follow procedures are more Available but poor 5 likely to attempt to continue to follow the Nominal 1 procedure even when the procedure is inadequate. Ergonomics/ Missing/misleading 50 • Influenced by Environmental (not Motivational) HMI Poor 10 Biases Nominal 1 Good 0.5 Fitness for Unfit P(Failure) = 1 • Cavalier Behavior: Operators who know they Duty Degraded fitness 5 will spend their shift in the simulator may not Nominal 1 be as prepared for work as operators who know they will be in the CR. Work Poor 2 • Policy Response Bias: as with procedures, Processes Nominal 1 operators may choose to execute certain work Good 0.8 processes that they believe are expected but that they might skip in an actual accident situation.

To completely understand the data collected in a SPAR-H PSF experiment, researchers should develop measures to assess the relevant PSFs as well as the simulator biases that

132

are expected to be active. Measures of Policy Response Bias should be included, along with measures of the Incentive Effect (if incentives are used), measures of Cavalier

Behavior (in addition to the Unreal survey question used in this analysis) and measures of

Prominent Hypothesis Bias (including but not limited to Prepared, Likely.For and

Likely.Against).

4.5 CONCLUSION The approach proposed in this chapter paves the way for a better understanding and improved models of simulator bias. Recommendations for future work include:

Develop multiple measures for each expected Bias. In this study, we assessed biases with one or two survey questions. Multiple measures of each bias should be developed, preferably a mix of independent, subjective and objective measures, perhaps assessed by the operator and by an independent observer. Measures of safety culture might be a good starting place for devising measures of Cavalier Behavior and Hypervigilance (for example, (NS Tutorial: Developing Safety, n.d.)). When possible, more informative measures should be developed; yes or no questions such as “Are you worried your performance will affect your grade in the class?” should be replaced with, “On a scale from one to five, how much do you worry that your performance will affect your grade?”

This is particularly necessary for Prominent Hypothesis Bias, which should be treated as a two-dimensional factor influenced by both the strength and the accuracy of the operators’ expectations.

133

Do not assume Simulator Bias can be treated as a single variable. In the Path Analysis

Model, Unreal and Likely.Dist affect Trip, while Likely.Dist and Worried affect Dif. This is because the effect of the Prominent Hypothesis does not act in concert with Cavalier

Behavior. Multiple measures for each bias allow for models with multiple, interacting latent variables.

Control crew characteristics in the experimental design. We did not control for team dynamics (Team), operator tendencies (BaseStress) or operator expertise (Quiz,

Evaluation) in the OSU NPP Simulator Experiments, but these factors have a significant impact on bias and response times. Rather than randomly assigning crews, future research should control for crew characteristics in the experimental design.

Include specific response variables in the experimental design. For simplicity, we limited our analysis to two response variables (Trip and Dif), but other response variables of interest might include number of diagnoses considered before identifying the accident, reactor operator confidence throughout the scenario, etc. Measures of operator stress will be important in experiments related to HRA models such as SPAR-H.

Develop dynamic measures of simulator bias. The difference between bias effects on

Trip and Dif suggests that bias evolves over the scenario, and that bias may interact with other (also evolving) PSFs. Ideally, future studies will propose a method to evaluate biases periodically throughout the scenario rather than as static factors at the end.

Collect data from null scenarios and accident scenarios. Student operators in the OSU

NPP Simulator Experiment expected an accident to occur during all of their simulator

134

sessions. While experienced operators are prepared to respond to any and all adverse events, they do not expect an accident to occur. We can eliminate this gap between simulator and control room conditions by including simulator sessions that focus on routine work.

Use experienced operators. Our student operator models demonstrate the feasibility and potential of this method, but reliable models of simulator bias require data from a reliable source.

This research proposes a set of simulator biases to consider when designing human reliability simulator experiments. Consideration of simulator bias can mean simply acknowledging that these biases are likely to be present and looking for qualitative ways to account for their effects, or it can be a more rigorous treatment of bias correction and elimination. We expect the short list of biases presented here will be modified and expanded as this body of research grows. More detail can be added, perhaps specifying various aspects of Cavalier Behavior or examining the relationship between “tunnel vision” and Hypervigilance.

We propose using causal models to theorize how biases interact with each other and how they impact operators in a simulator environment. The OSU NPP Simulator Experiment shows that the causal model proposed in Section 4.2.3 is too simplistic, and that simulator bias should not be treated as a single effect but rather as a set of interacting factors that can influence operator behavior in various ways. The difference in how the Bias Measures influence Trip and Dif further suggests that effects of simulator bias are not constant throughout a scenario but should be expected to vary as the scenario evolves.

135

Most importantly, we demonstrate that the effects of the simulator environment—abstract and intangible though they may be—are not beyond the reach of experimental science.

With proper instruments, SEM can help assess the invisible factors that influence our research.

Although our demonstration study is small and uses student operators, our results suggest that observing approximately 200 scenarios will yield models of sufficient power to make conclusive judgments about some of the simulator effects observed in this experiment.

SACADA is expected to collect information from 560 scenarios per year (Chang, et al.,

2014), more than twice the sample size needed for robust models of simulator bias.

The biggest challenge to this effort is not scope or scale, but in developing data collection instruments to measure bias effects. Multiple, independent measures of each bias are necessary to build robust models that capture the different simulator biases and their effects. The surveys we developed for this study are a starting point for this effort.

136

5 A Data-driven Bayesian Model of Subjective PSF Values

As discussed in previous chapters, the objective of this research is to illustrate how to use digital simulator data to improve HRA. In this chapter, we introduce a data-driven PSF model that links objective Environmental PSF values to subjective PSF values reported by student operators.

The process is illustrated in Figure 5.1. The Environmental PSF model estimates the PSF values we expect given the plant conditions. This model is derived from the IDAC PSF algorithms and modified for the OSU GPWR data. The Reported PSFs are collected from student operators in the OSU NPP Simulator Facility using the data collection process discussed in Chapter 2. The two sets of PSFs—Environmental and Reported—are linked via a hierarchical Bayesian linear regression model with random effects. The result is the

Posterior Subjective PSF Model, a PSF model that primarily relies on objective information to predict reported PSF values. A fit measure based on Cohen’s Kappa for inter-rater reliability is proposed to evaluate the model.

Table 5.1 outlines the elements of each component of the Posterior PSF model, including obtaining Reported PSF data, estimating Environmental PSF values associated with each reported PSF, and using this data to develop the Posterior PSF model.

137

Figure 5.1: The PSF model building process.

Table 5.1: The components of the proposed PSF BN. Component Reported PSFs Environmental Posterior PSF model PSF Model Development 1. Design experiment 1. Extract PSF 1. Specify model Process and data collection algorithms from structure (Bayesian instruments IDAC model network) 2. Train student 2. Adapt IDAC PSF 2. Define 휃, the set of operators algorithms for unknown model 3. Conduct simulator simulator parameters sessions with student environment 3. Specify prior operators 3. Calculate distributions for each 4. Collect PSF data for Environmental element of 휃 each session PSFs for OSU 4. Optimize 휃 by simulator sessions maximizing 푝(휃|푋, 푌) ∝ 푝(푌|푋, 휃)푝(휃)13

Results 풀, the operator 푿, a set of Probability distribution reported PSF values Environmental PSF for a new set of reported for the OSU simulator values for OSU PSF values 푌̂ given 휃 sessions. The jth simulator sessions. and new environmental evaluation of PSF p in The jth estimate of values 푋̂: simulator observation i PSF p in simulator

is observation i is 풑(풀̂|휽, 푿̂) 풚풑풊풋 풙풑풊풋

13 This is the unnormalized posterior density; see Equation 5.17

138

Section 5.1 describes the data used in the model, including the Environmental PSFs calculated from the GPWR data and the Reported PSFs obtained by surveying the student operators. Section 5.2 explains the Bayesian approach used to develop the Posterior PSF model, Section 5.3 defines the proposed model, and Section 5.4 summarizes the results.

Section 5.5 analyzes the model results and Section 5.6 discusses the implications for future research.

5.1 DATA USED IN THIS RESEARCH: ENVIRONMENTAL AND REPORTED PSFS Two PSF data sets are used in this research: Environmental PSFs and Reported PSFs.

The Environmental PSFs are based on the PSF algorithms in the IDAC model. The

Reported PSFs are reported by the student operators.

5.1.1 Environmental PSFs The IDAC model is used as the basis for calculating the prior beliefs about the five major

PSFs: TCL, PIL, TC, CTL and S. For each PSF, IDAC calculates a numeric value based on relevant parameters; for example, IDAC TCL is calculated using the time available to the operator, and PIL is based on the number of active alarms.

The parameters in the Environmental PSF algorithms have not been verified experimentally. A first step towards deriving experimental values for these parameters would be to do sensitivity testing using the OSU model. We could insert a range of

Environmental PSF parameter values, calculate corresponding Environmental PSFs, and compare the model results to the models obtained using the original IDAC parameters.

139

The IDAC PSF values represent estimates of how the operators are expected to feel based on plant conditions. Section 5.3 maps the IDAC PSFs to reported PSFs, accounting for all the human variability that distorts the logical PSF (the Environmental PSF) to become the experienced (perceived) PSF.

Before these values can be compared, Environmental measures must be developed for use in the simulator setting with student operators. IDAC was developed as a computer program, not a tool to predict PSFs in a simulator scenario, and the IDAC PSF algorithms must be modified to generate the Environmental PSF model.

This section reviews the IDAC definition of each PSF and explains the modifications and assumptions used to create the Environmental PSF algorithm based on the original IDAC algorithm. In this research, the IDAC parameters are used in the Environmental PSF algorithms unless they are need to be modified for the OSU GPWR scenario data.

The parameters in the Environmental PSF algorithms have not been verified experimentally. A first step towards deriving experimental values for these parameters would be to do sensitivity testing using the OSU model. We could insert a range of

Environmental PSF parameter values, calculate corresponding Environmental PSFs, and compare the model results to the models obtained using the original IDAC parameters.

We could identify the range of parameters that do not change the model outcomes, and possibly identify an alternative parameter range that results in a better fit to the OSU

Reported PSF data.

140

5.1.1.1 Time Constraint Load TCL is “the pressure induced by the perception of the available time to complete a task”

(Li, 2013). To quantify this pressure, the IDAC algorithm measures the time available until plant parameters reach a critical threshold.

5.1.1.1.1 Background: Calculating TCL in IDAC IDAC’s Time Constraint Load algorithm captures the time available, t.a, until the plant parameters reach a pre-defined threshold. Two sets of parameter thresholds are used: nominal operating conditions and emergency (post-trip) conditions. The t.a variable is the minimum time available until one of the plant parameters reaches a pre-determined threshold.

0 T.a is used to estimate nominal TCL at point i, TCLi . The researcher defines tlower, the minimum amount of time that can be useful to the operators, and tupper, the maximum

0 time an operator might need. TCLi is defined as

0 푡. 푎푖 − 푡푙표푤푒푟 푇퐶퐿푖 = 1 − (5.1) 푡푢푝푝푒푟 − 푡푙표푤푒푟

0 TCL is restricted to [0,1]; if t.a is less than tlower, TCLi = 1, if t.a is greater than tupper,

0 TCLi = 0.

TCL.i is the sum of the TCL from the previous timestep and the nominal TCL, moderated by a TCL buildup factor, 휏푏푢푖푙푑푢푝, representing the rate at which the TCL experienced by the operator increases:

0 Δ푡푖 푇퐶퐿푖 = 푇퐶퐿푖−1 + (푇퐶퐿푖 − 푇퐶퐿푖−1) ( ) (5.2) τ푏푢푖푙푑푢푝

141

0 In Equation 5.2, Δ푡푖 is the amount of time between TCLi-1 and TCLi. When TCLi reaches its maximum, TCL decays exponentially with decay constant 휏푑푒푐푎푦 from

0 tthreshold, the time TCLi reached one:

+ 푡푖 − 푡푡h푟푒푠h표푙푑 푇퐶퐿푖 = 1 − exp ( ) (5.3) τ푑푒푐푎푦

To calculate TCL for a simulator scenario, the researcher must determine:

- P, the set of parameters to track in the t.a calculation.

o IDAC SGTR simulations track Pressurizer pressure, Pressurizer level, and Steam

Generator levels.

o In addition to determining which parameters to track, the researcher must define:

▪ P.thresholds, the boundaries for each parameter P, in both nominal and

emergency conditions.

▪ Time interval for determining 푃̇, the parameter rate of change, which is

needed to determine the time available until that parameter reaches its

designated threshold.

0 - Thresholds in TCLi , tlower and tupper.

o The IDAC SGTR simulations specify different thresholds for the SRO and the

RO:

14 ▪ SRO: tlower = 1 minute and tupper = 20 minutes.

▪ RO: tlower = 5 minutes and tupper = 25 minutes.

- Time constants 휏푏푢푖푙푑푢푝 and 휏푑푒푐푎푦.

14 See the IDAC literature for justification of these and other constants in the Environmental PSF equations (Li, 2013).

142

o The IDAC SGTR simulations specify different thresholds for the SRO and the

RO:

▪ SRO: 휏푑푒푐푎푦 = 300푠, 휏푏푢푖푙푑푢푝 = 50푠.

▪ RO: 휏푑푒푐푎푦 = 100푠, 휏푏푢푖푙푑푢푝 = 50푠.

- Δ푡푖, the timestep for calculating TCL.i (0.5 s in IDAC).

5.1.1.1.2 Environmental TCL The TCL parameters are based on the EOPs the operators use in the simulator. The OSU

Environmental PSF TCL algorithms use the IDAC SRO thresholds and time constants.

The SRO constants were selected because the SRO tlower and tupper thresholds are closer to the timescale of the OSU scenarios, which last 30 to 60 minutes. Setting tlower to the

IDAC RO value of 5 minutes would mean we expect operators to experience maximum

TCL throughout most of the scenario. With approximately 30 procedure steps per scenario, operators are functioning on a time scale of 1-2 minutes per procedure step, not

5 or more. We decided to calculate both SRO and RO TCL using the same parameters, because in the OSU scenarios the SRO and RO work as a team and are expected to experience similar TCL effects.

Parameters: In addition to the parameters tracked in the IDAC SGTR simulations

(Reactor Coolant System (RCS) pressure and level; steam generator levels), OSU TCL tracks steam generator pressure and containment pressure. This is because OSU scenarios include a Loss of Coolant Accident and a Steam Line Break. The thresholds for each parameter are listed in Table 5.2, along with the parameter name in the GPWR software.

143

The parameter bounds are based on the GPWR Critical Safety Function Trees, the

Abnormal Operating Procedures and the Emergency Operating Procedures, and experience in the OSU scenarios.

Table 5.2: Plant parameter thresholds used to calculate Environmental TCL. Parameter Accident Nominal Normal Nominal Accident Lower Lower Operations Upper Upper Bound Bound Bound Bound RCS pressure 250 psig 1400 psig 2250 psig 2500 3000 psig psig Pressurizer level 17% 25% 60% 70% 92% SG narrow range 0% 20% 57% 65% 80% level SG pressure 700 psig 990 psig 1011 psig 1170 1230 psig psig Containment 0 psig 0 psig 0 psig 3 psig 10 psig pressure

Parameter rate of change: 푃̇ is the instantaneous rate of change in parameter P, that is, the change in parameter P in the previous second.

Thresholds and time constants: The SRO thresholds and decay constants are used for all operator TCL calculations in the OSU simulations because the shorter tlower and tupper are more appropriate for the OSU scenarios, which typically last from 30 to 60 minutes:

푡 = 1 푚푖푛푢푡푒 푙표푤푒푟 푡푢푝푝푒푟 = 20 푚푖푛푢푡푒푠

휏푏푢푖푙푑푢푝 = 50푠 { 휏푑푒푐푎푦 = 300푠

횫풕풊 = 1s: TCL is calculated continuously using the GPWR timestep, 1s.

144

Figure 5.2: Environmental TCL (black), normalized time available (red), and normalized plant parameters (gray).

5.1.1.2 Passive Information Load PIL is the pressure associated with “some salient stimuli that catch one’s attention automatically” (Li, 2013). In NPPs, passive information is primarily from too many alarms ringing simultaneously that distract rather than assist the operators (PIL is included as a major PSF because of the operators’ experience in the Three Mile Island accident).

5.1.1.2.1 Background: calculating PIL in IDAC In the IDAC code, PIL is a function of the number of active alarms, i.e., the alarms the operator has yet to address. As with the other PSFs, the IDAC PIL algorithm includes a decaying component, tracking alarm activity over the previous 18s. The IDAC code uses

Δ푡푖 = 0.5푠, resulting in a sum over 36 timesteps to calculate the PIL from the previous

18s:

145

푖[푡 =0] 푛 ∑ 푖 푒훼Δ푇푖 ( 푖 ) 푖[푡푖=−18푠] 푛 푃퐼퐿 = 0 (5.4) 푖[푡 =0푠] 푛 푖 훼Δ푇푖 0 ∑ [ ] 푒 ( ) 푖 푡푖=−18푠 푛0

In Equation 5.4, n0 is a baseline number of alarms, estimating the number of alarms the operator can respond to appropriately, and nI is the number of active alarms the operator has not yet addressed (the number of new alarms plus the previously active alarms that have not yet been addressed). The timestep in the Equation 5.4, Δ푇푖, is the time between the current time and the time of i previous timesteps (Δ푇푖 will range from 0 to 18s). The constant 훼 is a weighting factor, chosen to give more weight to recent alarms. Equation

5.4 is normalized by dividing by the maximum load, i.e. n0/n0. PIL is restricted to [0,1];

When 푛푖 is greater than 푛0, PIL is defined as 1.0.

To calculate PIL for a simulator scenario, the researcher must determine:

- The PIL window, that is, the time frame for considering active alarms, set to 18 seconds

in IDAC.

- 훼, the alarm weighting factor. In IDAC, 훼 is chosen such that the weight of alarms from

five seconds previous (Δ푇 = 5푠) is ten percent of their original value:

o 훼 = 0.461; 푒0.461∗(−5푠) = 0.1

- The definition of an active alarm. Operators may choose to ignore some alarms, or they

can address them quickly, but collecting this information in the simulator is not always

straightforward.

- n0, the reference number of alarms. The IDAC SGTR simulation uses n0 = 3.

146

5.1.1.2.2 Environmental PIL The Environmental PIL algorithm uses the parameters suggested in IDAC, with one exception: 푛푖, the number of active alarms, is defined differently. In our analysis, we assume alarms remain active for one minute; we do not track when operators have addressed a specific alarm. This is based on observing operators in the simulator; if they do not address the alarm within a minute, they have likely forgotten the alarm and moved on to another task; if they do address the alarm, they require time to address alarms and even if they silence the annunciators they require additional time to understand and confirm the alarm they silenced.

As in the IDAC simulations, n0 =3. This reflects the operators’ inability to handle many alarms simultaneously; if there are more than three alarms, the operators will focus on a few important alarms and neglect the other alarms.

Figure 5.2 shows the Environmental PIL value for Team 3’s second exam session (Exam

B, or EB). The left image in Figure 5.3 shows the number of active alarms (black) and the associated PIL (red). PIL ranges from 0 to approximately 35, but the definition of PIL restricts the PIL range to [0,1]. The Right image in Figure 5.3 shows PIL with a restricted range; this is calculated by setting the maximum ni = n0. PIL capped at 1 has three levels:

0, 0.5 and 1. Capping the PIL assumes that the operator’s capacity to process new alarms is saturated when n = n0; although hundreds of alarms may be active, additional alarms do not increase the alarm load because the alarm load is already at maximum.

147

Figure 5.3: Number of active alarms (black) and Environmental PIL (red). Left: all active alarms; right: normalized active alarms, with the maximum number of alarms capped at no = 3.

5.1.1.3 Task Complexity TC refers to the complexity of the operator’s primary task, diagnosing the accident. This is subjective rather than objective, “a measure of interaction among task characteristics, personal capability and feeling” (Li, 2013).

5.1.1.3.1 Background Information: Calculating Task Complexity in IDAC th In IDAC, TC is a function of System Dynamics (Si, updated every i timestep), operator

Expertise (Exp) and operator Confusion (C). Equation 5.5 calculates TC at timestep i:

푆 푇퐶 = 0.8 ( 푖 ) + 0.2 퐶 (5.5) 푖 0.5 + 퐸푥푝 푖

Confusion: Confusion is determined by the conflict between the operator’s confidence in the current diagnosis and the number of indicators that contradict this operator’s diagnosis of the plant situation. In the OSU NPP Simulator Experiment, we did not have

148

a good mechanism to track the number of contradictory indicators to the current diagnosis. Tracking these indicators would involve identifying the current diagnosis, identifying the features of the scenario that are unexpected given the diagnosis, and identifying the operators’ awareness of these factors. Our PSF data collection included

‘Current Diagnosis’ but operators usually left this blank in the early parts of the scenario.

We did not collect data about contrary indicators and would have to use the video records to attempt to identify these factors. Instead, operator reported Confusion is used to calculate TC. Confusion is the only operator reported variable used in the Environmental

PSF calculations, and future researchers should consider developing an objective measure for Confusion so that this variable does not need to be collected from the operators.

Experience: In this analysis, Experience is represented by the student’s quiz grades, adjusted so the mean quiz grade is 0.5 to match the ‘average’ expertise in the IDAC model, which ranges from 0 to 1.

System Dynamics: System Dynamics (SD) is calculated from the number of plant parameters that change in a given timestep, along with a decay factor:

푆퐷푖 = 푆퐷푖−1 exp(−훼1훥푡) + 훼2푁푖 (5.6)

To calculate System Dynamics for a simulator scenario, the researcher must determine:

- Δ푡 ~ The time between timestep i & timestep i-1 (in the IDAC code, Δ푡 = 0.5푠)

o Δ푡 represents the frequency with which SDi is updated

- 푁푖~ The number of dynamic changes observed in timestep i. This includes:

o The set of tracked parameters

149

o The definition of a change in each of the tracked parameters

▪ deltaT.change: Time interval in which the change occurs

▪ deltaP.change: Required change in the parameter during deltaT.change

- SDi coefficients, 훼1 and 훼2.

o In the IDAC code, 훼1 is determined by setting SDi to decay to 10% of its original

value in three minutes, i.e., 0.1 = exp(-훼1 ∗ 180푠), which yields 훼1 = 0.013

o 훼2 defines the range of SDi, calculated after selecting 훼1 by setting the maximum

desired SDi , SDm, and the maximum Ni, Nm:, and the time step between SDi

evaluations, Δ푡, and solving for 훼2:

푆퐷푚 = 푆퐷푚 exp(−훼1훥푡) + 훼2푁푚 (5.7)

In IDAC, with 훼1 = 0.013 and Nm = SDm = 1, 훼2 = 0.0129.

5.1.1.3.2 Environmental TC The TC parameters are determined after observing operators in the simulator. When the original IDAC parameters appear to be reasonable, they are retained in the Environmental

TC equation.

횫풕풊 = 1s: In this analysis, SDi is updated every second. This provides a continuous measure of the system dynamics, rather than a disjointed account of the scenario events.

On the other hand, C and TC are only updated when the operators provide a new PSF evaluation, roughly every two to three minutes. An alternative approach would be to update SDi when C and TC are updated, with a variable Δ푡 between SDi; however, this approach is essentially deleting meaningful information from the model—the operators

150

are experiencing SDi continuously, not discretely, and removing this record of plant events does not reflect their experience.

Ni: In this research, the tracked parameters are based on the parameters the operators are expected to monitor during the accident. These are listed in Table 5.3 for each EOP used by the operators in the OSU NPP Experiment. Because the monitored parameters are so similar in all of the EOPs, Ni is separated into two phases: Pre-Trip and Post-Trip. The

Pre-Trip parameters that contribute to Ni are Volume Control Tank (VCT) level, Reactor

Coolant System (RCS) pressure, and Pressurizer level. Unfortunately, the variable names for radiation were not found in the GPWR software/documentation, so radiation data is not available in the analysis. Even if this data were available, it would be useful only early in the scenario because high radiation levels are masked in the GPWR after the reactor trips. The Post-Trip parameters that contribute to Ni are RCS pressure, SG pressure, SG level, and Containment (CNMT) pressure. Condensate Storage Tank (CST) level, Safety Injection (SI) flow and Refueling Water Storage Tank (RWST) level were not collected in the simulator sessions. In practice, the absence of these variables makes little difference; they do not change significantly in the OSU scenarios and the operators ignore them because they are not essential to the scenarios.

151

Table 5.3: Monitored parameters associated with the primary procedures used in the experiment. Parameters that are used to calculate OSU System Dynamics are shaded gray. Phase Pre-Trip E-0 E-1 E-2 E-3 VCT level (%) X RCS pressure (1000- X X X X X 2500) Pressurizer level (%) X Radiation* X X X X SG pressure X X X (600-1200) SG level (%) X X X X CST level** X X X X SI flow** X X RWST level** X X X CNMT Press (14-20) X X X X

With these parameters contributing to Ni, the maximum Ni at any point in the scenario is four, and the maximum Ni before the reactor trips is three. Ni is normalized by dividing by four.15 deltaT.change: based on experience with the student operators in the simulator, deltaT.change is set to two minutes. Student operators watch the parameters change over time and do not react quickly, even if parameters are changing dramatically. deltaP.change: the amount a parameter must change in deltaT.change is set to 5% of the parameter’s range. Again, this is an estimate based on observations of student operators’ sensitivity to dynamic changes in the simulator.

- Level indicators (percent): change of 5% in deltaT.change

15 We divide by the maximum number of tracked parameters for any condition (four), not the maximum number of tracked parameters in the current condition (three of four), because System Dynamics is an absolute measure not a relative measure. System dynamics must increase when a forth parameter starts changing, even if this change occurs because the operators switch procedures and are now aware of more changing parameters.

152

o SG level: deviation change of 5% in deltaT.change

- Pressure indicators:

o Containment pressure: 5% ~ 0.3 psig

o PRZ pressure: 5% ~ 75psig

o SG pressure: 5% ~ 30 psig

SDi coefficients: The three minute decay constant used in IDAC appears to be reasonable for student operators, so 훼1 and 훼2 from the IDAC code are used to calculate SDi.

Figure 5.4: Normalized number of changing parameters Ni (black) and system dynamics (red) for two simulator Exam sessions: Exam A (EA) for Crew #3 and Exam B (EB) for Crew #6.

5.1.1.4 Cognitive Task Load CTL is the amount of an operator’s cognitive resources that is required to accomplish the current task. In IDAC and in the OSU NPP experiment, the current task is determined by tasks in the current procedure step. In the future, a more sophisticated mechanism for

153

tracking an operator’s cognitive activities could be developed to improve the CTL measure.

5.1.1.4.1 Background: Calculating CTL IDAC’s CTL algorithm depends on the operator’s current task, which is determined by the procedures, specifically the task specified by the operating crew’s current step in the

AOPs or EOPs.

In IDAC, CTL is comprised of a decay term from the previous timestep and a new term,

CTL0, based on the load from the tasks in the current timestep.

푖 퐶푇퐿푖 = 퐶푇퐿푖−1 exp(−αΔ푡푖) + 퐶푇퐿0 (5.8)

The IDAC load is based on the activities required of the operator, summarized in Table

5.4.

To calculate CTL for a simulator scenario, the researcher must determine:

- 훼, the CTL decay constant. In IDAC, 훼 is determined by setting the CTL to decay to ten

percent of its original value in three minutes.

o 훼 = 0.013, 푒−0.013∗180푠 = 0.10

- The types of activities that contribute to CTL, and the cognitive load associated with each

task.

- Δ푡푖: the timestep for evaluating CTL

154

Table 5.4: The cognitive tasks and associated cognitive load (Table 6-5 in (Li, 2013)). Type Activity Full Load Load Addition 횫풊 1 Attend to one Read 10 indicators in 10 1 − 푒−0.0013∗1 control panel seconds. Full rate: 1/sec 1 indicator = 0.0129 2 Interpret one Interpret 10 indicators in 2 1 − 푒−0.0013∗1 indicator reading seconds. Full rate: 5/sec 5 = 0.0026 3 Generate a new Generate 3 statements in 1 1 − 푒−0.0013∗1 situational second. Full rate: 5/sec 3 statement = 0.0043 4 Match a statement Match 5 items in 1 second. 1 − 푒−0.0013∗1 in memory with an Full rate: 0.55/sec 5 investigation item. = 0.0026 5 Retrieve a Retrieve 5 knowledge links 1 − 푒−0.0013∗1 knowledge link in 10 seconds. Full rate: 0.5 0.5/sec = 0.0258 + Adjust for Expertise 6 Determine an 5 activities in 1 second. Full 1 − 푒−0.0013∗1 explanation rate: 5/sec 5 = 0.0026

5.1.1.4.2 Environmental CTL With no better basis available for assessing the cognitive load associated with each step in the procedure, the Environmental CTL is calculated using the parameters specified for the IDAC model (Table 5.4).

Cognitive activities: Appendix D lists the cognitive activities associated with each procedure step in the OSU NPP Simulator Experiments.

횫풕풊: Unlike PIL and TCL, CTL cannot be evaluated continuously (i.e. updated every second) because the record of operator activities is discrete. Instead, CTL is updated every time the researcher noted a new procedure step while live-coding the simulator

155

session, with Δ푡푖 corresponding to the time between the start of the current procedure and the start of the previous procedure step.

5.1.1.4.3 Alternative Environmental CTL After reviewing the scenarios, the researchers identified six procedure step characteristics that are expected to contribute to the cognitive load for the operator:

• Information not available on the large overview displays: Both the SRO and the RO can

quickly read information on the large overview displays without searching for the

information on the main control board, which is more difficult to navigate and read.

• Requires operator judgment: Some procedure steps are straightforward (e.g. is the valve

open?), but other steps require operators to make a judgment call (e.g. is the plant trending

towards an adverse condition?). These steps are more taxing on the operator.

• Procedure step requires operators to follow ‘Response not obtained’ directions: The

procedures formatting is sometimes tricky, particularly for novice operators. The procedure

has two columns: the primary column and the ‘response not obtained column,’ with

instructions for when the left column conditions are not met. Most of the time, the expected

(normal) response sends operators to the next procedure steps, but in some steps, the expected

plant conditions violate the left column and send operators to the ‘response not obtained’

instructions. This can confuse the operators and cause procedure mis-steps.

• Requires outside knowledge: In many cases, the operator must bring knowledge and

experience to the procedure step in order to respond correctly in a timely manner.

• Unfamiliar step: If the step is new to the operator, this increases the cognitive load.

Frequently, operators encounter unfamiliar steps if they make a mistake navigating the

procedure.

156

The analysis in Section 5.5 shows that many student operators do not appear to be sensitive to the Environmental CTL—that is, the Reported CTL does not depend on the

Environmental CTL. The factors identified above may be a first step towards developing a student operator-centric Environmental CTL measure that captures factors that influence student reported CTL.

5.1.1.5 Stress Stress in IDAC is the mean of the four parent PSFs:

1 푆 = (푇퐶퐿 + 푃퐼퐿 + 푇퐶 + 퐶푇퐿) (5.9) 4

This is easily calculated for the stand-alone PSFs. The model results (Section 5.5) suggest alternative weights for the four factors, rather than assigning them each an equal part in the reported Stress.

5.1.2 Reported PSFs Chapter 2 describes the Reported PSF data collection process. In addition to the five

Major PSFs (TCL, PIL, TCL, CTL and Stress), the operators report Confusion, which is used as an input to the Environmental TC calculation.

The PSFs are reported for each Observation; that is, each set of PSFs collected from an operator during a simulator session. PSFs are collected at key scenario times and at the researcher’s discretion, as described in Chapter 2. Invariant PSFs are removed from the dataset, as are the PSFs reported by the operator who participated in only one simulator session. The result is a set of 46 observations, each consisting of ten to twelve PSF evaluations. The 46 observations are collected from fourteen unique operators over four

157

sessions: two practice sessions (PA and PB) and two exam sessions (EA and EB). PSFs are reported on a scale of 0-10 and rescaled to 0-1 to match the Environmental PSFs.

Table 5.5 reports the data from the first two observations in the dataset. The subscript i indicates the index of the observation, and the subscript j is the jth PSF evaluation in observation i. The Reported PSFs y and the Environmental PSFs x are identified by the

PSF subscript p, numbered 1-5:

1. TCL

2. PIL

3. TC

4. CTL

5. Stress

Each observation is characterized by five observation factors: Op, Tm, S, C, R, E, and

TA. Opi is the ID number of operator who reported the data in the observation, and Tmi is

Opi’s Team ID number. S, C, R, and E are binary descriptors: Si = 1 if observation i is an

SGTR and 0 if a LOCA; Ci = 0 if the accident is a single fault accident and 1 if the accident has multiple faults (C ~ Complex); Ri = 0 if Opi is in the RO role and 1 if the

SRO; and Ei = 1 in Exam observations and 0 in practice observations. TAi is the Team

Atmosphere reported by Opi. TA is evaluated once by each operator; the same TA is used for all four of the operator’s observations. The Reported data include the five reported

PSFs and operator reported Confusion, Cnfij.

158

Table 5.5: Data used to populate the PSF model. Data for observations 1 and 2 are shown. Observation Factors Reported PSFs ypij Environmental PSFs xpij

i j Opi Tmi Si Ci Ri Ei TAi Cnfij y1ij y2ij y3ij y4ij y15ij x1ij x2ij x3ij x4ij x5ij

1 1 1 6 1 1 1 1 0.8 0.0 0.0 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.003 0.001

1 2 1 6 1 1 1 1 0.8 0 0.1 0.1 0.1 0.2 0.1 0.084 0.211 0.0 0.024 0.073

1 3 1 6 1 1 1 1 0.8 0 0.1 0.1 0.1 0.2 0.1 0.59 0.996 0.048 0.164 0.354

1 4 1 6 1 1 1 1 0.8 0.1 0.2 0.1 0.2 0.3 0.2 0.8 0.533 0.454 0.431 0.407

1 5 1 6 1 1 1 1 0.8 0.2 0.1 0.2 0.4 0.3 0.1 0.651 1.0 0.505 0.275 0.405

1 6 1 6 1 1 1 1 0.8 0.5 0.2 0.1 0.6 0.3 0.1 0.982 0.915 0.854 0.237 0.492

1 7 1 6 1 1 1 1 0.8 0.5 0.2 0.1 0.7 0.3 0.2 0.745 0.982 0.914 0.349 0.403

1 8 1 6 1 1 1 1 0.8 0.4 0.2 0.1 0.6 0.2 0.2 0.582 0.497 0.694 0.242 0.244

1 9 1 6 1 1 1 1 0.8 0.5 0.1 0.1 0.6 0.4 0.2 0.617 0.516 0.689 0.202 0.283

1 10 1 6 1 1 1 1 0.8 0.6 0.2 0.1 0.7 0.4 0.3 0.769 0.652 0.826 0.128 0.305

1 11 1 6 1 1 1 1 0.8 0.6 0.1 0.1 0.7 0.4 0.2 0.737 0.007 0.823 0.198 0.278

2 1 13 3 1 1 1 1 0.6 0.1 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.08 0.011 0.005

2 2 13 3 1 1 1 1 0.6 0.3 0.3 0.7 0.2 0.4 0.4 0.298 0.614 0.241 0.034 0.19

2 3 13 3 1 1 1 1 0.6 0.4 0.5 0.8 0.6 0.6 0.5 0.73 0.334 0.69 0.388 0.301

2 4 13 3 1 1 1 1 0.6 0.5 0.6 0.9 0.6 0.6 0.6 0.847 0.361 0.88 0.379 0.432

2 5 13 3 1 1 1 1 0.6 0.4 0.5 0.7 0.5 0.5 0.5 0.698 1.0 0.709 0.223 0.418

2 6 13 3 1 1 1 1 0.6 0.5 0.5 0.6 0.5 0.6 0.5 0.86 1.0 0.72 0.156 0.522

2 7 13 3 1 1 1 1 0.6 0.6 0.4 0.4 0.5 0.6 0.5 0.572 0.669 0.813 0.175 0.23

2 8 13 3 1 1 1 1 0.6 0.9 0.6 0.4 0.7 0.9 0.7 0.236 0.998 1.0 0.213 0.27

2 9 13 3 1 1 1 1 0.6 0.7 0.5 0.3 0.7 0.8 0.6 0.012 0.0 0.807 0.189 0.106

2 10 13 3 1 1 1 1 0.6 0.7 0.4 0.3 0.7 0.9 0.7 0.001 0.275 0.607 0.149 0.086

2 11 13 3 1 1 1 1 0.6 0.7 0.3 0.2 0.7 1.0 0.7 0.118 0.72 0.561 0.077 0.147

5.2 INTRODUCTION TO CONTINUOUS BAYESIAN NETWORKS Before explaining the model, we review the fundamentals of Bayesian analysis.

Bayesian analysis is based on Bayes theorem (Gelman, et al., Bayesian Inference, 2014), which begins with a prior probability, p(B), then adds new information (A) to that prior probability to calculate a posterior probability, p(B|A). The posterior reflects a new

159

understanding of the system (B) given the new information added (A). Bayes theorem predicts p(B|A) from p(A), p(B), and p(A|B):

푝(퐴|퐵)푝(퐵) 푝(퐵|퐴) = (5.10) 푝(퐴)

Bayes theorem is based on the axiom of conditional probability, which states that the conditional probability of A given B is equal to the joint probability of A and B divided by the probability of B. This relationship can be re-written as in 5.11, which leads directly to

5.10:

푝(퐴퐵) = 푝(퐴|퐵)푝(퐵) = 푝(퐵|퐴 )푝(퐴) (5.11)

The objective of this research is to determine the relationship between the reported PSFs

(Y) and the conditions that led to the reported PSFs (X). More specifically, we specify a proposed model structure that links X and Y, then use Bayes theorem to estimate the parameters in the model.

Translating this objective to the Bayesian framework, in Equation 5.10 A represents the observed data x and Y, and B represents the parameters in the model linking Y and X.

Prior to this research, little is known about B, so p(B) a weakly informative distribution, with all values approximately equally likely. The denominator, p(A), is unknown but constant (does not vary with B), which means that p(B|A) is proportional to the numerator, p(A|B)p(B).

Everything that follows below expands on this basic idea.

160

5.2.1.1 Aside: Bayesians and Frequentists Bayesian Networks (BNs) are sometimes called Bayesian Belief Networks, reflecting the

Bayesianists’ interpretation of this approach. In their framework, prior information represents a prior belief about the system or data, and the posterior updates our beliefs about the system based on new information. With this interpretation, analysts freely incorporate common sense or general knowledge into their models. Frequentists, on the other hand, restrict their models to information that can be observed (using the observed frequency of a given condition as the prior probability).

This research takes a pragmatic approach, relying as little as possible on prior beliefs and expert judgment. Unknown model parameters are assigned weakly informative prior distributions, reflecting our lack of knowledge about these variables.

5.2.1.2 Continuous Bayesian Networks All of the Bayesian Networks discussed above in Chapter 1 are discrete networks—PSFs are in discrete states (present/absent), and conditional probability tables are populated for each combination of node states within the BN.

In contrast, this research uses continuous BNs. The PSFs are reported on a scale of zero to ten; these data are normalized and treated as continuous variables ranging from zero to one. This approximation simplifies the analysis in several important ways:

First, the data do not have to be binned into low/medium/high categories to accommodate the requirements that every node state in a discrete BN must be quantified.

161

Second, building continuous BNs generates probability distributions for all of the unknown effects, rather than conditional probability tables for a finite set of artificially distinct states.

5.2.1.3 Example of a Continuous Linear Hierarchical Bayesian Model The model proposed in this research is a hierarchical normal linear model with random effects. Before introducing the full model, this section explains the components of the proposed model with a simplified example.

In this example, assume three operators reported one reported PSF, y, modeled by one

Environmental PSF, x. Table 5.6 is a sample data table that mimics the full data set used to build the models in Section 5.4. Note that the operator number does not necessarily correspond to the Observation number; each operator completes multiple Observations.

Table 5.6: Sample data for the simplified illustration BN. Observation i Evaluation j 풚풊풋 풙풊풋 Operator 1 1 0.0 0.0 Op.1 1 2 0.1 0.1 Op.1 1 3 0.5 0.4 Op.1 2 1 0.1 0.0 Op.3 2 2 0.1 0.1 Op.3 2 3 0.2 0.3 Op.3 3 1 0.0 0.0 Op.2 3 2 0.4 0.2 Op.2

A simple starting model is a linear regression that estimates coefficients for each predictor:

푦푖푗 = 푏푥푖푗 + 푎1푂푝. 1푖 + 푎2푂푝. 2푖 + 푎3푂푝. 3푖 + 휖푖푗, (5.12)

162

where Op.1 = 1 if Op.1 reported 푦푖푗 and Op.1 = 0 otherwise, etc. The coefficient 푏 is the effect of the Environmental PSF 푥푖푗; the variables 푎1, 푎2, and 푎3 are operator-specific intercepts, and 휖푖푗 is the random error factor. This can be re-written as

휇푖푗 = 푎표푝.푖 + 푏푥푖푗, (5.13)

where 휇푖푗 is the expected value of 푦푖푗, and 푎표푝.푖 is the coefficient for the operator in

Observation i. In linear regression, the errors 휖푖푗 are assumed to be independent and normally distributed around 휇푖푗. Instead of using traditional linear regression, a Bayesian approach is used to specify probability density functions for all the unknown parameters in the model. Replacing the measurement error 휖 with and unknown variance 휎2, we can rewrite the above equations as the conditional probability density of 푦푖푗:

2 2 푝(푦푖푗|푎표푝.푖, 푏, 푥푖푗, 휎 ) ~ 푁(휇푖푗, 휎 ) (5.14)

In this example and in the full model proposed in Section 5.3, we assume that the

2 conditional probability of yij is normally distributed with some unknown variance, 휎 . We could use a different distribution if we believed another distribution to be more appropriate; however, the Gaussian distribution is the distribution of maximum entropy for the class of distributions defined by a mean and variance, appropriate when attempting to avoid special cases in a model (Cover & Thomas, 1991).

Figure 5.5 illustrates the model components. Squares represent observed data and circles represent parameters that are estimated in the model.

163

Figure 5.5: Basic Model with all parameters. Observed data are in boxes; estimated parameters are in circles.

Next, we specify weakly informative prior distributions for the unknown model parameters: normal distributions for 푎표푝 and 푏; uniform for 휎:

푝(푎표푝) ~ 푁(0.0,2.5) { 푝(푏) ~ 푁(1.0,2.5) 푝(휎) ~ 푈(0,100)

The parameters in the prior distributions are based on idea that if the Environmental PSF perfectly explained the Reported PSF, the expected value of 푎표푝 and 푏 would be 0 and 1 respectively. The prior distributions are given large variance parameters because our belief in these initial values is low. Choice of prior distribution and justification for these parameters is discussed in greater detail in Section 5.2.1.6.

The normal linear model makes several assumptions, all of which can be tested by the model results:

• Linearity of the expected value, 퐸(푦푖푗|푥푖푗, 휃), with respect to the explanatory variables

푥푖푗

164

• Normality of the error terms, 푦푖푗 − 휇푖푗

2 2 • Independent observations with equal variance (assumes 휎푖푗 = 휎 for all observations)

o Often unequal variance indicates need for additional explanatory variables

The Bayesian approach described above will yield the same results as a linear regression.

The two methods diverge when Random Effects are added to the model.

5.2.1.4 Random Effects The model illustrated in Figure 5.5 is a fixed effects model; each parameter in the model is specified directly. Adding random effects allows the model to account for known sources of variation. For example, it is reasonable to assume that the operator coefficients

푎표푝 are related; factors that impact one operator’s reporting are likely to impact another operator as well. Instead of estimating each 푎표푝 separately using non-informative prior distributions, a random effects model captures both the common factors that influence all the operators and the individual variations between operators.

This is done by assuming the operator coefficients are conditionally independent and normally distributed around some unknown mean operator effect, 휇푎, with unknown variance 휎푎:

2 2 푝(푎표푝|휇푎, 휎푎 ) ~ 푁(휇푎, 휎푎 ) (5.15)

In this model, 휇푎 is the expected operator effect for a random operator, and 푎표푝 is the coefficient for a specific operator. The assumption that 푎표푝 are conditionally independent implies that all of the shared effects are captured in 휇푎, and all of an operator’s unique

165

tendencies are captured in the discrepancy between 푎표푝 and 휇푎; 휎푎 is a measure of the variation between operators.

As before, we assign weakly informative priors for 휇푎 and 휎푎:

푝(휇 ) ~ 푁(1.0,2.5) { 푝 푝(휎) ~ 푈(0,100)

The set of unknown parameters now includes b, aop for each operator, 휇푎, 휎, and 휎푎. For simplicity, let 휃 represent the set of all unknown model parameters. Figure 5.6 shows the full hierarchical model with random effects.

Figure 5.6:Hierarchical Model

With all of these relationships specified, and with the observed data x and y, it is now possible to estimate best fit values for 휃.

From Bayes theorem:

166

푝(푦푖푗|푥푖푗, 휃푖푗)푝(휃푖푗) 푝(휃푖푗|푦푖푗, 푥푖푗) = (5.16) 푝(푦푖푗|푥푖푗)

The denominator, 푝(푦푖푗|푥푖푗) is unknown but in a sense fixed— 푝(푦푖푗|푥푖푗) does not change for different values of 휃, meaning

푝(휃푖푗|푦푖푗, 푥푖푗) ∝ 푝(푦푖푗|푥푖푗, 휃푖푗)푝(휃푖푗) (5.17)

This is referred to as the unnormalized posterior density. The posterior distribution for 휃푖푗 is proportional to the unnormalized posterior density, allowing us to sample from the posterior distribution without requiring a numerical value for 푝(푦푖푗|푥푖푗). Equation 5.17 is expanded to

2 2 2 2 푝(푦푖푗|푥푖푗, 휃푖푗)푝(휃푖푗) = 푝(푦푖푗|푥푖푗, 휇푖푗, 휎 )푝(휎 )푝(푎표푝|휇푎, 휎푎 )푝(휇푎)푝(휎푎 )푝(푏) (5.18)

Every distribution on the right hand side of Equation 5.18 is known: 푦푝푖푗 and 푎표푝 follow the conditionally normal distributions in Equations 5.14 and 5.15; the other parameters are assigned the weakly informative priors discussed above.

Finally, if y is the set of all 푦푖푗, the probability of observing y given the set of model parameters 휃 is obtained by multiplying the probabilities of each observation 푦푖푗:

푝(푦|푥, 휃) = ∏ ∏ 푝(푦푖푗|푥푖푗, 휃푖푗) (5.19) 푖 푗

And

푝(휃|푦, 푥) ∝ ∏ ∏ 푝(푦푖푗|푥푖푗, 휃푖푗) 푝(휃푖푗) (5.20) 푖 푗

167

The parameters in 휃 can be estimated by maximizing 푝(휃|푦, 푥). In practice, most hierarchical models are too complex to solve analytically. Instead, the Gibbs Sampling algorithm is used to estimate values for 휃.

5.2.1.5 Gibbs Sampler The Gibbs Sampler is a Markov Chain Monte Carlo (MCMC) algorithm that iteratively samples possible values for 휃 until the model ceases to change with each successive sampling. Specifically, the Gibbs Sampler assigns values to all but one parameter, estimates the best value for that parameter given all the other parameter values, then uses the updated parameter value to select the best value for the next parameter. This process continues until the model has converged.

For example, if there were three operators in the experiment (there are actually 14 operators in the OSU dataset), 휃 = {푏, 휎, 휇푎, 휎푎, 푎1, 푎2, 푎3}. The Gibbs sampling process is then:

0 0 0 0 0 0 0 1. Set initial values, 휃 = {푏 , 휎 , 휇푎, 푎1, 푎2, 푎3} 2. Choose 휃1: 1 0 0 0 0 0 0 a. Sample 푏 from 푝(푏|푦, 푥, 휎 , 휇푎, 휎푎 , 푎1, 푎2, 푎3) (Note: in some cases the best value for 푏1 = 푏0) 1 1 0 0 0 0 0 0 b. Sample 휎 from 푝(휎|푏 , 푦, 푥, 휎 , 휇푎, 휎푎 , 푎1, 푎2, 푎3) 1 1 1 0 0 0 0 0 c. Sample 휇푎 from 푝(휇푎|푏 , 휎 , 푦, 푥, 휇푎, 휎푎 , 푎1, 푎2, 푎3) 1 1 1 1 0 0 0 0 d. Sample 휎푎 from 푝(휎푎|푏 , 휎 , 휇푎, 푦, 푥, 휎푎 , 푎1, 푎2, 푎3) 1 1 1 1 1 0 0 e. Sample 푎1 from 푝(푎1|푏 , 휎 , 휇푎, 휎푎 , 푦, 푥, 푎2, 푎3) 1 1 1 1 1 1 0 f. Sample 푎2 from 푝(푎2|푏 , 휎 , 휇푎, 휎푎 , 푎1, 푦, 푥, 푎3) 1 1 1 1 1 1 1 g. Sample 푎3 from 푝(푎3|푏 , 휎 , 휇푎, 휎푎 , 푎1, 푎2, 푦, 푥) 3. Sample 휃2: 2 1 1 1 1 1 a. Sample 푏 from 푝(푏|푦, 푥, 휎 , 휇푎, 푎1, 푎2, 푎3) b. … (and so on) 4. Continue until 푝(휃푛|푥, 푦) ≅ 푝(휃푛+1|푥, 푦)

168

Once the model has stabilized, numerical posterior distributions for each parameter can be obtained by continuing to sample new values of 휃푖: sample

푝(푏|푦, 푥, 휎, 휇푎, 휎푎, 푎1, 푎2, 푎3), sample 푝(휎|푦, 푥, 푏, 휇푎, 휎푎, 푎1, 푎2, 푎3), etc.

In hierarchical models, Gibbs Sampling is often done in batches. Given the observed data and holding the other parameters constant, a ‘batch’ of coefficients is estimated simultaneously. For example, all of the a coefficients can be estimated at once if all the other parameters in the model are known (Gelman, et al., 2014).

The Gibbs Sampler is implemented using the R package ‘rjags’ (Plummer, Stukalov, &

Denwood, 2016).

5.2.1.6 Selecting Prior Distributions for Parameters in Hierarchical Models Following the approach outlined in (Gelman, et al., 2014), we select weakly informative prior distributions for the model coefficients. Gelman writes,

Rather than trying to model complete ignorance, we prefer in most problems to use weakly informative prior distributions that include a small amount of real-world information, enough to ensure that the posterior distribution makes sense... In the general problem of estimating a normal mean, a N(0,A2) prior is weakly informative, with A set to some large value that depends on the context of the problem. (p. 55-56) The priors are selected by first proposing an informative prior, then broadening the informative prior to reduce the certainty associated with our prior knowledge. The informative prior is chosen based on the Environmental PSF model and our expectations of the model coefficients.

For example, if the Environmental PSFs perfectly captured the factors that influence reported PSFs, we would expect 휇푎 = 1; therefore, a natural prior distribution for 휇푎 is a

169

normal distribution centered around 1.0 with a small variance—perhaps a standard

2 deviation of 0.25. Choosing the informative prior distribution 푝(휇푎) ~ 푁(1.0,0.25 ) would put the prior 95% CI for 휇푎 between 0.5 and 1.5. The informative prior is illustrated by the thick red line in Figure 5.7. Our confidence in our prior knowledge is low; we do not expect 휇푎 = 1. To decrease the information in the prior distribution, we increase the standard deviation by an order of magnitude, resulting in the weakly informative prior with mean 1.0 and variance 6.25:

2 푝(휇푎) ~ 푁 (1.0, 2.5 ) (5.21)

The weakly informative prior is the thin black line in Figure 5.7. With the weakly informative prior, the expected value of 휇푎 remains 1.0, but the probability of randomly sampling a value from the prior distribution that falls between 0.5 and 1.5 is only 0.33.

Figure 5.7: An informative prior (red) and the weakly informative prior created by broadening the informative prior (black). A similar thought process is used to select weakly informative priors for all the coefficients in the ABC model described below:

1. Identify a reasonable value as the mean.

170

2. Select an informative parameter standard deviation based on the expected range of

the parameter.

3. Multiply the informative standard deviation by ten to reduce the information in the

prior distribution.

The chosen priors are listed in Table 5.11.

5.2.1.6.1 Prior Distributions for Variance Parameters A variety of methods have been proposed for choosing noninformative or weakly informative prior distributions for variance parameters in hierarchical models.

Frequently, the inverse-gamma distribution with equal shape and scale parameters (Inv-

Gamma(휖, 휖)) is used as the prior for Gaussian Hierarchical models. This distribution is selected because the inverse-gamma is a conjugate prior to the normal distribution, and because using equal shape and scale parameters yields a fairly flat distribution. However,

Gelman does not recommend this approach, because if the variance parameter is small

(close to zero), estimates will be sensitive to the parameters used to characterize the inverse-gamma distribution. Instead, Gelman recommends using a uniform distribution with a wide range, such as U(0,100), with the caveat that this approach tends to overestimate variance parameters if the sample size is small, i.e., less than five. (Gelman,

2006). In this research, variance parameters are assigned a uniform prior.

5.3 BUILDING THE SUBJECTIVE PSF MODEL The approach to building the Subjective PSF model is to start with a basic, naïve model, then to iteratively add new information (Figure 5.8).

171

Figure 5.8: The ABC model building and selection process. The basic model for PSF p states that the expected value of the operator reported PSF can be explained by the Environmental PSF:

퐸[푦푝푖푗] = 훽0 + 훽1푥푝푖푗 (5.22)

th In Equation 5.22, 푦푝푖푗 is j the reported value in observation i of PSF p, 푥푝푖푗 is the

Environmental PSF p, and 훽0 and 훽1 are unknown coefficients.

The new information added to the model is based on the results in Chapters 3 and 4.

Chapter 3 shows that the variation between operator reported PSFs is significant. This suggests that operator sensitivity to the Environmental PSF varies from operator to operator, and that contextual effects not captured by the Environmental PSF impact operator PSF reporting. Chapter 4 indicates that simulator bias will impact PSF reporting.

The proposed model structure is referred to as the ABC model, where A represents the operator’s response to the Environmental PSF, B represents simulator Bias, and C represents additional Context effects.16

16 A, not E, is used to represent Environmental Bias because E and e are so commonly associated with error, especially in linear regression and in HRA.

172

The model is evaluated each time new information is added to determine whether the new information is useful. Criteria for model evaluation are described in the next section.

After the final model is selected, a cross validation exercise tests the model results, and sensitivity tests check the model sensitivity to prior distributions.

5.3.1 Evaluating Model Results A new metric, K, is proposed to evaluate model results. K is based on Cohen’s Kappa, an interrater reliability measure that accounts for the probability of random agreement between raters (McHugh, 2012). In this case, the first rater is the OSU student and the second rater is the Posterior PSF model: the two agree if the model’s 95 percent credibility interval (CI) includes the reported PSF value.

Cohen’s Kappa is determined by p.o, the observed rate of agreement, and p.e, the rate of agreement obtained by chance:

푝. 표 − 푝. 푒 Κ = (5.23) 1 − 푝. 푒

P.o is the proportion of reported PSF data that fall inside the model CI. If J is the total number of PSF evaluations in the dataset,

1 푝. 표 = ∗ (# 표푓 푦 푖푛푠푖푑푒 푡ℎ푒 95% 퐶퐼 푓표푟 휇 ) (5.24) 퐽 푝푖푗 푝푖푗

P.e is the proportion of the PSF range covered by the CI. Since the PSF range is [0,1], this is simply the width of the CI:

퐽 1 푝. 푒 = ∑(휇 [97.5%] − 휇 [2.5%]) (5.25) 퐽 푝푖푗 푝푖푗 푗=1

173

If p.o = 0.9 and the model CI is narrow (say, p.e = 0.2), then K = 0.7/0.8 = 0.875, indicating that the model is almost ninety percent better than chance at estimating the reported PSF. However, if the model CI is wide—p.e = 0.8—the results are less impressive, as seen in the decrease in K: K = 0.1/0.2 = 0.5. If the model performs worse than chance, this results in a negative K.

A special case is when p.o = 1.0. If p.o = 1, K = 1 regardless of p.e. Wide p.e increases the likelihood of p.o falling inside the CI, increasing the likelihood of K = 1 results. For this reason, we report both K and p.e; if K = 1, p.e is often wide (greater than 0.6). One solution to this concern is to narrow the CI from 95% to perhaps 75% or 50%, decreasing the likelihood of all the observations falling within the CI.

K is selected as the fit measure for this work because it is intuitively meaningful, and it can be calculated for any subsection of the data, such as an observation, a PSF, a set of scenarios, or a specific operator. Alternative Bayesian fit measures such as the Deviance

Information Criterion (DIC) (Spiegelhalter, Best, Carlin, & van der Linde, 2002) are useful for comparing models but not for communicating how well a single model performs. DIC is always relative; there is no standard criteria for a “good” DIC value.17

K can be adjusted to suit the researcher’s aims. We use the 95% CI to define p.o and p.e, but this range could be narrowed if desired. The result is a metric that is informs the user of how well their model outperforms chance. The metric can be tailored to specific model acceptance criteria. In this research, our aim is to produce K.fit > 0.7 for a 95% CI.

17 The DIC is the model deviance penalized by a model complexity factor. Lower DIC generally corresponds to a ‘better’ model, but in some instances a more complex model is preferred.

174

Section 5.4.2 reports K.fit and K.predict, corresponding to how well the model matches the data used to fit the model and how well the model predicts new data.

5.3.1.1 Validating the Model: Build and Test Data, Cross Validation Before beginning the model analysis, the data are split into a Build data set and a Test data set. Two Test observations were randomly selected from the set of seven operators who reported PSFs in all four sessions. This ensured that there would be enough data from the two test operators in the Build data to estimate operator sensitivity factors for the data in the Test data set.

After the final model was selected, a cross validation exercise was conducted to test the stability of the final model. In the cross validation exercise, five new datasets are constructed. These cross validation datasets are subsets of the full data set. In each cross validation data set, one fifth of the observations are removed. The final model structure is then evaluated five times, once with each cross validation dataset. The resulting parameter estimates are used to examine:

• Consistency of the estimated model parameters between fits

• K.predict in the removed data

This cross validation exercise demonstrates that the model is consistent over five variations of the dataset and does not vary significantly if different data are used to build the model.

175

5.3.2 Data Transformation The PSF data are restricted to [0,1], but the model assumes ypij is normally distributed.

The logit function is used to change the range of the data from [0,1] to (-Infinity,

Infinity):

푥0 푙표푔푖푡(푥0) = 푙푛 ( ) (5.26) 1 − 푥0

Logit is not defined for zero or one. Therefore, the OSU data are modified by adding 0.1 and dividing by 1.2 before taking the logit transform:

푥 + 0.1 푥 = (5.27) 0 1.2

This cushion is based on the assumption that operators are not experiencing extreme PSF values in the simulator environment. We tested adding 0.01 or 0.001 instead of 0.1; this skewed the data and resulted in odd posterior distributions, so we settled on a 0.1 cushion. Figure 5.9 illustrates the data transformation: the gray line is the original data, the black line is the renormalized data, and the red line is the transformed data used in the

Bayesian models.

176

Figure 5.9: Data transformation. Original data (gray), cushion added to change the range from [0,1] to (0,1) (black), and logit transform (red).

This transformation allows the normal distribution to be used in the model. An alternative approach would be to use a beta distribution or other [0,1] distribution.

5.4 MODEL DESCRIPTION: THE ABC MODEL The proposed PSF BN builds on the HERA PSF Bayesian Network introduced by

Sundaramurthi and Smidts (2013) that is discussed in Chapter 1. Our approach improves the HERA PSF BN in the following ways:

• Operator-specific effects. The OSU PSF BN accounts for variability between operators

without having to specify details about the operator.

177

• Dynamic PSFs. In the HERA data, each accident is recorded as one data point; in the

OSU data, at least ten separate observations are recorded as the accident develops,

resulting in a model that tracks the PSF throughout the scenario.

• PSF strength. In the HERA data, PSFs are binary—either ‘present’ or ‘absent’ in an

event. In the OSU data, PSFs are rated on a 0-to-10 scale, providing more insight into the

strength of the PSF.

• Simplified PSF network. The revised PSF model is based on the simplified PSF network

from later versions of IDAC (Li, 2013).

• Success and failure data. Ninety-eight percent of the events in the HERA database

involved human error, precluding the ability to contrast PSFs in successful events from

failures. The OSU dataset is a mix of successes and failures.

• Operator reported data. The HERA PSF BN is based on incident reports, written after

the fact. Although the report authors interview the operator, the operators’ direct

feedback is not included in the dataset. The OSU data set is comprised of operators’

Reported PSF values.

• Probability distributions. The OSU PSF BN replaces conditional probability tables with

continuous probability distributions.

The proposed model has three components: the effect of the Environmental PSF (A), the simulator bias (B), and the situational context (C). Details and variations in modeling A,

B and C are discussed below.

5.4.1.1 A, Effect of the Environmental PSF The first component of the ABC model is A, the effect of the Environmental PSF. The

Environmental PSF represents the expected PSF level for a completely rational, perfectly

178

calibrated operator who understands the implications of all the plant indicators. But this is a simplification. As discussed in Section 5.1.2, the perceived PSF is not equivalent to the

Environmental PSF value. One source of discrepancy between the Environmental PSF and the perceived PSF is the operator’s sensitivity to the plant conditions. Some operators may be well aware of their environment and attuned to their surroundings, while others

(especially student operators) are focusing on working through the procedures.

Table 5.7 lists the various models of Environmental Effect that were considered in this research. The first model, A.1, is a basic model that treats the expected value of the reported PSF as a linear function of an intercept (푎푝0) and a coefficient (푎푝1).

Models A.2 – A.4 account for the variation between operators by estimating operator- specific sensitivity coefficients for each parent PSF, 푎푝푖:

퐴푝푖푗 = 푎푝푖푥푝푖푗 (5.28)

Table 5.7: ABC Model - A Variations 흁풑풊풋 = 푨풑풊풋 Name Description Apij Parameter Prior Distribution A.1 Base Model 퐴푝푖푗 = 푎푝0 + 푎푝1푥푝푖푗 푎푝1~ 푁(1.0,2.5) 푎푝0 ~ 푁(0.0,2.5) A.2 Separate Operator sensitivity 퐴푝푖푗 = 푎푝0 + 푎푝푖푥푝푖푗 푎푝푖~ 푁(1.0,2.5), coefficients for each operator estimated for each operator A.3 Grouped operator sensitivity 퐴푝푖푗 = 푎푝0 + 푎푝푖푥푝푖푗 푎푝푖~ 푁(1.0,2.5), coefficients (5 groups) estimated for each operator group 2 A.4 Add mean operator sensitivity 퐴푝푖푗 = 푎푝0 + 푎푝푖푥푝푖푗 푎푝푖~푁(휇푎푝, 휎푎 ), (five groups vary around mean) 휇푎푝~푁(1.0,2.5) 2 휎푎 ~푈(0,100)

179

In Model A.2, a unique coefficient is estimated for each operator. In Model A.3, the operators are grouped into five sensitivity groups based on the results of Model A.2 (see

‘Sorting Operators by Sensitivity’ below). In Models A.2 and A.3, 푎푝푖 are drawn from the weakly informative distribution 푁(1.0,2.5). Model A.4 introduces the hierarchical level from the example model (Figure 5.6): operator sensitivity factors for each group are assumed to be distributed normally around some mean value, 휇푎푝, which is the expected sensitivity of a random operator to PSF p:

2 푎푝푖 ~ 푁(휇푎푝, 휎푎 ) (5.29)

The parameters 휇푎푝 and 휎푎 are assigned weakly informative prior distributions as discussed in Section 5.2.1.6. A.4 is the Environmental PSF effects used in the final model.

5.4.1.1.1 Sorting Operators by Sensitivity Initial models estimated separate operator sensitivity coefficients for each operator. We found that this was too complex; most of the operator sensitivity coefficients were not significant in the final model. Instead, we decided to group the operators into five categories for each PSF.

This was done using the results from A.2 (see Table 5.12 in Section 5.5), which estimate

퐸[푦푝푖푗] = 푎푝푖푥푝푖푗 + 푔푝. In A.2, 푎푝푖 is the operator sensitivity coefficient for the operator who reported the PSFs in observation i, and gp is a PSF-specific intercept.

180

Operator sensitivity coefficients are sorted into five sensitivity groups: the three with the lowest 푎푝푖 are assigned to group 1, the next three lowest 푎푝푖 to group 2, and so on. Figure

5.10 plots the average api for each group by PSF.

Figure 5.10: Average operator sensitivity coefficient api in each group.

The operator sensitivity groups are listed in Table 5.8. We find that operator sensitivity is not consistent between PSFs; Operator 2, for example, is in a different sensitivity group for each PSF. The row labeled ‘n.groups’ in Table 5.8 lists the operator’s number of sensitivity groups. Eight operators are in 3 different groups, four are in two groups, and two are in all four groups. No operator is in the same group for all four PSFs.

181

Table 5.8: Operator sensitivity groups.

Operator Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 TCL 1 4 5 1 5 2 4 2 1 2 5 3 4 3 PIL 4 3 4 2 4 1 5 2 2 3 1 5 5 1 TC 4 5 2 1 4 1 5 5 1 2 3 4 2 3 CTL 4 1 1 2 4 5 2 1 5 3 3 5 2 4 n.groups 2 4 4 2 2 3 3 3 3 2 3 3 3 3

This indicates that an operator cannot be categorized as ‘sensitive to plant conditions’ or

‘insensitive to plant conditions’; instead, sensitivity depends on the PSF.

In the final model (C.8), operator sensitivity coefficients are estimated for each operator group rather than individual operators.

5.4.1.2 B, Simulator Bias The second component in the ABC model is B, Simulator Bias, which is included based on the discussion and analysis in Chapter 4. To estimate simulator bias, the model uses the results from the Latent Bias Model (Figure 4.7, Table 4.7). In this model, the latent variable Bias is determined by three factors: Exam, Team Atmosphere, and Year.

Because the PSF BN is built using 2015 data, in the ABC model Bias depends on Exam

(E) and Team Atmosphere (TA):

휇퐵푃푖 = 훽푏푝1퐸푖 + 훽푏푝2푇퐴푖 (5.30)

Table 5.9 lists the variations in models of B. Model B.1 treats Bias as a fixed effect, and

Model B.2 adds variation. As with operator sensitivity to plant conditions, variation in

Bias is expected. This is modeled by assuming the Bias affecting PSF p is normally distributed around 휇퐵푝푖,

182

2 퐵푝푖 ~ 푁(휇푏푝푖, 휎푏 ) (5.31)

There are two sources for the variability in Bias. The first is the variability in the effects of Exam and Team Atmosphere between observations, and the second is any other source of Bias that is not captured in by Exam or Team Atmosphere.

B.3 and B.4 repeat B.2 with one predictor instead of two. B.3 estimates Bias from Exam, and B.4 estimates Bias from Team Atmosphere. We find that both predictors are useful, although E is a stronger predictor than TA (see the coefficients in Table 5.13). B.2 is the

Bias used in the final model.

As with the prior distributions for Environmental PSF effects, parameters in the Bias prior distributions in Table 5.9 are chosen based on what they would be if the

Environmental PSFs perfectly explained the reported PSFs: 퐸[푏푝] = 0. An alternative prior parameterization would be to use the equivalent coefficients for 푏푝 that are estimated in the Latent Bias Model.

Table 5.9: ABC Model - B variations 흁풑풊풋 = 푨풑풊풋 + 푩풑풊 Name Description Bpi Parameter Prior Distribution B.1 Fixed Effects 퐵푝푖 = 푏푝1퐸푖 + 푏푝2푇퐴푖 푏푝~푁(0.0,2.5) B.2 Random Effects 휇푝푖 = 푏푝1퐸푖 + 푏푝2푇퐴푖 2 퐵푝푖 ~ 푁(휇푏푝푖, 휎푏 ) B.3 Random Effects – 휇푝푖 = 푏푝1퐸푖 푏푝~푁(0.0,2.5) Exam 2 2 퐵푝푖 ~ 푁(휇푏푝푖, 휎푏 ) 휎푎 ~푈(0,100) B.4 Random Effects – 휇푝푖 = 푏푝1푇퐴푖 TA 2 퐵푝푖 ~ 푁(휇푏푝푖, 휎푏 )

183

5.4.1.3 C, Context The third and final component in the ABC model is C, Context. We consider two aspects of context: static context and dynamic context. Static context factors are those controlled in the experiment: the design basis accident, the complexity of the accident, and the operator’s role. Dynamic factors are factors that appear to be significant to the operators based on observations of the scenario sessions: time series effect and operator Confusion.

Table 5.10 lists the Context variations considered in the ABC model. C.1 includes only static context effects. There are three static context variables, each equal to either zero or one: S (=1 if SGTR, = 0 if LOCA), C (=1 if Complex, = 0 if simple), and R (= 1 if SRO,

=0 if RO). There are total of eight possible contexts based on these three variables. Each context is assigned a Group index, g[i], which ranges from 1 to 8:

푔[푖] = 4푆푖 + 2퐶푖 + 푅푖 + 1 (5.32)

Even group indices are SRO and odd indices are RO. Group indices 1-4 are LOCAs, and

5-8 are SGTRs; indices 1-2 and 5-6 are simple (single fault) scenarios, and indices 3-4 and 7-8 are complex (multiple fault) scenarios. As Table 5.12 in Section 5.5 shows, adding static context did not improve the model metrics, so static context variables are not included in the remaining C variations.

184

Table 5.10: ABC Model - C variations 흁풑풊풋 = 푨풑풊풋 + 푩풑풊 + 푪풑풊풋 Name Description Cpi Parameter Prior Distribution C.1 Static, grouped, 퐶푝푔[푖]~ 푁(0.0,2.5) 푔[푖] = 4푆푖 + 2퐶푖 + 푅푖 fixed effects + 1 C.2 Dynamic: Confusion 퐶푝푖푗 = 푐푝퐶푛푓푖푗 C.3 Dynamic: Time 퐶푝푖푗 = 푐푝휇푝푖[푗−1] Series 푐푝~푁(0.0,2.5) C.4 Dynamic: Hours 퐶푝푖푗 = 푐푝퐻푝푖푗 C.5 Confusion + Time 퐶푝푖푗 Series = 푐푝1휇푝푖[푗−1] + 푐푝2퐶푛푓푖푗 C.6 Confusion, Random 휇푐푝푖푗 = 푐푝1퐶푛푓푖푗 2 Effects 퐶푝푖푗~푁(휇푐푝푖푗, 휎푐 ) C.7 Time Series, Random 휇푐푝푖푗 = 푐푝1휇푝푖[푗−1] 2 푐푝~푁(0.0,2.5) Effects 퐶푝푖푗~푁(휇푐푝푖푗, 휎푐 ) 2 휎푐 ~푈(0,100) C.8 Time Series and 휇푐푝푖푗 Confusion, Random = 푐푝1휇푝푖[푗−1] + 푐푝2퐶푛푓푖푗 2 effects 퐶푝푖푗~푁(휇푐푝푖푗, 휎푐 )

As with Bias, the prior distributions are chosen to reflect an ideal case model in which the

Environmental PSF explains the Reported PSF.

In C.2, operator Confusion (Cnfij) is added as a predictor for all four parent PSFs, and in

C.6 Cnfij is the base for the Context random effects. In the Environmental PSFs, the effect of operator Confusion is included as a predictor for TC but not the other PSFs (see

Section 5.1.1). We observed that some operators reported high TCL and CTL at the end of the scenario. This often corresponds to sessions in which the operators were using the wrong procedure or were uncertain of their diagnosis of the accident. Therefore,

Confusion was added to the model as a predictor for TCL, PIL, TC and CTL. Developing

185

a subjective metric of operator confusion will be an important aspect of continuing research.

C.3 Context adds time series effects, and C.7 adds random effects around a mean context effect 휇푐푝푖푗 that is based on the previous 휇푝푖푗 value. There are two ways to interpret the time series effect on the data. The first is as a decay factor; the effects of PSFs do not evaporate instantaneously when conditions change. The Environmental PSF equations all include a decay factor to capture the lingering effects of the previous state in terms of memory decay related to incoming alarms and rates of change. We decided the decaying

PSF component is not captured adequately through the Environmental PSF, for several reasons. First, some operators are insensitive to the Environmental PSF but still experience the decay effect due to other aspects of the model such as confusion. Second, on the other end of the spectrum, operators who are particularly sensitive to environmental factors may have decay factors that differ from the average. Third, the time series effect may be in part an artifact of the way PSF data are collected. Operators watch a video of the scenario, evaluating PSFs periodically throughout the scenario.

However, operators may use the previous PSF evaluation as an anchor for their next evaluation point; operators may be assessing an increase or decrease in PSF levels rather than assessing the PSF in isolation. For these reasons, a time series effect is included in the evaluation Context.

C.4 uses Hours from the beginning of the scenario (Hij) to predict Context. Hij does not improve k.fit and is therefore not used in the remaining C variations.

186

C.5 includes both time series and confusion as explanatory variables for Context, and C.8 adds random effects to the Context. C.8 is the context that is used in the final model.

5.4.1.4 Estimating Reported PSFs using the ABC Model The expected value of each Parent PSF (TCL, PIL, TC, or CTL) is simply A + B + C:

휇푝푖푗 = 퐴푝푖푗 + 퐵푝푖 + 퐶푝푖푗 (5.33) where p = 1, 2, 3, 4 corresponds to TCL, PIL, TC and CTL respectively.

The fifth PSF, Stress, is modeled differently. The IDAC PSF Network estimates

Environmental Stress as the mean value of the four Parent PSFs (TCL, PIL, TC and

CTL). This model structure is adopted to estimate subjective stress:

4

휇5푖푗 = ∑ 훽푆푝휇푝푖푗 (5.34) 푝=1

In the ABC model, the weight of each Parent PSF (훽푆) is estimated from the data, in contrast with the IDAC algorithm, which assigns equal weight to each Parent PSF.

Finally, the reported PSFs are modeled by a conditionally normal distribution around their respective values,

2 푝(푦푝푖푗|푥푝푖푗, 휃푝푖푗) ~ 푁(휇푝푖푗, 휎 ) (5.35)

5.4.1.5 Prior Distributions Weakly informative prior distributions for the model parameters are selected following the approach in Section 5.2.1.6. Variance parameters are drawn from a uniform distribution; other parameters are sampled from normal distributions centered around a

187

reasonable mean value with wide variance. Table 5.11 specifies the prior distributions for the model parameters.

Table 5.11: Prior distributions for unknown model parameters. Parameter Weakly Informative Justification Prior

Mean operator 푝(휇푝) ~ 푁(1.0,6.25) If the Environmental PSF were a sensitivity perfect proxy for the Perceived PSF, 휇푎푝 = 1 Bias coefficients 푝(푏퐵푝) ~ 푁(0.0, 6.25) With no prior information about the impact of the Bias on the Reported PSFs, we conservatively expect 푏퐵푝 = 0. This choice is akin to the null hypothesis: our prior expectation is that Bias has no effect.

Context 푝(푏퐶푝) ~ 푁(0.0,6.25) As with Bias, we conservatively set coefficients 퐸[푏퐶푝] = 0. Variance 푝(휎) ~ 푈(0,100) Noninformative prior distribution for parameters: variance recommended by (Gelman, 휎, 휎푎, 휎퐵, 휎퐶 2006).

The prior distributions can be tested by re-evaluating the model with different parameters in the prior distributions or different prior distributions all together. The final model is expected to be insensitive to the choice of prior parameters. The Results section of this chapter shows a set of different priors yields comparable results.

5.5 RESULTS The fit statistics for the different model configurations are listed in Table 5.12. The ‘best’ model is the model that maximizes k.fit and k.predict while minimizing p.e.fit and

2 p.e.predict. Model DIC and R (calculated using the posterior mean as the estimated

188

value) are included for reference. Cross reference the LABEL in Table 5.12 with Table

5.7, Table 5.8 and Table 5.9 to identify the details of each model configuration.

Table 5.12: ABC Model - Fit results LABEL DIC R2 k.fit k.predict p.e.fit p.e.predict

A: 흁풑풊풋 = 푨 A.1 4424 0.22 0.05 0.71 0.06 0.74 A.2 4339 0.31 0.15 0.76 0.12 0.72 A.3 4266 0.30 0.11 0.70 0.08 0.71 A.4 4269 0.30 0.11 0.69 0.08 0.71 B: 흁풑풊풋 = 푨. ퟒ + 푩 B.1 4355 0.28 0.11 0.78 0.09 0.72 B.2 3684 0.56 0.42 0.72 0.17 0.61 B.3 3686 0.56 0.42 0.68 0.18 0.60 B.4 3677 0.56 0.42 0.52 0.18 0.63 C: 흁풑풊풋 = 푨. ퟒ + 푩. ퟐ + 푪 C.1 3687 0.56 0.41 0.63 0.17 0.58 C.2 3461 0.62 0.44 0.56 0.16 0.59 C.3 6771 0.88 0.88 0.31 0.27 0.41 C.4 3604 0.58 0.43 0.72 0.17 0.60 C.5 3452 0.62 0.43 0.59 0.16 0.58 C.6 10702 0.83 0.82 0.35 0.27 0.45 C.7 3671 0.62 0.43 0.50 0.17 0.60 C.8 7007 0.87 0.87 0.62 0.26 0.43 AC: 휇푝푖푗 = 퐴. 4 + 퐶. 8 5822 0.85 0.87 0.48 0.29 0.48 BC: 휇푝푖푗 = 퐵. 2 + 퐶. 8 6901 0.56 0.86 0.63 0.29 0.58

The model building process begins with A, the Environmental PSF. The preferred “A” model is A.4. A.4 has comparable fit statistics to A.3, but we decided to use A.4 because it estimates 휇푎푝 as well as 푎푝푖. As it turns out, 휇푎푝 = 0, meaning A.3 is equivalent to

A.4, but the benefit of the A.4 model is that we have an estimate of the expected operator sensitivity to the Environmental PSF.

189

Using A.4 as the base, we replace 푎푝0 from A.4 and add B, the Bias effect. Of the four iterations of the Bias effect model, B.2 is selected for the final model because it has the best k.predict. Then, using A.4 and B.2 as the base, we add C, the additional Context effects. C.8 is preferred because it has high k.fit and k.predict with low p.e.predict. We do not consider other combinations of the model (e.g., A.1 + B.1 + C.1), because we assume that the best local model will be the best component part of a larger model. This assumption is reasonable but cannot guarantee the final model is the best model; there may be other, untried model configurations that are more representative of the data.

To test the necessity of including all three elements of the model—A, B and C—we also evaluated AC and BC. AC and BC have comparable k.fit to C.8, and BC has comparable k.predict to C.8, but p.e.predict is wider for these two models that include less information than C.8.

The final model used in the analysis below is therefore A.4 + B.2 + C.8, i.e.

- 퐴푝푖푗 = 푎푝푖푥푝푖푗, where 푎푝푖 is estimated for five levels of operator sensitivity: 2 2 o 푝(푎푝푖|휇푎푝, 휎푎 ) ~ 푁(휇푎푝, 휎푎 ) 2 2 - 퐵푝푖: 푝(퐵푝푖|휇퐵푝푖, 휎퐵) ~ 푁(휇퐵푝푖, 휎퐵), o 휇퐵푝푖 = 푏푝1퐸푖 + 푏푝2푇퐴푖 2 2 - 퐶푝푖푗: 푝(퐶푝푖푗|휇푝푖푗, 휎푐 ) ~ 푁(휇푐푝푖푗, 휎푐 ), o 휇푐푝푖푗 = 푐푝1휇푝푖[푗−1] + 푐푝2퐶푛푓푖푗 A few additional model configurations are included in Table 5.12: AC removes Bias from the model, and BC removes the Environmental PSF. AC has comparable k.fit to C.8, the final model discussed below, but k.predict for AC is less than k.predict for C.8. BC is has comparable k.fit and k.predict, but p.e.predict is greater than p.e.predict for C.8.

190

The model results are the set of sampled values obtained in the Gibbs Sampler. These include estimates of 휇푝푖푗, the expected value of each reported PSF.

The posterior samples are used to obtain parameter means, standard deviations, CIs, and the probability that the coefficient is not significant (“Probability Not Significant” in the tables below). This probability is comparable to the p-value in classical statistics: the proportion of the sampled values that are less than zero if the sample mean is positive, or greater than zero if the mean is negative. A high probability suggests that the explanatory variable is not useful for predicting the reported data. The results also report the 95 percent CI, which is simply the middle-ninety-fifth quantile (0.025 to 0.975) of the posterior samples.

As an example of the model results, Figure 5.11 and Figure 5.12 show the model fit for two representative operators. In the plots, the red line is the reported data, the blue line is the prior Environmental PSF, and the gray area is the model’s 95 percent CI. Each row includes the PSFs reported in one observation session in the simulator: Practice A (PA),

Practice B (PB), Exam A (EA) and Exam B (EB). Each plot title inclues K.fit for that

PSF in that session, the session name, the team number, and the operator role

(ODM/SRO or OAT/RO).

Plots for all of the operators are included in Appendix E.

191

Figure 5.11: Model fit for Operator 2. Reported PSF (thick red line), Environmental PSF (thin blue line), and posterior 95% CI (gray shadow). Operator sensitivity group is listed in parentheses in each plot title.

Figure 5.12: Model fit for Operator 12. Reported PSF (thick red line), Environmental PSF (thin blue line), and posterior 95% CI (gray shadow). Operator sensitivity group is listed in parentheses in each plot title.

192

5.5.1 Model Coefficients The final model parameters are illustrated with boxplots in Figure 5.13 and summarized in Table 5.13.

Figure 5.13: Posterior parameter boxplots. Left: coefficients for 휇푝푖푗, 푏푝, 푐푝; right: 푎푝푖.

193

Table 5.13 lists the posterior mean, standard deviation, and probability of insignificance for the model coefficients.

Table 5.13: Posterior estimates of the model parameters used to estimate μpij. ABC Posterior Mean Posterior SD Probability Not Significant Coefficient TCL PIL TC CTL TCL PIL TC CTL TCL PIL TC CTL p=1 p=2 p=3 p=4 p=1 p=2 p=3 p=4 p=1 p=2 p=3 p=4

휇푎푝 0.00 0.04 0.00 -0.03 0.04 0.04 0.05 0.06 0.51 0.15 0.47 0.30

푏푝1 -0.24 -0.26 -0.26 -0.17 0.09 0.09 0.09 0.09 0.00 0.00 0.00 0.02

푏푝2 -0.06 -0.22 0.02 0.03 0.04 0.05 0.04 0.05 0.09 0.00 0.30 0.29

푐푝1 0.30 0.29 0.18 0.24 0.04 0.04 0.04 0.05 0.00 0.00 0.00 0.00

푐푝2 0.15 0.16 0.40 0.32 0.03 0.03 0.04 0.03 0.00 0.00 0.00 0.00

The Left boxplots in Figure 5.13 show the model coefficients, partitioned into A, B, and

C contributors. The Right boxplots show the grouped operator sensitivity coefficients 푎푖 for each PSF. The operator sensitivity coefficients are listed in Table 5.14.

Table 5.14: Posterior operator sensitivity coefficients. Coefficent Posterior Mean Posterior SD Probability Not Significant TCL PIL TC CTL TCL PIL TC CTL TCL PIL TC CTL p=1 p=2 p=3 p=4 p=1 p=2 p=3 p=4 p=1 p=2 p=3 p=4 - 푎푝1 0.04 -0.05 -0.10 -0.12 0.05 0.04 0.06 0.07 0.18 0.08 0.03 0.04 - 푎푝2 0.05 0.03 0.00 -0.05 0.05 0.04 0.05 0.07 0.16 0.25 0.50 0.24 - 푎푝3 0.01 0.05 0.02 -0.04 0.05 0.04 0.06 0.08 0.42 0.09 0.38 0.34

푎푝4 0.04 0.09 0.07 -0.01 0.04 0.03 0.06 0.07 0.15 0.00 0.12 0.44

푎푝5 0.05 0.09 0.03 0.05 0.04 0.04 0.05 0.08 0.07 0.01 0.30 0.26

194

Finally, the coefficients for stress are listed in the left portion of Table 5.15, and the

variance parameters are listed in the right portion of Table 5.15. These parameters are

illustrated in boxplots in Figure 5.14, along with a boxplot of 휇푎푝.

Table 5.15: Posterior coefficients for stress (Left) and posterior variance parameters (Right). Stress Posterior Posterior Probability Variance Posterior Posterior Probability Coefficient Mean SD Not Parameter Mean SD Not Significant Significant 훽 휎 푆1 0.47 0.06 0.00 0.40 0.01 0.00 훽 A 휎 푆2 -0.01 0.04 0.38 푎 0.08 0.02 0.00 훽 B 휎 푆3 0.43 0.07 0.00 퐵 0.27 0.03 0.00 훽푆4 0.10 0.08 0.11 C 휎퐶 0.39 0.02 0.00

Figure 5.14: Posterior parameter boxplots. Left: 푏푆푝; middle: 휇푎푝, right: model variance.

5.5.2 Model K Values Figure 5.15 plots K for different subsets of the model. The first, leftmost figure in Figure

5.15 is a histogram of the K values for each PSF in each observation—with 43

observations used to build the model, this results in 215 separate K values. The other

plots in Figure 5.15 separate K by operator, by session, and by PSF. Red dots represent

195

operators in Treatment 1 (operators who encountered a mix of simple and multiple fault accidents), and black dots are Treatment 0 operators (operators who encountered only single-fault accidents). Gray dots average the two.

In the second plot in Figure 5.15, the operator K values can be roughly separated into two groups: nine operators with K.fit average above 0.8, and five operators with K.fit between 0.75 and 0.8. The model does not represent the factors that influence the second group as well as it does the first group. This may indicate that there are multiple modes of operator reporting tendencies, and that the operator sensitivity factors do not capture all the variation between operators. Future research could identify characteristic reporting modes to determine if multiple models should be developed to better capture different types of operator reporting behavior.

The third plot in Figure 5.15 separates K.fit by session. This plot shows K.fit for the first session, PA, is typically lower than K.fit for the other sessions. This is probably because operators are still learning how to report PSFs in the first session, and are still developing the PSF reporting habits that are solidified in their second session.

Finally, the fourth plot in Figure 5.15 separates K.fit by PSF. This plot shows that K.fit for stress is typically much lower than K.fit for the other PSFs, indicating that the linear model for stress adopted from the IDAC model is an inadequate measure of operator reported Stress. Although the data used to estimate Stress are the expected values of the reported PSFs, not the Environmental PSFs, the factors that influence the Parent PSFs may influence Stress directly, rather than through the Parent PSFs. Future models should

196

estimate Bias and Context effects specifically for stress, rather than relying on indirect effects through the parent PSFs.

Figure 5.15: Posterior K.fit. Left: histogram of K for all observations; 2nd: mean K.fit for each operator; 3rd: mean K.fit for each session; 4th: mean K.fit for each PSF. Results are sorted into two categories: Treatment 0 operators who encountered only single-fault accidents (black) and Treatment 1 operators who encountered a mix of single fault accidents and multiple fault accidents (red). The overall average is shown in gray. In the 3rd figure, results from complex (multiple fault) scenarios are marked with an x.

Table 5.16 summarizes the K.fit data.

Table 5.16: K.fit data for all the estimated values ('All PSFs') and each of the five PSFs. (Model = C.8) K.Fit TCL PIL TC CTL Stress All PSFs p.o 0.96 0.89 0.97 0.98 0.68 0.90 p.e 0.28 0.27 0.28 0.30 0.18 0.26 K 0.94 0.85 0.96 0.98 0.61 0.87

K = 0.87 for all PSFs means that, on average, the model fits a set of PSF observations 87 percent better than random chance.

197

5.5.3 Model Prediction Figure 5.16 shows the prediction plots for the two observation sets reserved in the Test data. As in Figure 5.11 and Figure 5.12, the red line is the reported PSF, the blue line is the prior Environmental PSF and the gray area is the model 95 percent CI for prediction.

The predicted values, 푦̂푝푖푗, are sampled using the mean posterior values of each model parameter:

2 2 푝(푦̂푝푖푗|푥푝푖푗, 푎푝푖, 휇푏푝푖, 휇푐푝푖푗, 휎 ) ~ 푁(푎푝푖푥푝푖푗 + 휇푏푝푖 + 휇푐푝푖, 휎 ) (5.36)

An alternative approach is to sample each estimated parameter from its posterior distributions, then use those parameters to generate distributions for 푎̂푝푖, 휇̂푏푝푖, etc. and to sample 퐴̂푝푖푗, 퐵̂푝푖, 퐶̂푝푖푗 from the posterior predictive distributions, and finally to sample

푦̂푝푖푗 from 휇̂푝푖푗 = 퐴̂푝푖푗 + 퐵̂푝푖 + 퐶̂푝푖푗. The result of this approach is a CI that spans from zero to one. This means we need more information about the Bias and the Context effects to reduce the prediction uncertainty.

Figure 5.16: Prediction plots for the two reserved Test observations. As in previous plots, Reported PSFs are red, Environmental PSFs are blue, and the posterior predicted 95% CI for the reported PSF is gray.

198

The CI for prediction is wider than the CI for fit, approximately 0.45 instead of 0.26.

K.fit is likewise lower, both because p.o is generally less than p.o for fit and because p.e for prediction is greater than p.e for fit.

The two reserved Test scenarios allow for comparison between operators. The first operator tends to report higher PSF values than the second operator, and this is reflected in the prediction plots.

K.predict for TCL in EA6OAT is approximately zero, indicating that the model performs no better than chance for this observation’s TCL. The corresponding plot in Figure 5.16 shows that although the model predicts the general shape of the Reported TCL, the predicted values are greater than the reported TCL.

Table 5.17: K.predict data for each Test observation and averaged over both observations (Model = C.8). Prediction - EB3ODM TCL PIL TC CTL Stress All PSFs p.o 1.00 0.55 0.91 0.55 0.82 0.76 p.e 0.44 0.44 0.45 0.45 0.47 0.45 K 1.00 0.20 0.84 0.17 0.66 0.57 Prediction - EA6OAT TCL PIL TC CTL Stress All PSFs p.o 0.44 0.67 0.89 1.00 1.00 0.80 p.e 0.41 0.38 0.38 0.45 0.41 0.41 K 0.05 0.46 0.82 1.00 1.00 0.67 Prediction - All PSFs TCL PIL TC CTL Stress All PSFs p.o 0.75 0.60 0.90 0.75 0.90 0.78 p.e 0.43 0.41 0.42 0.45 0.44 0.43 K 0.56 0.32 0.83 0.55 0.82 0.62

199

The low K.fit for stress would suggest that K.predict for stress would also be low; however, the prediction CI for Stress contains all the reported Stress values. One explanation is that the prediction CI is wider than the fit CI and therefore able to accommodate more variation in Stress. This suggests that the variance around expected

Stress is wider than the variance around the other PSFs, and that a separate 휎푆 should be estimated in the model.

5.6 ANALYSIS The interpretation and implications of each aspect of the ABC model are addressed below. This includes the significance of each element of the model, the relative strength of the different inputs within each model component (A, B and C), and the expected variation of each model component.

5.6.1 Interpreting A, the Environmental PSF Effect on the Reported PSF A is the effect of the Environmental PSF, determined by the Environmental PSF and by the operator sensitivity coefficient.

The left plots in Figure 5.13 show that 휇푎 (mu.a) is small compared to the other explanatory variables, indicating that Context and Bias are more significant in predicting operator reported PSFs than are the plant conditions. The Probability Not Significant values in Table 5.14 show that the Environmental PIL is most significant Environmental

PSF; probabilities that 푎푝≠2 = 0 are all less than 0.1.

200

After PIL, operators are most sensitive to TCL; for all but one TCL 푎푝푖, the probability that 푎푝푖 = 0 is less than 0.2. Operators in TC group 1 and 4 are sensitive to

Environmental TC, and operators in CTL group 1 are sensitive to CTL.

The variation in operator sensitivity to the Environmental PSFs can be explained by a variety of factors:

• Some operators’ Reported PSFs may not depend on plant conditions. Instead, these

operators may be reporting PSFs that depend on factors that are not tied to plant

conditions.

• Operator insensitivity to plant conditions may also reflect student operator perception

rather than experienced operator perception; student operators may need more training an

experience before they can perceive Environmental PSFs as experienced operators.

• The algorithms used to calculate the Environmental PSFs may not adequately capture the

relevant plant conditions that lead to the PSFs. If we modify the Environmental PSF

algorithms, we may see stronger ties between the Environmental PSF and the Reported

PSF. Operators appear to have the lowest sensitivity to Environmental CTL; if we

developed a new Environmental CTL measure for student operators based on the factors

identified in Section 5.1.1.4.3, operator sensitivity to Environmental CTL may increase.

• Sorting operators by degree of sensitivity may be too simplistic. Instead, we may need to

identify PSF reporting modes or characteristic PSF reporting styles.

There are two variables of interest in A: the mean operator sensitivity to the

Environmental PSFs, 휇푎, and the variation in sensitivity between operators. The middle plot in Figure 5.14, a boxplot of the posterior 휇푎 samples, shows that the operators are

201

more sensitive to PIL than to the other PSFs, and that the sensitivity to CTL is more varied than sensitivity to the other PSFs.

2 Variation between operators is represented by 휎푎 . The right column plots in Figure 5.13 show some operator groups are insensitive to the Environmental PSFs (the boxplots for these operators are centered at zero), while other are sensitive to the Environmental PSFs.

This sensitivity is not necessarily consistent; operators are not in the same group for every PSF and may be sensitive to one PSF but not others. The variety in 휇푎푝 and 푎푝푖 helps begin to capture the unique effects of personality and experience on perceived PSFs without having to explicitly model operator characteristics.

5.6.2 Interpreting B, the Bias Effect on the Reported PSF There are two aspects to the Bias component: the magnitude of the Bias and the direction of the Bias effect.

Bias is based on the Latent Bias Model in Chapter 4. From the SEM analysis, we expect

Bias to be inversely related to Exam and Team Atmosphere: Bias decreases during the

Exam phase (‘Graded’ in Chapter 4) and as Team Atmosphere improves. Maximum Bias is expected in the Practice phase in crews with poor Team Atmosphere.

The 푏푝1 and 푏푝2 data in Table 5.13 show that, as expected, Bias decreases in the Exam phase for all four PSFs. Improved Team Atmosphere decreases Bias in PIL and has a small effect on TCL; Team Atmosphere has no impact on the TC or CTL bias (the estimated coefficients are equal to zero).

202

5.6.2.1 Estimated Bias The estimated Bias is referred to as a “random” effect because it is free to vary around the expected Bias. Figure 5.17 shows the estimated Bias for each Session (PA, PB, EA and EB), plotting Bias for each PSF in the Session separately.

Figure 5.17: Estimated Bias in each observation, separated by Session (PA, PB, EA, EB) and identified by the reporting operator's Team Number and Role (S ~ SRO, R ~ RO).

Figure 5.18 shows the Bias sorted by operator for four randomly selected operators.

203

Figure 5.18: Bias organized by operator for four representative operators.

Team Atmosphere is evaluated once by each operator and does not change from Session to Session, because operators work with the same teammate throughout all four scenarios.

Therefore, the expected Bias does not change within each phase: for a given operator, the expected Bias in PA matches the expected Bias for PB, and EA matches EB. For several operators in Figure 5.18, posterior Bias in Exam sessions is more consistent than posterior Bias in Practice sessions. This may indicate that Bias is higher in the first practice session than in the other sessions.

5.6.3 Interpreting C, the Context Effect on the Reported PSFs The two dynamic Context effects in the final model are the time series effect and operator

Confusion. The left column boxplots in Figure 5.13 show that the time series effect is more significant than Confusion for the two PSFs that are tied directly to plant

204

conditions: TCL and PIL. In contrast, Confusion is more significant than the time series effect for the cognitive PSFs: TC and CTL. This suggests that TCL and PIL may have lingering or delayed effects, while TC and CTL are instantaneous and closely tied to cognition in the current situation, which relates to confusion.

5.6.4 Stress The first boxplot in Figure 5.14 shows the best predictors for Stress are TCL and TC, followed by CTL; the Stress coefficient for PIL is almost zero, indicating that, at least for the OSU student operators, PIL does not have a significant impact on Stress.

Stress is notoriously difficult to measure, and a variety of metrics have been proposed for evaluating stress, both within HRA and in experimental psychology. The model above shows how a theoretical measure for Stress can be updated based on experimental data.

PIL attempts to measure the effects of alarms on the operator performance. PIL is included as a predictor for Stress because operators and experts believe that alarms will increase Stress; however, this model shows that operator reported Stress is fairly insensitive to alarms. The next task is to determine how well operator reported Stress corresponds to changes in operator behavior, first in the simulator environment and then in the real world.

5.6.5 Model variance: capturing the Uncertainty in the PSFs As discussed in Section 5.5.3, the model predictions are based on the expected values of the model parameters, but the model also captures our uncertainty about these parameters. The variance parameters--휎, 휎푎, 휎푏, 휎푐—partition the uncertainty in the model into four areas of uncertainty: variation in operator reporting (휎), variation in operator

205

sensitivity to the Environmental PSFs (휎푎), variation in scenario Bias (휎푏), and variation due to context effects (휎푐). Reducing the uncertainty in any of these four areas will reduce the overall prediction uncertainty.

For example, the predictions in Figure 5.16 are based on the expected Bias and Context.

Suppose we knew the test data were collected in a situation with minimum Bias—for example, Bias corresponding to the fifth percentile of the Bias distribution. This is estimated using the model parameters; the fifth percentile Bias is the fifth percentile of a

2 normal distribution with mean 휇퐵푝 and variance 휎퐵. The dotted lines in Figure 5.19 show how the expected PSF values change in cases of extreme Bias—at the 2.5% of the Bias range or at the 97.5% of the Bias range. The effect is significant; extreme Bias shifts the expected value of the Reported PSF almost to the edges of the 95% CI for the ‘typical’

Bias condition.

206

Figure 5.19: Predicted values for the two Test observations. Dotted lines show the mean predicted values in cases of extreme Bias or lack of Bias. Similar analysis in Context or operator sensitivity will illustrate the uncertainty in these areas of the model.

5.6.6 Sensitivity Analysis To verify that the model prior distributions are not too informative, we reevaluate the final model with even less informative prior distributions for the model parameters.

Whereas the original model is evaluated with 푝(휃)~푁(0.0,6.25) for any model parameter 휃, the ‘even less informative’ model is evaluated with the prior distributions

푝(휃)~ 푁(0.0,625). We also replace the uniform priors for variance parameters with two

Inv-Gamma(휖, 휖) distributions, one with 휖 = 0.001 and one with 휖 = 0.0001.

Figure 5.20 shows the results. The top plot is the original parameter posterior means

(black) and the parameter posterior means estimated with the less informative priors, 휖 =

207

0.001 (red) and 휖 = 0.0001 (blue). The bottom plot is the difference between the two sets of posterior means. The difference is never greater than 0.06. In both plots, the blue line and the red line are almost indistinguishable.

For simplicity, the parameter estimates are all plotted on the same figure, separated into sections by vertical lines. The first section in the plots is the set of 푎푝푖, the second section is the set of 푏푝, the third is 푐푝, the forth is the set of stress coefficients 훽푠, and the final four dots correspond to the four variance parameters.

Figure 5.20: Sensitivity test results. The consistency in Figure 5.20 confirms that the difference in the two models is negligible, and that the weakly informative priors used to estimate the model parameters do not unnecessarily restrict the parameter estimates.

208

The concern about using a uniform prior for variance parameters is that it may overestimate the variance. We see that the variance (the four rightmost dots in the figures below) does not change when we switch to an inverse-gamma distribution. The concern about using an inverse-gamma distribution is that the posterior variance parameters may be sensitive to the choice of 휖; the perfect correspondence of the red and blue lines confirms that this is not a concern in our model.

5.6.6.1 LOGIT Transform vs Raw Data The Logit transform is used to change the range of the PSF data from [0,1] to (-Inf, Inf).

The results below show the impact of the logit transformation on the model results. The overall K.fit does not change if we use the raw data instead of transforming the data; however, the logit transform improves K.predict from 0.5 to 0.6.

Figure 5.21: Comparison of the C.8 model evaluated using logit transformed data and raw data. k.fit [logit] k.fit [raw] k.predict k.predict [logit] [raw] p.o 0.90 0.90 0.78 0.72 p.e 0.26 0.26 0.43 0.43 k 0.87 0.87 0.62 0.51

There are some discrepancies in model coefficients estimated using the transformed versus the raw data. Building the model using the raw data switches the significance of the Bias predictors: Team Atmosphere becomes significant for TC and CTL, and Exam is no longer a significant predictor for PIL Bias. Operator sensitivity coefficients also change; PIL group 2 and TC group 5 are sensitive to the Environmental PSF if we use the

209

raw data, and TC group 4 is no longer sensitive to Environmental TC. These discrepancies may indicate that we are estimating too many parameters in the model; reducing the number of operator sensitivity groups from five to three might result in more consistent results.

Table 5.18: ABC model coefficient posterior mean probability in the logit-transform model (left) and the model built using the raw data (right). Logit Raw TCL PIL TC CTL TCL PIL TC CTL mu.a 0.51 0.15 0.47 0.30 0.30 0.09 0.40 0.48 b1 0.00 0.00 0.00 0.02 0.08 0.23 0.02 0.07 b2 0.09 0.00 0.30 0.29 0.00 0.00 0.00 0.00 c1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 c2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 a1 0.18 0.08 0.03 0.04 0.05 0.07 0.18 0.09 a2 0.16 0.25 0.50 0.24 0.05 0.01 0.40 0.46 a3 0.42 0.09 0.38 0.34 0.47 0.11 0.42 0.28 a4 0.15 0.00 0.12 0.44 0.01 0.05 0.51 0.37 a5 0.07 0.01 0.30 0.26 0.12 0.00 0.03 0.21

5.6.7 Cross Validation K-fold cross validation is used to evaluate the stability of the model and test how well the model represents the data. K-fold validation is similar to the bootstrap approach used to evaluate the Bias model; however, instead of evaluating the model separately for each data point (which is quite time consuming), we evaluate the model five times.

In K-fold validation, the data set is divided into K subsets. The model is then evaluated K times, each time with the Kth subset removed (Hastie, Tibshirani, & Friedman, 2009).

Figure 5.22, Figure 5.23 and Figure 5.24 plot the parameter estimates for each fold. The horizontal lines represent the mean (black) and 95% CI (red) for the original model

210

estimated using the Build dataset. The dots represent the parameter mean and 95% CI for each fold in the model. In the figures below, parameters are labeled as [p,index], i.e. beta[1,2] corresponds to 푏12.

211

Figure 5.22: K-fold comparison for model coefficients.

212

Figure 5.23: K-fold comparison for operator sensitivity factors.

213

Figure 5.24: K-fold comparison for variance parameters. Overall, the K-fold exercise supports the validity of the model. Most of the parameters are consistent between folds. One exception is the set of stress coefficients, 훽푆푝. The instability in the stress coefficients confirms the need to improve the model for Stress.

The Context coefficients for Confusion in PIL and CTL are just outside the final model

CI, indicating that we should look for ways to add information about scenario context to the model.

5.7 CONCLUSIONS The ABC model is an example of how simulator data can be used to enhance HRA.

Specifically, the ABC model illustrates a method for modeling subjective, operator- dependent perceived PSF values using primarily objective data. Identifying significant

PSFs and evaluating PSF levels is a challenge for HRA.

The operator sensitivity factor in the model illustrates how to capture expected perceived

PSFs for a range of operators, without necessarily having to specify details about the operators. The operator sensitivity factor also provides a method to capture the relative influence of the plant condition versus other factors that impact the operator’s perception.

The ABC model shows that some student operators are relatively insensitive to plant conditions; this may or may not be representative of experienced operators. On the one

214

hand, professionals should be more attuned to the implications of plant conditions, which might result in Reported PSFs that have a stronger tie to the Environmental PSFs. On the other hand, experienced operators practice accident scenarios all the time; responding to deteriorating plant conditions is part of their training and part of their job, and the

Environmental PSFs incorporated in the ABC model may not impact their Reported

PSFs. Future studies following the approach outlined in this chapter can shed light on this question.

The Bias component of the model is included to assess the impact of artificial effects introduced by the simulator environment. The Latent Bias Model in Chapter 4 identified two factors that contribute to Bias: Exam and Team Atmosphere. As expected, transitioning from the Practice phase of the experiment to the Exam phase consistently reduced simulator Bias. Improved Team Atmosphere also decreased Bias for PIL, and to a lesser degree, for TCL. Future work to identify factors that impact simulator bias and differentiate them from other context effects will improve our understanding of how Bias impacts Reported PSFs.

Finally, Context in the ABC model is the sum of two dynamic effects: operator confusion and a time series effect. Operator confusion is a subjective PSF that requires further research to be modeled objectively. This factor is important to the model because all of the PSFs—not just TC—are influenced by the operator’s mental state. This is a key difference between the Environmental PSFs and the Reported PSFs: Environmental PSFs can be estimated strictly from plant conditions, but perceived PSFs will always be impacted by the operator’s sense of the situation. Factors that might be useful for

215

modeling operator confusion, at least for student operators, include: number of current diagnoses the operator is considering, operator confidence in the diagnosis (although this is also subjective and will require development of an objective measure), and operator hesitation or uncertainty in following the procedures (measured by time spent making procedure-based decisions or the number of times the operator returns to the same questions in the procedure).

The Random Effects incorporated into the model allow extreme cases to be incorporated into a PRA. Analysts can take results from a simulator study and model scenarios with minimum bias, or with operators who range from minimum to maximum sensitivity to

Environmental PSFs, or for context effects that are maximized or minimized.

The ABC model treats Stress as a linear function of TCL, PIL, TC and CTL. K.fit for

Stress is less than K.fit for the other PSFs (0.6 vs 0.9), indicating that the linear model may be improved. One approach would be to add a separate Bias effect for Stress; an alternative would be to model Stress as a linear function of the Environmental PSFs plus

Confusion and Bias.

Another area of open research is how well the Reported PSFs capture the PSFs perceived by the operators. In some cases, low operator sensitivity to the Environmental PSFs may reflect a discrepancy between Reported PSFs and Perceived PSFs. It may be possible to improve the consistency of Reported PSFs to ensure they are a good measure of

Perceived PSFs. This could be accomplished by additional training and practicing PSF evaluation to ensure PSF reports are internally consistent. The frequency of PSF evaluations in this experiment may be too high; evaluating PSFs less frequently may

216

provide more meaningful results. Finally, it is possible that some student operators report artificial PSFs: they may report what they believe they should be experiencing (following a Policy Response Bias), rather than PSFs that capture their true mental state.

The next step in this research is to link Reported PSFs to PSF effects. Successful PSF evaluation has two components: determining the PSF level and assessing the PSF effect.

The ABC model tackles the first part of the challenge by providing an objective method for assessing PSF level. This approach reduces the discrepancy between PSF assessments that was observed in the International Empirical HRA Study, for example (Forester, et al.), and reduces the need for expert judgment on the part of the analyst. The next step is to build a model to understand how the PSF level relates to operator actions and PRA outcomes. We could build on the ABC model by incorporating success and failure data from the student operator scenarios. Another hierarchical level could be added to the

ABC model to predict operator success or failure in a sequence of tasks throughout a scenario. The scenarios could be broken into a series of tasks, following the approach developed for SACADA (Chang, et al., 2014), with success and failure criteria defined for each task.

217

6 Conclusions

This research was inspired by the concept of Science-Based HRA, the idea that HRA can obtain a rigorous scientific basis through controlled experiments and quantitative data.

NPP simulators are the ideal platform for this research: affordable, easy to control, and— since the advent of digital simulators—accessible. The research presented in this dissertation outlines an approach to Science-Based HRA in simulator environments. First, student operators are proposed as a useful test platform for designing experiments and data collection instruments. Second, simulator bias is identified as an unavoidable aspect of simulator research, and preliminary methods are proposed for quantifying bias effects.

Finally, a subjective PSF model is proposed that links Environmental PSFs to subjective

Reported PSFs. This model is for dynamic PRA, with PSFs evolving throughout a scenario. Although the models themselves are not expected to apply to experienced operators, the methods used to develop these models are provided as an example of how similar models might be developed with simulator data collected from professional NPP operators.

Amid the many results of this research there are several findings and concepts in particular that are useful to the HRA community:

A framework for discussing simulator bias. The effects of an artificial environment are widely recognized, but prior to this research we did not have a vocabulary for discussing

218

these effects of differentiating between aspects of the artificial environment. This research identifies two broad types of simulator bias, Environmental and Motivational

Bias, and names several biases that are likely to impact simulator data. The list is compiled from literature from other fields and is not expected to be comprehensive.

Instead, the biases are offered as a starting point, a framework that will expand and grow as simulator-based research continues.

A method for estimating Simulator Bias. The Latent Bias Model and the Bias Path

Analysis Model introduced in Chapter 4 are not expected to reflect experienced operator performance, because they are built from student operator data. Instead, the true result of the bias analysis is the method introduced to identify the factors that impact the bias effects and to estimate the strength of these effects. We envision future experiments with more direct bias manipulations: introducing a strong false prominent hypothesis and testing results, planting an “insider” crew member to increase cavalier behavior in the crew, or collecting data during NRC exams and in more relaxed circumstances. A systematic look at the effects of these manipulations would provide a stronger technical basis for the use of simulator data in HRA research. In these models, the outcome variables included in the Bias models (time to trip the reactor and time to diagnosis the accident) should be supplemented by other success measures—perhaps similar measure to those proposed for an objective Confusion measure for the PSF model.

The Operator Sensitivity Factor. In most HRA methods, analysts specify the PSF level as High/Medium/Low (for example) based on external factors, but this does not account for variability in operator response. The operator sensitivity factor introduced in the ABC

219

model is an example of how to incorporate an Environmental PSF and a range of possible operator responses to that PSF within a model. Using Bayesian analysis, this can be done without specifying operator characteristics. Instead, the expected operator sensitivity and the variance in operator sensitivity to Environmental PSFs can be estimated by observing a random assortment of operators.

The operator sensitivity factor in the ABC model is a preliminary example of how to incorporate these effects. There are many aspects of operator behavior, and one factor is probably not sufficient to capture them all. In IDAC, operator variability is incorporated through three distinct decision making styles. Future research may identify similar characteristic styles in PSFs: varying modes of operator response to Environmental PSFs that merit separate treatment in the model.

Two new numerical methods. Structural Equation Modeling and continuous Bayesian networks are not inventions of this research, but they are new to the HRA research community. SEM is an extension of factor analysis, which has been used in HRA to generate PSF network structures (Groth & Mosleh, 2012). Similarly, Bayesian HRA is now widely used, but these models are discrete Bayesian networks that do not take advantage of all the benefits of Bayesian analysis. The results of SEM and Bayesian networks are usually quite similar; the difference is in interpretation. SEM estimates point values and standard errors; Bayesian analysis estimates distributions. Both methods are appropriate and useful for HRA. Bayesian analysis is already familiar to HRA researchers, and the probability distributions generated by the models are representative of the uncertainty associated with the effects we are modeling. We hope this introduction

220

to SEM and to Bayesian hierarchical linear models will inspire other researchers to adopt these tools in HRA applications, because they are so useful for capturing the variability and uncertainty inherent in human reliability.

Continuous PSF distributions. One of the benefits of Bayesian continuous hierarchical models is that these models generate probability distributions rather than probabilities for discrete states. This is more representative of PSFs, of operator characteristics, etc. than partitioning PSFs into high/low or present/absent categories.

Student operators as a test platform. Using student operators in the simulator sessions allowed us to collect a large dataset without the expense and logistic complications of working with experienced operators. We tested concepts, developed multiple iterations of the questionnaires and PSF evaluation worksheet, and generated example models of Bias and subjective PSFs. We did all of this without taking up valuable simulator time in a plant simulator facility or having to organize many groups of operators to participate in the experiment. This approach allows pilot experiments to be conducted to refine a research approach before investing in large scale simulator experiments with professional

NPP operators.

221

References

ANSI/ANS. (2009). Nuclear Power Plant Simulators for Use in Operator Training and

Examination. American Nuclear Society.

Benish, R., Smidts, C., & Boring, R. (2013). A pilot experiment for science-based Human

Reliability Analysis (HRA) validation. Probabilistic Safety Assessment 2013.

Columbia, SC.

Boring, R. L., Shirley, R. B., Joe, J. C., Mandelli, D., & Smitch, C. L. (2014). Simulation

and Non-Simulation Based Human Reliability Analysis Approaches. Idaho Falls,

ID: Idaho National Laboratory.

Bradburn, N. M., Sudman, S., & Wansink, B. (2004). Asking questions: the definitive

guide to questionnaire design--for marker research, political polls, and social and

health questionnaires. San Francisco, CA: Jossey-Bass.

Bye, A., Dang, V. N., Forester, J., Hildebrandt, M., Marble, J., Liao, H., & Lois, E.

(2012). Overview and First Results of the US Empirical HRA Study. Proceedings

of the 11th International Probabilistic Safety Assessment and Management

Conference (PSAM). Helsinki, Finland.

222

Chang, Y. H., & Mosleh, A. (1999). Cognitive Modeling and Dynamic Probabilistic

Simulation of Operating Crew Response to Complex System Accidents (ADS-

IDA Crew). College Park, MD: Center for Technology Risk Studies.

Chang, Y. H., & Mosleh, A. (2007). Cognitive modeling and dynamic probabilistic

simulation of operating crew respones to complex system accidents, Parts 1-5.

Reliability Engineering and System Safety, 92, 997-1101.

Chang, Y. J., Bley, D., Criscione, L., Kirwan, B., Mosleh, A., Madary, T., . . . Zoulis, A.

(2014). The SCADA Database for Human Reliability and Human Performance.

Reliability Engineering & System Safety, 125(Special Issue), 117-133.

Chang, Y., & Mosleh, A. (2007). Cognitive Modeling and Dynamic Probabilistic

Simulation of Operator Crew Response to Complex System Accidents, Parts 1-5.

Reliability Engineering & System Safety, 92(8), 997-1013.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. HIllsdale, NJ:

Erlbaum.

Cover, T. M., & Thomas, J. A. (1991). Differential Entropy Bound on Discrete Entropy.

In Elements of Information Theory (pp. 234-237). USA: John Wiley & Sons, Inc.

Coyne, K. (2009). A predictive model of nuclear power plant crew decision-making and

performance in a dynamic simulation environment. College Park, MD: University

of Maryland PhD Dissertation.

223

Coyne, K. (2009). Model Validation. In A Predictive Model of Nuclear Power Plant

Crew Decision-Making and Performance in a Dynamic Simulation Environment

(pp. 283-370). College Park, MD: University of Maryland PhD Dissertation.

CSNI. (2008). HRA data and recommended actions to support the collection and exhange

of HRA data. Committee on the safety of nuclear installations. OECD Nuclear

Energy Agency.

CSNI. (2009). Simulator Studies for HRA Purposes. Budapest, Hungary: OECD NEA

CSNI.

Dang, V. N., Bye, A., Lois, E., Massaiu, S., Broberg, H., Braarud, P., . . . Nelson, P.

(2014). The International HRA Empirical Study Final Report: Lessons Learned

from Comparing HRA Methods Predictions to HAMMLAB Simulator Data.

United States Nuclear Regulatory Commission Office of Regulatory Research.

Dougherty, Jr, E. M. (1990). Human Reliability Analysis - Where shouldst thou turn?

Reliability Engineering and System Safety, 29, 283-299.

EPRI. (2013, 12 20). HRA Calculator. (Electric Power Research Institute (EPRI))

Retrieved 4 13, 2017, from https://www.epri.com/#/pages/product/3002002232/

Forester, J. A., Dang, V. N., Bye, A., Boring, R. L., Liao, H., & Lois, E. (n.d.).

Conclusions on Human Reliability Analysis (HRA) Methods from the

International HRA Empirical Study. US Nuclear Regulatory Commission.

Retrieved 4 13, 2017, from

https://www.nrc.gov/docs/ML1205/ML120550130.pdf

224

Fritz, M. S., & MacKinnon, D. P. (2007). Required Sample Size to Detect the Mediated

Effect. Psychol Sci., 18(3), 233-239.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical model.

Bayesian Analysis, 1(3), 515-533.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B.

(2014). Bayesian Inference. In Bayesian Data Analysis (3rd ed., pp. 6-8). Boco

Ratan, FL: CRC Press.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtark, A., & Rubin, D. B.

(2014). Bayesian Data Analysis (3rd ed.). Boca Raton, FL: CRC Press.

Gertman, D. I., & Blackman, H. S. (1994). Human Reliability and Safety Analysis Data

Handbook. New York, NY: John Wiley & Sons, Inc.

Gertman, D., Blackman, H., Marble, J., Byers, J., & Smith, C. (2005). The SPAR-H

Human Reliability Analysis Method. Idaho Falls, ID: Idaho National Laboratory.

Retrieved 5 25, 2016, from http://www.nrc.gov/reading-rm/doc-

collections/nuregs/contract/cr6883/cr6883.pdf

Groth, K. M. (2009). A data-informed model of performance shaping factors for use in

human reliability analysis. College Park, Maryland: PhD Thesis, University of

Maryland.

Groth, K. M., & Mosleh, A. (2012). A data-informed PSF hierarchy for model-based

Human Reliability Analysis. Reliability Engineering and System Safety, 108,

154-174.

225

Groth, K. M., & Mosleh, A. (2012). Deriving causal Bayesian networks from human

reliability analysis data: A methodology and example model. Proc IMechE Part

O: Journal of Risk and Reliability, 226(4), 361-379.

Groth, K. M., & Swiler, L. P. (2012). Bridging the gap between HRA research and HRA

practice: a Bayesian network version of SPAR-H. Reliability Engineering and

System Safety, 115, 33-42.

Groth, K. M., Smith, C. L., & Swiler, L. P. (2014). A Bayesian method for using

simulator data to enhance human error probabilities assigned by existing HRA

methods. Reliability Engineering and System Safety, 128, 32-40.

GSES. (2015). Generic Pressurized Water Reactor. GSE Systems. Retrieved 4 23, 2017,

from http://www.gses.com/training-applications/#NUCLEAR-GPWR

Gupta, A. (2013). Development of Boiling Water Reactor Nuclear Power Plant Simulator

for Human Reliability Analysis Education and Research. Columbus, OH: Ohio

State University Master's Thesis.

Hakon, J. (n.d.). Overview displays for generic PWR simulator. (Institute for Energy

Technology (IFE)) Retrieved 3 16, 2017, from

https://www.ife.no/en/ife/departments/software-engineering/products/overview-

displays-for-generic-pwr-simulator

Hallbert, B., Morgan, T., Hugo, J., Oxstrand, J., & Persensky, J. J. (2014). A formalized

approach for the collection of HRA data from nuclear power plant simulators.

226

Idaho Falls, ID: US Nuclear Regulatory Commission Office of Regulatory

Research.

Hallbert, B., Whaley, A., Boring, R., McCabe, P., & Chang, Y. (2006). Human

Repository and Analysis (HERA): The HERA Coding Manual and Quality

Assurance. Washington, DC: US NRC.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). Model Assessment and Selection. In

The Elements of Statistical Learning: Data Mining, Inference, and Prediction (pp.

219-260). New York, NY: Springer .

Hirschberg, S. (2004). Human Reliability Analysis in Probabilistic Safety Assessment for

Nuclear Power Plants. OECD NEA CSNI.

Hollnagel, E. (1998). Cognitive Reliability and Error Analysis Method (CREAM).

Elsevier Science.

Hooper, D., Coughlan, J., & Mullen, M. (2008). Structural Equation Modeling:

Guidelines for Determining Model Fit. Electronic Journal of Business Research

Methods, 6(1), 53-60.

HRA Calculator. (2013, 12 20). (Electric Power Research Institute (EPRI)) Retrieved 4

13, 2017, from https://www.epri.com/#/pages/product/3002002232/

(2008). HRA data and recommended actions to support the collection and exhange of

HRA data. Committee on the safety of nuclear installations. OECD Nuclear

Energy Agency.

227

Hsueh, K.-S., & Mosleh, A. (1996). The development and application of the accident

dynamic simulator for dynamic probabilistic risk assessment of nuclear power

plants. Reliability Engineering and System Safety, 297-314.

Kenny, D. A. (2016, 5 22). Mediation. Retrieved 6 17, 2016, from

http://www.davidakenny.net/cm/mediate.htm

Kim, M. C., Seong, P. H., & Hollnagel, E. (2006). A probabilistic approach for

determining the control mode in CREAM. Reliability Engineering and System

Safety, 91, 191-199.

Krishnamoorthy, K. (2015). Noncentral Chi-square Distribution. In Handobok of

Statistical Distributions with Applications (p. 259). Chapman and Hall/CRC.

Kutner, M., Nachtsheim, C. J., & Neter, J. (2008). In Remedial Measures for Evaluating

Precision in Nonstandard Situations--Bootstrapping (pp. 458-464). Singapore:

McGraw-Hill Education.

LaChance, J., Cardoni, J., Mosleh, A., Aird, D., Helton, D., & Coyne, K. (2012). Discrete

Dynamic Probabilistic Risk Assessment Model Development and Application.

Albuquerque, NM: Sandia National Laboratories.

Li, Y. (2013). Modeling and simulation of operator knowledge-based behavior. College

Park, MD: University of Maryland Dissertation.

Li, Y. (2013). Modeling and Simulation of Operator Knowledge-Based Behavior.

College Park, MD: University of Maryland PhD Dissertation.

228

MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power Analysis and

Determination of Sample Size for Covariance Structure Modeling. Psychological

Methods, v(2), 130-149.

MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation Analysis. Annu.

Rev.Psuchol., 58, 593-614.

McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochemia Medica,

22(3), 276-282.

Mkrthyan, L., Podofillini, L., & Dang, V. N. (2015). Bayesian belief networks for human

reliability analysis: a review of applications and gaps. Reliability Engineering and

System Safety, 139, 1-16.

Musharraf, M., Bradbury-Squires, D., Khan, F., & Veitch, B. (2014). A virtual

experimental technique for data collection for a Bayesian network approach to

human reliaibility analysis. Reliability Engineering and System Safety, 132, 1-8.

Noldus. (2017). Observer XT. Retrieved 4 23, 2017, from

http://www.noldus.com/human-behavior-research/products/the-observer-xt

NRC, (2011). Regulatory Guide 1.149 Nuclear Power Plant Simulation Facilities for Use

in Operator Training, License Examinations and Applicant Experience

Requirements. Washington DC: US Nuclear Regulatory Comission.

NS Tutorial: Developing Safety. (n.d.). (IAEA Regulatory Control of Nuclear Power

Plants) Retrieved April 9, 2016, from

https://www.iaea.org/ns/tutorials/regcontrol/chapters/develop.pdf

229

Plummer, M., Stukalov, A., & Denwood, M. (2016, 2 20). rjags: Bayesian Graphical

Models using MCMC. Retrieved 4 18, 2017, from https://cran.r-

project.org/package=rjags

Prussia, G. E., Brown, K. A., & Willis, P. G. (2003). Mental models of safety: do

managers and employees see eye to eye? Journal of Safety Research, 34, 143-156.

Raykov, T., & Marcoulides, G. A. (2000). A First Course in Structural Equation

Modeling. Mahway, NJ: Lawrence Erlbaum Associates, Inc.

Rigdon, E. (1996, 4 11). The Form of Structural Equation Models. Retrieved 6 17, 2016,

from http://www2.gsu.edu/~mkteer/sem2.html

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of

Statistical Software, 48(2), 1-36.

Rubin, M. (2016, 12 28). The Perceived Awareness of the Research Hypothesis Scale:

Assessing the Influence of Demand Characteristics. Retrieved 5 5, 2017, from

https://doi.org/10.6084/m9.figshare.4315778.v2

Shirley, R. B., Smidts, C., Boring, R., Mosleh, A., & Li, Y. (2015). Science-Based HRA:

Experimental Comparison of Operator Performance to IDAC Simulations. 9th

International Conference on Nuclear Plant Instrumentation, Control & Human-

Machine Interface Technologies. Charlotte, NC.

Shirley, R. B., Smidts, C., Wang, N., & Gupta, A. (2014). Elimination of bias in NPP

control room simulator experiments. 2014 American Nuclear Society Winter

Meeting. Anaheim, CA.

230

Skrondal, A., & Rabe-Hesketh, S. (2005). Structural Equation Modeling: Categorical

Models. Retrieved 6 10, 2016, from Encyclopedia of Statistics in Behavioral

Science: http://www.gllamm.org/SEMcat.pdf

Smidts, C., Shen, S. H., & Mosleh, A. (1997). The IDAC cognitive model for the analysis

of nuclear power plant operator response under accident condition: Part I:

problem solving and decision making model. Reliability Engineering and System

Safety, 55, 51-71.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian

measures of model complexity and fit. Journal of the Royal Statistical Society,

64(4), 583-639.

Spurgin, A. J. (1990). Operator Reliability Experiments Using Power Plant Simulators.

Palo Alto, CA: Electric Power Research Institute.

Spurgin, A., & Petkov, G. (2005). Advanced Simulator Data Mining for Operators'

Performance Assessment. Studies in Computational Intelligence, 5, 487-514.

Sundaramurthi, R., & Smidts, C. (2013). Human reliability modeling for the Next

Generation System Code. Annals of Nuclear Energy, 52, 137-156.

Wang, N., Benish, R., & Smidts, C. (2014). A preliminary analysis on Performance

Shaping Factors in a Science-Based HRA validation method. ANS Student

Conference. State College, PA.

Westland, J. C. (2010). Lower bounds on sample size in structural equation modeling.

Electronic Commerce Research and Applications, 9, 476-487.

231

Weston, L., Whitehead, D., & Graves, N. (1987). Recovery Actions in PRA for the Risk

Methods Integration and Evaluation Program (RMIEP), Vol 1 & 2. Washington,

DC: U.S. Nuclear Regulatory Commission.

Yang, Z., Bonsall, S., Wall, A., Wang, J., & Usman, M. (2013). A modified CREAM to

human reliability quantification in marine engineering. Ocean Engineering, 58,

293-303.

232

Appendix A: PSF Definitions

FULL PSF BN – PSF DEFINITIONS PSF Percent Definition PSF data (Major PSFs listed “Present” collection in in bold) in Events OSU Experiments

I. IDAC 1.0 PSFs (definitions adapted from [22]) Time Constraint 19% Present in scenarios where the operator is short on time and must Dynamic – Load perform under the resulting pressure and restrictions. Heavily video review influences the stress levels of the crew members. Stress 45% Pressure induced on the operator by the major PSFs Dynamic – video review Task Related load 24% Present in cases where the operator is overwhelmed with a lot of Dynamic – (also called work. Time constraint may or may not be present in the scenario. video review Cognitive Task Some conditions that result in increased task-related load are an Load) increase in difficulty of the task to be performed, realization of commission of errors, and an increase in effort requirement from the operator. Global Condition 44% A measure of the system criticality as perceived by the operator. Dynamic – Assessment Operator’s perceived seriousness of the plant situation will be video review assessed based on his assessment of the resources available that could be used to control the plant. The global condition assessment will vary amongst operators as it depends on their skill levels. Attention 3% The awareness and concentration levels of the operator. Dynamic – video review Team training 75% The level of training that the crew possesses. It is assessed by the working Static; operator efficiency of the crew. A quantitative measure is the number of hours the background crew has worked together. survey Group 42% This PSF indicates the level of understanding, communication, and Static; operator Cohesiveness teamwork in the crew. It reflects how well the crew members get along background with each other. survey MMI 10% The ergonomics and design of the control room. Involves the design Static; operator criteria such as the location of alarms (such that they are easily visible), background the workstation location of crew members, ease of operation of controls, survey design of visual display monitors, and so on. Control Room 3% Any environmental distractions that the operator may face, including Static; post- Distractions excessive noise, uncomfortable temperatures, radiation etc. scenario survey Procedure 43% The level of clarity of instructions and procedures from manual and other Static; post- Quality/importance written sources that the operator follows. Ambiguity in the directions scenario leads to poor procedure quality. survey Favorable operation 17% The comfort level that the operator experiences arising out of his Static; post- schedule scheduled work hours. For instance, presence of operators working scenario overtime will be recognized as an unfavorable operating schedule. survey

Continued

233

Full PSF BN – PSF Definitions Continued

Operator Group 3% A measure of the morale of the operator depending upon his position in Static; operator Status the crew and organization. The operator group status can influence the background person’s ability to take initiative or get his opinion accepted in the crew. survey Safety Culture 72% An organizational characteristic. Safety culture is a reflection of the Not evaluated attitude that the organization has towards implementation of safety – consistent for measures in the nuclear power plant. It involves how strictly these all scenarios measures are adopted and how the management deals with any violation. At times, the safety culture can be an inconvenience. For example, the operator may delay executing the right action as he has to go through a safety routine such as a checklist and this may lead to loss of precious time in emergency situations. Failed Diagnosis 17% The number of unsuccessful diagnoses that the operator has made. It is a Dynamic; measure of resource loss, since a failed diagnosis indicate the loss of researcher/ precious time in emergency situations. observer assessment Failed 12% The number of unsuccessful strategies that the operator has used. It is also Dynamic; strategies/actions a measure of resource loss (time). Both the number of failed diagnosis and researcher/ failed strategies highly influence the operator’s cognitive state. observer assessment Operator 64% The operator’s experience/ability at the job as evaluated by his crew Static; course Ascendancy members. Thus an operator with several years of experience is trusted by grades his crew members to be capable of doing his job efficiently. Such operators are said to have high ascendancy.

II. IDAC 3.0 PSFs (definitions adapted from [17]) Cognitive Task [Not The cognitive load associated with a particular procedure step. Cognitive Dynamic; Load evaluated] task load is evaluated for six types of tasks: Attention to an indicator, researcher interpretation of an indicator, generation of a situational statement, assessment matching a statement in memory with an investigation item, retrieving a (2014), video knowledge link, and determining an explanation review (2015) Diagnosis [Not The complexity associated with making a diagnosis of an accident or Dynamic; Complexity evaluated] event. Measured as a function of operator expertise, system dynamics, and calculated number of confusing indications from the plant. (2014), video review (2015)

SIMPLIFIED PSF NETWORK DEFINITIONS Element Definition (adapted from [17])

Surrogates - Values of critical parameters, etc. used to estimate the strength of performance shaping factors. Alarms Alarms activated by plant conditions Plant Dynamics The number of plant parameters currently changing and their rate of change. Operator Confusion An operator's confusion about the current diagnosis, based on what the operator believes to be the plant state and any indicators that are contrary to that plant state Operator Activities The complexity/difficulty of the operator's current task. Activities include: [1] Attend to one control panel indicator, [2] Interpret one indicator reading, [3] Generate a new situational statement, [4] Match a statement in memory with an investigation item, [5] Retrieve a knowledge link, [6] Determine an explanation

Operator Characteristics Expertise The operator's experience and expertise in the plant Decision-Making How the operator typically makes decisions. Does the operator tend to quickly jump to conclusions, spend Style a long time confirming a diagnosis, or jump from one diagnosis to another without fully investigating each possible explanation.

Continued 234

Simplified PSF Network Definitions – Continued

Prioritization How the operator prioritizes new information, information already in working memory, and seeking out specific information to contribute to the decision-making process Investigation A plant phenomenon might have two or more possible causes. Once the operator finds a valid cause, what Termination Criteria criteria determine whether he/she continues to pursue other possible explanations? Integrating New An operator's ability to incorporate new information into the "mental picture" of the plant situation Evidence Accident Awareness Operator’s vigilance level for potential accidents. Threshold Diagnosis The point at which an operator declares an accident condition. High thresholds require stronger evidence Confidence to support the diagnosis thus it might take longer time to declare an accident. This is a measure of the Threshold operator’s prudence in declaring an accident. Activeness After making a potential diagnosis, the operator looks for additional signals to confirm the diagnosis. Gathering Evidence

Performance Shaping Factors Time Constraint The pressure induced by the perception of the available time to complete a task; also referred to as time Load pressure and time stress. Passive Information Stimuli that catch the operator's attention automatically (e.g. the alarms in the control room). Because Load passive information is intrusive and grabs one’s attention, it interrupts the ongoing cognitive process. Too much passive information could be overwhelming. In addition to causing mental stress, it shifts one’s attention and impedes the ability to refocus. Cognitive Task Load The cognitive demand associated with the operator's current task Complexity of The difficulty or complexity of diagnosing the current plant situation. Complexity will increase as Diagnosis confusion increases or as plant dynamics increase; complexity decreases with operator expertise. Stress The stress on the operator. Fatigue This includes mental fatigue, physical sleepiness, and fatigue due to a lack of motivation/activity.

Manifestations in the IDAC Simulation Cognitive Resource The mental effort an operator uses to solve/address a task or problem. Cognitive Resource Use is expected Use to increase as the task complexity increases and as the accident severity increases. Information When the motivation of working fast is high, more cognitive resource is used, and hence the processing Processing Speed speed is increased. But the processing speed has a cap, because the cognitive resource is limited; this cap is adjusted by the fatigue level. Alarm Stack Length The number of alarms held in an operator's working memory. Routine Monitoring Operators routinely monitor a set of key parameters. This is the time interval between checking parameter Interval Multiplier values. Attention Span An operator's ability to focus/pay attention to one train of thought Memory Span An operator's ability to hold multiple items in working memory. Decay Time of Likelihood/rate of forgetting items (such as notifications) that have not been integrated into the operator's Uninvestigated Items current decision-making/thought processes

235

Appendix B: Questionnaires

236

237

238

239

240

241

242

243

244

245

246

247

248

249

Appendix C: Scenario Narratives

250

251

252

253

254

Appendix D: OSU Environmental CTL Coding

255

Step Task 1:One CP indicator 2:Interpret Indicator Reading 3:Generate Situational Statement 4:Match Statement In InvestigationMem, Item 5: Retrieve Knowledge Link 6: Determine Explanation Procedure E-0 0.1 RCP Trip Criteria:SI flow > 2 2 1 200gpm | RCS Press < 1400 psig

E-0 0.2 Alternate Miniflow:RCS Press < 1 1 1 1800 | > 2200

E-0 0.3 RHR restart:RCS Press < 230 1

E-0 0.4 SG AFW Iso:SG Lvl 3 3 1 uncontrolled rise, radiation; lvl > 25%

E-0 0.5 AFW switchover supply:CST < 1 1 1 10%

E-0 1 verify reactor trip 1 1 1

E-0 2 turbine trip 1 1 1

E-0 3 AC buses ENERGIZED 1 1 1

E-0 4 SI Actuated 1 1 1

E-0 5 skip 1

E-0 6 CSIPs running 1 1 1

E-0 7 RHR pumps running 1 1 1

E-0 8 SI > 200 gpm 1 1 1

E-0 9 RCS press < 230 1 1 1

E-0 10 RHR HX > 1000:SKIP 1

E-0 11 :SKIP 1

E-0 12 MS ISOL 1 1 1

E-0 13 msivs/bypass shut 1 1 1

E-0 14 SG press 100 psig < other SG 3 1 1 press

E-0 15 mdafw/tdafw isol SHUT 2 2 1

E-0 16 cnmt press < 10 1 1 1

E-0 17 afw > 210 kpph 3 3 1

E-0 18 sequencer load block 9 actuated 1 1 1 1

E-0 19 energize AC buses 1a1 1b1 1 1 1 1 Continued

256

Procedure Step Task 1:One CP indicator 2:Interpret Indicator Reading 3:Generate Situational Statement 4:Match Statement In InvestigationMem, Item 5: Retrieve Knowledge Link 6: Determine Explanation CTL Coding Continued

E-0 20 :SKIP 1

E-0 21 maintain temp btw 555, 559 F 5 5 2 1

E-0 22 prz porvs shut 1 1 1

E-0 23 prz spray shut 1 1 1

E-0 24 prz porv block valves open 1 1 1 1

E-0 25 sg press dropping uncontrolled 3 1 1 1

E-0 26 go to E-2 faulted SG iso 1

E-0 27 sg abnormal rad or uncontrolled 4 1 1 1 lvl rise

E-0 28 check f to ruptured sg isolated 3 1 1

E-0 29 go to E-3 SGTR

E-0 30 cnmt press normal 1 1 1 1 1

E-0 31 cnmt wide range sump normal 1 1 1 1

E-0 32 cnmt rad normal 1 1 1 1 1

E-1 1 monitor CSFSTs 1

E-1 2 Maintain RCP seal btw 8,13 gpm 3 1 1 1

E-1 3 check intact SG levels 3 1 1

E-1 0.1 RCP Trip Criteria:SI flow > 2 2 1 200gpm | RCS Press < 1400 psig

E-1 0.2 AFW switchover supply:CST < 1 1 1 10%

E-1 0.3 RHR restart:RCS Press < 230 1

E-1 0.4 Alternate Miniflow:RCS Press < 1 1 1 1800 | > 2200

E-1 0.5 Secondary Integrity: SG pressure 3 1 1 1 drops

E-1 0.6 E-3 Transition: SG level rise 3 1 1 1

E-1 0.7 Cold Leg Recirc 1 1 1 1

Continued

257

Procedure Step Task 1:One CP indicator 2:Interpret Indicator Reading 3:Generate Situational Statement 4:Match Statement In InvestigationMem, Item 5: Retrieve Knowledge Link 6: Determine Explanation CTL Coding Continued

E-1 4 PZR PORV & Block Valves 4 2 1 1

E-1 5 SI termination criteria: RCS 4 3 1 subcooling, secondary heat sink (SG lvl, total FW), RCS Press, PRZ lvl

E-1 6 cnmt spray status 1 1 1

E-1 7 source range detectors 2 2 1 1

E-1 8 RHR pump status 4 1 1 1

E-1 9 RCS, SG pressures stable or 4 2 1 trending good

E-1 10 CCW flow to RHR HX 4 4 4

E-3 0.1 Alternate Miniflow:RCS Press < 1 1 1 1800 | > 2200

E-3 0.2 RHR restart:RCS Press < 230 1

E-3 0.3 SI reinitiation: RCS subcooling, 2 2 1 PZR lvl

E-3 0.4 Cold Leg Recirc 1 1 1 1

E-3 0.5 Secondary Integrity: SG pressure 3 1 1 1 drops

E-3 0.6 Multiple tube rupture: SG lvl rise 2 2 1 1

E-3 0.7 AFW switchover supply:CST < 1 1 1 10%

E-3 1 monitor CSFSTs 1

E-3 2 RCP running 1 1 1

E-3 3 RCP Trip Criteria:SI flow > 2 2 1 200gpm | RCS Press < 1400 psig

E-3 4 ruptured SG identified 3 3 1 1

E-3 5 adjust SG PORV to 88%, AUTO 1 1 1 1

E-3 6 ruptured SG PORV shut 1 1 1

Continued

258

Procedure Step Task 1:One CP indicator 2:Interpret Indicator Reading 3:Generate Situational Statement 4:Match Statement In InvestigationMem, Item 5: Retrieve Knowledge Link 6: Determine Explanation CTL Coding Continued

E-3 7 MDAFW flow available to intact 2 2 1 SGs

E-3 8 shut ruptured SG steam supply 2 1 1 1 valve

E-3 9 verify blowdown isolation valves 1 1 1 shut

E-3 10 shut ruptured sg MS drain iso 1 1 1 1 before msiv

E-3 11 shut ruptured sg msiv and bypass 2 2 1

E-3 12 ruptured sg faulted 1 1

E-3 13 ruptured sg needed for cooldown 1 1

E-3 14 ruptured sg level > 25% 1 1 1

E-3 15 fw to ruptured sg isolated 2 1 1 1

E-2 1 monitor CSFSTs 1

E-2 2 verify msivs shut 1 1 1

E-2 3 verify msiv bypass shut 1 1 1

E-2 4 any sg pressure stable/rising (not 3 3 1 1 faulted)

E-2 5 any sg pressure 3 1 1 dropping/depressurized

E-2 6 isolated faulted sg 8 8 1

E-2 7 CST level > 10% 1 1 1

E-2 8 any SG abnormal 3 3 1 1 rad/uncontrolled level rise

E-0.1 1 check RCP seal isolation status 4 2 2

APP 10, 5-4 1 confirm RAB/TB stack RAD 2 2 1 MONITOR TROUBLE

OST 1009 1 Attachment 1 1 1 1

AOP 20 1 Check RCS leak exists 1 1

Continued

259

Procedure Step Task 1:One CP indicator 2:Interpret Indicator Reading 3:Generate Situational Statement 4:Match Statement In InvestigationMem, Item 5: Retrieve Knowledge Link 6: Determine Explanation CTL Coding Continued

APP 006, 1-1 1 confirm charging pumps 1 1 1 discharge high/low flow

APP 009, 2-2 1 confirm alarm prz low level 3 3 1 1 deviation

APP 009, 2-2 2 verify automatic actions

APP 009, 2-2 3 corrective actions (many) 3 3 3 3 1

APP 009, 3-3 1 confirm prz low press/heaters on 4 3 1

APP 009, 3-3 2 verify backup heaters energized 1 1 1 APP 009, 3-3 3 corrective actions (many) 3 3 1 1 1 1

APP 009, 5-1 1 confirm pzr press high/low 1 1 1

APP 009, 5-1 2 verify automatic actions APP 009, 5-1 3 corrective actions (many) 4 4 1 1 1 1

APP 009, 8-1 1 confirm PRT high/low 3 3 3 lvl/press/temp

APP 009, 8-1 2 automatic actions

APP 009, 8-1 3 corrective actions (many) 3 3 3 3 1

APP 009, 8-2 1 confirm PRT discharge high 4 4 4 temp

APP 009, 8-2 2 verify automatic actions

APP 009, 8-2 3 corrective actions (many) 3 3 3 1

OP 100, 8.2.2 1 maintain PRT press < RCS Press 2 2 1 1 1

OP 100, 8.2.2 2 establish comm btw MCR, radwaste, PRT manual vent

OP 100, 8.2.2 3 shut PRT regulator isol valve 1 (manual, remote)

Continued

260

Procedure Step Task 1:One CP indicator 2:Interpret Indicator Reading 3:Generate Situational Statement 4:Match Statement In InvestigationMem, Item 5: Retrieve Knowledge Link 6: Determine Explanation CTL Coding Continued

AOP 005 1 check RAD MON not in high 1 1 1 1 alarm

AOP 005 2 Notify Health Physics 1

AOP 005 3 check stack monitor rad not 1 1 1 1 alarmed

AOP 005 4 check process mon not in alarm 1 1 1

AOP 005 5 refer to tech specs 1

AOP 005 6 refer to app attachment… 1 1

AOP 16 1 rhr in operation 1 1 1 1

AOP 16 2 skip

AOP 16 3 refer to PEP-110 1

AOP 16 4 RCS leakage within VCT 3 3 3 2 1 makeup

AOP 16 5 maintain VCT > 5% 1 1 1

AOP 16 6 go to 10

AOP 16 7 aligh csip suction to rwst 4 4 4

AOP 16 8 maintain vct level on scale at 2 2 2 indicator

AOP 16 9 when VCT > 10% 6 6 1

AOP 16 10 check valid CNMT vent 4 4 1 1 monitors clear

AOP 16 11 check RCS leak rad monitor 1 1 1 1 CLEAR

AOP 16 12 check all area rad monitors clear 1 1 1 1

AOP 16 13 check stack rad mon clear 1 1 1 1

AOP 16 14 determine evacuation 1

AOP 16 15 notify chem to stop sampling RCS

AOP 16 16 qualitative RCS flow balance 5 5 1 1 1

No Procedure 0 1 1 1 1 Continued

261

Procedure Step Task 1:One CP indicator 2:Interpret Indicator Reading 3:Generate Situational Statement 4:Match Statement In InvestigationMem, Item 5: Retrieve Knowledge Link 6: Determine Explanation CTL Coding Continued

APP 10, 4-5 1 confirm RAD MON alarm 1 1 1 1

APP 10, 4-5 3 corrective actions (many) 1 1 1 1 1

E-2 9 go to E-3 SGTR 1 1

E-2 10 check SI terminated 2 2 1 1 1

E-2 12 go to E-1, LOCA 1 1

E-3 16 go to 18 1

E-3 17 stop Feed flow shut MDAFW 2 2 1 TDAFW to ruptured SG

E-3 18 check steam supply TDAFW 2 2 1 1 from rup SG shut

E-3 19 ruptured SG press > 260 1 1 1

E-3 20 pzr press < 2000 1 1 1

262

Appendix E: ABC Model Fit 95% CI Plots

Legend:

• Red: Reported PSF • Blue: Environmental PSF • Gray: Posterior PSF Model 95% Credibility Interval

Plot Labels:

• Session (PA, PB, EA, EB ~ Practice A, Practice B, Exam A, Exam B) • Crew Number (1-8) • Role (ODM: Operator-Decision Maker/Senior Reactor Operator; OAT: Operator-Action Taker/Reactor Operator) • PSF

263

264

265

266

267

268

269

270