Doctoral Thesis

On Some Aspects of Bayesian Modeling in Reliability

THESIS submitted for the award of the degree of

DOCTOR OF PHILOSOPHY in STATISTICS

By: Under the Supervision of: Md. Tanwir Akhtar Prof. Athar Ali Khan

Department of Statistics and Operations Research Aligarh Muslim University Aligarh-202002 India 2015

List of Papers

Papers included in the thesis Published Papers

I. Akhtar, M. T. and Khan, A. A. (2014) Bayesian analysis of generalized log-Burr family with . SpringerPlus, 3:185.

II. Akhtar, M. T. and Khan, A. A. (2014) Log-logistic distribution as a reliability model: A Bayesian analysis. American Journal of Mathematics and Statistics, 4(3): 162-170.

Communicated Paper

I. Akhtar, M. T. and Khan, A. A. (2015) Hierarchical Bayesian approach with R and JAGS: Some Reliability Applica- tions. Journal of Data Science.

Paper not included in the thesis Published Papers

I. Khan, Y., Akhtar, M. T., and Khan, A. A. (2015) Bayesian Modeling of Forestry Data with R Using Optimization and Simula- tion Tools. Journal of Applied Analysis and Computation, 5(1): 38-51.

Dedicated to My Family Especially My Father

Dr. (Late) Md. Akhtar Hassan

Preface

There has been a dramatic growth in the development and applications of Bayesian statistics. One reason behind the dramatic growth in Bayesian modeling is the availability and development of computational algorithms to compute the range of integrals that are necessary in a Bayesian posterior analysis. Due to the speed and smoothness of modern computational tools and techniques, it is now possible to use the Bayesian paradigm to fit any statistical model whose likelihood and priors are specified or even very complex model that cannot be fit by alternative frequentist methods.

Since the mid-1980s, the development of widely accessible powerful compu- tational tools and the implementation of (MCMC) methods have led to an explosion of interest in Bayesian statistics and modeling. This was followed by an extensive research for new Bayesian methodologies gener- ating the practical application of complicated models used over a wide range of sciences. During the late 1990s, BUGS emerged in the foreground. BUGS is a free that could fit complicated models in a relatively easy manner, using - dard MCMC methods. Since 1998 and 2003, the WinBUGS and JAGS, respectively, has earned great popularity among researchers of diverse scientific fields.

The present thesis entitled On Some Aspects of Bayesian Modeling in Reliability is a brief collection of modern methods and techniques for modeling of reliability data in Bayesian perspective. Bayesian modeling means that the process of fitting a probability model to a set of data and summarising the results by a probability distribution on the parameters of the models and on unobserved quantities, such as, prediction of new observations. The acceptance and appli- cations of Bayesian methods in virtually all branches of science and engineering vi 0. Preface

have significantly increased over the last few decades. This increase is largely due to advances in simulation based computational tools for implementing Bayesian methods. Such tools are extensively used in this thesis.

This thesis focuses on assessing the reliability of components or components of a system with particular attention to models containing covariates (or explanatory or regressor variables). Such models include failure time linear models, generalised linear models, and hierarchical models. Throughout the thesis, Laplace approx- imation, sampling importance resampling algorithm, and MCMC methods are used for implementing Bayesian analyses. MCMC makes the Bayesian approach to reliability computationally feasible and computationally straightforward. This is an important advantage in complex setting, e.g., hierarchical modeling, where classical approach fails or becomes too difficult for practical implementation.

The objective of the present thesis is to apply Bayesian methods for model- ing of reliability data (discrete as well as continuous), with emphasis on model building and model implementation using the highly acclaimed, freely available LaplacesDemon and JAGS software packages, as run from within R. The bulk of this thesis forms a progression from the trivially simple to the moderately complex and cover linear, generalized linear, and hierarchical models.

For fitting the model for reliability data, one needs a statistical computing environment. This environment should be such that one can write scripts to define Bayesian model, that is, specification of likelihood and prior distributions, use or write functions to summarize a posterior distribution, use function to simulate from the posterior distribution, and finally construct numerical as well as graphi- cal summaries to illustrate posterior inference. An environment that meet that requirements is the R system, which provides a wide range of functions for data manipulation, calculation, and graphics. Moreover, it includes a well-developed, simple programming language that users can extend by adding new functions. vii

LaplacesDemon and JAGS are called from within R. JAGS is called by using its an interface from within R via R2jags package.

Complete R, LaplacesDemon, and JAGS code concerning model building, priors specification, and model fitting for all analyses are provided. The interpretation of LaplacesDemon and JAGS output are also provided. All analyses using JAGS are directly compared with analyses using the same data using LaplacesDemon or standard R functions such as bayesglm or glmer.

This thesis comprises of four chapters with a comprehensive list of bibliography provided at the end.

Chapter 1 is an introductory chapter. This chapter covers basic concepts com- mon to all Bayesian analyses, including Bayes’ theorem, building blocks of Bayesian statistics, hierarchical Bayes, posterior predictive distribution, and model goodness of fit. The basic definitions of reliability, including reliability function, hazard function, mean time to failure, concept of censoring in Bayesian paradigm, are also included. This chapter also covers the tools and techniques for the Bayesian com- putation. Analytic approximation and simulation methods are covered here, but most of the emphasis is on Laplace approximation and MCMC method including Metropolis-Hastings algorithm and Gibbs sampler are currently the most powerful and popular techniques. An introduction to software packages R, LaplacesDemon, and JAGS used for Bayesian computations is also provided in this chapter.

Important parametric reliability models and discrete as well as continuous reliability data analyses for component level data are presented and developed in Chapter 2. In this chapter, for the Bayesian computation of reliability data, R and JAGS code for binomial, Poisson, and lifetime models are developed. Both types of censored and uncensored data are considered for the Bayesian analysis. Analytic and parallel simulation techniques are implemented to approximate the marginal posterior densities of model parameters. viii 0. Preface

Chapter 3 extends the Bayesian computation methods to the standard regression models used in reliability analysis using parametric models. In practical, logistic, Poisson, and lifetime regression models are considered for the analysis in Bayesian perspective. Real reliability data, which depends on concomitant variables are used. R and JAGS code are developed for the purpose of Bayesian computation. To app- roximate the posterior densities, both analytic and simulation methods are used.

Chapter 4 introduces Bayesian computation methods for hierarchical models, which reveals the full power and conceptual simplicity of the Bayesian approach for the practical reliability problems. Also, Bayesian analysis of hierarchical models facilitate the joint analysis of data collected from similar components. For the Bayesian computation of hierarchical models, R and JAGS code are written and its output results are interpreted and provided.

The research work presented in Chapter 2 and 3 of this thesis is based on my two published papers and the contents of Chapter 4 are based on a communicated paper, which are listed below.

1. Akhtar, M. T. and Khan, A. A. (2014). Bayesian Analysis of Generalized Log-Burr Family with R. SpringerPlus, 3:185.

2. Akhtar, M. T. and Khan, A. A. (2014). Log-logistic Distribution as a Reliability Model: A Bayesian analysis. American Journal of Mathematics and Statistics, 4(3): 162-170.

3. Akhtar, M. T. and Khan, A. A. (2015). Hierarchical Bayesian Approach with R and JAGS – Some Reliability Applications. Journal of Data Science. Acknowledgements

All the praise and thanks are to ALLAH ‘the one universal Being’ who inspire entire humanity toward knowledge, truth and eternal commendation, who blessed me with strength and required passionate order to overcome all the obstacles in the way of this toilsome journey. In utter gratitude I bow my head before HIM.

Words are not sufficient for expressing my intensity of sentiments. So, most humbly I express my deep sense of gratitude and indebtedness to my reverend supervisor Professor Athar Ali Khan who has always been a source of inspiration, whose constructive criticism, affectionable attitude, strong motivation and constant encouragement have added greatly to the readability and relevance of each chapter. I am grateful to him, who unsparingly helped me by sharing the in-depth knowledge of the subject, which has improved and made this thesis presentable. I will remain ever so grateful to him for his kindness as a teacher in the real sense of the term.

My special thanks to Professor Qazi Mazhar Ali, Chairman, Department of Statistics and Operations Research, Aligarh Muslim University, Aligarh for his cooperation, encouragement and furnishing all necessary facilities in the department and for extending his kind cooperation at every stages of the pursuit of this thesis.

I am cheerfully willing to express my appreciation and regard to all teachers of the department for their moral support and whole hearted cooperation. They have been a source of inspiration during my academic endeavor.

Thanks are also due to all non-teaching staff members of the department for their kind help and cooperation.

Words are inadequate in offering my thanks to affectionate colleagues, especially Ms. Neha Gupta, Ms. Romana Shehla, and Mr. Mohammad Azam Khan.

I would also like to thank my senior and junior colleagues for sharing their x 0. Acknowledgements

enthusiasm, moral support and whole hearted cooperation in my work, and equally greatful to all my friends who encouraged me and providing invaluable support and prayer.

I am unable to express my deepest thanks to my family who stood behind me in all the sweet and sour moments of life and were always there to pat my shoulder affectionately. Their love, patience and sacrifice are vested in every page of this study. Whenever I felt tired and sense of despair gripped me, I always found them standing by my side, providing me words of immense consolation.

I would like to thank the developers of the free software, especially R, Laplaces-

Demon, JAGS (for statistical computing and graphics), and LATEX(used in the writing of this thesis).

I would also like to thank the University Grants Commission for providing me Maulana Azad National Fellowship.

Last but certainly not least, my heart fills with earnest gratitude to pay the tributes Bani-e-Darsgah, Sir Syed Ahmad Khan, may ALLAH bestow eternal peace upon them with Maghfirat and Rahmat.

Date: Md. Tanwir Akhtar Place: Aligarh E-mail: [email protected] Contents

Prefacev

Acknowledgements ix

Acronyms xvii

1 Introduction1 1.1 Introduction to Bayesian Statistics...... 3 1.2 Basic Definition of Statistical Models...... 5 1.3 Basis of Bayesian Statistics...... 6 1.3.1 The Bayes’ Theorem...... 7 1.4 Model Based Bayesian Statistics...... 8 1.5 The Building Blocks of Bayesian Statistics...... 10 1.5.1 The Prior Distributions...... 10 1.5.1.1 Informative Priors...... 12 1.5.1.2 Conjugate Priors...... 12 1.5.1.3 Weakly Informative Priors...... 14 1.5.1.4 Least Informative Priors...... 16 1.5.1.5 Uninformative Priors...... 17 1.5.2 The Likelihood...... 18 1.5.2.1 Likelihood Based Inference...... 18 1.5.2.2 Likelihood Function of a Parameterized Model... 19 1.6 Hierarchical Bayes...... 21 1.6.1 Hierarchical Bayesian Models...... 21 1.7 Prediction in Bayesian Paradigm...... 23 1.8 Model Goodness of Fit...... 25 xii Contents

1.9 Introduction to Reliability...... 27 1.9.1 Basic Definitions of Reliability...... 29 1.9.1.1 Reliability Function...... 29 1.9.1.2 Hazard Function...... 30 1.9.1.3 Relationship among Measures...... 31 1.9.1.4 Mean Time to Failure...... 32 1.9.1.5 Mean Residual Life...... 32 1.9.2 Censoring...... 33 1.9.2.1 Likelihood and Censoring...... 34 1.10 Tools and Techniques...... 37 1.10.1 Asymptotic Approximation Methods...... 38 1.10.1.1 The Normal Approximation...... 38 1.10.1.2 The Laplace Approximation...... 39 1.10.2 Simulation Methods...... 42 1.10.3 Direct Simulation Methods...... 42 1.10.3.1 Monte Carlo Approximation Method...... 43 1.10.3.2 Rejection Sampling...... 45 1.10.3.3 Importance Resampling...... 46 1.10.3.4 Sampling Importance Resampling...... 47 1.10.4 Markov Chain Monte Carlo Methods...... 48 1.10.4.1 Metropolis-Hastings Algorithm...... 51 1.10.4.2 The Independent Metropolis...... 52 1.10.4.3 The Random-walk Metropolis...... 53 1.10.4.4 Gibbs Sampling...... 54 1.10.4.5 Metropolis-within-Gibbs...... 55 1.11 Software Packages Used...... 56 1.11.1 R Language...... 56 Contents xiii

1.11.2 LaplacesDemon...... 57 1.11.2.1 The Function LaplaceApproximation ...... 59 1.11.2.2 The Function LaplacesDemon ...... 64 1.11.3 JAGS...... 68 1.11.3.1 The package R2jags...... 69 1.11.3.2 The Function jags ...... 69

2 Bayesian Reliability Analysis of Parametric Models 73 2.1 Introduction...... 75 2.2 Important Parametric Distributions...... 76 2.2.1 The Binomial Distribution...... 76 2.2.2 The Poisson Distribution...... 77 2.2.3 The Log-normal Distribution...... 78 2.2.4 The Weibull Distribution...... 78 2.2.5 The Log-logistic Distribution...... 80 2.2.6 The Log-location-scale Distribution...... 81 2.2.7 The Generalized Log-Burr Distribution...... 82 2.3 Bayesian Analysis of Discrete Models...... 84 2.3.1 The Binomial Model for Success/Failure Data...... 84 2.3.2 Implementation...... 87 2.3.2.1 The Launch Vehicle Data...... 87 2.3.2.2 Analysis Using R ...... 88 2.3.2.3 Analysis Using JAGS ...... 91 2.3.3 The Poisson Model for Count Data...... 96 2.3.4 Implementation...... 99 2.3.4.1 The Supercomputer Failure Count Data...... 100 2.3.4.2 Analysis Using R ...... 100 2.3.4.3 Analysis Using JAGS ...... 102 xiv Contents

2.4 Bayesian Analysis of Continuous Models...... 106 2.4.1 The Log-logistic Failure Time Model...... 106 2.4.2 Implementation...... 108 2.4.2.1 The Roller Bearing Failure Time Data...... 108 2.4.2.2 Analysis using LaplacesDemon ...... 109 2.4.2.3 Analysis using JAGS ...... 118 2.4.3 The Generalized Log-Burr Failure Time Model...... 122 2.4.4 Implementation...... 123 2.4.4.1 The Locomotive Controls Data...... 123 2.4.4.2 Analysis using LaplacesDemon ...... 123 2.4.4.3 Analysis using JAGS ...... 130 2.5 Discussion and Conclusion...... 134

3 Bayesian Reliability Analysis of Regression Models 137 3.1 Introduction...... 139 3.2 Generalised Linear Regression Models...... 141 3.3 The Link Function...... 142 3.4 Regression Analysis of Discrete Models...... 144 3.4.1 Logistic Regression Model for Binomial Data...... 144 3.4.2 Implementation...... 146 3.4.2.1 High-Pressure Coolant Injection System Demand Data...... 147 3.4.2.2 Analysis Using LaplacesDemon ...... 147 3.4.2.3 Analysis Using JAGS ...... 155 3.4.3 Poisson Regression Model for Count Data...... 159 3.4.4 Implementation...... 160 3.4.4.1 Systems’s Components Reliability Data...... 161 3.4.4.2 Analysis Using LaplacesDemon ...... 161 3.4.4.3 Analysis Using JAGS ...... 170 Contents xv

3.5 Regression Analysis of Continuous Models...... 173 3.5.1 Log-logistic regression Model for Lifetime Data...... 175 3.5.2 Implementation...... 176 3.5.2.1 Lifetimes of Steel Specimens Data...... 177 3.5.2.2 Analysis Using LaplacesDemon ...... 178 3.5.2.3 Analysis Using JAGS ...... 183 3.5.3 Generalized Log-Burr Model for Lifetime Data...... 187 3.5.4 Implementation...... 188 3.5.4.1 Electrical Insulating Fluid Failure Time Data... 188 3.5.4.2 Analysis Using LaplacesDemon ...... 189 3.5.4.3 Analysis Using JAGS ...... 194 3.6 Discussion and Conclusion...... 197

4 Bayesian Reliability Analysis of Hierarchical Models 199 4.1 Introduction...... 201 4.2 Hierarchical Modeling of Binomial Data...... 202 4.2.1 Implementation...... 203 4.2.1.1 Emergency Diesel Generators Demand Data.... 204 4.2.1.2 Analysis Using R ...... 205 4.2.1.3 Analysis Using JAGS ...... 209 4.3 Hierarchical Modeling of Poisson Data...... 214 4.3.1 Implementation...... 215 4.3.1.1 Nuclear Power Plant Scram Rate Data...... 215 4.3.1.2 Analysis Using R ...... 216 4.3.1.3 Analysis Using JAGS ...... 219 4.3.1.4 Prediction for a New Response...... 224 4.4 Hierarchical Modeling of Lifetime Data...... 228 4.4.1 Implementation...... 229 4.4.1.1 Bearing Fatigue Failure Times Data...... 230 xvi Contents

4.4.1.2 Analysis Using LaplacesDemon ...... 230 4.4.1.3 Analysis Using JAGS ...... 237 4.5 Discussion and Conclusion...... 240

Bibliography 243 Acronyms

AIC : Akaike Information Criterion AMWG : Adaptive Metropolis within Gibbs ANOVA : Analysis of Variance BFR : Bathtub Failure Rate BGR : Brook-Gelman-Rubin BIC : Bayesian Information Criterion BMA : Bayesian Model Averaging BPIC : Bayesian Predictive Information Criterion BUGS : Bayesian Inference Using Gibbs Sampling BWR : Boiling Water Reactor cdf : Cumulative Density Function CFR : Constant Failure Rate CI : Credible Interval clog-log : Complementary Log-Log coda : Convergence Diagnostic and Output Analysis CRAN : Comprehensive R Archive Network DFR : Decreasing Failure Rate DIC : Deviance Information Criterion EDG : Emergency Diesel Generator ESS : Effective Sample Size FPL : Functional Programming Language GIV : Generate Initial Values GLM : Generalized Linear Model glmer : Generalized Linear Mixed Effects in R xviii 0. Acronyms

GLMM : Generalizes Linear Mixed Model HPCI : High-Pressure Coolant Injection IFR : Increasing Failure Rate iid : Independently and Identically Distributed IM : Independent Metropolis ISO : International Organization for Standardisation JAGS : Just Another Gibbs Sampler LBFGS : Limited-Memory Broyden-Fletcher-Goldfarb-Shanno LIP : Least Informative Prior lmer : Linear Mixed Effects in R LMM : Linear Mixed Model LL : Loglikelihood LP : Logposterior MCMC : Markov chain Monte Carlo MCSE : Monte Carlo Standard Error MH : Metropolis Hastings MTTF : Mean Time to Failure MWG : Metropolis within Gibbs nlmer : Non-linear Mixed Effects in R NPPSRD : Nuclear Power Plant Scram Rate Data pdf : Probability Density Function PMC : Population Monte Carlo RWM : Random walk Metropolis SCR : System’s Component Reliability SD : Standard Deviation SIR : Sampling Importance Resampling SMP : Shared Memory Processor SPG : Spectral Projected Gradient WIP : Weakly Informative Prior 1

Introduction

Contents 1.1 Introduction to Bayesian Statistics...... 3

1.2 Basic Definition of Statistical Models...... 5

1.3 Basis of Bayesian Statistics...... 6

1.3.1 The Bayes’ Theorem...... 7

1.4 Model Based Bayesian Statistics...... 8

1.5 The Building Blocks of Bayesian Statistics...... 10

1.5.1 The Prior Distributions...... 10

1.5.2 The Likelihood...... 18

1.6 Hierarchical Bayes...... 21

1.6.1 Hierarchical Bayesian Models...... 21

1.7 Prediction in Bayesian Paradigm...... 23

1.8 Model Goodness of Fit...... 25

1.9 Introduction to Reliability...... 27

1.9.1 Basic Definitions of Reliability...... 29

1.9.2 Censoring...... 33 2 1. Introduction

1.10 Tools and Techniques...... 37

1.10.1 Asymptotic Approximation Methods...... 38

1.10.2 Simulation Methods...... 42

1.10.3 Direct Simulation Methods...... 42

1.10.4 Markov Chain Monte Carlo Methods...... 48

1.11 Software Packages Used...... 56

1.11.1 R Language...... 56

1.11.2 LaplacesDemon...... 57

1.11.3 JAGS...... 68 1.1. Introduction to Bayesian Statistics 3

1.1 Introduction to Bayesian Statistics

In statistics, there are two broad categories of interpretations of probability: classical (also called conventional or frequentist) and Bayesian. These views often differ with each other on the fundamental nature of probability. In classical statistics, probability is loosely defined as the limit of an event’s relative frequency in a large number of trials, and only in the context of experiments that are random and well defined. On the other hand, Bayesian statistics is able to assign probabilities to any statement, even when a random process is not involved. In Bayesian statistics, probability is a way to express an individuals degree of belief in a statement, or given evidence. Bayesian statistics is named after Reverend Thomas Bayes (1701-1761), an English Minister and Mathematician, for developing Bayes’ Theorem, which was published posthumously after his death (Bayes, 1763). Bayesian statistics is, in fact, very old and was dominating school of statistics for a long time. In contrast, the foundation of classical statistics were not really laid until the first half of twentieth century.

Until the late 1980s, the Bayesian statistics was considered only as an interesting alternative of the classical approach. But the beginning of the 21st century found Bayesian statistics to be fashionable in science. The main difference between the classical and Bayesian statistics is that the later considers parameter as a random variable that are characterized by a prior distribution. This prior distribution is combined with the parameter of interest on traditional likelihood to obtain the posterior distribution of the parameter of interest on which the statistical inference is based (e.g., Ntzoufras, 2009).

More precisely, Bayesian statistics is an approach to statistics which formally seeks use of prior information along with data, and Bayes’ theorem provides the formal basis of using prior information with data. This approach is entirely 4 1. Introduction

different from the sampling theoretic approach in which inferences are made about a parameter on the basis of sampling distribution of the estimator of the parameter of interest. Inference in Bayesian statistics is a simple probability calculation, and one of the things Bayesians are most proud of is the parsimony and internal logic of their framework for inference. Thus, the entire Bayesian theory for inference can be derived using just three axioms of probability (Lindley, 1983, 2006). Bayes’ rule can be deduced from them, and the entire framework for Bayesian statistics, such as estimation, prediction, hypothesis testing, is based on just these three premises. In contrast, classical statistics lacks such an internally coherent body of theory (Kery, 2010).

Although, the main tool of Bayesian statistics is probability theory yet, for many years Bayesians were considered as a heretic minority for several reasons. The main objection of frequentists was on the subjective view point of the Bayesian approach introduced in the analysis through the prior distribution. However, as history had proved, the main reason why Bayesian theory was unable to establish a foothold as a well accepted quantitative approach for data analysis was the intractabilities involved in the calculation of the posterior distribution. Asymptotic methods had provided solutions to specific problems, but no generalization was possible. Ironically therefore, for a long time Bayesians thought that they had the better solutions in principle than classical but unfortunately could not practically apply them to any except very simple problems for want of a method to solve their problems. A dramatic change of this situation came with the advent of simulation based approaches like Markov chain Monte Carlo and related techniques that draw samples from the posterior distribution; see for instance, the article entitled “Bayesian Statistics without Tears” by Smith and Gelfand(1992). In the early 1990s, two groups of statisticians had rediscovered Markov chain Monte Carlo (MCMC) methods in combination with the rapid evolution of computers, which made the new computational tools popular within a few years. Bayesian statistics 1.2. Basic Definition of Statistical Models 5

suddenly became fashionable, opening new highways for statistical research. Using MCMC, a Bayesian can now set up and estimate complicated models that describe and solve problems that could not be solved with frequentist methods.

In addition to this, more recent advancements in genetics have also given new impetus to Bayesian theory. Generally the large amount of data, in terms of both sample size and variable size, have rendered the more traditional methods inapplicable. Hence, Bayesian methods, with the help of simulation techniques like MCMC methodology, are appropriate for exploration of large model and parameter spaces and tracing the most important associations.

1.2 Basic Definition of Statistical Models

One of the most important issues in statistical modeling is the construction of probabilistic models that represent, or sufficiently approximate, the true generating mechanism of a phenomenon under study. The construction of such models is usually based on probabilistic and logical arguments concerning the nature and function of a given phenomenon.

Consider a random variable y, called response, which follows a probabilistic rule with density or probability function p(y|θ), where θ is the parameter vector.

T Consider an independent, identically distributed (iid) sample y = [y1, . . . , yn] of size n of this variable; where the AT denotes the transpose of a vector or matrix A. The joint distribution n Y p(y|θ) = p(yi|θ) (1.1) i=1 is called the likelihood of the model and contains the available information provided by the observed sample.

Usually, models are constructed in order to assess or interpret causal relationships 6 1. Introduction

between the response variable y and various characteristics expressed as variables

Xj, j ∈ ν, called covariates or explanatory variables; j indicates a covariate or model term and ν is the set of all terms under consideration. In such cases, the explanatory variables are linked with the response variables via a deterministic function and part of the original parameter vector is substituted by an alternative set of parameters (denoted by β) that usually encapsulate the effect of each covariate on the response variable. For example, in a normal regression model with y ∼ N(Xβ, σ2I), the parameter vector is given by θT = [βT , σ2].

1.3 Basis of Bayesian Statistics

The problem stated in Bayes’ famous paper (Bayes, 1763) involves two key ingredients. One is the use of probability as a means of expressing uncertainty about an unknown quantity of interest. The other is the conditional nature of the problem: what Bayes was interested in evaluating was the conditional probability of failures in a single trial, given some data on the previous number of failures. Put another way, he wanted to learn about the failure probability on the basis of observed data. In modern language, this translates to requiring p(θ|y, n), where θ is the unknown failure probability and we have observed data on y failures out of n binomial trials. Bayes proposed a theorem (easily provable from the axioms of probability) relating conditional and marginal probabilities of random variables which he used to calculate the required conditional probability for his problem (Lunn et al., 2013).

Unaware of Bayes, Pierre-Simon Laplace (1749-1827) independently devel- oped Bayes’ theorem and first published his version in 1774, eleven years after Bayes, in one of Laplace’s first major contributions (Laplace, 1774, p. 366-367). In 1812, Laplace introduced a host of new ideas and mathematical techniques in his book, 1.3. Basis of Bayesian Statistics 7

Theorie Analytique des Probabilites (Laplace, 1812). Thereafter, in 1814, Laplace published his Essai Philosophique sur les Probabilites, which introduced a mathe- matical system of inductive reasoning based on probability (Laplace, 1814). In this work, the Bayesian interpretation of probability was developed independently by Laplace, much more thoroughly than Bayes, so some Bayesians refer to Bayesian inference as Laplacian inference. As before Laplace, probability theory was solely concerned with developing a mathematical analysis of games of chance. Laplace applied probabilistic ideas to many scientific and practical problems.

1.3.1 The Bayes’ Theorem Bayes’ theorem shows the relationship between two conditional probabilities that are reverse of each other. This theorem is named after Reverend Thomas Bayes (1701-1761), and is also referred to as Bayes’ Law or Bayes’ Rule (Bayes and Price, 1763). Although, this theorem was originally found by Pierre-Simon de Laplace (Hoffmann-Jorgensen, 1994, p. 102).

Bayes’ theorem is usually stated in terms of probability for observable events, and it is valid in all common interpretation of probability. Let us consider two

possible outcomes A and B. Moreover, consider that A = A1∪,..., ∪An, for which

Ai ∩ Aj = φ for every i 6= j. Then Bayes’ theorem provides an expression for the conditional probability of A given B, which is equal to

P (B|Ai)P (Ai) P (B|Ai)P (Ai) P (Ai|B) = = Pn . (1.2) P (B) i=1 P (B|Ai)P (Ai)

In a simple and more general form, for any outcome A and B,

P (B|A)P (A) P (A|B) = . (1.3) P (B)

The above rule can be used for inverse probability. Assume that B is the finally observed outcome and that by A, we denote possible causes that provoke B. Then 8 1. Introduction

P (B|A) can be interpreted as the probability that B will appear when A cause is present while P (A|B) is the probability that A is responsible for the occurrence of B that we have already observed.

Bayesian inference is based on this rationale, or it is better to say that Bayes’ theorem is the basis of Bayesian inference. The preceding equation, which at a first glance is simple, offers a probabilistic mechanism of learning from data

(Bernardo and Smith, 1994, p. 2). Hence, after observing data (y1, y2, . . . , yn), we

calculate the posterior distribution p(θ|y1, . . . , yn), which combines prior and data information. This posterior distribution is the key element in Bayesian inference.

1.4 Model Based Bayesian Statistics

Bayesian statistics differs from the classical statistics since all unknown parameters are considered as random variables. For this reason, prior distribution must be defined initially. This prior distribution expresses the information available to the researcher before any “data” are involved in the statistical analysis. Interest lies in calculation of the posterior distribution p(θ|y) of the parameters θ given the observed data y. According to the Bayes’ theorem, by replacing B with observed data y, A with parameter set θ, and probability P with density p, the Equation (1.3) can be written as p(y|θ)p(θ) p(θ|y) = , (1.4) p(y) where p(y) will be discussed below, p(θ) is the set of prior distributions of parameter set θ before y is observed, p(y|θ) is the likelihood of y under a model, and p(θ|y) is the joint posterior distribution, sometimes called the full posterior distribution of parameter set θ that expresses uncertainty about parameter set θ after taking both the prior and data into account. Since, there are usually multiple parameters, θ represents a set of j parameters, and may be considered hereafter in this thesis 1.4. Model Based Bayesian Statistics 9

as θ = θ1, . . . , θj. The denominator,

Z p(y) = p(y|θ)p(θ)dθ, (1.5)

defines the marginal likelihood of y, or the prior predictive distribution of y, and may be set to an unknown constant c. The prior predictive distribution indicates what y should look like, given the model, before y has been observed. Only the set of prior probabilities and the model’s likelihood are used for the marginal likelihood of y. The presence of the marginal likelihood of y normalizes the joint posterior distribution, p(θ|y), ensuring it is a proper distribution and integrates to one.

By replacing p(y) with constant of proportionality c, the model-based formulation of Bayes’ theorem becomes

p(y|θ)p(θ) p(θ|y) = c

or, p(θ|y) ∝ p(y|θ)p(θ). (1.6)

This form can be stated as the unnormalized joint posterior being proportional to the likelihood times the prior. However, the goal in model-based Bayesian inference is usually not to summarize the unnormalized joint posterior distribution, but to summarize the marginal distributions of the parameters. The full parameter set θ can typically be partitioned into θ = {Φ, Λ}, where Φ is the sub-vector of interest, and Λ is the complementary sub-vector of θ, often referred to as a vector of nuisance parameters. In a Bayesian framework, the presence of nuisance parameters does not pose any formal, theoretical problems. A nuisance parameter is a parameter that exists in the joint posterior distribution of a model, though it is not a parameter of interest. The marginal posterior distribution of φ, the 10 1. Introduction

parameter of interest, can simply be written as

Z p(φ|y) = p(φ, Λ|y)dΛ. (1.7)

In model-based Bayesian inference, Bayes’ theorem is used to estimate the unnor- malized joint posterior distribution, and finally we can assess and make inferences from the marginal posterior distributions.

1.5 The Building Blocks of Bayesian Statistics

The building blocks of Bayesian statistics are:

1. The Prior: p(θ) is the set of prior distributions for parameter set θ, and uses probability as a means of quantifying uncertainty about θ before taking the data into account.

2. The Likelihood: p(y|θ) is the likelihood or likelihood function of θ, in which all variables are related in a full probability model.

3. The Posterior: p(θ|y) is the joint posterior distribution that expresses uncer- tainty about parameter set θ after taking both the prior and the data into account. If parameter set θ is partitioned into a single parameter of interest φ and the remaining parameters are considered nuisance parameters, then p(φ|y) is the marginal posterior distribution.

1.5.1 The Prior Distributions

In Bayesian theory, a prior probability distribution, often called simply the prior, of an uncertain parameter θ is a probability distribution that expresses uncertainty about θ before the data are taken into account. The parameters of a prior distribution are called hyperparameters. When applying Bayes’ theorem, the 1.5. The Building Blocks of Bayesian Statistics 11

prior is multiplied by the likelihood function and then normalized to estimate the posterior probability distribution, which is the conditional distribution of θ given the data. Moreover, the prior distribution affects the posterior distribution, therefore, specification of prior distribution is important in Bayesian inference. Usually, specification of the prior mean and variance is emphasized. The prior mean provides a prior point estimate for the parameter of interest, while the variance expresses uncertainty concerning this estimate. When a priori strongly believes that this estimate is accurate, then the variance must be set low, while ignorance or great uncertainty concerning the prior mean can be expressed by large variance. If prior information is available, it should be appropriately summarized by the prior distribution. This procedure is called elicitation of prior knowledge (Statisticat LLC, 2013). Usually, no prior information is available. In this case we need to specify a prior that will not influence the posterior distribution and let the data speak for themselves. Such distributions are frequently called noninformative or vague prior distributions. A usual vague improper prior distribution is p(θ) ∝ 1, which is the uniform prior over the parameter space. The term improper here refers to distributions that do not integrate to one. Such prior distributions can be used without any problem provided that the resulting posterior will be proper. A wide range of noninformative vague priors may be used; for details, see Kass and Wasserman(1995) and Yang and Berger(1996). In this thesis, the term weakly informative prior is used for proper prior distributions for location parameters with large variance. However, for the scale or shape parameters half-cauchy with scale = 25 is commonly used. Such priors contribute negligible information to the posterior distribution.

Traditionally, prior distributions belong to one of two categories: informative and noninformative priors. Here, five categories of priors are presented according to their information and the goal in the use of the prior. The five categories are informative, conjugate, weakly informative, least informative and uninformative. 12 1. Introduction

1.5.1.1 Informative Priors

When prior information is available about θ, it should be included in the prior distribution of θ. For example, if the present model form is similar to a previous model form, and the present model is intended to be an updated version based on more current data, then the posterior distribution of θ from the previous model may be used as the prior distribution of θ for the present model.

In this way, each version of a model is not starting from scratch, based only on the present data, but the cumulative effects of all data, past and present, can be taken into account. To ensure the current data do not overwhelm the prior, Ibrahim and Chen(2000) introduced the power prior. The power prior is a class of informative prior distribution that takes previous data and results into account. If the present data is very similar to the previous data, then the precision of the posterior distribution increases when including more and more information from previous models. If the present data differs considerably, then the posterior distribution of θ may be in the tails of the prior distribution for θ, so the prior distribution contributes less density in its tails. Hierarchical Bayes is also a popular way to combine data sets.

1.5.1.2 Conjugate Priors

Usually the target posterior distribution is not analytically tractable. In the past, intractability was avoided via the use of conjugate prior distributions. These prior distributions have the nice property of resulting to posteriors of the same distributional family. An extensive illustration of conjugate priors is provided by Bernardo and Smith(1994).

A prior distribution that is a member of the distributional family D with param- eters α is conjugate to the likelihood p(y|θ) if the resulting posterior distribution 1.5. The Building Blocks of Bayesian Statistics 13

p(θ|y) is also a member of the same distributional family. Therefore,

if θ ∼ D(α) then θ|y ∼ D(˜α), where α and α˜ are the prior and posterior parameters of D. Moreover, when the posterior distribution is in the same family as the prior probability distribution, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood. For example, the Gaussian family is conjugate to itself (or self-conjugate) with respect to a Gaussian likelihood: if the likelihood function is Gaussian, then choosing a Gaussian prior for the mean will ensure that the posterior distribution is also Gaussian. All probability distributions in the exponential family have conjugate priors, see Robert(2007) for a catalog. Although, the gamma distribution is the conjugate prior distribution for the precision of a normal distribution (Spiegelhalter et al., 2003),

τ ∼ Gamma(0.0001, 0.0001).

better properties for scale parameters are yielded with the non-conjugate, proper, half-Cauchy (the half-t distribution is another option) distribution, with a general recommendation of scale= 25, see Figure 1.1 for a weakly informative scale parameter (Gelman, 2006),

σ ∼ HC(25)

τ ∼ σ−2.

Conjugacy is mathematically convenient in that the posterior distribution follows a known parametric form (Gelman et al., 2004, p. 40). It is obviously easier to summarize a normal distribution than a complex, multi-modal distribution with no known form. If information is available that contradicts a conjugate parametric family, then it may be necessary to use a more realistic, inconvenient prior distribution. 14 1. Introduction 0.6 0.5 scale=1 scale=5 scale=25 0.4 0.3 Density 0.2 0.1 0.0

0 5 10 15 20 x

Figure 1.1: It is evident from the above plot that for scale=25 the half-Cauchy distribution becomes almost uniform.

The basic justification for the use of conjugate prior distributions is similar to that for using standard models, such as the binomial and normal, for the likelihood: it is easy to understand the results, which can often be put in analytic form, they are often a good approximation, and they simplify computations. Also, they are useful as building blocks for more complicated models, including many dimensions, where conjugacy is typically impossible. For these reasons, conjugate models can be good starting points (Gelman et al., 2004, p. 41).

Nonconjugate prior distributions can make interpretation of posterior inferences less transparent and computation more difficult, though this alternative does not pose any conceptual problems. In practice, for complicated models, conjugate prior distributions may not even be possible (Gelman et al., 2004, p. 41-42).

1.5.1.3 Weakly Informative Priors

Weakly Informative Prior (WIP) distribution uses prior information for regulariza- tion and stabilization, providing enough prior information to prevent results that 1.5. The Building Blocks of Bayesian Statistics 15

contradict our knowledge or problems such as an algorithmic failure to explore the state-space. Another goal is for WIPs to use less prior information than is actually available. A WIP should provide some of the benefit of prior information while avoiding some of the risk from using information that doesn’t exist. WIPs are the most common priors in practice, and are favored by subjective Bayesians.

Selecting a WIP can be tricky. WIP distributions should change with the sample size, because a model should have enough prior information to learn from the data, but the prior information must also be weak enough to learn from the data. For example, a popular WIP for a centered and scaled predictor (Gelman, 2008) may be θ ∼ N(0, 10000) where θ is normally-distributed according to a mean of 0 and a variance of 10, 000, which is equivalent to a standard deviation of 100, or precision of 1.0 × 10−4. In this case, the density for θ is nearly flat. Nonetheless, the fact that it is not perfectly flat yields good properties for numerical approximation algorithms. In both Bayesian and frequentist inferences, it is possible for numerical approximation algorithms to become stuck in regions of flat density, which become more common as sample size decreases or model complexity increases. Numerical approximation algorithms in frequentist inference function as though a flat prior were used, so numerical approximation algorithms in frequentist inference become stuck more frequently than numerical approximation algorithms in Bayesian inference. Prior distributions that are not completely flat provide enough information for the numerical approximation algorithm to continue to explore the target density, the posterior distribution.

Vague Priors: A vague prior, also called a diffuse prior, is difficult to define, after considering WIPs. The first formal move from vague to weakly informative priors is Lambert et al.(2005). After conjugate priors were introduced (Raiffa and Schlaifer, 16 1. Introduction

1961), most applied Bayesian modeling has used vague priors, parameterized to approximate the concept of uninformative priors (better considered as least informative priors, see Section 1.5.1.4).

Typically, a vague prior is a conjugate prior with a large scale parameter. However, vague priors can pose problems when the sample size is small. Most problems with vague priors and small sample size are associated with scale, rather than location, parameters. The problem can be particularly acute in random- effects models, and the term random-effects is used rather loosely here to imply exchangeable, hierarchical, and multilevel structures. A vague prior is defined here as usually being a conjugate prior that is intended to approximate an uninformative prior (or actually, a least informative prior), and without the goals of regularization and stabilization.

1.5.1.4 Least Informative Priors

The term Least Informative Priors, or LIPs, is used here to describe a class of prior in which the goal is to minimize the amount of subjective information content, and to use a prior that is determined solely by the model and observed data. The rationale for using LIPs is often said to be to let the data speak for themselves.

Flat Priors: The flat prior was historically the first attempt at an uninformative prior. The unbounded, uniform distribution, often called a flat prior, is

θ ∼ U(−∞, ∞) where θ is uniformly-distributed from negative infinity to positive infinity. Although this seems to allow the posterior distribution to be affected solely by the data with no impact from prior information, this should generally be avoided because this probability distribution is improper, meaning it will not integrate to one. This may cause the posterior to be improper, which invalidates the model. 1.5. The Building Blocks of Bayesian Statistics 17

Reverend Thomas Bayes (1701-1761) was the first to use inverse probability (Bayes and Price, 1763), and used a flat prior for his billiard example so that all possible values of θ are equally likely a priori (Gelman et al., 2004, p. 34-36). Pierre-Simon Laplace (1749-1827) also used the flat prior to estimate the proportion of female births in a population, and for all estimation problems presented or justified as a reasonable expression of ignorance. Laplace’s use of this prior distribution was later referred to as the principle of indifference or principle of insufficient reason, and is now called the flat prior (Gelman et al., 2004, p. 39). Laplace was aware that it was not truly uninformative, and used it as a LIP.

Hierarchical Prior: A hierarchical prior is a prior in which the parameters of the prior distribution are estimated from data via hyperpriors, rather than with subjective information (Gelman, 2008). Parameters of hyperprior distributions are called hyperparameters. Subjective Bayesians prefer the hierarchical prior as the LIP, and the hyperparameters are usually specified as WIPs. Hierarchical priors are presented later in more detail in the section entitled “Hierarchical Bayes”.

Jeffreys Prior: Jeffreys prior, also called Jeffreys rule, was introduced in an attempt to establish a least informative prior that is invariant to transforma- tions (Jeffreys, 1961). Jeffreys prior works well for a single parameter, but multi-parameter situations may have inappropriate aspects accumulate across dimensions to detrimental effect.

1.5.1.5 Uninformative Priors

Traditionally, most of the above descriptions of prior distributions were categorized as uninformative priors. However, uninformative priors do not truly exist (Irony and Singpurwalla, 1997), and all priors are informative in some way. Traditionally, there have been many names associated with uninformative priors, including diffuse, minimal, noninformative, objective, reference, uniform, vague, and perhaps weakly informative (Statisticat LLC, 2013). 18 1. Introduction

1.5.2 The Likelihood

In order to complete the definition of a Bayesian model, both the prior distribu- tions and the likelihood must be approximated or fully specified. The likelihood, likelihood function or p(y|θ), contains the available information provided by the sample. An introductory description of likelihood is provided in Section 1.2. From the definition of likelihood given by Equation (1.1), the data y affects the poste- rior distribution p(θ|y) only through the likelihood function p(y|θ). In this way, Bayesian inference obeys the likelihood principle, which states that for a given sample of data, any two probability models p(y|θ) that have the same likelihood function yield the same inference for θ.

1.5.2.1 Likelihood Based Inference

If we assume a set of independent and identically distributed observations y =

{y1, . . . , yn} from a sampling model p(yi|θ), i = 1, . . . , n with scalar θ, then likelihood (as introduced in Section 1.2) is any function of θ that is proportional Qn ˆ to p(y|θ) = i=1 p(yi|θ). The maximum likelihood estimate is the value θ which maximises p(y|θ), or equivalently maximises the log-likelihood. Let

  "d2logp(y|θ)# dlogp(y|θ)!2 I(θ) = −E = E   y|θ dθ2 y|θ dθ

be the Fisher Information contained in a single observation y. Then under broad generality conditions, the maximum likelihood estimator has an asymptotic normal distribution θˆ ∼ N(θ, [nIˆ(θˆ)]−1) (1.8) where Iˆ is a sample based estimate of I. Thus the maximum likelihood estimator will converge to the true value of the parameter assuming the sampling model has been appropriately chosen (Lunn et al., 2013). 1.5. The Building Blocks of Bayesian Statistics 19

1.5.2.2 Likelihood Function of a Parameterized Model

In non-technical parlance, likelihood is usually a synonym for probability, but in statistical usage, there is a clear distinction: probability allows us to predict unknown outcomes based on known parameters, whereas likelihood allows us to estimate unknown parameters based on known outcomes.

In a sense, likelihood can be thought a reversed version of conditional probability. Reasoning forward from a given parameter θ, the conditional probability of y is the density p(y|θ). With θ as a parameter, here is relationship in expressions of the likelihood function L(θ|y) = p(y|θ) where y is the observed outcome of an experiment, and the likelihood (L) of θ given y is equal to the density p(y|θ). When viewed as a function of y with θ fixed, it is not a likelihood function L(θ|y), but merely a probability density function p(y|θ). When viewed as a function of θ with y fixed, it is a likelihood function and may be denoted as L(θ|y) or p(y|θ). For example, in a Bayesian linear regression with an intercept and two independent variables, the model may be specified as

2 yi ∼ N(µi, σ )

µi = β1 + β2Xi,1 + β3Xi,2

The dependent variable y, indexed by i = 1, . . . , n, is stochastic and normally- distributed according to the expectation vector µ, and variance σ2. Expectation vector µ is an additive, linear function of a vector of regression parameters, β, and the model matrix X.

By considering a conditional distribution, the record-level likelihood in Bayesian notation is

1 1 2 p(yi|θ) = √ exp[− (yi − µi) ]; y ∈ (−∞, ∞) σ 2π 2σ2 20 1. Introduction

In both theory and practice, and in both frequentist and Bayesian inference, the log-likelihood is used instead of the likelihood, on both the record- and model-level. The model-level product of record-level likelihoods can exceed the range of a number that can be stored by a computer, which is usually affected by sample size. By estimating a record-level log-likelihood, rather than likelihood, the model-level log-likelihood is the sum of the record-level log-likelihoods, rather than a product of the record-level likelihoods,

n X log[p(y|θ)] = log[p(yi|θ)] i=1

rather than n Y p(y|θ) = p(yi|θ). i=1 As a function of θ, the unnormalized joint posterior distribution is the product of the likelihood function and the prior distributions. To continue with the example of Bayesian linear regression, here is the unnormalized joint posterior distribution

2 2 2 p(β, σ |y) = p(y|β, σ ) p(β1) p(β2) p(β3) p(σ ).

More usually, the logarithm of the unnormalized joint posterior distribution is used, which is the sum of the log-likelihood and prior distributions. Here is the logarithm of the unnormalized joint posterior distribution for this example

2 2 2 log[p(β, σ |y)] = log[p(y|β, σ )] + log[p(β1)] + log[p(β2)] + log[p(β3)] + log[p(σ )].

The logarithm of the unnormalized joint posterior distribution is maximized with numerical approximation. 1.6. Hierarchical Bayes 21

1.6 Hierarchical Bayes

Prior distributions may be estimated within the model via hyperprior distributions, which are usually vague and nearly flat. Parameters of hyperprior distributions are called hyperparameters. Using hyperprior distributions to estimate prior distributions is known as hierarchical Bayes. In theory, this process could continue further, using hyper-hyperprior distributions to estimate the hyperprior distributions. Estimating priors through hyperpriors, and from the data, is a method to elicit the optimal prior distributions (Statisticat LLC, 2013). One of many natural uses for hierarchical Bayes is multilevel or hierarchical modeling. More clear discussion is made in the following subsection.

1.6.1 Hierarchical Bayesian Models

Inherently, Bayesian models have a hierarchical structure. The prior probability distribution p(θ|α) of the model parameters θ with prior parameters α can be considered as one level of hierarchy. The likelihood combined with the one level of hierarchy leads to the final stage of a Bayesian model resulting in the posterior distribution p(θ|y) ∝ p(y|θ) p(θ|α) via Bayes theorem. Figure 1.2 shows a graphical representation of the hierarchical structure of a standard Bayesian model.

To capture the complex structure of some data, the prior is frequently structured using a series of conditional distributions called hierarchical stages of the prior distribution. Hence, a hierarchical Bayesian model is defined when a prior distribution is also assigned on the prior parameters α associated with the 22 1. Introduction

α θ y

Prior p(θ|α) p(y|θ) Parameters Prior Distribution Data Likelihood

Figure 1.2: Graphical representation of standard Bayesian model. Squared nodes refer to constant parameters whereas oval nodes refer to stochastic components of the model.

likelihood parameter θ. The posterior distribution takes the form,

p(θ, α, β|y) ∝ p(y|θ) p(θ|α) p(α|β). (1.9)

In this model formulation, the prior distribution is characterized by two levels of hierarchy, that is, p(θ|α) is the first level and p(α|β) is the second level. Prior distributions of the first level of a hierarchical prior are called hyperpriors and the corresponding parameters are called hyperparameters . In this case, p(α|β) is the hyperprior and β are the hyperparameters of the prior parameters α. This structure can be extended by adding more level of hierarchy if needed (Ntzoufras, 2009). By reading the above equation from right to left, it begins with hyperpriors p(α|β), which are used conditionally to (estimate) prior p(θ|α), which in turn is used, as per usual, to estimate the likelihood p(y|θ), and finally the posterior is p(θ, α, β|y). Graphical representation of a two stage hierarchical Bayesian model can be seen in Figure 1.3.

Generally, any Bayesian model with parameters θ and Φ and prior distribution p(θ, Φ) can be written in a hierarchical structure if the joint prior distribution is decomposed to a series of conditional distributions such as p(θ, Φ) = p(θ|Φ) p(Φ). In hierarchical models, hyperparameters Φ are rarely involved in the model likelihood. Hierarchical models can be considered as a large set of stochastic formulations that include many popular models such as the random effects, the variance components, 1.7. Prediction in Bayesian Paradigm 23

Second Level First Level

β α θ y

Hyper p(α|β) p(θ|α) p(y|θ) Parameters Hyperprior Prior Distribution Likelihood

Figure 1.3: Graphical representation of a two-stage hierarchical Bayesian model. Squared nodes refer to constant parameters whereas oval nodes refer to stochastic components of the model.

the multilevel, and the generalized linear mixed models (GLMM). Details concerning hierarchical Bayesian models can be found in Gelman et al.(2014, chap. 5, 15), Ntzoufras(2009, chap. 9), Robert(2007, chap. 10), Woodworth(2004, chap. 11), Lawson et al.(2003, chap. 2), and Dey et al.(2000, chap. 2, 7). An excellent treatment of the subject can be found in Gelman and Hill(2007). This book is all about hierarchical models.

1.7 Prediction in Bayesian Paradigm

In Bayesian theory, predictions of unknown or future observables are based on predictive distributions introduced by Jeffreys in 1961 (Jeffreys, 1961), that is, the distribution of the data averaged over all possible parameter values. For this reason, when data y have not been observed yet, predictions are based on the marginal likelihood Z p(y) = p(y|θ)p(θ)dθ (1.10) which is the likelihood averaged over all parameter values supported by our prior beliefs. The distribution p(y) is often called the prior predictive distribution, prior because it is not conditional on a previous observation of the process, and predictive because it is the distribution for a quantity that is observable. 24 1. Introduction

After the data have been observed, one finds the prediction of new and unobserved data y˜ more interesting. Following the same process, the distribution of y˜, called posterior predictive distribution: posterior because it is conditional on the observed y and predictive because it is a prediction for an observable y˜, can be computed as Z p(˜y|y) = p(˜y|θ)p(θ|y)dθ, (1.11) which is the likelihood of predicted data averaged over the posterior distribution p(θ|y) (Ntzoufras, 2009). For example, if y has missing values, then the missing ys can be estimated with the posterior predictive distribution as y˜ from within the model. In case of linear regression model, the integral for prediction is

Z p(˜y|y) = p(˜y|β, σ2)p(β, σ2|y)dβdσ.

The posterior predictive distribution is easy to estimate as it is evident from the expression in the integrand that following steps will be required to simulate y˜ from p(˜y|y):

1. simulate β, σ2 from p(β, σ2|y)

2. simulate y˜ from p(˜y|β, σ2)

The second step shows that y˜ is a simulation from p(˜y|y).

According to Press(1989, p. 57-58), inference must be based on predictive distributions since they involve observables while the posterior distribution also involves parameters that are never observed. Hence, by using the predictive distribution, we can quantify our knowledge about future as well as measure the

probability of again observing in the future each yi assuming that the adopted model is true. For this reason, we may use the predictive distribution not only to predict future observations but also to construct goodness-of-fit diagnostics and perform model checks for each model’s structural assumptions. 1.8. Model Goodness of Fit 25

1.8 Model Goodness of Fit

In Bayesian paradigm, the most common method of assessing the goodness of fit of an estimated statistical model is a generalization of the frequentist Akaike Information Criterion (AIC). The Bayesian method, like AIC, is not a test of the model in the sense of hypothesis testing, though Bayesian theory has Bayes factors for such purposes. Instead, like AIC, Bayesian paradigm provides a model fit statistic that is to be used as a tool to refine the current model or select the better-fitting model of different methodologies.

To begin with, model fit can be summarized with deviance, which is defined as −2 times the log-likelihood (Gelman et al., 2004, p. 180), such as

D(y, θ) = −2log[p(y|θ)]. (1.12)

Just as with the likelihood, p(y|θ), or log-likelihood, the deviance exists at both the record- and model-level. Due to the development of BUGS software (Gilks et al., 1994), deviance is defined differently in Bayesian theory than frequentist. In frequentist statistics, deviance is −2 times the log-likelihood ratio of a reduced model compared to a full model, whereas in Bayesian paradigm, deviance is simply −2 times the log-likelihood. In Bayesian theory, the lowest expected deviance has the highest posterior density (Gelman et al., 2004, p. 181).

It is possible to have a negative deviance. Deviance is derived from the likelihood, which is derived from probability density functions (pdf). Evaluated at a certain point in parameter space, a pdf can have a density larger than 1 due to a small standard deviation or lack of variation. Likelihoods greater than 1 lead to negative deviance, and are appropriate.

On its own, the deviance is an insufficient model fit statistic, because it does not take model complexity into account. The effect of model fitting, pD, is used as 26 1. Introduction

the effective number of parameters of a Bayesian model. The sum of the differences between the posterior mean of the model-level deviance and the deviance at each

draw i of θi is the pD.

A related way to measure model complexity is as half the posterior variance of the model-level deviance, known as pV (Gelman et al., 2004, p. 182)

model complexity: pV = var(D)/2. (1.13)

The effect of model fitting, pD or pV , can be thought of as the number of un- constrained parameters in the model, where a parameter counts as: 1 if it is estimated with no constraints or prior information; 0 if it is fully constrained or if all the information about the parameter comes from the prior distribution; or an intermediate value if both the data and the prior are informative (Gelman et al., 2004, p. 182). Therefore, by including prior information, Bayesian inference is more efficient in terms of the effective number of parameters than frequentist inference. Hierarchical, mixed effects, or multilevel models are even more efficient regarding the effective number of parameters.

Model complexity, pD or pV , should be positive. Although pV must be positive since it is related to variance, it is possible for pD to be negative, which indicates one or more problems: log-likelihood is non-concave, a conflict between the prior and the data, or that the posterior mean is a poor estimator (such as with a bimodal posterior).

The sum of both the mean model-level deviance and the model complexity (pD or pV ) is the Deviance Information Criterion (DIC), a model fit statistic that is also an estimate of the expected loss, with deviance as a loss function (Spiegelhalter et al., 1998, 2002). DIC is

DIC = D¯ + pV (1.14) 1.9. Introduction to Reliability 27

DIC may be compared across different models and even different methods, as long as the dependent variable does not change between models, making DIC the most flexible model fit statistic. DIC is a hierarchical modeling generalization of the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Like AIC and BIC, it is an asymptotic approximation as the sample size becomes large. DIC is valid only when the joint posterior distribution is approximately multivariate normal. Models should be preferred with smaller DIC. Since DIC increases with model complexity (pD or pV ), simpler models are preferred.

It is difficult to say what would constitute an important difference in DIC. Very roughly, differences of more than 10 might rule out the model with the higher DIC, differences between 5 and 10 are substantial, but if the difference in DIC is, say, less than 5, and the models make very different inferences, then it could be misleading just to report the model with the lowest DIC.

The Bayesian Predictive Information Criterion (BPIC) was introduced as a criterion of model fit when the goal is to pick a model with the best out-of-sample predictive power (Ando, 2007). BPIC is a variation of DIC where the effective number of parameters is 2pD (or 2pV ). BPIC may be compared between ynew and yholdout, and has many other extensions, such as with Bayesian Model Averaging (BMA) (e.g., Statisticat LLC, 2013).

1.9 Introduction to Reliability

Nowadays, rapid advances in technology, development of highly sophisticated products, intense global competition, and increasing customer expectations have put new pressures on manufacturers to produce high quality products. Customers expect purchased products to be reliable and safe. Systems, electronic gadgets, vehicles, machines, devices, and so on should, with high probability, be able to 28 1. Introduction

perform their intended function under usual operating conditions, for some specified period of times (Meeker and Escobar, 1998). Moreover, there are many ways to define reliability. Colloquially, reliability is the property that a thing works when we want to use it. By necessity, more formal definitions of reliability must account for whether or not an item performs its intended function at or above a specified standard, how long it is able to perform at that standard, and the conditions under which it is operated. For example, the reliability of an electrical switch may be defined as the probability that it successfully functions under a specified load and at a particular temperature. In contrast, reliability may be expressed as an explicit function of time. Defining the reliability of a pump in a nuclear power plant depends both on its environment and on its ability to provide a specified capacity over time. As these examples illustrate, an operational definition of reliability must be specific enough to permit a clear distinction between items that are reliable and those that are not, but also must be sufficiently general to account for the complexities that arise in making this determination. In an effort to achieve both aims, the International Organization for Standardization (ISO, 1986) defines reliability as “the ability of an item to perform a required function, under given environmental and operating conditions and for a stated period of time” (Hamada et al., 2008). From this definition of reliability, it is observed that reliability analyses often involve the analysis of binary outcomes, that is, success/failure data. However, in practice, it is often as important to analyse the time periods over which items or systems function. Such analyses are called lifetime or failure time analyses. Lifetime analyses involve the analysis of positive, continuous-valued quantities (e.g., the length of time an item functions), and so require different statistical models than analyses based on success/failure data. Both type of such models are discussed in Chapter 2, which are used in this thesis. 1.9. Introduction to Reliability 29

1.9.1 Basic Definitions of Reliability

Let T denote a nonnegative (i.e., T ≥ 0) random variable representing the lifetime of individuals or units in some population. Here we consider the case of a single continuous lifetime variable T . Let F (.) denotes the cumulative distribution function (cdf) of T with corresponding probability density function (pdf) f(.). Note that f(t) = 0 for t < 0. Then

Z t F (t) = P (T ≤ t) = f(x)dx, (1.15) 0 which gives the probability that an individual will fail before time t. Alternatively, F (t) can be interpreted as the proportion of individuals in the population that will fail before time t.

1.9.1.1 Reliability Function

The probability that an individual performs or survives up to time t is given by the reliability function,

Z ∞ R(t) = P (T > t) = 1 − F (t) = f(x)dx. (1.16) t

This function is also referred to as the survival function. Note that R(t) is a

monotone decreasing function with R(0) = 1 and R(∞) = limt→∞ R(t) = 0. Consequently, the pdf can be expressed as

P (t < T ≤ t + ∆t) dF (t) dR(t) f(t) = lim = = − . (1.17) ∆t→0+ ∆T dt dt

The pdf can be used to represent relative frequency of failure times as a function

of time. The pth-quantile of the distribution of T is the value tp such that

F (tp) = P (T ≤ tp) = p. (1.18) 30 1. Introduction

−1 That is, tp = F (p). The pth-quantile is also referred to as the 100×pth-percentile of the distribution.

1.9.1.2 Hazard Function

The hazard function, also called hazard rate, specifies the instantaneous rate of failure at T = t given that the individual survived up to time t and is defined as

P (t < T ≤ t + ∆t|T > t) f(t) h(t) = lim = . (1.19) ∆t→0+ ∆t R(t)

The hazard function expresses the propensity to fail in the next small interval of time (t, t + ∆t], given survival up to time t. That is, for small ∆t,

h(t) × ∆t ≈ P (t < T ≤ t + ∆t|T > t). (1.20)

The hazard function is also referred to as the risk or mortality rate. We can view this as a measure of intensity at time t or a measure of the potential of failure at time t. The hazard is a rate rather than probability. It can assume value in [0, ∞). Figure 1.4 shows the four of the most common types of hazard functions. These include:

1. Increasing Failure Rate (IFR): The instantaneous failure rate (hazard) increases as a function of time. We expect to see an increasing number of failures for a given period of time.

2. Decreasing Failure Rate (DFR): The instantaneous failure rate decreases as a function of time. We expect to see a decreasing number of failures for a given period of time.

3. Bathtub Failure Rate (BFR): The instantaneous failure rate begins high because of early failures (infant mortality or burn-in failures), levels off for a period of time (useful life), and then increases (wearout or aging failures). 1.9. Introduction to Reliability 31 3.0 Decreasing Failure Rate 2.5

Increasing Failure Rate 2.0 1.5 h(t)

Constant Failure Rate 1.0 Bathtub Failure Rate 0.5 0.0

0.0 0.2 0.4 0.6 0.8 1.0

t

Figure 1.4: Four different kinds of hazard functions, including bathtub hazard function.

4. Constant Failure Rate (CFR): The instantaneous failure rate is constant for the observed lifetime. We expect to see a relatively constant number of failures for a given period of time.

1.9.1.3 Relationship among Measures

The functions f(t), F (t), R(t), and h(t) give mathematically equivalent specifica- tions of the distribution of T . It is easy to derive expressions for R(t) and f(t) in terms of h(t), since f(t) = −dR(t)/dt, implies that

dR(t)/dt dlog (R(t)) h(t) = − = − . (1.21) R(t) dt

Integrating h(s) over (0, t), gives cumulative hazard function H(t)

Z t H(t) = h(s)ds = −log (R(t)) . (1.22) 0 32 1. Introduction

Here, unless otherwise specified, log denotes the natural logarithm, the inverse function of the exponential function exp = e. Thus,

 Z t  R(t) = exp (−H(t)) = exp − h(s)ds . (1.23) 0

Hence, the pdf of T can be expressed as

 Z t  f(t) = h(t) exp − h(s)ds = h(t) × R(t). (1.24) 0

R ∞ Note that H(∞) = 0 h(t)dt = ∞.

1.9.1.4 Mean Time to Failure

For a nonnegative random variable T the mean time to failure (MTTF) is defined as Z ∞ MTTF = E(T ) = tf(t)dt, (1.25) 0 which can also be shown as

Z ∞ MTTF = E(T ) = R(t)dt, (1.26) 0 where E(T ) is the expected value or mean value of T . The MTTF is also called the expected life.

1.9.1.5 Mean Residual Life

The mean residual life is the expected time to failure of a device that has survived to time t, which is defined as

1 Z ∞ M(T ) = sf(t + s)ds. (1.27) R(t) t

Notice that M(0) = E(T ). 1.9. Introduction to Reliability 33

1.9.2 Censoring

Censoring is one of the unique features of reliability data, specially failure time data, which means that an observation on a lifetime of interest is not complete, that is, the lifetime is observed only to fall into a certain range instead of being known exactly. Meeker and Escobar(1998) says censoring restricts the ability to observe failure times exactly. According to Hamada et al.(2008), lifetime data are censored when the exact failure time for a specific item is unknown. That is, when experiment ends and collection stops, at least one unit has not failed , its failure time is censored and the exact failure time is unknown. Different types of censoring arises in practice, which includes right, left, interval, time, and failure censoring, but the one that receives most of the attention is right censoring. For example, all the roller bearings with asterisk given in Table 2.8 were working after the end of experiment, which means that their failure times exceed the times when they were inspected, so that their failure times are censored. Such type of data are called right-censored data. Right censoring also arises due to Type-I censoring, also called time censoring, in which experiment is terminated at a pre-specified time. All items that are still working by the last inspection have right-censored failure times data. Left censoring arises when an item fails before the first inspection (Hamada et al., 2008). For example, consider a set of 100 mobile chargers and put these chargers on an experiment for 50 days to examine whether each charger is still working or not. The experiment starts at midnight and chargers examined at 10:00 a.m. everyday. The data for the lifetime of any charger that fails before 10:00 a.m. on first day are left censored. This type of censoring scheme is very rarely encountered. Both left and right censoring are two special cases of interval censoring, which arises when an item’s failure time is known to have occurred in

an interval, for instance, [ti−1, ti]. If an item is left-censored at time t, then its failure time is in interval (0, t]. Similarly, an item is right-censored at t if its failure time is in interval [t, ∞). The failure time of a mobile charger is an example of 34 1. Introduction

interval censoring because they can only be examined to within 24-hours interval. Another category of censoring is Type-II censoring or failure censoring. In failure censoring, a lifetime experiment is terminated after a specified number of failures have occurred. The items that are still working have right-censored failure times. For example, if a batch of 100 deep freezers are kept on testing, and specified that the test will terminate after 50 failures. The data for the 50 deep freezers that do not fail are failure censored. The Type-II censoring experiments are less common in practice. Although, the statistical properties of estimates from failure censored data are simpler than the corresponding properties from time censored data (Meeker and Escobar, 1998).

Usually, it is considered that censoring and lifetimes or survival times are independent of each other. This type of censoring is known as independent or non-informative censoring. For example, if a deep freezer is removed from the experiment because it is sold out, then censoring of its failure time is independent of its survival time. Finally, an item may need to be removed from the test because of a failure that is not of interest, for example, the experimenter may damage it accidentally, so that it can no longer be tested (Hamada et al., 2008). This is an example of random censoring, which for this example, produces a right-censored failure time. Censoring in more detail can be found in Nelson(1990), Meeker and Escobar(1998), Kalbfleisch and Prentice(2002), Lawless(2003), Tableman and Kim(2004), Sun(2006), Hamada et al.(2008), Wu and Hamada(2009), and Kleinbaum and Klein(2012).

1.9.2.1 Likelihood and Censoring

Since Bayesian modeling, in this case of reliability data, uses the observed response in its analyses, therefore, censoring mechanisms are easily addressed. If we observe that an item has failed at or before the time of first inspection, which means that its life time data are left censored and lies in interval [0, t]. Thus, the probability 1.9. Introduction to Reliability 35

of observing a failure in this interval is

Z t P (T ≤ t) = f(t)dt = F (t). o

The likelihood function is either equal to or approximately proportional (or equal or similar) to the probability of the data (Meeker and Escobar, 1998). Thats why, this probability F (t) is the left censored failure time’s contribution to the likelihood function for estimating the parameters of f(.) (or the model), and cause of censoring does not matter. By assuming that, for Bayesian modeling of censored data, all the data are independent, the likelihood function is the product of all their contributions.

For different types of censored data, Table 1.1 represents their contributions to likelihood functions. Figure 1.5 illustrates the intervals of uncertainty for examples of left-censored, interval-censored, and right-censored data. An advantage of Bayesian approach is that only the censoring pattern, e.g., a right censored failure time, is relevant, not which censoring scheme, such as Type-I, Type-II, or random censoring produced it (Hamada et al., 2008). The general form of likelihood function takes the following notation.

Perhaps, our interest in likelihood function is to fit a Bayesian model for censored data. In order to analyse such data (type), let y represents the observed

response due to all data including both uncensored and censored, yun represents

the response due to uncensored data, and ycen represents response due to censored

Type of censoring Characteristic Contribution Uncensored T = t f(t) Left censored at ti T ≤ ti F (ti) Interval censored in ti−1 and ti ti−1 ≤ T ≤ ti F (ti) − F (ti−1) Right censored at ti T > t 1 − F (ti) Table 1.1: Contributions to the likelihood function for lifetime data. 36 1. Introduction

Left Censoring 0.8 0.6

Interval Censoring ) θ y| p( 0.4 0.2

Right Censoring 0.0

0.0 0.5 1.0 1.5 2.0 2.5

t

Figure 1.5: The likelihood contributions for different kind of censoring.

data. Then the likelihood for all data y = (yun, ycen), by considering that the failure times are conditionally independent given θ, the vector of model parameters, and censoring mechanism is independent of the failure time, will be

p(y|θ) = p(yun|θ) × p(ycen|θ). (1.28)

The likelihood function based on censored data looks like the usual likelihood function, but we have to add the information given by the censored data. All the censored data used in this thesis are right-censored. For this reason, the likelihood, as for the roller bearing failure time data discussed in Chapter 2, will be of the form n m Y Y p(y|θ) = p(yun,i|θ) × p(ycen,j|θ), (1.29) i=1 j=1 where n and m are the number of uncensored and censored data, respectively,

and yun,i and ycen,j are the responses for ith uncensored and jth censored data,

respectively. It may be noted that for right-censored data p(ycen,j|θ) is nothing but 1.10. Tools and Techniques 37

1 − F (ycen,j) = R(ycen,j). Contribution to the likelihood function for survival times data for other categories of censoring like Type-I, Type-II, or random censoring, can be found with details in Martz and Waller(1982), Nelson(1990), Zacks(1992), Meeker and Escobar(1998), Kalbfleisch and Prentice(2002), Lawless(2003), Klein and Moeschberger(2003), Tableman and Kim(2004), Sun(2006), Marshall and Olkin(2007), Hamada et al.(2008), Ntzoufras(2009), Wu and Hamada(2009), and Kleinbaum and Klein(2012),.

An appealing feature of Bayesian approach for handling censored data is that once the likelihood function is specified, we need to choose only appropriate prior distributions for the model parameters θ, and approximate their joint posterior density using MCMC. Unlike classical methods, which have different methods for each type of censored data, the Bayesian method provides a common framework to analyse all types of censored data (Hamada et al., 2008).

1.10 Tools and Techniques

The technical problem of evaluating quantities required for Bayesian inference typically reduces to the calculation of a ratio of two integrals (Bernardo and Smith, 2000, p. 339). In all cases, the technical key to the implementation of the formal solution given by Bayes’ theorem is the ability to perform a number of integrations (Bernardo and Smith, 2000, p. 340). Except in certain rather stylized problems, the required integrations will not be feasible analytically and thus, efficient approximation strategies are required (Statisticat LLC, 2013). There are too many different types of numerical approximation algorithms in Bayesian theory. An incomplete list of broad categories of Bayesian numerical approximation may include asymptotic approximation methods including normal and Laplace approximations, direct simulation, and Markov chain Monte Carlo simulation methods. 38 1. Introduction

1.10.1 Asymptotic Approximation Methods

Many simple Bayesian analyses based on non-informative prior distributions give similar results to standard non-Bayesian approaches, for example, the posterior t-interval for the normal mean with unknown variance. The extent to which a non- informative prior distribution can be justified as an objective assumption depends on the amount of information available in the data; in the simple cases as the sample size n increases, the influence of the prior distribution on posterior inference decreases. These ideas are sometimes referred to as asymptotic approximation theory because they refer to properties that hold in the limit as n becomes large.

This section begins with the discussion of normal and Laplace’s approxima- tions for the posterior distribution. Applications are discussed in the remaining chapters.

1.10.1.1 The Normal Approximation

If the posterior distribution p(θ|y) is unimodal and roughly symmetric, it is often convenient to approximate it by a normal distribution centered at the mode, that is, the logarithm of the posterior density function is approximated by a quadratic function.

A Taylor series expansion of log p(θ|y) centered at the posterior mode, θˆ gives

" 2 # ˆ 1 ˆ T d ˆ log p(θ|y) = log p(θ|y) + (θ − θ) 2 log p(θ|y) (θ − θ) + ..., (1.30) 2 dθ θ=θˆ where the linear term in the expansion is zero because the log-posterior density has zero derivatives at its mode, the remainder terms of higher order fade in importance relative to the quadratic term when θ is close to θˆ and n is large. Considering Equation (1.30) as a function of θ, the first term is constant, whereas the second term is proportion to the logarithm of a normal density, yielding the 1.10. Tools and Techniques 39

approximation p(θ|y) ≈ N(θ,ˆ [I(θˆ)]−1), (1.31) where I(θ) is the observed information, which is negative of Hessian,

d2 I(θ) = − log p(θ|y). dθ2

If the mode θˆ, is in the interior of parameter space, then I(θˆ) is positive; if θ is a vector parameter, then I(θ) is a matrix (e.g., Gelman et al., 2004).

Note that the normal approximation does not impose any condition on the parameter. It can be applied to any transformation φ = t(θ). This will lead to a normal posterior with mean given by the posterior mode of φ and variance given by minus the inverse of the Hessian of the log posterior of φ evaluated at the mode. If the prior influence is discarded, the normal approximation for the posterior of φ becomes φ ∼ N(φ,ˆ I−1(φˆ)).

The normal approximation ignores skewness and secondary modes and it will only work well if the posterior is similar in shape to a normal distribution. When this does not apply, it is still possible to transform the parameter to a more adequate space. There are no optimal rules but good practical suggestions, for example, there are re-parameterizations that allow the parameter to vary over the real line and those leading to constant (Jeffreys) non-informative priors.

1.10.1.2 The Laplace Approximation

The integrals involved in Bayesian analysis are often quite complex. Familiar numerical methods such as Simpson’s rule and Gaussian quadrature can be difficult to apply in multivariate setting. Often an analytic approximation such as Laplace’s method, based on the normal distribution, can be used to provide an adequate approximation to such integrals. In order to evaluate the integral E(h(θ)|y) = R h(θ)p(θ|y)dθ using Laplace’s method, first express the integrand 40 1. Introduction

in the form exp[log(h(θ)p(θ|y))] and then expand log(h(θ)p(θ|y)) as a function of θ in a quadratic Taylor series approximation around its mode. The resulting approximation for h(θ)p(θ|y) is proportional to a (multivariate) normal density in θ, and its integral is just

d/2 00 1/2 approximation of E(h(θ)|y): h(θ0)p(θ0|y)(2π) | − u (θ0)| where d is the dimension of θ, u(θ) = log(h(θ)p(θ|y)), and θ0 is the point at which u(θ) is maximized.

Suppose −h(θ) is a smooth, bounded unimodal function, with a maximum at θˆ, and θ is a scalar. By Laplace’s method (Tierney and Kadane, 1986), the integral

Z I = f(θ) exp[−nh(θ)]dθ can be approximated by

s 2π Iˆ = f(θˆ) σ exp[−nh(θˆ)], n where "∂2h #−1/2 σ = | . ∂θ2 θˆ As presented in Mosteller and Wallace(1964), Laplace’s method is to expand about θˆ to obtain:

   Z ˆ 2 0 (θ − θ) 00 I ≈ f(θˆ) exp −n h(θˆ) + (θ − θˆ)h (θˆ) + h (θˆ) dθ. 2

Recalling that h0(θˆ) = 0, we have

   Z ˆ 2 (θ − θ) 00 I ≈ f(θˆ) exp −n h(θˆ) + h (θˆ) dθ 2   Z −n(θ − θˆ)2 = f(θˆ) exp[−nh(θˆ)] exp   dθ 2σ2 1.10. Tools and Techniques 41

s 2π = f(θˆ) σ exp[−nh(θˆ)]. n

Intuitively, if exp[−nh(θ)] is very peaked about θˆ, then the integral can be well approximated by the behavior of the integrand near θˆ. More formally, it can be shown that   1  I = Iˆ 1 + O . n To calculate moments of posterior distributions, it is needed to evaluate expressions such as: R g(θ) exp[−nh(θ)]dθ E[g(θ)] = , (1.32) R exp[−nh(θ)]dθ where exp[−nh(θ)] = L(θ|y)p(θ). Tierney and Kadane(1986) provides two approx- imations to E[g(θ)]. The term n refer to the sample size (e.g., Tanner, 1996).

Result 1. E[g(θ)] = g(θˆ)[1 + O(1/n)]

To see this, apply Laplace’s method to the numerator of Equation (1.32) with f = g, to obtain the approximation

s 2π g(θˆ) σ exp[−nh(θˆ)]. n

Next, apply Laplace’s method to the denominator of Equation (1.32) with f = 1, to obtain the approximation

s 2π σ exp[−nh(θˆ)]. n

Tierney and Kadane(1986) show that the resulting ratio g(θˆ) has error O(1/n). Mosteller and Wallace(1964) present related results.

Result 2. σ∗  exp[−nh∗(θ∗)]   1  E[g(θ)] = × 1 + O . σ exp[−nh(θˆ)] n2 42 1. Introduction

To see this, first apply Laplace’s method to the numerator of Equation (1.32) with f = 1, g positive, and −nh∗(θ) = −nh(θ) + log(g(θ)), where θ∗ is the mode of −h∗(θ) and " 2 ∗ #−1/2 ∗ ∂ h σ = | ∗ . ∂θ2 θ Next, apply Laplace’s method to the denominator with f = 1. Tierney and Kadane(1986) shows that the resulting ratio has error O(1/n2). Again, Mosteller and Wallace(1964) presents related results.

For multivariate θ,

" 2 ∗ #−1 " 2 #−1 X∗ ∂ h X ∂ h = | ∗ , = | ∂θ2 θ ∂θ2 θˆ

and detΣ∗ !1/2 exp[−nh∗(θ∗)]   1  E[g(θ)] = × 1 + O . detΣ exp[−nh(θˆ)] n2

1.10.2 Simulation Methods

Simulation provides a straightforward way of approximating probabilities or inte- grals. The following two sections provide an introduction concerning the use of simulation methods in Bayesian inference. The focus is on direct simulation and Markov chain Monte Carlo simulation methods that are widely used in Bayesian inference.

1.10.3 Direct Simulation Methods

From the form of posterior distribution (i.e., Equation 1.4), it is noted that the posterior distribution can easily become intractable and lead to an unrecognized distribution. This created a burden in the early development stages of Bayesian modeling. Nowadays, with the availability of the computing tools, one can make 1.10. Tools and Techniques 43

statistical inferences from the posterior distribution through simulation. To do so,

a random sample of size n as θ1, . . . , θn is drawn from the posterior distribution p(θ|y). From these samples, for any function of the parameter θ of g, the mean of P g(θ ) i i E[g(θ)|y] can be estimated by n and the posterior distribution for g(θ) can

be approximated by a histogram of the values g(θ1), . . . , g(θn). Thus, having a choice of g(θ), any statistical inference can be made from this sample.

Moreover, simulation forms a central part of much applied Bayesian analysis, because of the relative ease with which samples can often be generated from a probability distribution, even when the density function can not be explicitly integrated. Another advantage of simulation is that extremely large or small simulated values often flag a problem with model specification or parametrization that might not be noticed if estimates and probability statements were obtained in analytic form (Gelman et al., 2014).

The challenge of this simulation approach is to find a viable way to draw samples from posterior density p(θ|y). There are many methods to do so, and here we briefly describe some of the more important ones used in this thesis for Bayesian modeling of reliability data. These are

1. Monte Carlo Approximation

2. Rejection Sampling

3. Importance Sampling

4. Sampling Importance Resampling

1.10.3.1 Monte Carlo Approximation Method

Using appropriate functions of random numbers, generate a large sample of instances of the random variable and empirically estimate the property of interest based on the sample. This method is popularly known as Monte Carlo Approximation. The 44 1. Introduction

name obviously derives from the famous casino resort of the Italian Riviera. Monte Carlo basically means simulating random samples and its implementation does not require a deep knowledge of calculus or numerical analysis. Due to non-availability of fast computing algorithms of generating random samples, the job of simulation was quite tedious. However, in the recent past because of the development of the modern computer and computing algorithms (over time), Monte Carlo methods have experienced explosive growth. Monte Carlo methods are now standard in Bayesian statistics and all sciences to solve many types of problems.

Mathematically, suppose θ has a posterior density p(θ|y) and we are interested in learning about a particular function of the parameter g(θ). The posterior mean of g(θ) is given by Z E[g(θ)|y] = g(θ)p(θ|y)dθ.

Suppose we are able to simulate an independent samples θ1, . . . , θm from the posterior density. The law of large number says that if θ1, . . . , θm are iid samples from p(θ|y), then the Monte Carlo estimate at the posterior mean is

1 m E[g(θ)|y] ≈ X g(θj), as m → ∞. m j=1

This implies that as m → ∞ (Hoff, 2009),

¯ Pm j • θ = j=1 θ /m → E[θ|y];

Pm j ¯ 2 • j=1(θ − θ) /(m − 1) → V [θ|y];

• the number of (θj ≤ c)/m → P (θ ≤ c|y):

• the empirical distribution of {θ1, . . . , θm} → p(θ|y);

1 m • the median of {θ , . . . , θ } → θ1/2;

1 m • the α-percentile of {θ , . . . , θ } → θα 1.10. Tools and Techniques 45

The Monte Carlo approach is an effective method for summarizing a posterior when simulated samples are available from the exact posterior distribution.

1.10.3.2 Rejection Sampling

Many general techniques are available for simulating draws from the target density, the posterior distribution p(θ|y). One of them is rejection sampling. Rejection sampling is a general purpose algorithm for simulating random draws from a given probability distribution. In this setting, suppose we wish to produce an independent sample from a posterior distribution p(θ|y), where the normalizing constant may not be known. The first step in rejection sampling is to find another probability density such that:

• It is easy to simulate draws from q.

• The density q resembles the posterior density of interest p in terms of location, shape, and spread.

• For all θ and a constant c, p(θ|y) ≤ c q(θ).

Now, suppose we are able to find density q with these properties. Then one obtains draws from p using the following accept/reject algorithm:

1. Independently simulate θ from q and a uniform random variable U on the unit interval.

2. If U ≤ p(θ|y)/c q(θ), then accept θ as a draw from the density p; otherwise reject θ.

3. Continue Steps 1 and 2 of the algorithm until one has collected a sufficient number of accepted θ.

Rejection sampling is one of the most useful methods for simulating draws from a variety of distributions, and standard methods for simulating from standard 46 1. Introduction

probability distributions such as normal, gamma, and beta are typically based on rejection sampling. The main task in designing a rejection sampling algorithm is finding a suitable proposal density q and a constant value c. At Step 2 of the algorithm, the probability of accepting candidate draw is given by p(θ|y)/c q(θ). One can monitor the algorithm by computing proportion of draws of q that are accepted; an efficient rejection sampling algorithm has a high acceptance rate (Albert, 2009).

1.10.3.3 Importance Resampling

Let us return to the basic problem of computing an integral in Bayesian statistics. In many situations, the normalizing constant of the posterior density p(θ|y) will be unknown, so the posterior mean of the function g(θ) will be given by the ratio of integrals R g(θ)p(θ)p(y|θ)dθ E[g(θ)|y] = , (1.33) R p(θ)p(y|θ)dθ where p(θ) is the prior and p(y|θ) is the likelihood function. If we were able to simulate a sample θj directly from the posterior density p, then we could approximate this expectation by the Monte Carlo method. In the case where we are not able to generate a sample directly from p, suppose, instead that we can construct a probability density q that we can simulate and that approximates the posterior density p.

Rewrite the posterior mean as

R g(θ) p(θ)p(y|θ) q(θ)dθ E[g(θ)|y] = q(θ) R p(θ)p(y|θ) q(θ) q(θ)dθ R g(θ)w(θ)q(θ)dθ = (1.34) R w(θ)q(θ)dθ where the factor w(θ) = p(θ)p(y|θ)/q(θ) is the weight function called importance weight or importance ratio (Albert, 2009). If θ1, . . . , θm is a simulated sample from 1.10. Tools and Techniques 47

the approximate density q, then the importance sampling estimate of the posterior mean is Pm j j j=1 g(θ )w(θ ) g¯IS = Pm j . (1.35) j=1 w(θ ) This is called an importance sampling estimate because we are sampling values of θ that are important in computing the integrals in the numerator and denominator. The simulated standard error of an importance sampling estimate is estimated by qPm j j 2 j=1[(g(θ ) − g¯IS)w(θ )] seg¯IS = Pm j . (1.36) j=1 w(θ ) As in rejection sampling, the main issue in designing a good importance sampling estimate is finding a suitable sampling density q. This density should be of a familiar functional form so that simulated draws are available. The density should mimic the posterior density p and have relatively flat tails so that the weight function w(θ) is bounded from above. One can monitor the choice of q by inspecting the values of the simulated weights w(θj). If there are no usually large weights, then it is likely that the weight function is bounded and the importance sampler is providing a suitable estimate. Note that, it is generally advisable to use the same set of random draws for both the numerator and denominator in order to reduce the sampling error in the estimate.

1.10.3.4 Sampling Importance Resampling

In rejection sampling, we simulate draws from a proposal density q and accept a subset of these values to be distributed according to the posterior density of interest, p(θ|y). There is an alternative method of obtaining a simulated sample from the posterior density p motivated by the importance sampling algorithm.

As before, we simulate m draws from the proposal density q denoted by θ1, . . . , θm and compute the weights w(θj) = p(θj|y)/q(θj). Convert the weights to 48 1. Introduction

probabilities by using the formula

j j w(θ ) q = Pm j . (1.37) j=1 w(θ )

Let us take a new sample θ∗1, . . . , θ∗m from the discrete distribution over θ1, . . . , θm with respective probabilities q1, . . . , qm. Then the θ∗j will be approximately distributed according to the posterior distribution p. This method, called sampling importance resampling (SIR), is a weighted bootstrap procedure where we sample with replacement from the sample θj with unequal sampling probabilities (Albert, 2009). This sampling algorithm is straightforward to implement in R.

1.10.4 Markov Chain Monte Carlo Methods

The simulation methods described in the previous section, also called methods of direct simulation, cannot be applied in all cases. They refer mostly to unidimensional distributions. Moreover, some of them are focused on the effective computation of specific integrals and cannot be used to obtain samples from any posterior distribution of interest (Givens and Hoeting, 2005, p. 183). Simulation techniques based on Markov chains overcome such problems because of their generality and flexibility. These characteristics along with the massive development of computing facilities have made Markov chain Monte Carlo (MCMC) methods popular since the early 1990s. MCMC methods were introduced into physics in 1953 in a simplified version by Metropolis and his associates (Metropolis et al., 1953). Intermediate landmark publications include the generalization of Metropolis algorithm by Hastings(1970) and development of the Gibbs sampler by Geman and Geman(1984). Nevertheless, it took about 35 years until MCMC methods were rediscovered by Bayesian scientists (Tanner and Wong, 1987; Gelfand et al., 1990; Gelfand and Smith, 1990) and became one of the main computational tools in modern statistical inference. 1.10. Tools and Techniques 49

Markov chain Monte Carlo methods enabled quantitative researchers to use highly complicated models and estimate the corresponding posterior distributions with accuracy. In this way, MCMC methods have greatly contributed to the development and propagation of Bayesian theory. MCMC methods are based on the construction of a Markov chain that eventually “converges” to the target distribution (called stationary or equilibrium), which in our case, is the posterior distribution p(θ|y). This is the main way to distinguish MCMC algorithms from direct simulation methods, which provide samples directly from the target – posterior distribution. Moreover, the MCMC produces a dependent sample since it is generated from a Markov chain, in contrast to the output of direct methods, which is an independent sample. These MCMC methods incorporate the notion of an iterative procedure, for this reason they are frequently called iterative methods, since in every step they produce values depending on the previous one.

A Markov chain is a stochastic process or a sequence of dependent random variables {θ1, θ2, . . . , θT } such that

p(θt+1|θt, θt−1, . . . , θ1) = p(θt+1|θt),

that is, the distribution of θ at sequence t + 1 given all the preceding θ values for times t, t − 1,..., 1 depends only on the value θt of the previous sequence t. Moreover, p(θt+1|θt) is independent of time t. Finally, when the Markov chain is irreducible, aperiodic, and positive recurrent, as t → ∞ the distribution of θt converges to its equilibrium distribution, which is independent of the initial values of the chain θ0. More details can be seen in Gilks et al.(1996).

In order to generate a sample from p(θ|y), one must construct a Markov chain with two desired properties:

• p(θt+1|θt) should be “easy to generate from”, and

• the equilibrium distribution of the selected Markov chain must be the posterior distribution of interest p(θ|y). 50 1. Introduction

Assuming that, to construct a Markov chain with these requirements, then

1. Select an initial value θ0.

2. Generate T values until the equilibrium distribution is reached.

3. Monitor the converge of the algorithm using convergence diagnostics. If convergence diagnostics fail, then generate more observations.

4. Discard the first S observations and consider {θS+1, θS, . . . , θT } as the sample for the posterior analysis.

5. Plot the posterior distribution. Usually, focus is on the univariate marginal distributions.

6. Finally, obtain summaries of the posterior distribution that is mean, median, standard deviation, quantiles, correlations etc.

There are two most popular MCMC methods: the Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970) and the Gibbs Sampling (Geman and Geman, 1984). The Independent Metropolis and the Random-walk Metropolis are two special cases of Metropolis-Hastings algorithm. Moreover, many variants and extensions of these algorithms have been developed. Although, they are based on the principles of the original algorithms, most of these algorithms are more advanced and complicated than the original ones and usually focus on specific problems. Some important more recent developments reported in the MCMC literature are the Slice Sampler (Higdon, 1998; Damien et al., 1999; Neal, 2003), Reversible Jump MCMC algorithm (Green, 1995), and Perfect Sampling (Propp and Wilson, 1996; Møller, 1999). In the following subsections, focus is on the most popular methods. More additional information concerning MCMC specific methods can be found in Gilks et al.(1996), Givens and Hoeting(2005), Gamerman and Lopes(2006), Ntzoufras(2009), and Robert and Casella(2010). 1.10. Tools and Techniques 51

1.10.4.1 Metropolis-Hastings Algorithm

Metropolis et al.(1953) originally formulated the Metropolis algorithm, by introduc- ing the Markov-chain-based simulation methods used in science. Later, Hastings (1970) generalized the original method which became popular as the Metropolis- Hastings (MH) algorithm. The original Metropolis algorithm described in previous section, involved the use of symmetric proposal distribution. The Metropolis- Hastings algorithm extends the use of original algorithm to the situations where the proposal distribution is not symmetric. This algorithm is considered to be the general formulation of all MCMC methods. Green(1995) further generalized the Metropolis-Hastings algorithm by introducing Reversible Jump Metropolis-Hastings algorithm for sampling from parameter spaces with different dimensions.

Let us consider a target density p(θ|y) from which we wish to generate a sample of size T . The Metropolis-Hastings can be described by the following iterative steps, where θt is the vector of simulated values in t iteration of the algorithm:

1. Set initial values θ0.

2. For t = 1, 2,...,T repeat the following steps

• Simulate a new candidate parameter value θ∗ from a proposal density q(θ∗|θt−1). p(θ∗|y)q(θt−1|θ∗) ! • Compute α = min 1, p(θt−1|y)q(θ∗|θt−1) . • Update θt = θ∗ with acceptance probability α; otherwise set θt = θt−1.

The Metropolis-Hastings algorithm of the sequence of simulated draws θ1, θ2, . . . , θT will converge to its target density regardless of whatever the proposal distribution q is selected. Nevertheless, in practice, choice of the proposal is important because poor choice will considerably delay convergence towards the target — the posterior distribution. 52 1. Introduction

The two special cases of Metropolis-Hastings algorithm are the Independent Metropolis and the Random-walk Metropolis, which depend upon the choice of the proposal density. These frequently used algorithm adaptations are described below.

1.10.4.2 The Independent Metropolis

Different Metropolis-Hastings algorithms are constructed depending on the choice of proposal density. The Metropolis-Hastings algorithm described in previous section allows a proposal distribution q that depends only on the previous state θt−1 of the chain. If we now require the proposal distribution q to be independent of this previous state of the chain, that is,

q(θ∗|θt−1) = q(θ∗),

then we do get a special case of original Metropolis algorithm called the Independent Metropolis-Hastings or simply the Independent Metropolis (IM) algorithm.

In this situation, the acceptance probability is given by

p(θ∗|y) q(θt−1)! α = min 1, (1.38) p(θt−1|y) q(θ∗) which can also be expressed as

w(θ∗|y) ! α = min 1, , (1.39) w(θt−1|y) where w(θ|y) = p(θ|y)/q(θ) is the ratio of the target (posterior) and the proposal distributions, and is equivalent to the importance weight.

The IM algorithm is efficient when the proposal q is a good approximation of the posterior distribution p(θ|y). The good independent proposal can be based on Laplace approximation (Tierney and Kadane, 1986; Tierney et al., 1989; Erkanli, 1.10. Tools and Techniques 53

1994). According to Ntzoufras(2009), the acceptance rate in case of independent Metropolis must be high enough to obtain an efficient algorithm. The high acceptance rate indicates that the proposal is a sufficient approximation of the target distribution.

1.10.4.3 The Random-walk Metropolis

A more natural approach, different from the independent Metropolis which ignores information from the previous state of the chain, for the practical construction of a Metropolis proposal is thus to take into account the value previously simulated to generate the next value of the chain. The algorithm with this approach is called the Random-walk Metropolis (RWM) algorithm, which slightly differs from the original Metropolis (Metropolis et al., 1953) algorithm. In original Metropolis, only symmetric proposal of type q(θ∗|θt−1) = q(θt−1|θ∗) were considered. In Random- walk Metropolis, as a special case, the proposal density can be defined by letting the density of the form

q(θ∗|θt−1) = q(|θ∗ − θt−1|), (1.40) which is symmetric about the origin. In this case of random-walk Metropolis, the acceptance probability is of the form

p(θ∗|y) ! α = min 1, . (1.41) p(θt−1|y)

Notice that, in both cases of original and random-walk Metropolis algorithms, the acceptance probability depends only on the target distribution. Moreover, in contrast to the independent Metropolis, where the acceptance rate must be high enough to obtain an efficient algorithm, in random-walk Metropolis, the acceptance rate according to Roberts et al.(1997) and Neal and Roberts(2008) is around 25%, varying from 0.23 for large dimensions to 0.45 for the univariate case. 54 1. Introduction

1.10.4.4 Gibbs Sampling

The Gibbs sampling was introduced by the landmark paper of Geman and Geman in 1984, which first applied a Gibbs sampler on a Gibbs random field. The work of Geman and Geman(1984), built on that of Metropolis et al.(1953), Hastings (1970), and Peskun(1973), influenced Gelfand and Smith(1990) to write a paper that sparked a new interest in Bayesian methods, statistical computing, algorithms, and stochastic processes through the use of computing algorithms such as the Gibbs sampling and Metropolis-Hastings algorithms.

Indeed, Gibbs sampling is in fact a special case of single-component Metropolis- Hastings algorithm using as proposal density q(θ∗|θt), the full conditional posterior distribution

t−1 p(θj|θ−j , y),

t−1 T where θ−j = (θ1, . . . , θj−1, θj+1, . . . , θd) . Such proposal distributions result in acceptance probability α = 1, and therefore, the proposal move is accepted in all iterations. Although, Gibbs sampling is a special case of Metropolis-Hastings algorithm, it is usually cited as a separate simulation technique because of its popularity and convenience. An advantage of Gibbs sampling is that, in each step, random values must be generated from unidimensional distributions for which a wide variety of computational tools exists (Gilks et al., 1996). Frequently, these conditional distributions have a known form and thus random numbers can easily be simulated using standard functions in statistical and computing software. Gibbs sampling is always moving to new values and most importantly, does not require specification of proposal distributions. On the other hand, it can be ineffective when the parameter space is complicated or the parameters are highly correlated. The algorithm can be summarized by the following steps:

1. Set initial values θ0. 1.10. Tools and Techniques 55

2. For t = 1, 2,...,T repeat the following steps

t−1 • For j = 1, 2, . . . , d, update θj from θj ∼ p(θj|θ−j , y).

• Set θt = θt−1 and save it as the simulated set of values at t + 1 iteration of the algorithm.

Hence, given a particular state of chain θt, the new parameter values are simulated as t t−1 t−1 t−1 θ1 from p(θ1|θ2 , θ3 , . . . , θp , y),

t t t−1 t−1 θ2 from p(θ2|θ1, θ3 , . . . , θp , y),

t t t t−1 θ3 from p(θ3|θ1, θ2, . . . , θp , y), ......

t t t t t−1 t−1 θj from p(θj|θ1, θ2, . . . , θj−1, θj+1, θp , y), ......

t t t t−1 θp from p(θp|θ1, θ2, . . . , θp−1, y).

t−1 t t t t−1 t−1 Generating values from p(θj|θ−j , y) = p(θj|θ1, θ2, . . . , θj−1, θj+1, . . . , θp , y) is rela- t−1 tively easy since it is a univariate distribution and can be written as p(θj|θ−j , y) ∝

p(θ|y), where all the variable except θj are held constant at their given values. More detailed description of the Gibbs sampler is given by Casella and George(1992) and Smith and Roberts(1993), while early applications of the Gibbs sampling are provided by Gelfand and Smith(1990) and Gelfand et al.(1990).

1.10.4.5 Metropolis-within-Gibbs

Nowadays, the wide range of available computational tools for generating random values from univariate distributions allow us to implement the Gibbs sampler in a variety of cases, even when the resulting conditional posterior distribution is cumbersome. Nevertheless, on some occasions, it is convenient to use simple Metropolis-Hastings steps to generate from these univariate conditional posterior 56 1. Introduction

distributions. This approach is called the Metropolis within Gibbs (MWG) algorithm and it is simply a componentwise Metropolis-Hastings algorithm in which some components of the parameter vector are directly generated from the corresponding full conditional posterior distributions (Ntzoufras, 2009). This combination of Metropolis and Gibbs steps is frequently used in practice. In this way, the user can easily incorporate blocking, take advantage of specific easy-to-generate full conditionals, and generally build an efficient and flexible MCMC algorithm.

1.11 Software Packages Used

1.11.1 R Language

R is a system for statistical computations and graphics created by Ross Ihaka and Robert Gentleman (Ihaka and Gentleman, 1996). It provides, among other things, a programming language, high level graphics, interfaces of other languages, and debugging facilities. R is both a software and a language considered as a dialect of S-Language which was developed at AT&T Bell Laboratories in 1980s by Rick Becker, John Chambers, and Allan Wilks (Beckar et al., 1988), and has been in widespread use in the statistical community since. A comprehensive book on R language is Chambers(2008). An important thing about R is that it is an interpreted language not a compiled one, meaning that all command of R, typed on keyboard are directly executed without required to built a complete programme like in most computer languages (C, Fortran, Pascal, ...). The language syntax has a superficial similarity with C, but the semantics are of the FPL (Functional Programming Language) variety with stronger affinities with Lisp and APL. In particular, it allows “computing on the language” which in turn makes it possible to write own functions that take expressions as input, something that is often useful for statistical modeling and graphics. 1.11. Software Packages Used 57

R is very much a vehicle for newly developing methods of interactive data analysis. It has developed rapidly, and has been extended by a large collection of packages. Most of the program written in R are essentially ephemeral, written for a single piece of data analysis. Moreover, it allows user, for instance, to program loops to successively analyse several data sets. It is also possible to combine in a single program, different statistical functions to perform more complex analysis.

Introduction to the R environment is incomplete without mentioning statistics, yet many people use R as a modern statistics system. We prefer to think of it as an environment within which many statistical techniques have been implemented. A few of these are built into the base R, and many are supplied as packages. There are about 25 packages supplied with base R (called standard and recommended packages) and many more are available through the CRAN family of internet site (via http://CRAN.R-project.org), and elsewhere.

R is freely distributed under the terms of the GNU General Public Licence (for more information: http://www.gnu.org/). Its development and distribution are carried out by several statisticians known as R Core Team (R Core Team, 2014).

R is available in several forms: the source (written mainly in C and some routine in Fortran), essentially for Unix and Linux machines, or some pre-compiled binaries for Windows, Linux, and Mancintosh. The files needed to install R, either from the sources or from the pre-compiled binaries, are distributed from the internet site of the Comprehensive R Archive Network (CRAN), where the instructions for the installation are also available.

1.11.2 LaplacesDemon

LaplacesDemon, usually referred to as Laplace’s Demon, is a contributed R package which provides a complete and self-contained Bayesian environment within R, and is freely available at http://www.bayesian-inference.com/software. 58 1. Introduction

This package provides a facility to build any kind of statistical model with a user-specified model function. The model may be updated with iterative quadrature, Laplace Approximation, numerous MCMC algorithms, and PMC (Population Monte Carlo). After updating, a variety of facilities are available, including numerous MCMC diagnostics, Laplace Approximation with multiple optimization algorithms, various probability distributions, posterior predictive checks, a variety of plots, parameter and variable importance, Bayesian form of test statistics (such as Durbin-Watson, Jarque-Bera, etc.), validation, and numerous additional utility functions, such as functions for multimodality, matrices, or timing of model specification. Laplace’s Demon seeks to be generalizable and user friendly to Bayesian, specially Laplacians.

LaplacesDemon (Statisticat LLC, 2013) was designed without consideration for hard determinism, but instead with a lofty goal toward facilitating high-dimensional Bayesian inference, posing as its own intellect that is capable of impressive analysis. LaplacesDemon mainly goes through five steps, that is,

• Data Creation • Model Specification • Initial Values • Updating LaplacesDemon and • Summarizing and plotting outputs

More details and many additional things about the package and its vignettes can be found at http://www.bayesian-inference.com.

For the installation of LaplacesDemon, simply download the source code from http://www.bayesian-inference.com/softwaredownload, then open R console and install the package from source using the code:

install.packages(pkgs="path/LaplacesDemon_ver.tar.gz", repos=NULL, type="source") where path is a path to the zipped source code, and _ver is replaced with the latest version found in the name of the downloaded file. 1.11. Software Packages Used 59

A goal in developing the LaplacesDemon package was to minimize reliance on other package or software. Therefore, the usual dep=TRUE argument does not need to be used, because the LaplacesDemon package does not depend on anything other than base R and its parallel package. LaplacesDemonCpp is an extension package that uses C++, and imports these packages: parallel, Rcpp, and RcppArmadillo. This section introduces only LaplacesDemon, but the use of LaplacesDemonCpp is identical. Once installed, simply use the library(LaplacesDemon) or require(LaplacesDemon) function in R to activate the LaplacesDemon package and load its functions into memory.

Two main functions of the LaplacesDemon package used in this thesis are LaplaceApproximation and LaplacesDemon. The function LaplaceApproximation analytically optimizes the results with one of the op- timization algorithms whereas LaplacesDemon uses MCMC technique to optimize the results. A complete description about these two functions are given below.

1.11.2.1 The Function LaplaceApproximation

The LaplaceApproximation function deterministically maximizes the logarithm of the unnormalized joint posterior density with one of several optimization algorithms. The goal of LaplaceApproximation is to estimate the posterior mode and variance of each parameter. This function is useful for optimizing initial values and estimating a covariance matrix to be input into the IterativeQuadrature, LaplacesDemon, PMC, or VariationalBayes function, or sometimes for model estimation in its own right.

LaplaceApproximation(Model, parm, Data, Interval=1.0E-6, Iterations=100, Method="SPG", Samples=1000, CovEst="Hessian", sir=TRUE, Stop.Tolerance=1.0E-5, CPUs=1, Type="PSOCK") 60 1. Introduction

Arguments

Model This required argument receives the model from a user-defined function. The user-defined function is where the model is specified. LaplaceApproximation passes two arguments to the model function, parms and Data. For more information, see the LaplacesDemon function and “LaplacesDemon Tutorial” vignette.

parm This argument requires a vector of initial values equal in length to the number of parameters. LaplaceApproximation will at- tempt to optimize these initial values for the parameters, where the optimized values are the posterior modes, for later use with the IterativeQuadrature, LaplacesDemon, PMC, or the VariationalBayes function. The GIV function may be used to randomly generate initial values. Parameters must be continuous.

Data This required argument accepts a list of data. The list of data must include mon.names which contains monitored vari- able names, and parm.names which contains parameter names. LaplaceApproximation must be able to determine the sample size of the data, and will look for a scalar sample size variable n or N. If not found, it will look for variable y or Y, and attempt to take its number of rows as sample size. LaplaceApproximation needs to determine sample size due to the asymptotic nature of this method. Sample size should be at least sqrt(J) with J exchangeable parameters.

Interval This argument receives an interval for estimating approximate gra- dients. The logarithm of the unnormalized joint posterior density of the Bayesian model is evaluated at the current parameter value, and again at the current parameter value plus this interval. 1.11. Software Packages Used 61

Iterations This argument accepts an integer that determines the number of iterations that LaplaceApproximation will attempt to maxi- mize the logarithm of the unnormalized joint posterior density. Iterations defaults to 100. LaplaceApproximation will stop before this number of iterations if the tolerance is less than or equal to the Stop.Tolerance criterion. The required amount of computer memory increases with Iterations. If computer memory is exceeded, then all will be lost.

Method This optional argument accepts a quoted string that specifies the method used for Laplace Approximation. The default method is Method="SPG". Options include "AGA" for adaptive gradient ascent, "BFGS" for the Broyden-Fletcher-Goldfarb-Shanno algo- rithm, "BHHH" for the algorithm of Berndt et al., "CG" for conju- gate gradient, "DFP" for the Davidon-Fletcher-Powell algorithm, "HAR" for adaptive hit-and-run, "HJ" for Hooke-Jeeves, "LBFGS" for limited-memory BFGS, "LM" for Levenberg-Marquardt, "NM" for Nelder-Mead, "NR" for Newton-Raphson, "PSO" for Parti- cle Swarm Optimization, "Rprop" for resilient backpropagation, "SGD" for Stochastic Gradient Descent, "SOMA" for the Self- Organizing Migration Algorithm, "SPG" for Spectral Projected Gradient, "SR1" for Symmetric Rank-One, and "TR" for Trust Region.

Samples This argument indicates the number of posterior samples to be taken with sampling importance resampling via the SIR function, which occurs only when sir=TRUE. Note that the number of samples should increase with the number and intercorrelations of the parameters. 62 1. Introduction

CovEst This argument accepts a quoted string that indicates how the covari- ance matrix is estimated after the model finishes. This covariance matrix is used to obtain the standard deviation of each parameter, and may also be used for posterior sampling via Sampling Importance Resampling (SIR) (see the sir argument below), if converged. By default, the covariance matrix is approximated as the negative in- verse of the "Hessian" matrix of second derivatives, estimated with Richardson extrapolation. Alternatives include CovEst="Identity", CovEst="OPG", or CovEst="Sandwich". When CovEst="Identity", the covariance matrix is not estimated, and is merely assigned an iden- tity matrix. When LaplaceApproximation is performed internally by LaplacesDemon, an identity matrix is returned and scaled. When CovEst="OPG", the covariance matrix is approximated with the inverse of the sum of the outer products of the gradient, which requires X, and either y or Y in the list of data. For OPG, a partial derivative is taken for each row in X, and each element in y or row in Y. Therefore, this requires N + NJ model evaluations for a data set with N records and J variables. The OPG method is an asymptotic approximation of the Hessian, and usually requires fewer calculations with a small data set, or more with large data sets. Both methods require a matrix inversion, which becomes costly as dimension grows. The Richardson-based Hes- sian method is more accurate, but requires more calculation in large dimensions. An alternative approach to obtaining covariance is to use the BayesianBootstrap on the data, or sample the posterior with iter- ative quadrature (IterativeQuadrature), MCMC (LaplacesDemon), or VariationalBayes. sir This logical argument indicates whether or not Sampling Importance Resampling (SIR) is conducted via the SIR function to draw indepen- 1.11. Software Packages Used 63

dent posterior samples. This argument defaults to TRUE. Even when TRUE, posterior samples are drawn only when LaplaceApproximation has converged. Pos- terior samples are required for many other functions, in- cluding plot.laplace and predict.laplace. The only time that it is advantageous for sir=FALSE is when LaplaceApproximation is used to help the initial val- ues for IterativeQuadrature, LaplacesDemon, PMC, or VariationalBayes, and it is unnecessary for time to be spent on sampling. Less time can be spent on sampling by increasing CPUs, which parallelizes the sampling.

Stop.Tolerance This argument accepts any positive number and defaults to 1.0E-5. Tolerance is calculated each iteration, and the criteria varies by algorithm. The algorithm is considered to have con- verged to the user-specified Stop.Tolerance when the tol- erance is less than or equal to the value of Stop.Tolerance, and the algorithm terminates at the end of the current iter- ation. Often, multiple criteria are used, in which case the maximum of all criteria becomes the tolerance. For example, when partial derivatives are taken, it is commonly required that the Euclidean norm of the partial derivatives is a crite- rion, and another common criterion is the Euclidean norm of the differences between the current and previous parameter values. Several algorithms have other, specific tolerances.

CPUs This argument accepts an integer that specifies the number of central processing units (CPUs) of the multicore computer or computer cluster. This argument defaults to CPUs=1, in 64 1. Introduction

which parallel processing does not occur. Parallelization occurs only for sampling with SIR when sir=TRUE.

Type This argument specifies the type of parallel processing to perform, ac- cepting either Type="PSOCK" or Type="MPI".

1.11.2.2 The Function LaplacesDemon

The LaplacesDemon function is the main function of Laplace’s Demon. Given data, a model specification, and initial values, LaplacesDemon maximizes the logarithm of the unnormalized joint posterior density with MCMC and provides samples of the marginal posterior distributions, deviance, and other monitored variables.

LaplacesDemon (Model, Data, Initial.Values, Covar = NULL, Iterations = 10000, Status = 100, Thinning = 10, Algorithm = "MWG", Specs = NULL, LogFile = "", ...)

Arguments

Model This required argument receives the model from a user-defined function that must be named Model. The user-defined function is where the model is specified. LaplacesDemon passes two arguments to the model function, parms and Data, and receives five arguments from the model function: LP (the logarithm of the unnormalized joint posterior), Dev (the deviance), Monitor (the monitored variables), yhat (the variables for posterior predictive checks), and parm, the vector of parameters, which may be constrained in the model function.

Data This required argument accepts a list of data. The list of data must con- tain mon.names which contains monitored variable names, and must con- tain parm.names which contains parameter names. The as.parm.names 1.11. Software Packages Used 65

function may be helpful for preparing the data, and the is.data function may be helpful for checking data.

Initial.Values For LaplacesDemon, this argument requires a vector of initial values equal in length to the number of parameters. For LaplacesDemon.hpc, this argument also accepts a vector, in which case the same initial values will be applied to all parallel chains, or the argument accepts a matrix in which each row is a parallel chain and the number of columns is equal in length to the number of parameters. When a matrix is supplied for LaplacesDemon.hpc, each parallel chain begins with its own initial values that are preferably dispersed. For both LaplacesDemon and LaplacesDemon.hpc, each initial value will be the starting point for an adaptive chain or a non-adaptive Markov chain of a parameter. Parameters are assumed to be continuous, unless specified to be discrete (see dparm below), which is not accepted by all algorithms (see dcrmrf for an alternative). If all initial values are set to zero, then Laplace’s Demon will attempt to optimize the initial values with the LaplaceApproximation function. After Laplace’s Demon finishes updating, it may be desired to continue updating from where it left off. To continue, this argument should receive the last iteration of the previous update. For example, if the output object is called Fit, then Initial.Values=as.initial.values(Fit). Initial values may be generated randomly with the GIV function.

Covar This argument defaults to NULL, but may otherwise accept a K × K proposal covariance matrix (where K is the number 66 1. Introduction

of dimensions or parameters), a variance vector, or a list of covariance matrices (for blockwise sampling in some algorithms). When the model is updated for the first time and prior variance or covariance is unknown, then Covar=NULL should be used. Some algorithms require covariance, some only require variance, and some require neither. Laplace’s Demon automatically converts the user input to the required form. Once Laplace’s Demon has finished updating, it may be desired to continue updating where it left off, in which case the proposal covariance matrix from the last run can be input into the next run. The covariance matrix may also be input from the LaplaceApproximation function, if used.

Iterations This required argument accepts integers larger than 10, and determines the number of iterations that Laplace’s Demon will update the parameters while searching for target distributions. The required amount of computer memory will increase with Iterations. If computer memory is exceeded, then all will be lost. The Combine function can be used later to combine multiple updates.

Status This argument accepts integers between 1 and the number of iterations, and indicates how often the user would like the status of the number of iterations and proposal type (for example, mul- tivariate or componentwise, or mixture, or subset) printed to the screen. For example, if a model is updated for 1,000 iterations and Status=200, then a status message will be printed at the fol- lowing iterations: 200, 400, 600, and 800 in LaplacesDemon. The LaplacesDemon.hpc function does not print the status during parallel processing. 1.11. Software Packages Used 67

Thinning This argument accepts integers between 1 and the number of iterations, and indicates that every nth iteration will be retained, while the other iterations are discarded. If Thinning=5, then every 5th iteration will be retained. Thinning is performed to reduce autocorrelation and the number of marginal posterior samples.

Algorithm This argument accepts the abbreviated name of the MCMC algo- rithm, which must appear in quotes. (A list of MCMC algorithms with abbreviated name may be found in the "LaplacesDemon Tu- torial" vignette.)

Specs This argument defaults to NULL, and accepts a list of specifications for the MCMC algorithm declared in the Algorithm argument. The specifications associated with each algorithm may be found in the "LaplacesDemon Tutorial" vignette.

LogFile This argument is used to specify a log file name in quotes in the working directory as a destination, rather than the console, for the output messages of cat and stop commands. It is helpful to assign a log file name when using multiple cores, such as with LaplacesDemon.hpc. Doing so allows the user to check the progress in the log. A number of log files are created, one for each chain, and one for the overall process.

CPUs This argument is required for parallel independent or interactive chains in LaplacesDemon or LaplacesDemon.hpc, and indicates the number of central processing units (CPUs) of the computer or cluster. For example, when a user has a quad-core computer, CPUs=4.

... Additional arguments are unused. 68 1. Introduction

1.11.3 JAGS

In the current era, the increase in the availability of computing power has led to a substantial increase in the availability and use of computationally intensive statistical methods. Amongst the most widely adopted of these are Markov Chain Monte Carlo (Gilks et al., 1996) methods which (usage) became very popular within the last decade. Although, writing customized MCMC sampling algorithms is relatively straightforward, particularly using the MH algorithm (Hastings, 1970), it has become more common practice to employ more general MCMC fitting software such as the Bayesian Inference Using Gibbs Sampling (BUGS) software variants WinBUGS and OpenBUGS (Lunn et al., 2000). One alternative known as Just Another Gibbs Sampler (JAGS) has more recently been available by Plummer(2003), and became more popular for analysing more complex statistical models using MCMC methods. Both JAGS and WinBUGS/OpenBUGS use the BUGS syntax to allow arbitrary models to be more easily defined by the user, and provide sufficient flexibility to be used for the vast majority of statistical problems. JAGS uses Gibbs Sampling (Geman and Geman, 1984; Gelfand and Smith, 1990; Casella and George, 1992) and the Metropolis algorithm (Metropolis et al., 1953) to generate a Markov chain by sampling from full conditional distributions. The JAGS software is available freely at http://www-ice.iarc.fr/∼martyn/software/jags/.

While using JAGS, one must specify the model to run, and to load data and initial values for a specified number of Markov chains. Then it is possible to run the Markov chain(s) and to save the results for the parameters in which we are interested. Summary statistics of these data, convergence, kernel estimates etcetera are available as well. Nevertheless, some users of this software might be interested in saving the output and reading it into R for further analysis. JAGS comes with the ability to run the software in batch mode using scripts. More information can be found in the excellent JAGS manual freely available at http://sourceforge.net/projects/mcmc-jags/. 1.11. Software Packages Used 69

1.11.3.1 The package R2jags

R2jags: A package for Running JAGS from R makes use of this feature and pro- vides convenient functions to call JAGS directly after data manipulation in R (Su and Yajima, 2014). Furthermore, it is possible to work with the results after importing them back into R again, for example, to create posterior pre- dictive simulation or, more generally, graphical display of data and posterior simulation. R2jags is available from Comprehensive R Archive Network, i.e., http://CRAN.R-project.org or one of its mirrors. This software can be installed by typing install.packages("R2jags") at the R prompt, make sure that internet connection is available, and then load the package with library("R2jags") or require("R2jags") before using it. The package coda by Plummer et al.(2004), is also very useful for the analysis of JAGS output as JAGS is developed to work closely together with R and the coda package.

1.11.3.2 The Function jags

The implementation of the R2jags package is straightforward. The main function of R2jags is jags which takes data and initial values as input. The jags function automatically writes a JAGS script, calls the model, and save the simulations for easy access in R. The detail description of jags function and its argument are as follows:

jags(data, inits, parameters.to.save, model.file="model.bug", n.chains=3, n.iter=2000, n.burnin=floor(n.iter/2), n.thin=max(1, floor((n.iter - n.burnin)/1000)), DIC=TRUE, working.directory=NULL, set.seed = 123, refresh = n.iter/50, progress.bar = "text", digits=5, RNGname = c("Wichmann-Hill", "Marsaglia-Multicarry", "Super-Duper", "Mersenne-Twister"), jags.module = c("glm","dic")) 70 1. Introduction

Arguments

model A vector or list of the names of the data objects used by the model.

inits A list with n.chains elements; each element of the list is itself a list of starting values for the BUGS model, or a function creating (possibly random) initial values. If inits is NULL, JAGS will generate initial values for parameters.

parameters. to.save A character vector of the names of the parameters to save which should be monitored.

model.file File containing the model written in BUGS code.

n.chains Number of Markov chains (default is 3).

n.iter Number of total iterations per chain (including burn in; default is 2000).

n.burnin Length of burn in, i.e. number of iterations to discard at the beginning. Default is n.iter/2, that is, discarding the first half of the simulations. If n.burnin is 0, jags will run 100 iterations for adaption.

n.thin Thinning rate which must be a positive integer. Set n.thin > 1 to save memory and computation time if n.iter is large. Default is max(1, floor(n.chains ∗ (n.iter - n.burnin)/1000)), which will only thin if there are at least 2000 simulations.

DIC logical; if TRUE (default), compute deviance, pD, and DIC. The rule pD=var(deviance)/2 is used. 1.11. Software Packages Used 71

working. directory Sets working directory during execution of this function. This should be the directory where model file is.

set.seed Random seeds for JAGS to produce the identical results with jags.

refresh Refresh frequency for progress bar, default is n.iter/50.

progress.bar Type of progress bar. Possible values are “text”, “gui”, and “none”. Type “text” is displayed on the R console. Type “gui” is a graphical progress bar in a new window. The progress bar is suppressed if progress.bar is “none”.

digits As in write.model in the R2WinBUGS package: number of significant digits used for BUGS input. Only used if specifying a BUGS model as an R function.

RNGname The name for random number generator used in JAGS. There are four RNGS supplied by the base mod- ule in JAGS: Wichmann-Hill, Marsaglia-Multicarry, Super-Duper, Mersenne-Twister.

jags.module The vector of jags modules to be loaded. Default are glm and dic. Input NULL if you do not want to load any jags module.

2

Bayesian Reliability Analysis of Parametric Models

Contents 2.1 Introduction...... 75

2.2 Important Parametric Distributions...... 76

2.2.1 The Binomial Distribution...... 76

2.2.2 The Poisson Distribution...... 77

2.2.3 The Log-normal Distribution...... 78

2.2.4 The Weibull Distribution...... 78

2.2.5 The Log-logistic Distribution...... 80

2.2.6 The Log-location-scale Distribution...... 81

2.2.7 The Generalized Log-Burr Distribution...... 82

2.3 Bayesian Analysis of Discrete Models...... 84

2.3.1 The Binomial Model for Success/Failure Data...... 84

2.3.2 Implementation...... 87

2.3.3 The Poisson Model for Count Data...... 96 74 2. Bayesian Reliability Analysis of Parametric Models

2.3.4 Implementation...... 99

2.4 Bayesian Analysis of Continuous Models...... 106

2.4.1 The Log-logistic Failure Time Model...... 106

2.4.2 Implementation...... 108

2.4.3 The Generalized Log-Burr Failure Time Model...... 122

2.4.4 Implementation...... 123

2.5 Discussion and Conclusion...... 134 2.1. Introduction 75

2.1 Introduction

A parametric model is one in which failure time (the outcome) is assumed to follow a known distribution, for example, binomial, Poisson, normal, log-normal, log-logistic, exponential or Weibull distribution. Typically, what is actually meant is that the outcome follows some family of distributions of similar form with unknown parameters. It is only when the value of the parameter(s) is known that the exact distribution is fully specified. For parametric models, the data are typically used to estimate the values of the parameters that fully specify that distribution. Reliability estimates obtained from parametric models typically yields plots more consistent with a theoretical reliability curve. If we feel comfortable with the underlying distributional assumption, then parameters can be estimated which completely specify the reliability and hazard functions. This simplicity and completeness are the main appeals of using a parametric approach.

Component reliability is the foundation of reliability assessment and refers to the reliability of a single component; this item may be an integrated system or a minor component within a large system. Component reliability data can be discrete, as are success/failure, or failure count data, or continuous, as are failure time data. Both types of data, that is, discrete as well as continuous are considered in subsequent analysis for the purpose of illustrations. Oftentimes, failure time data are censored, which means that we do not know the failure time exactly for a specified item. An appealing feature of Bayesian approach is that only the censoring pattern, e.g., a right-censored failure time, is relevant, not which censoring scheme, such as Type-I, Type-II, or random censoring produced it. Unlike the classical method, which have different methods for different censored data type, the Bayesian approach provides a common framework for dealing with all censored data types. Thus, the Bayesian paradigm gives a unified framework to handle various types of censored as well as exact failure time data (e.g., Hamada et al., 2008). 76 2. Bayesian Reliability Analysis of Parametric Models

In assessing component reliability, the situation can be more complicated and warrant more complex models like hierarchical models discussed in Chapter 4. Finally, in any reliability data analysis, one must consider model selection, which means choosing an appropriate distribution that is suitable for the reliability data.

2.2 Important Parametric Distributions

Reliability analysis deals with the analysis of time-to-failure or lifetime data and distribution of lifetime is presented by a parametric distribution. Consequently, parametric distributions are used both in stochastic analysis of reliability of systems or components, where they are mostly assumed to be fully known and corresponding properties of the system or component are analysed, and in statistical analysis, where process data are used to estimate the parameters of the distribution, often followed by a specific inference of interest (Coolen, 2008). In this section, a class of some important parametric probability distributions for both discrete and continuous random quantities is presented. These distributions are used for fitting both discrete as well as continuous reliability data. These parametric distributions are also used for regression and hierarchical regression analysis in subsequent chapters. A comprehensive list of parametric distributions with R and JAGS syntax is provided in Table 2.1.

2.2.1 The Binomial Distribution The binomial distribution is a single parameter distribution which provides a natural model that arises from a sequence of n exchangeable (or, iid Bernoulli) trials or draws from a large population where each trial gives rise to one of two possible outcomes, conveniently labeled success and failure. For the success outcome, the value of the random variable is assigned 1, otherwise 0. Because of 2.2. Important Parametric Distributions 77

the exchangeability, the data can be summarized by the total number of success in n trials, which is denoted here by y. Converting from a formulation in terms of the exchangeable trials to one using independent and identically distributed (iid) random variables is achieved quite naturally by letting the parameter θ represent the proportion of success in the population or, equivalently, the probability of success in each trial. The binomial probability distribution states that

n! p(y|θ) = Binomial(n, θ) = θy(1 − θ)n−y, 0 ≤ θ ≤ 1 (2.1) y where on the left side we suppress the dependence on n because it is regarded as a part of the experimental design that is considered fixed; all the probabilities discussed for this problem are assumed to be conditional on n (Gelman et al., 2004). Note that, the binomial distribution is not appropriate model if the tests are dependent, and that it only applies if all the items have the same success probability. Also, when n = 1, the binomial is called the Bernoulli distribution.

2.2.2 The Poisson Distribution

The Poisson distribution arises naturally in the study of data taking the form of counts; for instance, a major area of application is reliability, where the number of times that a component fails in a specified period of time is studied.

If a data y follows Poisson distribution with rate θ, then the probability density function of y is

(θt)y e−θt p(y|θ) = Poisson(θt) = , y = 0, 1, 2, . . . , θ > 0 (2.2) y! where y = 0, 1, 2,..., is the observed number of failures, θ is the mean number of failure per unit time, and t is the length of the specified time period. Note that equal mean and variance (here θt) is the most limiting characteristic of the Poisson 78 2. Bayesian Reliability Analysis of Parametric Models

distribution. This model is appropriate when the probability of events occurring in disjoint time intervals is independent, and when the probability of events occurring in short time interval is small.

2.2.3 The Log-normal Distribution If y = log(T ) follows normal distribution with mean µ and variance σ2 having

probability density function f(y) = √1 exp[− 1 ( y−µ )2], then the failure time σ 2π 2 σ T = exp(y) follows log-normal distribution having density function of the form

  1 1 log(t) − µ!2 f(t|µ, σ) = Log-normal(µ, σ) = √ exp −  . (2.3) tσ 2π 2 σ

Corresponding reliability function and hazard function are, respectively,

log(t) − µ! R(t) = 1 − Φ (2.4) σ

and h(t) = f(t)/R(t), (2.5) where Φ( · ) is the standard normal cumulative distribution function. The mean

σ2 2 and variance of the log-normal distribution are [exp(µ + 2 )] and [exp(σ ) − 1][exp(2µ + σ2)], respectively (Lawless, 2003). An important feature of the log- normal distribution is its unique hazard function; it increases initially and then decreases and approaches zero at very long time. Despite a distribution with decreasing hazard at long time being untenable, the log-normal has been useful in many applications.

2.2.4 The Weibull Distribution The Weibull distribution is perhaps the most widely used failure time distribution model. Application to the failure time or durability of manufactured items is 2.2. Important Parametric Distributions 79

common, and it is used as model with diverse type of items, such as ball bearing, automobile components, and electrical insulation. It is also used in biological and medical applications, for example, in studies on the time to the occurrence of tumors in human populations or in laboratory animals.

The Weibull distribution with rate parameter λ and shape parameter β has a probability density function of the form

f(t|λ, β) = Weibull(β, λ) = λβtβ−1e−λtβ ; t > 0, λ > 0, β > 0. (2.6)

Corresponding reliability and hazard functions are, respectively,

R(t) = exp(−λtβ) (2.7) and h(t) = λβtβ−1. (2.8)

The mean and variance are λ−1/βΓ(1 + 1/β) and λ−2/β[Γ(1 + 2/β) − Γ(1 + 1/β)2, respectively.

The Weibull hazard function is monotone decreasing for β < 1, which applies to components in the “infant mortality” phase of the failure time distribution, increasing for β > 1, which applies to components in the wear-out phase, and for β = 1, the Weibull reduces to exponential which has a constant hazard. Because of its increasing, decreasing, and constant nature of hazard rate, the Weibull distribution is an attractive model choice. The model is fairly flexible and has been found to provide a good distribution of many types of life time data. This and the fact the model has a simple expressions for probability density, reliability and hazard functions partly account for its popularity. The Weibull distribution arises as an asymptotic extreme value distribution, and in some instances, this can be used to provide motivation of it as model. 80 2. Bayesian Reliability Analysis of Parametric Models

2.2.5 The Log-logistic Distribution If y = log(T ) follows logistic distribution with location parameter µ and scale pa-

−1 y−µ y−µ −2 rameter σ having probability density function f(y) = σ exp( σ )[1 + exp( σ )] , then the lifetime T follows log-logistic distribution with scale parameter α (> 0) and shape parameter β (> 0) having probability density function of the form

f(t|β, α) = Log-logistic(β, α) = (β/α)(t/α)β−1[1 + (t/α)β]−2, t > 0 (2.9) where α = exp(µ) and β = 1/σ. Corresponding reliability function and hazard function are, respectively,

R(t) = [1 + (t/α)β]−1 (2.10)

and h(t) = (β/α)(t/α)β−1[1 + (t/α)β]−1. (2.11)

The mean of log-logistic distribution exists and is given by

E(T ) = α Γ(1 + β−1) Γ(1 − β−1) only if β > 1.

Because of its simple algebraic expression for the reliability and hazard functions, handling censored data is easier than with log-normal while providing a good approximation to it (Tableman and Kim, 2004). The hazard function is identical to the Weibull hazard, aside from the denominator factor [1 + (t/α)β]. For β ≤ 1, the hazard function is monotone decreasing. For β > 1, the hazard function resembles the log-normal hazard as it increases from 0 to a maximum at t = α(β − 1)1/β, and then approaches 0 monotonically as t → ∞ (e.g., Lawless, 2003). Because of this monotonic nature of the hazard function, it captures the situations not covered by the Weibull distribution. This feature is observed consistently in the remaining chapters. 2.2. Important Parametric Distributions 81

2.2.6 The Log-location-scale Distribution

The probability density function of a parametric location-scale model for a random variable y on (−∞, ∞) with location parameter µ (−∞ < µ < ∞) and scale parameter σ (> 0) is given by

1 y − µ f(y|µ, σ) = f ( ), −∞ < y < ∞. (2.12) σ 0 σ

The corresponding distribution and reliability function for y are

Z y F0(y|µ, σ) = f0(t)dt, −∞ Z y R0(y|µ, σ) = 1 − f0(t)dt −∞

= 1 − F0(y; µ, σ).

The standardized random variable z = (y − µ)/σ clearly has density and reliability

functions f0(z) and R0(z), respectively, and Equation (2.12) with µ = 0 and σ = 1 is called the standard form of the distribution.

The lifetime distributions, such as, exponential, Weibull, all have the property that y = log t has a location-scale distribution: the Weibull, log-normal, and log-logistic distribution for t correspond to extreme value, normal, and logistic distributions, respectively, for y. The reliability functions for z = (y − µ)/σ on (−∞, ∞) are respectively,

z R0(z) = exp(e ) extreme value

R0(z) = 1 − Φ(z) normal

z −1 R0(z) = (1 + e ) logistic

Similarly, any location-scale model (Equation 2.12) gives a lifetime distribution by the transformation t = exp(y). In this case, the reliability function can be 82 2. Bayesian Reliability Analysis of Parametric Models

expressed as

logt − µ! R (t|µ, σ) = R 0 0 σ " β# 0  t  = R , o α

0 where α = exp(µ), β = 1/σ and R0 = R0(log(x)) is a reliability function defined on (0, ∞) (e.g., Lawless, 2003).

The log-Burr distribution can be obtained by generalizing a parametric location- scale family of distribution given by Equation (2.12), to let pdf, cdf, or reliability function include one or more parameters. This distribution is much useful because they include common two parameter lifetime distributions as special cases.

2.2.7 The Generalized Log-Burr Distribution The generalized log-Burr family, for which the standardized variable z = (y − µ)/σ has the probability density function of the form

z z −k−1 f0(z|k) = e (1 + e /k) − ∞ < z < ∞

and the corresponding reliability function

z −k R0(z|k) = (1 + e /k) − ∞ < z < ∞ where k (> 0) is a shape parameter. The special case, k = 1 gives the logistic distribution and k → ∞ gives the extreme value distribution (e.g., Lawless, 2003). Since the generalized log-Burr family includes log-logistic and Weibull distributions, it allows discrimination between them. It is also a flexible model for fitting the lifetime data. 2.2. Important Parametric Distributions 83 2] / +1) 2 2 k ) ( k µ ] 2] 1) 2 ) / − 1 − µ 2 b k y ) − ( ) y ) ν µ ( y τ y − λy (log − − τ y [1+ n − 2 ( ) (1 y − )) 1 2 τ p ))] βy µ − ) µ 1 − − 1 − − exp( − e − ) τ y , some distributions like y kπ y a syntax. The distributions exp[ p ( θ ( 1 ( 1 y τ (1 τ ) − exp[ 1 y − − ) θ ) ! y α ν ) b θy b y − p y exp( 2 2 k π π +1 Γ + − (1 τ τ JAGS a 2 2 k a a  JAGS α α 1 τ Γ( y Γ y − n β exp ( Γ( Γ [1+exp( Γ( b θ e q p q νλ y and ) ) R ) ) ) ) α, β ) ν, λ ) ) ( ( µ, τ ) µ, τ ) ( ) p µ, τ a, b ( θ a, b ( ( θ ( p, n ( ( ( ( µ, τ, k ( dexp dlnorm dpois dlogis dnorm dt dweibull dbern dbeta dgamma dunif dbin ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ y y y y y y y y y y y y ] 2 +1 2 1) n ] ) − 2 b µ ) ] ] ( 2 − µ a package. Moreover, similar to ) 1 ) y ) σ σ − y b y y µ y ( ( σ log − − − 1 n y ( n y/b 2 − ( ) 2 1 )] − (1 y 2 1 ) p e µ [1+ − − 1 µ σ − 1 − σ 1 − − exp[ y − ) 2 in the form of precision parameters. y Discrete Distributions y a − p θ ) ) a 1 y Continuous Distributions (1 2 exp[ ) y nπσ − − exp[ θ α 2 ) ! +1 y a exp( π b θy b √ y n α − a + ) p 2 π ) Γ 2 + − 2 (1 2 Γ b [1+exp( y 1 2 a n √ a Γ( 1 1 a  y ( 1 ( √ LaplacesDemon y Γ y a − n 1 b a Γ( b π σ σy σ Γ( b exp ( θ e p ∗ ) α ) ) ) ( ) ) LaplacesDemon ) ) ) a, b ) a, b ∗ ( n, p ( µ, σ ) ( µ, σ ) ( ) p µ, σ a, b ( θ a, b ( ( θ ( ( ( ( df, ncp ( dexp dgamma dt dweibull dlogis dlnorm dnorm dunif dbern dbinom dbeta dhalfcauchy dpois ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ y y y y y y y y y y y y t y are also available in t Table of important discrete and continuous parametric distributions with 5. Exponential 6. Gamma 11. Student’s 13. Weibull 9. Log-normal 10. Normal 12. Uniform 8. Logistic Distribution R Syntax1. Bernoulli 2. Binomial Density4. Beta JAGS Syntax Density 7. Half-Cauchy 3. Poisson with asterisk superscripts arenormal available and in Table 2.1: 84 2. Bayesian Reliability Analysis of Parametric Models

2.3 Bayesian Analysis of Discrete Models

In many practical situations, reliability data is generated in the form of suc- cess/failure and failure count (i.e., the number of failures that occur over a specified period of time). The success/failure data in reliability assessments capture the component’s success or failure to perform its intended function. For example, experimenter may try an emergency diesel generator (EDG) to see if it will start on demand, and record whether the EDG starts or not. Another example is a missile system, for which experimenter records whether it executes its mission successfully or not when launched (Hamada et al., 2008). The second type of data, that is, failure count, which record the number of times that a component fails in a specified period of time, may arise because of limitations of the data capture system or the way that data are reported, e.g., system may only keep track of the monthly number of failures and report them. This section concludes with Bayesian modeling of these type of discrete reliability data by choosing an appropriate probability distribution using R and JAGS software packages.

2.3.1 The Binomial Model for Success/Failure Data

In certain cases, experimenter wants to assess success/failure reliability data, which captures the component’s success or failure to perform its intended function. For example, in a missile system, experimenters record whether missile executes its mission successfully or not, when launched. For modeling such type of data the binomial distribution is used. The binomial distribution is appropriate for a fixed number of tested components n, where the tests are assumed to be conditionally independent given the success probability θ. Note that, the binomial becomes Bernoulli distribution for n = 1.

Let us consider a set of binomial data yi that expresses the number of successes 2.3. Bayesian Analysis of Discrete Models 85

over ni trials (for i = 1,...,J). Hence,

yi ∼ Binomial(ni, θ),

resulting to a likelihood given by,

J " ! # Y ni p(y|θ) = θyi (1 − θ)ni−yi i=1 yi J ! Y ni = θΣyi (1 − θ)n−Σyi (2.13) i=1 yi

PJ where n = i=1 ni is the total numbers of Bernoulli trials in the sample.

In order to perform Bayesian inference in binomial model, one must specify a prior distribution for θ. A convenient choice of the prior distribution for θ is the beta conjugate prior distribution with parameters α and β, denoted by

θ ∼ Beta(α, β), and probability density function

ΓαΓβ p(θ) = θα−1 (1 − θ)β−1, α, β > 0 (2.14) Γ(α + β) where α is interpreted as the prior number of successful component tests and β as the prior number of failed component tests, that is, α + β is like a prior sample size. Then, the resulting posterior distribution for success probability θ is given by

p(θ|y) ∝ p(θ) p(y|θ)

J ! ΓαΓβ Y ni ∝ θα−1 (1 − θ)β−1 × θΣyi (1 − θ)n−Σyi Γ(α + β) i=1 yi ∝ θα+Σyi−1 (1 − θ)β+n−Σyi−1 (2.15) 86 2. Bayesian Reliability Analysis of Parametric Models which is nothing but beta distribution

θ|y ∝ Beta(α + Σyi, n + β − Σyi) (2.16)

with shape parameters (α + Σyi) and (n + β − Σyi), and posterior mean and variance are, respectively,

α + Σy E(θ|y) = i , (2.17) n + α + β (α + Σy )(n + β − Σy ) V (θ|y) = i i . (2.18) (n + α + β)2 (n + α + β + 1)

The posterior mean can also be expressed as a weighted average of the prior and the sample proportion,

n ! Σy ! α + β ! α ! E(θ|y) = i + n + α + β n n + α + β α + β Σy ! α ! = w i + (1 − w) (2.19) n α + β where w = n/(n + α + β), Σy/n is the sample proportion and α/(α + β) is the mean of a beta prior with parameter α and β.

A beta distribution with equal and low parameter values can be considered as a weakly informative prior (e.g., α = β = 10−3). Other choices that usually adopted are Beta(1/2, 1/2), or Beta(1, 1), which is equivalent to uniform dis- tribution U(0, 1). The latter can be considered as a weakly informative prior distribution since a priori gives the same probability to any interval of the same range. Nevertheless, this prior will be influential when the sample size is small (Ntzoufras, 2009). This might not necessary be a disadvantage since, for small sample size, the posterior will also reflect the low available information concerning the parameter of interest θ. 2.3. Bayesian Analysis of Discrete Models 87

2.3.2 Implementation

Bayesian Analysis of binomial model for success/failure data can easily be performed and its implementation is quite simple in R and JAGS. However, to have clear concept of implementation, a simple binary data set is taken from Johnson et al. (2005). Hamada et al.(2008) also discussed the same data. The data are reported in next section. Using these data different aspects of Bayesian modeling is discussed. Two different functions bayesglm and jags of R and JAGS, respectively, are used to fit the model, and to estimate the parameter of interest, the success probability. Numerical as well as graphical summaries of the corresponding results are also reported.

2.3.2.1 The Launch Vehicle Data

Table 2.2 represents the responses of a set of binary data. These are the launch outcomes of new aerospace vehicles conducted by new companies during the period 1980-2002. A total of 11 launches occurred in which 3 were successes and 8 were failures. And reliability is the probability of successful launch.

Vehicle Outcome Pegasus Success Percheron Failure AMROC Failure Conestoga Failure Ariane 1 Success India SLV-3 Failure India ASLV Failure India PSLV Failure Shavit Success Taepodong Failure Brazil VLS Failure

Table 2.2: Outcomes for 11 launches of new vehicles performed by new companies with limited launch-vehicle design experience, 1980–2002. 88 2. Bayesian Reliability Analysis of Parametric Models

2.3.2.2 Analysis Using R

A Bayesian analysis of this model using the bayesglm function from arm (Gelman and Su, 2014) package of R can be conducted. This function is a Bayesian alteration of classical generalized linear model that uses an approximate EM algorithm to update the betas at each step using an augmented regression to represent the prior information. This function uses Student-t prior distribution for the coefficients. Since in our case model involves intercept only, therefore, the prior distribution for the intercept term is t-distribution with 1 degree of freedom (i.e., Cauchy distribution), and prior mean is 0 whereas prior scale is 10. The main arguments of the bayesglm function can be seen by typing the commands args(bayesglm) followed by library(arm) on R console. Make it sure that before using these commands, the arm package is installed.

bayesglm (formula, family = gaussian, data, ...)

The first argument formula requires a symbolic description of the model to be fitted. For example, in case of launch vehicle binary data which contains only intercept term and response variable y, the formula of the model will be of the form y~1. The argument family requires a description of the error distribution and link function to be used in the model. The default error distribution is gaussian. The data argument needs an optional data frame, list, or environment containing the variables in the model. For more details see Gelman and Su(2014).

Data Creation

As our inferential setting of these launch vehicle data, a total of 11 aerospace vehicles were launched by new companies, in which there were only 3 successful launches. For each successful launch, the value of the response variable is assigned to 1, and 0 for otherwise. Thus, the creation of these data in R format can be done as follows, 2.3. Bayesian Analysis of Discrete Models 89

n <- 11 y <- c(1,0,0,0,1,0,0,0,1,0,0) where n represents the total number of launches, and y represents the binary response for each trial.

Model Specification

Modeling a binomial random variable in essence means modeling a series of binary trials. In that situation, we count the total number of successes (or failures) among the specified number of trials, and from this want to estimate the general tendency of trials to become successes. That is, we want to estimate Pr(success). Data coming from binary trial processes are ubiquitous in reliability and include survival or the success of an experiment (Kery, 2010). In launch vehicle data, the binomial distribution describes that there are 3 successful launches among 11 trials, which means Pr(success)= θ. Thus, a simple model for these response y is

yi ∼ Binomial(1, θ), i = 1, 2,..., 11.

For Bayesian analysis of this model, the function bayesglm uses a default link function, the logit link, that is,

θ ! logit(θ) = log = Intercept. 1 − θ

After fitting the model, we use the Inverse-logit function to transform the results in original metric. The inverse-logit function, defined as

ex logit−1(x) = , 1 + ex

that transforms continuous values to the range (0, 1), which is necessary, since probabilities must be between 0 and 1, and maps from the linear predictor to the probabilities. 90 2. Bayesian Reliability Analysis of Parametric Models

Model Fitting

Finally, to fit the above defined model, the function bayesglm is called to perform the analysis and put its results into an object called Fit. The display function of arm is used to print the summary of results. This generic function with its detail=TRUE argument gives a clean printout of the fitted object, focusing on the most pertinent pieces of information including p-value or z-value. The output of the fitted model can be seen in next section.

Fit <- bayesglm(y~1, family=binomial) display(Fit, detail=TRUE)

Now, to get the posterior simulations of the model parameter theta, a generic function of arm called sim is used. To get the clear picture of parameter estimates, 1000 (n.sim=1000) independent draws are made. Moreover, to convert the results in original metric, the invlogit function is used and its summary of results are reported in the following Summarizing Output section.

set.seed(123) Fit.sim<-sim(Fit, n.sim = 1000) coef.Fit.sim<-coef(Fit.sim) invlogit(coef.Fit.sim) apply(invlogit(coef.Fit.sim),2,mean) ## Estimate apply(invlogit(coef.Fit.sim),2,sd) ## Standard Deviation quantile(invlogit(coef.Fit.sim),prob=c(0.025,0.25,0.5,0.75,0.975))

Summarizing Output

Posterior summaries and densities, after fitting the model with bayesglm, are provided in Table 2.3. From this table, it is observed that, the estimate of theta is 0.311 ± 0.125, and that the 95% credible interval is (0.11, 0.599). 2.3. Bayesian Analysis of Discrete Models 91

Parameter Mean SD 2.5% 25% Median 75% 97.5%

theta 0.311 0.125 0.110 0.220 0.295 0.387 0.599

Table 2.3: From this summary it is evident that posterior mean and median are close, which is an indication of symmetric posterior density.

2.3.2.3 Analysis Using JAGS

Let us consider the Bayesian analysis of same binary data with JAGS using its interface of R, that is, R2jags package of R, which includes the posterior simulation and convergence diagnostic of the model. For the modeling of launch vehicle data in JAGS, one must specify the model to run, and to load data which is created in separate file, and the initial values of the model parameters for a specified number of Markov chains. The R2jags package makes use of this feature and provides convenient functions to call JAGS directly from within R. Furthermore, it is possible to work with the results after importing them into R again, for example, to create a posterior predictive simulations, or more generally, graphical displays of data and posterior simulation.

Data Creation

The first thing to provide to R2jags is the data. One must create the data inputs, which the R2jags needs. This can be a list containing the name of each vector. The data set given in Table 2.2 can be created in R format. In case of vehicle launch data, the outcome is the binary response variable, and for each success outcome, the value of the response variable is assigned the value 1, whereas for each failure outcome, the value of the variable is assigned 0. Thus, for the vehicle launch outcomes, the data are created as follows:

n <- 11 y <- c(1,0,0,0,1,0,0,0,1,0,0) jags.data <- list("n","y") 92 2. Bayesian Reliability Analysis of Parametric Models where n is the total number of binary outcomes, y is the response for each trial, 1 for success and 0 for failure, and all these are combined in a list with object jags.data.

Model Specification

For modeling launch vehicle data, the response is binary outcomes, and hence assume to follow a binomial distribution, that is,

yi ∼ Binomial(θ, 1), i = 1, 2, . . . , n where θ is the success probability, the parameter of interest. Alternatively, the Bernoulli distribution (command dbern(θ)) can also be used instead of the binomial with n = 1 without any problem. To perform Bayesian analysis, it is necessary to specify a prior distribution for θ. The parameter θ is assumed to follow a beta distribution θ ∼ Beta(1, 1) which is equivalent to U(0, 1), and can be considered as a weakly informative prior distribution since we give the same probability to any interval of the same range.

Thus, the specification of above defined model in JAGS language must be put in a separate file which is then read by JAGS. When working in R, this is most conveniently done using the cat function of R which behaves pretty much like paste with the exception that the result is not a character object but directly written to a file we specify. Here is the JAGS code specifying the model, using cat function to put it in the file model1.jags:

cat("model{ for(i in 1:n){ y[i]~dbin(theta, 1) } 2.3. Bayesian Analysis of Discrete Models 93

theta~dbeta(1,1) }", file="model1.jags.txt")

Here, y denotes the observed response variable which is n long and follows a binomial distribution with parameter theta drawn from Beta(1, 1). The Beta(1, 1) distribution is a commonly used conjugate prior distribution for binomial likelihood with low information.

Initial Values

The starting values used to initialize the chain are simply called the initial values. To start the MCMC simulation, usually it is necessary to specify a starting value for the chains. In most cases, JAGS will however be able to generate the initial values itself. In order to be able to monitor convergence, we will normally run several chains for each parameter. The starting value for the chains is a named list, names are the parameters used in the model. Each element of the list is itself a list of starting values for the JAGS model, or a function creating (possible random) initial values. In this case, there is only one parameter, called theta, in the model

inits <- function(){list(theta=runif(1))}

Model Fitting

Once these structures have been setup, JAGS is called to compile the model and run MCMC simulation to get the posterior inference for theta. Before running, it must be decided that how many chains is to be run (n.chain = 3) and for how many iterations (n.iter = 1000). If the length of burn-in is not specified, n.bern = floor(n.iter/2) is used, that is, 500 in this case. Additionally, it is necessary to specify which parameters we are interested in, and set a monitor on each of those. In this case, theta is the only parameter which should be 94 2. Bayesian Reliability Analysis of Parametric Models

monitored. Thus, to start simulation, the function jags of R2jags is used, and its results are assigned to object Fit. The results in object Fit can conveniently be printed by print(Fit), which prints detailed summary of the results, and are summarized in the Summarizing Output section.

Fit <- jags(jags.data,inits,parameter=c("theta"), n.iter=1000, n.chain=3, model.file="model1.jags.txt",) print(Fit)

Summarizing Output

The output of the R function jags is a list which includes several components, most notable are the summary of the inference and convergence, and a list containing the simulation draws of all the saved parameters. In this case, the jags call is assigned to the R object Fit, and so typing print(Fit) from the R console will display the summary of the fitted model shown below. The print method displays information on the mean, standard deviation, 95% credible interval (CI) estimates, the effective sample size, and potential scale reduction factor Rˆ of the Brook-Gelman-Rubin (BGR) statistics (Gelman and Rubin, 1992; Brooks and Gelman, 1998)). The BGR statistics is an analysis of variance (ANOVA)-type diagnostic that compares within- and among-chain variance (Kery, 2010). Values around 1 indicate convergence, with 1.1 considered as acceptable limit by Gelman and Hill (2007).

Parameter Mean SD 2.5% Median 97.5% Rhat n.eff

theta 0.311 0.124 0.100 0.300 0.584 1.002 1300 deviance 13.792 1.279 12.892 13.303 17.432 1.007 1100

pD 0.8 DIC 14.6

Table 2.4: Summary of JAGS simulations after being fitted to the logistic model for the launch vehicle data. 2.3. Bayesian Analysis of Discrete Models 95

Bugs model at "model1.jags.txt", fit using jags, 3 chains, each with 1000 iterations (first 500 discarded)

80% interval for each chain R−hat medians and 80% intervals 12 13 14 15 16 1 1.5 2+ deviance ● 12 13 14 15 16 1 1.5 2+

16

15

deviance 14

● 13

12

0.6

0.4

● theta ●

0.2

0

Figure 2.1: Graphical summary plot of JAGS for the binomial model, fit to the launch vehicle data. R-hat is near one for parameter theta, indicating good convergence.

Table 2.4 represents the numerical summary output from the jags function after being fitted to the binomial model for the success probability of launch vehicle data. The first five columns of numbers give inference for the model parameter. In this case, the model parameter theta has a mean estimate 0.311 and a standard error of 0.124. The median estimate of theta is 0.300 with a 50% uncertainty interval of [0.219, 0.391], and a 95% interval of [0.100, 0.584]. At the bottom, pD shows the estimated effective number of parameters in the model, and DIC, the deviance information criterion, an estimate of predictive error. Finally, consider the right most columns of the output, where Rhat gives information about convergence of the algorithm. At convergence, the number at this column should be equal to 1. If Rhat is less than 1.1 for all parameters, then we judge the algorithm to have 96 2. Bayesian Reliability Analysis of Parametric Models

Fitting with bayesglm 3.0 Fitting with jags 2.5 2.0 1.5 Density 1.0 0.5 0.0

0.0 0.2 0.4 0.6 0.8

θ

Figure 2.2: Posterior density plot of the model parameter theta. Results from different methods have different styles. The closeness of the two approaches look evident in the graphics.

approximately converged, in the sense that the parallel chains have mixed well. The final column, n.eff is the effective sample size of the simulations.

Additionally, to see the complete picture of the results, a plot can be generated by typing plot(Fit), and the resulting plot is given in Figure 2.1. In this plot the left column shows a quick summary of inference and convergence, that is, Rˆ is close to 1.0 for parameter theta, indicating well mixing of the three chains and thus good convergence. The right column shows inference for the parameter theta. Figure 2.2 shows the density plot of the model parameter theta fitted with bayesglm and jags, respectively.

2.3.3 The Poisson Model for Count Data

The Poisson distribution arises naturally in the study of data taking in the form of counts. For example, in reliability where the number of times that a component 2.3. Bayesian Analysis of Discrete Models 97

fails in a specified period of time is studied. This model is appropriate when the probability of events occurring in disjoint time interval is independent and when the probability of events occurring in short time of interval is small.

Let us consider a set of discrete count data y in which we wish to estimate their mean θ. Considering a Poisson distribution with mean θ for the data, we write

yi ∼ Poisson(θti), for i = 0, 1, 2, . . . , n (2.20)

where yi (i = 0, 1, 2, . . . , n) is the observed number of failures, θ is the mean number

of failures per unit time, and ti (i = 0, 1, 2, . . . , n) is the length of the specified time period. Note that equal mean and variance (here θt) is the most limiting characteristic of the Poisson distribution.

For the Poisson model, the parameter of interest is the Poisson rate θ, that is, mean number of failures per unit time, which we wish to estimate. Therefore, in a model of failure count data, the likelihood as a function of θ given the observe failure count y is given by

Pn n −θti yi −θ ti Qn yi Y e (θti) e i=0 i=0(θti) p(θ|y) = = Qn i=0 yi! i=0 yi! or

−θ Pn t Pn y p(θ|y) ∼ e i=0 i θ i=0 i , (2.21)

ignoring factors that do not depend on θ. To complete the model, one must specify a prior distribution.

The gamma conjugate prior: when the prior distribution is in the same family as the prior probability distribution, then the prior and posterior are called the conjugate distributions, and the prior is called the conjugate prior for likelihood. For example, the gamma distribution is a frequently used prior distribution for the mean parameter of the Poisson model. That is, the gamma prior and the Poisson 98 2. Bayesian Reliability Analysis of Parametric Models

likelihood function have the same form, so that the resulting posterior distribution for mean is also a gamma distribution. An advantage of conjugate prior is that it makes posterior calculation easy.

To obtain posterior density for the Poisson model, let us consider a gamma prior for mean θ with parameters α and β having density function,

θ ∼ Gamma(α, β)

βα p(θ) = θα−1e−βθ ∝ θα−1e−βθ Γα Then the resulting posterior distribution of θ is given by

α−1 −βθ Pn y −θ Pn t p(θ|y) ∝ p(θ) p(y|θ) ∝ θ e × θ i=0 i e i=0 i

α+Pn y −1 −θ(β+Pn t ) ∝ θ i=0 i e i=0 i .

This implies that n n X X θ|y ∝ Gamma(α + yi, β + ti) (2.22) i=0 i=0 where y = (y1, y2, y3, . . . , yn), and β can be interpreted as a prior sample size in Pn contrast with the data sample size i=0 ti, whereas α is the prior number of failures Pn in contrast with the observed number of failures i=0 yi. The posterior mean and variance are, respectively,

Pn α + i=0 yi α + ny¯ E(θ|y) = Pn = Pn (2.23) β + i=0 ti β + i=0 ti

and Pn α + i=0 yi α + ny¯ V (θ|y) = Pn 2 = Pn 2 , (2.24) (β + i=0 ti) (β + i=0 ti) where y¯ is the sample mean (maximum likelihood estimator). The usually selected low information prior is a gamma distribution with low and equal prior parameters 2.3. Bayesian Analysis of Discrete Models 99 0.4

α = 1.0, β = 2.0 α = 9.0, β = 0.5 α = 7.5, β = 1.0

0.3 α = 5.0, β = 1.0 α = 0.001, β = 0.001 0.2 0.1 0.0

0 5 10 15 20

Figure 2.3: It is evident from the above plot that for shape α = 0.001 and rate β = 0.001 the gamma distribution becomes almost uniform.

such as α = β = 10−3. This prior is convenient since its mean is equal to one while the variance is given by 1/α, which becomes large, expressing prior ignorance, for low value of α. This fact is evident from the Figure 2.3.

2.3.4 Implementation Bayesian modeling of Poisson distribution for count data is simple and easy to implement in R and JAGS. However, to have a clear concept of implementation a simple example is considered. For Bayesian analysis of Poisson model a super- computer failure count data is taken from Hamada et al. (2008). The data are discussed and reported in next section. Using these data different aspects for Bayesian analysis of Poisson data is discussed. The R and JAGS packages are used for the purpose of fitting the model using different prior distributions. The Bayesian analysis of the posterior density for a Poisson model is reported here, however, posterior predictions are discussed in Chapter 4. Numerical as well as graphical summaries of the corresponding results are also reported. 100 2. Bayesian Reliability Analysis of Parametric Models

2.3.4.1 The Supercomputer Failure Count Data

The supercomputer failure count data discussed about the monthly number of failures of shared memory processors (SMPs) components of the Los Alamos National Laboratory Blue Mountain Supercomputer. The supercomputer consists of 47 identical SMPs and Table 2.5 represents monthly number of failures for the first month of operation. The supercomputer engineers expect that there should be no more than 10 failures for each component in the first month of operation.

1514231364442322 4552532231125141 112132535251152

Table 2.5: Monthly number of failures for 47 supercomputer components

2.3.4.2 Analysis Using R

A Bayesian analysis of this model using the bayesglm function from arm package of R can be conducted. Details of this function is already discussed in Section 2.3.2.2. The only difference is the choice of Poisson family instead of binomial.

Data Creation

For Bayesian fitting Poisson model for supercomputer data with the function bayesglm, the data are created as follows,

n <- 47 y <- c(1,5,1,4,2,3,1,3,6,4,4,4,2,3,2,2,4,5,5,2,5,3,2,2,3,1,1,2, 5,1,4,1,1,1,2,1,3,2,5,3,5,2,5,1,1,5,2) where n represents the total number of SMPs, and y represents the monthly number of failures for the first month of operations.

Model Specification

The distribution assumed for count data is a Poisson, which applies when counted 2.3. Bayesian Analysis of Discrete Models 101

things are distributed independently and randomly. The Poisson model has a single parameter, the expected count θ, that is often called intensity (Kery, 2010) and, in case of supercomputer data, represents monthly failure rate or mean number of failures. In contrast to the normal, the Poisson variance in not a free parameter but is equal to the mean, θ.

Thus, for supercomputer failure count data, where the response (y1, y2, . . . , yn) is the monthly number of failures recorded for the SMPs, the simple model for these responses is

yi ∼ Poisson(θ), i = 1, 2,..., 47.

Here, count response yi for ith SMP is distributed as a Poisson random variable with mean number of failure θ. Moreover, for Bayesian analysis of Poisson model the function bayesglm uses default log link function, that is,

log(θ) = Linear predictor = Intercept

In this case, intercept is the only term in linear predictor. Therefore, after fitting the model, to get the results in original metric the exponential function exp is used to transform the results.

Model Fitting

For fitting the above specified Poisson model, the function bayesglm is called to perform the analysis and its results are assigned to an object called Fit.

Fit<-bayesglm(y~1, family=poisson(link="log")) display(Fit, detail=TRUE)

To get the posterior simulations of the model parameter theta, 1000 independent draws are made using the function sim. Finally, to convert the results in original metric the function exp is used, and its output is reported in the next section. 102 2. Bayesian Reliability Analysis of Parametric Models

set.seed(123) ## Simulation Fit.sim<-sim(Fit,n.sim=1000) coef.Fit.sim<-coef(Fit.sim) exp(coef.Fit.sim) apply(exp(coef.Fit.sim),2,mean) ## Estimate apply(exp(coef.Fit.sim),2,sd) ## Standard Deviation quantile(exp(coef.Fit.sim),prob=c(0.025,0.25,0.5,0.75,0.975))

Summarizing Output

Posterior summaries and densities after being fitted the model with bayesglm are provided in Table 2.6. From this table, it is observed that, the estimate of theta is 2.82 ± 0.24, and that the 95% credible interval is (2.37, 3.35).

Parameter Mean SD 2.5% 25% Median 75% 97.5%

theta 2.82 0.24 2.37 2.65 2.80 2.97 3.35

Table 2.6: From this output it is evident that posterior mean and median are very close, which is an indication of symmetric posterior density.

2.3.4.3 Analysis Using JAGS

Let us consider here the Bayesian analysis of the count data with JAGS using an interface of R, i.e., R2jags. For modeling the monthly number of failures of supercomputer components in JAGS, one must specify the model to run, and to load data which is created in a separate file and the initial values of the model parameters for a specified number of Markov chains. The R2jags package makes use of this feature and provides convenient functions to call JAGS directly from within R. Furthermore, it is possible to work with the results after importing them back into R again, for example, to create a posterior predictive simulations, or more generally, graphical displays of data and posterior simulations. 2.3. Bayesian Analysis of Discrete Models 103

Data Creation

The first part of modeling in JAGS is data definition, which must be defined in a listed form containing the name of each vector. In this case of supercomputer components failure data, the monthly number of failures given in Table 2.5 are the

response variable yi (i = 1, 2,..., 47). This data can be created in R format as:

n <- 47 y <- c(1,5,1,4,2,3,1,3,6,4,4,4,2,3,2,2,4,5,5,2,5,3,2,2,3,1,1,2, 5,1,4,1,1,1,2,1,3,2,5,3,5,2,5,1,1,5,2) jags.data <- list("n", "y")

Here n is the total number of supercomputer components (SMPs), y is the response values of monthly number of failures, and these are combined in a list with object jags.data.

Model Specification

For modeling these data, the monthly number of failures yi is assumed to follow a Poisson distribution

yi ∼ Poisson(θ), i = 1, 2,..., 47 where θ is the individual monthly failure rate for the SMPs. In this case, the monthly failure rate (mean number of failures per unit time) θ is the unknown model parameter which we want to estimate. It is expected that there should be no more than 10 failures for each component. To represent this, the prior information for the parameter θ is assumed to be a conjugate gamma prior distribution with a mean of 5, that is, θ ∼ Gamma(5, 1)

It is observed that, the probability that θ exceeds 10 for Gamma(5, 1) prior is 0.03, which is very small. Thus, a candidate JAGS model in BUGS language for the 104 2. Bayesian Reliability Analysis of Parametric Models

above defined model is as follows:

cat("model{ for(i in 1:n){ y[i]~dpois(theta) } theta~dgamma(5, 1) }", file="model2.jags.txt")

This model allows each observed response y to follow a Poisson distribution with mean theta, which is drawn from a gamma distribution with shape 5 and rate 1, that is, mean value 5.

Initial Values

Now, it is needed to specify initial values for the model parameter theta to start chains. Thus, the initial value for the chains in a list is specified by writing a function as

inits <- function(){list(theta = rlnorm(1))}

Model Fitting

Using the objects jags.data, inits, and model2.jags, we can now compile the model and run MCMC simulation to get estimate for theta by calling JAGS. For MCMC simulation 3 chains (n.chain=3) each with 5000 iterations (n.iter=5000) has been run, and parameter theta is monitored. Its results are assigned with object Fit, and can be printed by print(Fit) as:

Fit <- jags(jags.data, inits, parameter=c("theta"), n.iter=5000, n.chain=3, model.file="model2.jags.txt",) print(Fit) 2.3. Bayesian Analysis of Discrete Models 105

Summarizing Output

The inference of the posterior densities after fitting the Poisson model for count data using jags are reposted in Table 2.7. The posterior estimate for theta is 2.86±0.249 and 95% credible interval is (2.39, 3.36), which is statistically significant. Rhat is very close to 1.0, indication of good mixing of the three chains and thus approximate convergence. The plot for the posterior density of parameter theta is depicted in Figure 2.4.

Parameter Mean SD 2.5% Median 97.5% Rhat n.eff

theta 2.86 0.249 2.39 2.85 3.36 1.001 3800 deviance 171.61 1.507 170.57 171.04 175.77 1.001 3800

pD 1.1 DIC 172.7

Table 2.7: Posterior summary of JAGS simulations after being fitted to the Poisson model for the supercomputer failure count data.

Fitting with bayesglm

1.5 Fitting with jags 1.0 Density 0.5 0.0

2.0 2.5 3.0 3.5 4.0

θ

Figure 2.4: Posterior density plot of the model parameter theta. It is evident from these plots that bayesglm is excellent as it resembles with jags. The difference between the two seems magical. 106 2. Bayesian Reliability Analysis of Parametric Models

2.4 Bayesian Analysis of Continuous Models

The most widely used data to access component reliability are failure time data, which record the continuous time to failure of the components. Other examples of failure time data are time-to-death used in survival analysis and time-to-interrupt that arises in software reliability (Hamada et al., 2008). The reliability literature also refers to failure time data as lifetime data, and both the terms are used interchangeably throughout the thesis.

This section describes Bayesian modeling of failure time data by choosing an appropriate continuous parametric distribution. To complete the model some useful prior distributions for the model parameters are used. Consequently, for the purpose of Bayesian analysis of lifetime reliability models, two important techniques, that is, analytic approximation and simulation methods are imple- mented using LaplacesDemon and JAGS software packages. The analytic method is implemented using LaplaceApproximation function of LaplacesDemon, and simulation is made through Metropolis-Hastings algorithms using LaplacesDemon and jags functions of LaplacesDemon and R2jags packages, respectively. Real data sets are used to illustrate the implementations in R. Censoring mechanism is also taken into account.

Contents of this section are based on the published papers Akhtar and Khan (2014a,b).

2.4.1 The Log-logistic Failure Time Model

The sampling model for which the Bayesian modeling is desired is log-logistic distribution. The log-logistic distribution is a very important reliability model as it fits well in many practical situations of reliability data analyses. Another important feature with the log-logistic distribution lies in its closed form expression 2.4. Bayesian Analysis of Continuous Models 107

for survival and hazard functions (see Section 2.2.5) that makes it advantageous over log normal distribution. It is therefore more convenient in handling censored data than the log-normal. Moreover, its relation with logistic distribution facilitates a lot in lifetime data analyses. That is, if y = log(T ) follows logistic distribution, which is a member of location-scale family of distributions, with location parameter µ and scale parameter σ having probability density function

1 y − µ  y − µ−2 p(y|µ, σ) = exp 1 + exp , σ σ σ

then the lifetime T follows log-logistic distribution with scale parameter α (> 0) and shape parameter β (> 0) having probability density function of the form

h i−2 p(t|α, β) = (β/α)(t/α)β−1 1 + (t/α)β , t > 0 where α = exp(µ) and β = 1/σ. It may be noted that, in JAGS and BUGS, this distribution is defined in terms of precision parameter which is nothing but inverse of scale parameter σ. In many practical situations, it has been seen that the non-Bayesian analysis of log-logistic distribution is not an easy task, whereas it can be implemented in principle when dealing in a Bayesian paradigm, provided simulation tools are used.

Thus, to complete the model, it is important to choose an appropriate prior distributions both for µ and σ according to available information. Making this choice is a challenge because the two parameters have support on real line and positive real line, respectively. For this situation, separate and independent prior should be chosen. One choice of prior for the parameter µ is

µ ∼ N(0, 1000) which is a normal distribution with mean 0 and a large variance 1000 or a small precision 1 × 10−3. The density for µ with large variance is nearly flat. Prior 108 2. Bayesian Reliability Analysis of Parametric Models

distributions that are not completely flat provide enough information for the numerical approximation algorithms to continue to explore the target density. Similarly, for the scale parameter σ the choice of prior distribution is half-Cauchy with scale=25. σ ∼ HC(25)

The half-Cauchy with scale parameter α = 25 is recommended, default, weakly informative prior distribution for a scale parameter (Statisticat LLC, 2013). At this scale α = 25, the density of half-Cauchy is nearly flat but not completely (see Figure 1.1), which yields good properties for numerical approximation algorithms. The inverse-gamma is often used as a weakly informative prior distribution for scale parameter, however, this distribution creates problem for scale parameters near zero. Gelman and Hill(2007) recommend that, the uniform, or if more information is necessary, the half-Cauchy is a better choice. Thus, in subsequent analysis, the half-Cauchy distribution with scale=25 is used as a weakly informative prior distribution.

2.4.2 Implementation For Bayesian modeling of log-logistic distribution using different priors, let us introduce a roller bearing failure time data set taken from Hamada et al.(2008), page-110. The data are originally discussed in Muller(2003).

2.4.2.1 The Roller Bearing Failure Time Data

The data presented in Table 2.8 comprises 11 observed failure times and 55 times when a test was suspended (censored) before item failure occurred (∗) for a 4.5 roller bearing in a set of J-52 engines from EA-6B Prowler aircraft (Muller, 2003). Degradation of the 4.5 roller bearing in the J-52 engine has caused in-flight engine failures. The random variable is the time to failure T , and reliability is the probability that a roller bearing does not fail before time t. 2.4. Bayesian Analysis of Continuous Models 109

Failure Times (in operating hours) 1085∗ 1500∗ 1390∗ 152∗ 971∗ 966∗ 997∗ 887∗ 977∗ 1022∗ 2087∗ 646 820∗ 897 810∗ 80∗ 1167 711 1203 85∗ 1070∗ 719∗ 1795∗ 1890 1145∗ 1380∗ 61∗ 1165∗ 437∗ 1152∗ 159∗ 3428∗ 555∗ 727∗ 2294∗ 663∗ 1427∗ 951 767∗ 546∗ 736∗ 917∗ 2871∗ 1231∗ 100∗ 1628 759∗ 246∗ 861∗ 462∗ 1079∗ 1199∗ 424∗ 763∗ 1297∗ 2238∗ 1388 1153∗ 2892∗ 2153∗ 853∗ 911∗ 2181 1042∗ 799∗ 750

Table 2.8: Roller bearing lifetime data for the Prowler attack aircraft. An asterisk indicates a right-censored failure time

2.4.2.2 Analysis using LaplacesDemon

Fitting with LaplaceApproximation

The Laplace approximation is a family of asymptotic techniques used to approxi- mate integrals. It seems to accurately approximate unimodal posterior moments and marginal posterior densities in many cases. Here, for fitting of linear regression model with intercept term only we use the function LaplaceApproximation which is an implementation of Laplace’s approximations (Tierney and Kadane, 1986) of the integrals involved in the Bayesian analysis of the parameters in the modelling process. This function deterministically maximizes the logarithm of the unnormal- ized joint posterior density using one of the several optimization techniques. The aim of Laplace approximation is to estimate posterior mode and variance of each parameter. For getting posterior modes of the log-posteriors, a number of opti- mization algorithms are implemented. This includes Spectral Projected Gradient (SPG) (Birgin et al., 2000, 2001) algorithm which is default. SPG has been adapted from the spg function in package BB (Varadhan and Gilbert, 2009). However, we find that the Limited-Memory BFGS (L-BFGS) is a better alternative in Bayesian scenario. The limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno) algo- rithm, which was proposed independently by Broyden(1970), Fletcher(1970), Goldfarb(1970), and Shanno(1970), is a quasi-Newton optimization algorithm that compactly approximates the Hessian matrix. Rather than storing the dense 110 2. Bayesian Reliability Analysis of Parametric Models

Hessian matrix, L-BFGS stores only a few vectors that represent the approximation. It may be noted that Newton-Raphson is the last choice as it is very sensitive to the starting values, it creates problems when starting values are far from the targets, and calculating and inverting the Hessian matrix can be computationally expensive, although it is also implemented in LaplaceApproximation for the sake of completion. The main arguments of LaplaceApproximation can be seen by using the function args as:

LaplaceApproximation(Model, parm, Data, Interval=1e-06, Iterations =100, Method="SPG", Samples=1000, CovEst="Hessian", sir=TRUE, Stop.Tolerance=1e-05, CPUs=1, Type="PSOCK")

First argument Model defines the model to be implemented, which contains specifi- cation of likelihood and prior. Laplace Approximation passes two arguments to the model function, parm and Data, and receives five arguments from the model function: LP (the logarithm of the unnormalized joined posterior density), Dev (the deviance), Monitor (the monitored variables), yhat (the variables for the posterior predictive checks), and parm, the vector of parameters, which may be constrained in the model function. The argument parm requires a vector of initial values equal in length to the number of parameters, and LaplaceApproximation will attempt to optimize these initial values for the parameters, where the optimized values are the posterior modes. The Data argument requires a listed data which must be include variable names and parameter names. The argument sir=TRUE stands for implementation of Sampling Importance Resampling algorithm, which is a boot- strap procedure to draw independent sample with replacement from the posterior sample with unequal sampling probabilities (Gordon et al., 1993; Albert, 2009). Contrary to sir of LearnBayes package, here proposal density is multivariate normal not t. 2.4. Bayesian Analysis of Continuous Models 111

Data Creation

The function LaplaceApproximation requires data that is specified in a list. For roller bearing failure time data the logarithm of failureTime is the response variable. Since intercept is the only term in the model, a vector of 1’s is inserted into model matrix X. Thus, J = 1 indicates only column of 1’s in the model matrix.

failureTime <- c(1085,1500,1390,152,971,966,997,887,977,1022,2087, 646,820,897,810,80,1167,711,1203,85,1070,719,1795,1890,1145, 1380,61,1165,437,1152,159,3428,555,727,2294,663,1427,951,767, 546,736,917,2871,1231,100,1628,759,246,861,462, 1079,1199,424, 763,1297,2238,1388,1153,2892,2153,853,911,2181,1042,799,750) N <- 66 J <- 1 X <- matrix(1,nrow=N,ncol=J) y <- log(failureTime) censor <- c(rep(0,11),1,0,1,0,0,1,1,1,rep(0,4),1,rep(0,13),1, rep(0,7),1,rep(0,10),1,rep(0,5),1,0,0,1) mon.names <- c("LP", "sigma") parm.names <- as.parm.names(list(beta=rep(0,J), log.sigma=0)) MyData <- list(N=N, J=J, X=X, mon.names=mon.names, parm.names=parm.names, censor=censor,y=y)

In this case, there are two parameters beta and log.sigma which must be specified in vector parm.name. The logposterior LP and sigma are included as monitored variables in vector mon.names. The number of observations is specified by N, that is, 66. Censoring is also taken into account, where 0 stands for censored and 1 for uncensored values. Finally, all these things are combined in a listed form as MyData at the end of the command. 112 2. Bayesian Reliability Analysis of Parametric Models

Initial Values

The function LaplaceApproximation requires a vector of initial values for the parameters. Each initial value is a starting point for the estimation of a parameter. Here, the first parameter, the beta has been set equal to zero, and the remaining parameter, log.sigma, has been set equal to log(1), which is zero. The order of the elements of the initial values must match the order of the parameters. Thus, define a vector of initial values

Initial.Values <- c(rep(0,J), log(1))

For initial values the function GIV (which stands for “Generate Initial Values”) may also be used to randomly generate initial values.

Model Specification

The function LaplaceApproximation can fit any Bayesian model for which like- lihood and prior are specified. To use this method one must specify a model. Thus, for fitting of the roller bearing failure time data, consider that logarithm of failureTime follows a logistic distribution, which is often written as

y ∼ Logistic(µ, σ),

and expectation vector µ is equal to the inner product of design matrix X and parameter β, µ = Xβ.

Prior probabilities are specified respectively for regression coefficient β and scale parameter σ ,

βj ∼ N(0, 1000), j = 1,...,J

σ ∼ HC(25). 2.4. Bayesian Analysis of Continuous Models 113

The large variance or small precision indicates a lot of uncertainty of each β, and is hence a weakly informative prior distribution. Similarly, half-Cauchy is a weakly informative prior for σ (Akhtar and Khan, 2014b).

Model <- function(parm,Data) { ## Parameter beta <- parm[grep("beta",Data$parm.names)] sigma <- exp(parm[grep("log.sigma",Data$parm.names)]) ## Log(Prior Densities) beta.prior <- sum(dnormv(beta,0,1000,log=TRUE)) sigma.prior <- dhalfcauchy(sigma,25,log=TRUE) ## Log-Likelihood mu <- tcrossprod(Data$X,t(beta)) LL <- sum(censor*dlogis(Data$y,location=mu,scale=sigma,log=TRUE) +(1-censor)*plogis(Data$y,location=mu,scale=sigma,log.p=TRUE, lower.tail=FALSE)) LP <- LL+beta.prior+sigma.prior Modelout <- list(LP=LP,Dev=-2*LL,Monitor=c(LP,sigma),yhat=mu, parm=parm) return(Modelout) } In the beginning, the Model function contains two arguments, that is, parm and Data, where parm is for the set of parameters, and Data is the list of data. There are two parameters beta and sigma having priors beta.prior and sigma.prior, respectively. The object LL stands for loglikelihood and LP stands for logposterior. The function Model returns the object Modelout, which contains five objects in listed form that includes logposterior LP, deviance Dev, monitoring parameters Monitor, predicted values yhat and estimates of parameters parm. 114 2. Bayesian Reliability Analysis of Parametric Models

Model Fitting

To fit the above specified model, the function LaplaceApproximation is used and its results are assigned to object Fit. Its summary of results are printed by the function print, which prints detailed summary of results and it is not possible to show here. However, its relevant parts are summarized in the section Summarizing Output.

Fit <- LaplaceApproximation(Model=Model, parm=Initial.Values, Data=MyData, Iteration=1000, Samples=20000, sir=TRUE) print(Fit)

Summarizing Output

The function LaplaceApproximation approximates the posterior density of the fitted model, and posterior summaries can be seen in the following tables. Ta- ble 2.9 represents the analytic results using Laplace approximation method while Table 2.10 represents the simulated results using SIR algorithm.

Parameter Mode SD LB UB beta 7.79 0.18 7.43 8.14 log.sigma -0.97 0.22 -1.42 -0.53

Table 2.9: Summary of the asymptotic approximation using the function LaplaceApproximation. It may be noted that these summaries are based on asymptotic approximation, and hence Mode stands for posterior mode, SD for pos- terior standard deviation, and LB, UB are 2.5% and 97.5% quantiles, respectively.

From these posterior summaries, it is obvious that, the posterior mode of intercept

parameter β0 is 7.79 ± 0.18 whereas posterior mode of log(σ) is −0.97 ± 0.22. Both the parameters are statistically significant also. In a practical data analysis, intercept model is discussed merely as a beginning point. More meaningful model is simple regression model or multiple regression model, which is discussed in Chapter 3. Simulation tool is being discussed in the next section. 2.4. Bayesian Analysis of Continuous Models 115

Parameter Mean SD MCSE ESS LB Median UB beta 7.85 0.19 0.00 10000 7.52 7.82 8.27 log.sigma -0.90 0.23 0.00 10000 -1.32 -0.92 -0.41 Deviance 51.98 1.96 0.02 10000 50.01 51.38 57.11 LP -34.07 0.98 0.01 10000 -36.63 -33.77 -33.08 sigma 0.42 0.10 0.00 10000 0.27 0.40 0.66

Table 2.10: Summary of the simulated results via SIR algorithm using the same function, where Mean stands for posterior mean, SD for posterior standard deviation, MCSE for Monte Carlo standard error, ESS for effective sample size, and LB, Median, UB are 2.5%, 50%, 97.5% quantiles, respectively.

Fitting with LaplacesDemon

In this section, simulation method is applied to analyze the same data with the function LaplacesDemon, which is the main function of LaplaceDemon package (Statisticat LLC, 2013). Given data, a model specification, and initial values, LaplacesDemon maximizes the logarithm of the unnormalized joint posterior den- sity with one of the MCMC algorithms, also called samplers, and provides samples of the marginal posterior distributions, deviance and other monitored variables. LaplacesDemon offers a large number of MCMC algorithms for numerical ap- proximation. Popular families include Gibbs sampling, Metropolis-Hasting (MH), Independent-Metropolis (IM), Random-walk-Metropolis (RWM), Slice sampling, Metropolis-within-Gibbs (MWG), Adaptive-Metropolis-within-Gibbs (AMWG), and many others. However, details of MCMC algorithms are best explored online at http://www.bayesian- inference.com/mcmc, as well as in the “LaplacesDemon Tutorial" vignette. The main arguments of LaplacesDemon can be seen by using the function args as:

LaplacesDemon(Model, Data, Initial.Values, Covar=NULL, Iterations= 1e+05, Status=1000, Thinning=100, Algorithm="RWM", Specs=NULL, LogFile="", ...)

The arguments Model and Data specify the model to be implemented and list of 116 2. Bayesian Reliability Analysis of Parametric Models

data, which are specified in the previous section, respectively. Initial.Values requires a vector of initial values equal in length to the length of parameter vector. The argument Covar=NULL indicates that variance vector or covariance matrix has not been specified, so the algorithm will begin with its own estimates. Next two arguments Iterations=10000 and Status=100 indicate that the LaplacesDemon function will update 10000 times before completion and status is reported after every 100 iterations. The thinning argument accepts integers between 1 and number of iterations, and indicates that every 10th iteration will be retained, while the others are discarded. Thinning is performed to reduced autocorrelation and the number of marginal posterior samples. Further, the Algorithm requires abbreviated name of the MCMC algorithm in quotes. In this case RWM is short for the Random-walk-Metropolis. Finally, Specs=Null is default argument, and accepts a list of specifications for the MCMC algorithm declared in the Algorithm argument.

Initial Values

LaplacesDemon requires a vector of initial values for the parameters. Each initial value will be the starting point for an adaptive chain, or a non-adaptive Markov chain of a parameter. If all initial values are set to zero, then Laplace’s Demon will attempt to optimize the initial values with the LaplaceApproximation function using a resilient backpropagation algorithm. So, it is better to use the last fitted object Fit with the function as.initial.values to get a vector of initial values from the LaplaceApproximation for fitting of LaplacesDemon. Thus, to obtain a vector of initial values the function as.initial.values is used as

Initial.Values <- as.initial.values(Fit)

Model Fitting

LaplacesDemon is stochastic, or involves pseudo-random numbers, its better to 2.4. Bayesian Analysis of Continuous Models 117

set a seed with set.seed function for pseudo-random number generation before fitting with LaplacesDemon, so results can be reproduced. Now, fit the prespecified model with the function LaplacesDemon, and its results are assigned to the object name FitDemon. Its summary of results are printed with the function print, and its relevant parts are summarized in the next section.

set.seed(999) FitDemon <- LaplacesDemon(Model, Data=MyData, Initial.Values, Covar=Fit$Covar, Iterations=10000, Status=100, Thinning=1, Algorithm="IM", Specs=list(mu=Fit$Summary1[1:length(Initial.Values),1])) print(FitDemon)

Summarizing Output

The LaplacesDemon simulates the data from the posterior density with Independent- Metropolis and approximates the results which can be seen in the following table. Table 2.11 represents the simulated results in a matrix form over stationary samples that summarizes the marginal posterior densities of the parameters, which contains mean, standard deviation, Monte Carlo Standard Error (MCSE), Effective Sample Size (ESS), and 2.5%, 50%, 97.5% quantiles. The complete picture of the results can also be seen in Figure 2.5.

Parameter Mean SD MCSE ESS LB Median UB beta 7.79 0.11 0.00 4288 7.59 7.79 8.01 log.sigma -0.96 0.13 0.00 4567 -1.21 -0.97 -0.70 Deviance 50.66 0.73 0.02 3057 49.98 50.42 52.65 LP -33.41 0.37 0.01 3056 -34.40 -33.29 -33.06 sigma 0.38 0.05 0.00 4574 0.30 0.38 0.50

Table 2.11: Posterior summary of simulated results due to stationary samples. 118 2. Bayesian Reliability Analysis of Parametric Models

2.4.2.3 Analysis using JAGS

In this section, simulations technique is implemented using JAGS, which simulates data via Metropolis-within-Gibbs algorithm and approximates the posterior density. Thus, for fitting the model with JAGS, following settings are made.

Data Creation

For fitting the roller bearing data in JAGS, the data is created as follows. In data frame, time denotes the failure time and status denotes censoring status, that is, 0 for censored and 1 for uncensored data.

failTime <- data.frame(time=c(1085,1500,1390,152,971,966,997,887, 977,1022,2087,646,820,897,810,80,1167,711,1203,85,1070,719, 1795,1890,1145,1380,61,1165,437,1152,159,3428,555,727,2294, 663,1427,951,767,546,736,917,2871,1231,100,1628,759,246,861, 462,1079,1199,424,763,1297,2238,1388,1153,2892,2153,853,911, 2181,1042,799,750),status=c(rep(0,11),1,0,1,0,0,1,1,1,rep(0,4), 1,rep(0,13),1,rep(0,7),1,rep(0,10),1,rep(0,5),1,0,0,1))

For modeling censored data using JAGS, censored values should be recorded as NA, and the indicator value is.censored should be set to TRUE. The data are right censored, for this censoring limit should be constructed. In this case, the vector of censoring limit is y.cens.

y <- log(failTime$time) y[censored] <- NA N<-length(y) censored <- !failTime$status is.censored <- as.numeric(censored) y.cens <- log(failTime$time) 2.4. Bayesian Analysis of Continuous Models 119

data<-list("y", "N", "is.censored", "y.cens",)

Here y represents the logarithm of response values and N total number of observa- tions. At the end, data is specified in a listed form that is compatible with the model.

Initial Values

Next, it is needed to specify initial values to start chains. The key here is setting not only the initial values for mu and sigma, but initial values for the observations that have been censored. Thus, for three chains the initial values are specified as

y.init <- rep(NA, length(y)) y.init[censored] <- y.cens[censored] + 1 inits1 <- list(mu = 0, sigma = 0.5, y = y.init) inits2 <- list(mu = 1, sigma = 0.2, y = y.init) inits3 <- list(mu = 1.5, sigma = 0.3,y = y.init) inits <- list(inits1, inits2, inits3)

Model Specification

For modeling the roller bearing failure time data, the log-logistic lifetime model is used and is defined as

yi ∼ Logistic(µ, τ) where yi is the logarithm of ith failure and follows logistic distribution with location parameter µ and rate parameter τ , which is nothing but inverse of σ. Moreover, the weakly informative priors considered for the parameters µ and σ are, respectively,

µ ∼ N(0, 0.001) τ = 1/σ

σ ∼ U(0, 100) 120 2. Bayesian Reliability Analysis of Parametric Models

Since, in this case, the data are censored and JAGS very helpfully provides a way to model censored data through the use of dinterval distribution, which represents interval censored data. It has two parameters: y the original continuous response variable, and c[], a vector of cut points of length M (or a vector censoring limit). Thus, a full JAGS program for fitting the roller bearing censored data is given as:

cat("model{ for(i in 1:N){ is.censored[i]~dinterval(y[i], y.cens[i]) y[i]~dlogis(mu, tau) } mu~dnorm(0, 0.001) tau <- 1/pow(sigma, 1) sigma~dunif(0, 100) }", file="modelCen.txt")

Here, is.censored is an indicator variable that takes the value 1 if y[i] is censored and 0 otherwise, which follows dinterval distribution with parameter y and y.cens. In this model, the prior distributions for the parameters are chosen considerably weakly informative.

Model Fitting

For fitting the above defined model, the function jags of R2jags is used and its results are assigned to object FitJags. The results are printed by typing print(FitJags) and reported in next section.

FitJags = jags(data=data, inits, param=c("mu","sigma"), n.chains=3, n.iter=20000, model.file="modelCen.txt",) print(FitJags,dig=5) 2.4. Bayesian Analysis of Continuous Models 121

Summarizing Output

JAGS simulates the data from posterior density using Metropolis-within-Gibbs algorithm and approximate the results, which are reported in Table 2.12. Rhats are very close to 1.0, indicates good convergence. Plot of the posterior densities can be seen in Figure 2.5.

Parameter Mean SD 2.5% Median 97.5% Rhat n.eff

mu 7.921 0.246 7.545 7.885 8.500 1.001 3100 sigma 0.458 0.121 0.283 0.436 0.745 1.001 2400 deviance 33.797 6.130 23.110 33.418 47.194 1.001 3100

Table 2.12: Posterior summary of JAGS simulations after being fitted to the log-logistic model for roller bearing failure times data.

Log−logistic

2.0 LaplaceApproximation LaplacesDemon

1.5 jags 1.0 Density 0.5 0.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 β0 4 LaplaceApproximation LaplacesDemon 3 jags 2 Density 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 σ

Figure 2.5: Plots of posterior densities of the parameters β0 and σ for the posterior distribution of log-logistic model using the functions LaplaceApproximation, LaplacesDemon, and jags. It is evident from these plots that the posterior densities obtained from three different methods are very close to each other. 122 2. Bayesian Reliability Analysis of Parametric Models

2.4.3 The Generalized Log-Burr Failure Time Model

The log-Burr distribution is a generalization of two important reliability models, that is, logistic distribution and extreme value distribution. The log-Burr dis- tribution can be obtained by generalizing a parametric location-scale family of distribution given by Equation (2.12), to let pdf, cdf, or reliability function include one or more parameters. This distribution is much useful as they include two parameters lifetime distribution as special cases.

The generalized log-Burr family, for which the standardized variable z = (y−µ)/σ has the probability density function of the form

z z −k−1 f0(z|k) = e (1 + e /k) − ∞ < z < ∞

and the corresponding reliability function

z −k R0(z|k) = (1 + e /k) − ∞ < z < ∞ where k (> 0) is a shape parameter. The special case, k = 1 gives the logistic distribution and k → ∞ gives the extreme value distribution.

To complete the Bayesian model, separate and independent prior distributions are chosen for both of the parameters, that is, µ and σ. One choice of prior distribution for the parameter µ is

µ ∼ N(0, 1000).

The normal prior with large variance indicates a lot of uncertainty for the parameter µ, and is hence a weakly informative prior distribution. Similarly, for the σ parameter, one choice of prior distribution is half-Cauchy with scale=25,

σ ∼ HC(25) 2.4. Bayesian Analysis of Continuous Models 123 which is recommended, default, weakly informative prior distribution for a scale parameter. In this section, Bayesian approach is used to model lifetime data for log-Burr model using analytic and simulation tools. Laplace approximation is implemented for approximating posterior densities of the parameters. Moreover, parallel simulation tools are also implemented using LaplacesDemon and JAGS software packages. Censoring is also taken into account. Different approaches are used to implement the censoring mechanism for both the software packages, and its codes are written in R and JAGS, separately.

2.4.4 Implementation For the implementation of Bayesian modeling of log-Burr distribution, a lifetime data set is taken from Lawless(2003). The same data were discussed by Schmee and Nelson(1977).

2.4.4.1 The Locomotive Controls Data

This data set contains the number of thousand miles at which different locomotive controls failed, in a life test involving 96 controls. The test was terminated after 135,000 miles, by which time 37 failures had occurred. The failure times for the 37 failed units are 22.5, 37.5, 46.0, 48.5, 51.5, 53.0, 54.5, 57.5, 66.5, 68.0, 69.5, 76.5, 77.0, 78.5, 80.0, 81.5, 82.0, 83.0, 84.0, 91.5, 93.5, 102.5, 107.0, 108.5, 112.5, 113.5, 116.0, 117.0, 118.5, 119.0, 120.0, 122.5, 123.0, 127.5, 131.0, 132.5, 134.0. In addition, there are 59 censoring times, all equal to 135.0.

2.4.4.2 Analysis using LaplacesDemon

Fitting with LaplaceApproximation

The implementation of analytic approximation technique is made using the function LaplaceApproximation of LaplacesDemon package. This function is an implemen- tation of Laplace approximation method, and seems to accurately approximate the 124 2. Bayesian Reliability Analysis of Parametric Models

unimodal posterior moments and marginal posterior densities. Details concerning this function is already discussed in Section 2.4.2.2. Thus, for fitting the model with LaplaceApproximation, the following steps are followed.

Data Creation

The function LaplaceApproximation requires data that is specified in a list. For locomotive controls data the logarithm of failTime will be the response variable. Since intercept is the only term in the model, a vector of 1’s is inserted into model matrix X. Thus, J = 1 indicates only column of 1’s in the matrix.

failTime <- c(22.5,37.5,46.0,48.5,51.5,53.0,54.5,57.5,66.5,68.0, 69.5,76.5,77.0,78.5,80.0,81.5,82.0,83.0,84.0,91.5,93.5,102.5, 107.0,108.5,112.5,113.5,116.0,117.0,118.5,119.0,120.0,122.5, 123.0,127.5,131.0,132.5,134.0, rep(135.0,59)) N <- 96; J <- 1 k <- 1 # k=1 for log-logistic and k=30 for Weibull model X <- matrix(1, nrow=N, ncol=J) y <- log(failTime); censor <- c(rep(1,37), rep(0,59)) mon.names <- c("LP", "sigma") parm.names <- as.parm.names(list(beta=rep(0, J), log.sigma=0)) MyData <- list(N=N, J=J, X=X, k=k, mon.names=mon.names, parm.names=parm.names, y=y, censor=censor)

In this case, there are two parameters beta and log.sigma which must be specified in vector parm.names. The logposterior LP and sigma are included as monitored variables in vector mon.names. The number of observations are specified by N. Censoring is also taken into account, where 0 stands for censored and 1 for uncensored values. Finally all these thing are combined in a listed form as MyData object at the end of the command. 2.4. Bayesian Analysis of Continuous Models 125

Initial Values

The function LaplaceApproximation requires a vector of initial values for the parameters. Each initial value is a starting point for the estimation of a parameter. Here, the first parameter, the beta has been set equal to zero, and the remaining parameter, log.sigma, has been set equal to log(1), which is zero. The order of the elements of the initial values must match the order of the parameters. Thus, define a vector of initial values

Initial.Values <- c(rep(0, J), log(1))

Model Specification

The function LaplaceApproximation can fit any Bayesian model for which likeli- hood and prior are specified. However, it is equally useful for maximum likelihood estimation. To use this method one must specify a model. Thus, for fitting of the locomotive controls data, consider that the logarithm of failTime follows log-Burr distribution which is often written as

y ∼ Log-Burr(µ, σ; k),

and expectation vector µ is equal to the inner product of model matrix X and parameter β µ = Xβ.

Prior probabilities are specified for regression coefficient β and scale parameter σ, respectively,

βj ∼ N(0, 1000), j = 1,...,J

σ ∼ HC(25).

The large variance or small precision indicates a lot of uncertainty of each β, and 126 2. Bayesian Reliability Analysis of Parametric Models

is hence a weakly informative prior distribution. Similarly, half-Cauchy is a weakly informative prior for σ (Akhtar and Khan, 2014a).

Model<-function(parm, Data){ ## Parameters beta <- parm[grep("beta", Data$parm.names)] sigma <- exp(parm[grep("log.sigma", Data$parm.names)]) ## Log(Prior Densities) beta.prior <- sum(dnorm(beta, 0, 1000, log=TRUE)) sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE) ## Log-Likelihood mu <- tcrossprod(Data$X, t(beta)) z <- (y-mu)/sigma llf <- -log(sigma)+z-(k+1)*log(1+exp(z)/k) lls<- (-k)*log(1+exp(z)/k) LL <- sum(censor*llf+(1-censor)*lls) LP <- LL+beta.prior+sigma.prior Modelout <- list(LP=LP, Dev=-2*LL, Monitor=c(LP,sigma), yhat=mu, parm=parm) return(Modelout) }

The Model function contains two arguments, that is, parm and Data, where parm is for the set of parameters, and Data is the list of data. There are two parameters beta and sigma having priors beta.prior and sigma.prior, respectively. The object LL stands for loglikelihood and LP stands for logposterior. The function Model returns the object Modelout, which contains five objects in listed form that includes logposterior LP, deviance Dev, monitoring parameters Monitor, predicted values yhat and estimates of parameters parm. 2.4. Bayesian Analysis of Continuous Models 127

Model Fitting

To fit the above specified model, the function LaplaceApproximation is used and its results are assigned to object Fit. Its summary of results are printed by the function print, which prints detailed summary of results and it is not possible to show here. However, its relevant parts are summarized in the Summarizing Output section.

set.seed(666) Fit<-LaplaceApproximation(Model=Model, parm=Initial.Values, Data=MyData, Iterations=1000, Samples=1000, sir=TRUE) print(Fit)

Summarizing Output

The function LaplaceApproximation approximates the posterior density of the fitted model, and posterior summaries can be seen in the following two tables. Table 4.7 represents the analytic result using Laplace approximation method while Table 4.8 represents the simulated results using SIR algorithm.

Log-logistic model (k=1) Parameter Mode SD LB UB beta 5.08 0.09 4.90 5.26 log.sigma -0.96 0.15 -1.25 -0.66 Weibull model (k=30) Parameter Mode SD LB UB beta 5.21 0.09 5.03 5.39 log.sigma -0.85 0.15 -1.16 -0.54

Table 2.13: Summary of the analytic approximation using the function LaplaceApproximation. It may be noted that these summaries are based on asymptotic approximation, and hence Mode stands for posterior mode, SD stands for posterior standard deviation, and LB, UB are 2.5% and 97.5% quantiles, respec- tively. 128 2. Bayesian Reliability Analysis of Parametric Models

Log-logistic model (k=1) Parameter Mean SD MCSE ESS LB Median UB beta 5.09 0.09 0.00 1000 4.93 5.09 5.27 log.sigma -0.93 0.14 0.00 1000 -1.22 -0.93 -0.65 Deviance 149.04 1.81 0.06 1000 147.24 148.45 153.94 LP -86.02 0.90 0.03 1000 -88.47 -85.72 -85.12 sigma 0.40 0.06 0.00 1000 0.29 0.39 0.52 Weibull model (k=30) Parameter Mean SD MCSE ESS LB Median UB beta 5.22 0.09 0.00 1000 5.06 5.21 5.40 log.sigma -0.82 0.15 0.00 1000 -1.10 -0.82 -0.51 Deviance 149.44 1.94 0.06 1000 147.52 148.87 154.61 LP -86.22 0.97 0.03 1000 -88.80 -85.93 -85.26 sigma 0.45 0.07 0.00 1000 0.33 0.44 0.60

Table 2.14: Summary matrices of the simulation via sampling importance resam- pling algorithm using the function LaplaceApproximation, where Mean stands for posterior mean, SD for posterior standard deviation, MCSE for Monte Carlo standard error, ESS, for effective sample size, and LB, Median, UB are 2.5%, 50%, 97.5% quantiles, respectively.

From these posterior summaries, it is obvious that the posterior mode of

intercept parameter β0 for logistic distribution is 5.08 ± 0.09 whereas posterior mode of log(σ) is −0.96 ± 0.15, while for Weibull distribution the posterior

mode of intercept parameter β0 is 5.21 ± 0.09 whereas posterior mode of log(σ) is −0.85 ± 0.15. Both the parameters of different distributions are statistically significant also. Simulation tools are being discussed in the next section.

Fitting with LaplacesDemon

Here, MCMC simulation method is used for the Bayesian analysis of same locomotive data, and the log-Burr model is fitted with the function LaplacesDemon. This function approximates the logarithm of unnormalized joint posterior density using Independent Metropolis (IM) algorithm, and provides samples of marginal posterior density, deviance, and other monitored variables. 2.4. Bayesian Analysis of Continuous Models 129

Model Fitting

The R code for fitting the model with LaplacesDemon is as follows. Its results are assigned to the object FitDemon, and reported in Summarizing Output section.

Initial.Values<-as.initial.values(Fit) FitDemon<-LaplacesDemon(Model, Data=MyData, Initial.Values, Covar=Fit$Covar, Iterations=10000, Status=100, Thinning=1, Algorithm="IM", Specs=list(mu=Fit$Summary1[1:length(Initial.Values),1])) FitDemon

Summarizing Output

The posterior summaries after being fitted the model with LaplacesDemon, and generating 10000 iterations are provided in Table 2.15, which represents the simulated results due to stationary samples.

Log-logistic model (k=1) Parameter Mean SD MCSE ESS LB Median UB beta 5.10 0.10 0.01 481.56 4.92 5.10 5.30 log.sigma -0.92 0.16 0.01 427.68 -1.25 -0.91 -0.59 Deviance 149.55 2.31 0.18 360.81 147.27 149.05 155.11 LP -86.27 1.15 0.09 360.82 -89.05 -86.02 -85.13 sigma 0.40 0.07 0.00 442.40 0.29 0.40 0.55 Weibull model (k=30) Parameter Mean SD MCSE ESS LB Median UB beta 5.24 0.10 0.01 373.50 5.07 5.22 5.46 log.sigma -0.80 0.16 0.01 360.03 -1.11 -0.79 -0.50 Deviance 149.62 2.14 0.15 334.67 147.55 148.94 155.09 LP -86.31 1.07 0.07 334.66 -89.04 -85.97 -85.27 sigma 0.46 0.07 0.00 373.10 0.33 0.45 0.61

Table 2.15: Posterior summaries of simulation due to stationary samples using the function LaplacesDemon. Although, results are reported for k=30 only, however, it has been observed that results are same for larger values of k. 130 2. Bayesian Reliability Analysis of Parametric Models

2.4.4.3 Analysis using JAGS

In this section, for Bayesian analysis of log-Burr model, JAGS is used to simulate the samples from posterior densities and approximate the results using Metropolis- within-Gibbs algorithm. For fitting the model with jags, the data creation, initial values, and model specification written in JAGS language are as follows.

Data Creation

For modeling the locomotive controls censored data using JAGS, the data are created in R as follows. The data frame failTime contains two variables, time and status, where time represents the vector of failure times and status represents the vector of censoring status, 0 for censored and 1 for uncensored values.

failTime <- data.frame(time=c(22.5,37.5,46.0,48.5,51.5,53.0,54.5, 57.5,66.5,68.0,69.5,76.5,77.0,78.5,80.0,81.5,82.0,83.0,84.0, 91.5,93.5,102.5,107.0,108.5,112.5,113.5,116.0,117.0,118.5, 119.0,120.0,122.5,123.0,127.5,131.0,132.5,134.0, rep(135.0,59)), status=c(rep(1,37), rep(0,59))) time <- failTime$time n <- length(time) y <- log(time) censor <- failTime$status zeros <- rep(0,n) C <- 1000 k <- 1 # k=1 for logistic model and k=30 for Weibull model data <- list(n=n, y=y, censor=censor, zeros=zeros, C=C, k=k)

Where n is the total number of observations, y represents the logarithm of failure time, censor represents the censoring status, zeros is a vector of zero values equal in length to the total number of observations, and C is a sufficiently large positive 2.4. Bayesian Analysis of Continuous Models 131 value. Last two values are used in Poisson zeros trick for modeling log-Burr distribution in JAGS.

Initial values

To start the chains, the values of the starting points for the model parameters µ and σ are specified by defining a function as inits <- function(){ list(mu=rlnorm(1), sigma=runif(1)) }

Model Specification

Despite the breadth of offering, there may be occasions when we would like to use a probability distribution that is not built in JAGS. Here is the same situation with log-Burr distribution, which is not defined or included in the list of standard distribution of JAGS. To resolve the problem, one possibility is to use the so called “Poisson zeros trick” or the analogous “Bernoulli ones trick” (e.g., Kruschke, 2015; Lunn et al., 2013; Ntzoufras, 2009).

For the Bayesian analysis of log-Burr model using JAGS, Poisson zeros trick is adopted with the fact that dpois(0, φ) = exp(−φ), where φ is set to − log(L).

Therefore, the JAGS statement zerosi ∼ dpois(− log(Li)) yields the value Li,

the likelihood. So, we can define Li = pdf(yi, parameters) and put − log(Li) (or

log-likelihood li) into dpois to get the desired likelihood value in JAGS. But notice

that − log(Li) must be positive value (to be valid for the Poisson distribution), therefore, the likelihood must have a constant, say C, added to it to be sure it is

positive no matter what value of yi and parameters gets put into it. Moreover, in JAGS, the vector of 0’s (zeros) must be defined outside the model definition. One way to do that is with the data. Thus, the JAGS code of log-Burr model for locomotive censored data is as follows: 132 2. Bayesian Reliability Analysis of Parametric Models

cat("model{ for(i in 1:n){ zeros[i]~dpois(phi[i]) phi[i]<- -l[i]+C l[i]<- censor[i]*(-log(sigma)+((y[i]-mu)/sigma) -(k+1)*log(1+exp((y[i]-mu)/sigma)/k)) +(1-censor[i])*((-k*log(1+exp((y[i]-mu)/sigma)/k))) } ## Priors mu~dunif(-100,100) sigma~dunif(0,100) }", file="modelLoco.txt")

For Bayesian analysis of log-Burr model using JAGS, the independent and weakly informative priors of the model parameters are considered and defined at the end. Note that, the deviance for data from distributions specified using the zeros trick is calculated with respect to zeros[i], not y[i]. Thus, to transform this to the scale of y[i], the deviance (D) is calculated by using the formula (Lunn et al., 2013)

Dzero = D + 2nC (2.25) where Dzero represents the deviance with respect to zeros[i], D is the natural deviance corresponding to y[i], C is the constant applied to ensure that the Poisson rate is positive, and n is the number of observations, which is 96 in this example. Finally obtained deviance is reported in summary table.

Model Fitting

After writing the full model code, the data, and the initial values, the model is fitted with jags. To simulate the data using MCMC algorithm, 10000 iterations 2.4. Bayesian Analysis of Continuous Models 133

are used for each of the three chains. The results of the posterior densities are assigned to the object FitJags, and reported in next section. set.seed(123) FitJags<-jags(data=data, inits=inits, param=c("mu","sigma"), n.chains=3, n.iter=10000, model.file="modelLoco.txt") FitJags

Summarizing Output

Posterior summaries of the log-Burr model parameters using MCMC algorithm via JAGS are reported in Table 2.16. First five columns show the posterior mean, standard deviation, and 97.5% credible interval for both of the model parameters including deviance. Last two columns indicate BGR statistics (Rhat) and number of effective sample size (n.eff). The values of Rhat indicate that chains are mixed well, implying good convergence. Graphical summary of the model parameters can be seen in Figure 2.6

Log-logistic model (k=1) Parameter Mean SD 2.5% Median 97.5% Rhat n.eff mu 5.106 0.097 4.936 5.098 5.324 1.001 3000 sigma 0.409 0.063 0.305 0.402 0.548 1.001 3000 deviance 149.305 2.071 147.263 148.665 154.781 1.000 1 Weibull model (k=30) Parameter Mean SD 2.5% Median 97.5% Rhat n.eff mu 5.240 0.101 5.071 5.230 5.467 1.001 3000 sigma 0.459 0.074 0.338 0.451 0.623 1.001 3000 deviance 149.576 2.106 147.531 148.953 155.114 1.000 1

Table 2.16: JAGS posterior densities using MCMC algorithm . Although, results are reported for k=30 only, however, it has been observed that results are same for larger values of k. 134 2. Bayesian Reliability Analysis of Parametric Models

Logistic (k=1) Logistic (k=1)

LaplaceApproximation LaplaceApproximation 4 LaplacesDemon 6 LaplacesDemon jags jags 5 3 4 2 3 Density Density 2 1 1 0 0

4.8 5.0 5.2 5.4 5.6 0.2 0.3 0.4 0.5 0.6 0.7 β0 σ

Weibull (k=30) Weibull (k=30) 6

4 LaplaceApproximation LaplaceApproximation LaplacesDemon LaplacesDemon

jags 5 jags 3 4 3 2 Density Density 2 1 1 0 0

5.0 5.2 5.4 5.6 5.8 0.2 0.4 0.6 0.8 β0 σ

Figure 2.6: Plot of posterior densities of the parameters β0 and σ for the posterior distribution of log-Burr model using the functions LaplaceApproximation, LaplacesDemon, and jags. It is evident from these plots that LaplceApproximation is excellent as it resembles with LaplacesDemon and jags. The difference among three seems magical.

2.5 Discussion and Conclusion

Component reliability is the foundation of reliability assessment and refers to the reliability of single component. In this chapter, Bayesian approach is applied to model different types of single component reliability data using appropriate parametric distributions. Two important techniques, that is, Laplace Approxima- 2.5. Discussion and Conclusion 135

tion and simulation methods are implemented using R, LaplacesDemon, and JAGS software packages. For modeling of reliability data, the complete R and JAGS code are written and provided with detailed description. It has been seen that the results obtained from these analytic and simulation methods using different software packages are very close to each other. The excellency of Laplace approx- imation and simulation methods seems clear in the plots of posterior densities. Furthermore, in practical data analysis, intercept model is merely discussed as a beginning point. More meaningful models are single or multiple regression models, and hierarchical models, which are discussed in next two chapters, respectively.

3

Bayesian Reliability Analysis of Regression Models

Contents 3.1 Introduction...... 139

3.2 Generalised Linear Regression Models...... 141

3.3 The Link Function...... 142

3.4 Regression Analysis of Discrete Models...... 144

3.4.1 Logistic Regression Model for Binomial Data...... 144

3.4.2 Implementation...... 146

3.4.3 Poisson Regression Model for Count Data...... 159

3.4.4 Implementation...... 160

3.5 Regression Analysis of Continuous Models...... 173

3.5.1 Log-logistic regression Model for Lifetime Data...... 175

3.5.2 Implementation...... 176

3.5.3 Generalized Log-Burr Model for Lifetime Data...... 187

3.5.4 Implementation...... 188 138 3. Bayesian Reliability Analysis of Regression Models

3.6 Discussion and Conclusion...... 197 3.1. Introduction 139

3.1 Introduction

So far, different parametric models are considered for component reliability data in the context of single experiment of an item or a minor component within a large system, some of which are censored. In most of the reliability applications, however, the distribution of reliability data depends on other concomitant variables, which helps in handling heterogeneity in a population through the inclusion of these concomitant variables in the model. It is very common for reliability data to involve concomitant variables related to lifetime, for example, the lifetime of electrical insulation may depend on the voltage the insulation is subjected to while in use. In this context, the voltage is a concomitant variable (also) called covariate. The covariate is also referred to as independent variable, explanatory variable, predictor or regressor. In most studies, there are covariates or explanatory variables like voltage, temperature or pressure, whose relationship with lifetime data are of interest. This leads to a consideration of regression models. In general, regression model is a model which involves covariates. This chapter considers Bayesian regression modeling of reliability data with the situations in which reliability data distribution depends on covariates. A regression model with covariates sometimes explains or predicts why some components fail quickly and other survive a long time. If important concomitant or explanatory variables are ignored in an analysis, it may be possible that resulting estimates of quantities of interest, like distribution quantiles or survival probabilities, could be biased seri- ously (Meeker and Escobar, 1998). In reliability studies, possible covariate includes

• Continuous covariate : stress, temperature, voltage or pressure.

• Discrete covariate : number of hardening treatments or number of simultaneous users of a system.

• Categorical covariate : manufacturer, design or location. 140 3. Bayesian Reliability Analysis of Regression Models

Thus, the regression models with lifetime as a response variable and the concomitant variables as covariates, allow such additional factors to be incorporated in a Bayesian statistical analysis.

For the purpose of Bayesian regression analysis of parametric regression models, any of the parametric models discussed in this or previous chapter can be made into a regression model by specifying a relationship between the parametric model and the covariates by a model parameter (possibly transformed) as a function of the covariates. For example, in the well known multiple regression model, the response variable y has a normal distribution

2 y ∼ N(µ(x), σ ), (3.1) with mean µ that is related to k covariates x1, x2, . . . , xk through

T µ = β0 + β1x1 + β2x2 + ... + βkxk = x β, (3.2)

2 T where normal variance σ and regression coefficients β = (β1, β2, . . . , βk) are the set of regression model parameters under estimation.

In Bayesian regression analysis, it is generally advisable to consider centering any covariate (Lunn et al., 2013), that is, subtracting the empirical mean from Pk each value, for example, µ = β0 + j=1 βj(xi − x¯). This has the effect of reducing

the posterior correlation between each coefficient (β1, β2, . . . , βk) and the intercept term, because the intercept is essentially related to the centre of the data.

In ordinary regression modeling, the errors are usually assumed to be normally distributed, for example, the least squares estimators for linear regression are equivalent to maximum likelihood estimators. In context of reliability, the case of normality is much less compelling for the reason that lifetime and strengths are inherently positive quantities. Moreover, frequency distribution of reliability data is often skewed, hence, usual regression modeling techniques developed for 3.2. Generalised Linear Regression Models 141

normal distribution do not work here. In addition, censoring is a common feature of reliability data, which is rare in ordinary regression. However, we are not restricted to normality because the reliability models that commonly arise in reliability applications also include binomial, Poisson, log-normal, log-logistic, and Weibull distributions. These distributions often depend on covariates, and software packages like LaplacesDemon and JAGS make it easy to implement any appropriate distributions. If we suspect outlying observations, for example, we can provide some robustness against their effects by choosing a heavy-tailed t-distribution or logistic distribution. Thus, the outlier can be accommodated within the tails without necessarily forcing the location of the posterior to be moved significantly.

3.2 Generalised Linear Regression Models

In most of the situations, the reliability models are not restricted to normality. For example, the success/failure and failure count data follow binomial and Poisson distributions, respectively. Failure time data can be well approximated by a gamma distribution. These situations lead to a generalised linear regression models or simply Generalised Linear Models (GLMs) given by Nelder and Wedderburn(1972). GLMs constitute a wide variety of models encompassing stochastic representations used for the analysis of both discrete and continuous reliability response variables. Generalised linear regression models can be regarded as the natural extension of normal linear regression models and are based on exponential family of distributions, which involves the most common distributions such as normal, Poisson, and binomial. GLMs became very popular because of their generality and wide range of applications. This can be considered as one of the most prominent and important component of modern statistical theory (Ntzoufras, 2009). They have provided not only a family of models that are widely used in practice but also in unified, general way of thinking concerning the formulation of statistical models. 142 3. Bayesian Reliability Analysis of Regression Models

The two main ideas of generalised linear model are that, first, a transformation of the expectation of the response E(y) is expressed as a linear combination of covariate effects rather than the expected (mean) response itself. Second, for the random part of the model, distribution other than the normal can be chosen, for example, binomial or Poisson. Formally, a generalised linear model is described in the following three stages (Gelman et al., 2004):

1. The linear predictor, η = Xβ, which is a linear combination of covariate effects that are thought to make up g(E(y)). This is the systematic or deterministic part of the system description.

2. The link function g( · ) that relates the linear predictor to the mean of the outcome variable: µ = g−1(η) = g−1(Xβ).

3. The random component specifying the distribution of the response variable y with mean E(y|X) = µ. In general, the distribution of y given x can also depend on a dispersion parameter φ. This is the stochastic part of the system description.

Thus, the mean of the distribution of y, given X, determined by Xβ = E(y|X) = g−1(Xβ). We use the same notation as in linear regression whenever possible, so that X is the n × k matrix of explanatory variables and η = Xβ is the vector of n linear predictor values.

3.3 The Link Function

The link function is a monotonic and differentiable function used to relate the parameter of the response variable with the deterministic component, namely, the linear predictor and the associated covariates (Ntzoufras, 2009). Usually no restriction lies on the definition of such variables, but often we focus on the mean 3.3. The Link Function 143

of the distribution because the measures of central location are usually of main interest. GLM-based extensions in which dispersion or shape parameters are linked with covariates also exist in statistical literature (e.g., Rigby and Stasinopoulos, 2005). A desirable property of the link function is to map the range of values in which the parameter of interest lies with the set of real numbers < in which the linear predictor takes values. For example, in the binomial case, we wish to identify link functions that map the success probability p from [0, 1] in <.

The sampling distribution that commonly arises in reliability applications are log-normal, Poisson, Binomial, log-logistic and Weibull, and that we use the link function to transform parameters of that distribution onto a scale where a linear regression model can be used appropriately. The simplest link function is the one, called identity link, that sets the linear predictor equal to the mean µ. This is indeed the usual link function for the normal models. However, this function is not appropriate for other distributions such as the Poisson since their mean is positive while η ∈ <. In case, where the parameter of interest is positive, usually log link function is adopted, which implies a multiplicative relation between the parameter of interest and covariates. For example, for failure count data following a Poisson[λ(x)] distribution, the log link function for mean count is defined as

log(λ) = log[E(y|X)] = Xβ. (3.3)

For binomial model with success probability p, the logit link function is used and is defined as log(p) = log[p/(1 − p)] = Xβ. (3.4)

Alternatives to logit link are the probit Φ−1(p), and complementary log-log, log[− log(1 − p)]. For instance, the normal distribution combined with an identity link yields the general linear model, the Poisson with a log link yields a log-link model, and the binomial with a logit link yields a logistic regression model. A list of link function for common distributions are summarized in Table 3.1. 144 3. Bayesian Reliability Analysis of Regression Models

Link Link Function Distribution Name g(µ) Normal Identity µ Poisson Logarithmic log(µ) Binomial Logit log[µ/(1 − µ)] Negative Binomial Complementary log[µ/(k + µ)] Gamma Reciprocal 1/µ Inverse Gaussian Squared Reciprocal 1/µ2

Table 3.1: Link functions of most common members of exponential family of distributions.

3.4 Regression Analysis of Discrete Models

In this section, two important discrete regression models, that is, logistic and Poisson regression models are considered as reliability models.

3.4.1 Logistic Regression Model for Binomial Data Binomial data are frequently encountered in modern science specially in the field of reliability applications, where the observed response is usually binary indicating whether a component fails or not during an experiment or binomial such as the number of failures over a specified time. In this case, the most popular model is the logistic regression model, in which a logit link function is usually adopted. Bayesian analysis of logistic regression model for reliability data is discussed here. However, complementary log-log model is discussed in Chapter 4.

Consider the response variable yi for i = 1, 2, . . . , ni is binomially distributed

(ni = 1 for binary data where yi = 1 or 0) with success probability θi which is denoted by yi ∼ Binomial(ni, θi). Then the logistic regression model relates θ to the covariates through the link function as ! θi T logit(θi) = log = xi β (= ηi). (3.5) 1 − θi 3.4. Regression Analysis of Discrete Models 145

A desirable feature of the logit transformation of distribution parameter θ is that it is defined on (−∞, ∞), so that there are no restriction on β, and hence, provides flexibility in specifying prior distributions for β. The inverse transformation of Equation (3.5) gives an expression

T ηi exp(xi β) e θi = = , (3.6) T ηi 1 + exp(xi β) 1 + e

called inverse logit function, has the form of logistic cumulative distribution function which means that there is symmetry about zero. The likelihood contribution for

binomial response yi of the logistic regression model is given by

n T Y yi ni−yi p(y|xi β) = (θi) (1 − θi) i=1 n Pk !yi !ni−yi Y exp(β0 + j=1 βjxij) 1 = Pk Pk i=1 1 + exp(β0 + j=1 βjxij) 1 + exp(β0 + j=1 βjxij)  n k n k  X X X X = exp nyβ¯ 0 + ( βjxij)yi − ni log(1 + exp(β0 + βjxij)) i=1 j=1 i=1 j=1

For j = 0, 1 (i.e., β0, β1) and ni = 1 (for binary data,) the likelihood will be

" n n # X X p(y|β0, β1) = exp nyβ¯ 0 + β1 xiyi − log(1 + exp(β0 + β1xi)) , (3.7) i=1 i=1 where xi is the vector of covariate values associated with (yi, ni).

Regarding the choice of prior distributions for the regression coefficients β, if low information is available about each of the coefficients, one better choice of prior is

k βj ∼ N(0, 10 ), (3.8)

that is a suitably weak prior for sufficiently large value of k, it is commonly 4 to 6. If more information is available, then the normal distribution with mean possibly different from zero and a small variance is a better choice. 146 3. Bayesian Reliability Analysis of Regression Models

When applying the Bayes’ theorem, the prior is multiplied by the likelihood function and thus the posterior density is

p(β0, β1|y, X) ∝ p(y|β0, β1,X) p(β0) p(β1) where X is model matrix containing a column of 1’s and a column of covariate x.

Consequently, marginal posterior densities for β0 and β1 can be expressed as

Z p(β0|y, X) = p(β0, β1|y, X)dβ1

and Z p(β1|y, X) = p(β0, β1|y, X)dβ0 which are not in closed form. So, it is very difficult to compute or plot these marginal densities. At this stage one is forced to use analytical and simulation tools to implement Bayesian analysis.

3.4.2 Implementation

To implement the above logistic regression model by choosing normal distribution with large variance as a weakly informative prior for regression coefficients, let us consider a data set taken from Grant et al. (1999). The same data is also discussed in Hamada et al.(2008). All the concepts and computations will be discussed around that data. Here, we modeled these data using logistic regression model. For the computation of marginal posterior densities of each βs which is in a complex integral form, the Laplace approximation technique is used to approximate the integral. Parallel simulation tools are also implemented to draw the samples from marginal posterior densities to approximate the results with sampling importance resampling method (Gordan et at., 1993) and one of the MCMC algorithms. These techniques are implemented using the LaplacesDemon and 3.4. Regression Analysis of Discrete Models 147

JAGS. The MCMC algorithms used to approximate the integrals via IM algorithm and MWG algorithm using LaplacesDemon function of LaplacesDemon and jags function of R2jags, respectively. An important thing about IM algorithm is that it update the model with Laplace approximation, and then supply the posterior mode and covariance matrix to the IM algorithm. The Laplace approximation is already a well known approximation technique (Tierney and Kadane, 1986) which accurately approximate the integrals.

3.4.2.1 High-Pressure Coolant Injection System Demand Data

The reliability of U.S. commercial nuclear power plants is an extremely important consideration in managing public health risk. The high-pressure coolant injection (HPCI) system is a frontline safety system in a boiling water reactor (BWR) that injects water into a pressurized reactor core when a small break loss-of-coolant accident occurs. Grant et al.(1999) lists 63 unplanned demands to start for HPCI system at 23 U.S. commercial BWRs during 1987–1993. Table 3.2 presents these demands in which all failures are counted together, including failure to start, failure to run, failure of the injection valve to reopen after operating successfully earlier in the mission, and unavailability because of maintenance (Hamada et al., 2008). In data table, asterisks (∗) identify the 12 demands for which HPCI system failed.

3.4.2.2 Analysis Using LaplacesDemon

Fitting with LaplaceApproximation

The first thing is to provide data as LaplacesDemon needs. For this, the data set given in Table 3.2 can be created in R format as follows.

Data Creation

In HPCI system demand data set the binary response variable is failure denoted by y = 1 (0), where 1 stands for failure and 0 for success, and an explanatory variable 148 3. Bayesian Reliability Analysis of Regression Models

01/05/87* 08/03/87* 03/05/89 08/16/90* 08/25/91 01/07/87 08/16/87 03/25/89 08/19/90 09/11/91 01/26/87 08/29/87 08/26/89 09/02/90 12/17/91 02/18/87 01/10/88 09/03/89 09/27/90 02/02/92 02/24/87 04/30/88 11/05/89* 10/12/90 06/25/92 03/11/87* 05/27/88 11/25/89 10/17/90 08/27/92 04/03/87 08/05/88 12/20/89 11/26/90 09/30/92 04/16/87 08/25/88 01/12/90* 01/18/91* 10/15/92 04/22/87 08/26/88 01/28/90 01/25/91 11/18/92 07/23/87 09/04/88* 03/19/90* 02/27/91 04/20/93 07/26/87 11/01/88 03/19/90 04/23/91 07/30/93 07/30/87 11/16/88* 06/20/90 07/18/91* 08/03/87* 12/17/88 07/27/90 07/31/91

Table 3.2: Dates of unplanned HPCI system demands and failures during 1987- 1993.

or covariate is time, which denotes the number of elapsed days from a chosen reference data 01/01/87, for 63 demands. To calculate the number of elapsed days for each demand from reference date, the function difftime of R is used. This function takes two date-time objects as its arguments to calculate the time difference between two dates and returns an object of class difftime with an attribute indicating the units. Then the function as.numeric is used to convert obtained time difference (that is, the number of elapsed days) into numeric form as only the number of days is needed to use in analysis. Since an intercept term will be included, a vector of 1’s is inserted into the model matrix X. Thus, J = 2 indicates that, there are two columns in model matrix, first column for intercept term and second column for regressor, in the design matrix.

N <- 63; J <- 2 y <- c(1,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,1,0, 0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0, 0,0,0,0,0,0,0,0,0,0,0) time <- difftime(c("1987-01-05","1987-01-07","1987-01-26", 3.4. Regression Analysis of Discrete Models 149

"1987-02-18","1987-02-24","1987-03-11","1987-04-03", "1987-04-16","1987-04-22","1987-07-23","1987-07-26", "1987-07-30","1987-08-03","1987-08-03","1987-08-16", "1987-08-29","1988-01-10","1988-04-30","1988-05-27", "1988-08-05","1988-08-25","1988-08-26","1988-09-04", "1988-11-01","1988-11-16","1988-12-17","1989-03-05", "1989-03-25","1989-08-26","1989-09-03","1989-11-05", "1989-11-25","1989-12-20","1990-01-12","1990-01-28", "1990-03-19","1990-03-19","1990-06-20","1990-07-27", "1990-08-16","1990-08-19","1990-09-02","1990-09-27", "1990-10-12","1990-10-17","1990-11-26","1991-01-18", "1991-01-25","1991-02-27","1991-04-23","1991-07-18", "1991-07-31","1991-08-25","1991-09-11","1991-12-17", "1992-02-02","1992-06-25","1992-08-27","1992-09-30", "1992-10-15","1992-11-18","1993-04-20","1993-07-30"), "1987-01-01") time <- as.numeric(time) time <- time - mean(time) X <- cbind(1,as.matrix(time)) mon.names <- "LP" parm.names <- as.parm.names(list(beta=rep(0,J))) PGF <- function(Data) return(rnormv(Data$J,0,1000)) MyData <- list(N=N, J=J, PGF=PGF, X=X, mon.names=mon.names, parm.names=parm.names, y=y)

In this case, there are two parameters beta[1] and beta[2], which are specified in a vector parm.names. The logposterior LP is included as monitored variable in vector mon.names. Total number of observations is specified by N, i.e., 63. Finally, all these things are combined with object name MyData that returns data in a list. 150 3. Bayesian Reliability Analysis of Regression Models

Initial Values

Initial value is a starting point of the iteration for the optimization of a parameter. The LaplaceApproximation requires a vector of initial values for each parameter to start the iterations. So, both the β parameters have been set equal to zero with object name Initial.Values as

Initial.Values <- rep(0,J)

Model Specification

For modeling these HPCI data where the response variable is binary since each demand results in either an HPCI failure or success, we use the binomial with n = 1 distribution. The Bernoulli distribution can also be used instead of binomial without any problem. An explanatory variable or regressor is the number of elapsed days denoted by time. Thus, the model specified is a logistic regression, and can be described as

yi ∼ Binomial(θi, 1), i = 1, 2,..., 63

where yi = 1 (0) denotes an HPCI failure (success). The logit link function (other link functions like probit, complementary log-log, are also possible) is used to relate the model parameter θ and regressor time, that is,

! θi logit(θi) = log = β0 + β1timei 1 − θi where time is centered, that is, time = time − mean(time). The linear predictor is

made up of an intercept β0 and a regressor time with regression coefficient β1.

The independent and weakly informative normal prior with mean 0 and standard deviation 1000 is assumed for each β parameter

βj ∼ N(0, 1000), j = 1, . . . , J. 3.4. Regression Analysis of Discrete Models 151

The large variance or small precision indicates a lot of uncertainty for each β, and is hence a weakly informative prior. Finally, the specification of the above de- fined logistic regression model for fitting with LaplaceApproximation is as follows:

Model <- function(parm, Data){ ### Parameters beta <- parm[1:Data$J] ### Log(Prior Densities) beta.prior <- sum(dnorm(beta, 0, 1000, log=TRUE)) ### Log-Likelihood mu <- tcrossprod(Data$X, t(beta)) theta <- invlogit(mu) LL <- sum(dbinom(Data$y, 1, theta, log=TRUE)) ### Log-Posterior LP <- LL + beta.prior Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP, yhat=rbinom(length(theta), 1, theta), parm=parm) return(Modelout) }

Function Model contains two arguments parm and Data, where parm is the set of parameters, and Data is the list of data. The regression parameters having priors beta.prior. The object LL stands for loglikelihood and LP stands for logposterior. The function Model returns the object Modelout, which contains five objects in listed form that includes logposterior LP, deviance Dev, monitoring parameters Monitor, predicted values yhat and estimates of parameters parm.

Model Fitting

For fitting of the logistic regression model with weakly informative priors for the regression parameters, the function LaplaceApproximation of LaplacesDemon 152 3. Bayesian Reliability Analysis of Regression Models

is called, and its results are assigned to object Fit. Summaries of results are printed using the function print, and its relevant parts are summarized in the next section.

Fit <- LaplaceApproximation(Model=Model, parm=Initial.Values, Data=MyData, Method="NM", Iterations=10000, sir=TRUE) print(Fit)

Summarizing Output

The summary information of LaplaceApproximation, which approximates poste- rior density of fitted model, are summarized in the following two tables. Table 3.3 represents summary matrix of analytic results using Laplace approximation method. Rows are parameters and columns include, Mode, SD (Standard Deviation), LB (Lower Bound), and UB (Upper Bound). The bound constitutes a 95% credible interval. Table 3.4 represents the simulated results due to sampling importance resampling algorithm conducted via SIR function to draw independent posterior samples, which is possible only when LaplaceApproximation has converged.

Parameter Mode SD LB UB beta[1] -1.4902 0.3332 -2.1565 -0.8238 beta[2] -0.0006 0.0005 -0.0015 0.0004

Table 3.3: Summary of the analytic approximation using the function LaplaceApproximation.

Parameter Mean SD MCSE ESS LB Median UB beta[1] -1.5404 0.3397 0.0107 1000 -2.1981 -1.5248 -0.9335 beta[2] -0.0006 0.0005 0.0000 1000 -0.0016 -0.0006 0.0004 Deviance 62.1583 2.2389 0.0708 1000 60.0814 61.4733 69.2003 LP -46.7325 1.1195 0.0354 1000 -50.2535 -46.3900 -45.6941

Table 3.4: Summary of the simulated results due to sampling importance resam- pling method using the same function. 3.4. Regression Analysis of Discrete Models 153

From the summary output of analytic Laplace approximation method via LaplaceApproximation function reported in Table 3.3, it may be noted that

the posterior mode of parameter β0 is −1.4902 ± 0.3332 with 95% credible interval (−2.1565, −0.8238), which is statistically significant, whereas the posterior mode of

β1 is −0.0006±0.0005 with 95% credible interval (−0.0015, 0.0004), which is statisti- cally not significant. The simulated results due to sampling importance resampling algorithm using the same function reported in Table 3.4 says that the the posterior

mean of β0 is −1.5404 ± 0.3397 with 95% credible interval (−2.1981, −0.9335), whereas the posterior mean of β1 is −0.0006 ± 0.0005 with 95% credible interval (−0.0016, 0.0004). The deviance 62.16 is a measure of goodness of fit.

Fitting with LaplacesDemon

In this section, MCMC technique is used for the Bayesian analysis of logistic model, and model is fitted with LaplacesDemon function of LaplacesDemon, which maximizes the logarithm of unnormalized joint posterior density using one of the MCMC algorithms, and provides samples of the marginal posterior distribution, deviance, and other monitored variables.

Initial Values

Before fitting the model with LaplacesDemon it is necessary to specify ini- tial values for each parameter as a starting point for an adaptive chain, or a non-adaptive Markov chain. If initial values for all parameters are set to zero, then LaplacesDemon will attempt to optimize the initial values with the LaplaceApproximation using a resilient backpropagation algorithm. Hence, it is better to use the last fitted object Fit with the function as.initial.values to get a vector of initial values from LaplaceApproximation for fitting with LaplacesDemon. Thus, to get a vector of initial values the R command is

Initial.Values <- as.initial.values(Fit) 154 3. Bayesian Reliability Analysis of Regression Models

Model Fitting

LaplacesDemon is an implementation of stochastic, or a pseudo-random numbers, so it is better to set a seed with set.seed function for pseudo-random number generation before fitting the model, so results can be reproduced. Finally, we call the LaplacesDemon function to maximize the first component in the list output from the pre-specified Model function, given a data set called Data with the following setting. The fitted model specifies the IM (Independent Metropolis) algorithm for updating. Simulation results are assigned to the object FitDemon, and its relevant parts are summarized in the next section.

set.seed(666) FitDemon <- LaplacesDemon(Model=Model, Data=MyData, Initial.Values, Covar=Fit$Covar, Iterations=50000, Status=1000, Algorithm="IM", Specs=list(mu=Fit$Summary1[1:length(Initial.Values),1])) print(FitDemon)

Summarizing Output

Table 3.5 shows the simulated results in a matrix form that summarizes the marginal posterior densities of parameters due to stationary samples which includes Mean, SD (Standard Deviation), MCSE (Monte Carlo Standard Error), ESS (Effective Sample Size), and 2.5%, 50%, 97.5% quantiles.

Parameters Mean SD MCSE ESS LB Median UB beta[1] -1.4956 0.1979 0.0028 4666 -1.8852 -1.4932 -1.1164 beta[2] -0.0006 0.0003 0.0000 5000 -0.0011 -0.0006 0.0000 Deviance 60.7504 0.6960 0.0103 4361 60.0622 60.5393 62.6143 LP -46.0286 0.3480 0.0052 4361 -46.9605 -45.9230 -45.6845

Table 3.5: Summary of the MCMC results due to Independent Metropolis algo- rithm using the LaplacesDemon function. 3.4. Regression Analysis of Discrete Models 155

In the above output, Laplace’s Demon is appeased, because all the five conditions LaplacesDemon needs are satisfactory. Here are the criteria it measures against. The final algorithm must be non-adaptive, so that the Markov property holds. The acceptance rate of most algorithms is considered satisfactory if it is within the interval [15%, 50%]. MCSE is considered satisfactory for each target distribution if it is less than 6.27% of the standard deviation of the target distribution. This allows the true mean to be within 5% of the area under a Gaussian distribution around the estimated mean. ESS is considered satisfactory for each target distribution if it is at least 100, which is usually enough to describe 95% probability intervals. And finally, each variable must be estimated as stationary.

From Table 3.5, it is noticed that all criteria have been met: MCSEs are sufficiently small, ESSs are sufficiently large, and all parameters were estimated to be stationary, the algorithm is non-adaptive independent Metropolis and Markov property holds, and acceptance rate 0.49 (i.e., 49%) of the algorithm lies within the interval [15%, 50%].

3.4.2.3 Analysis Using JAGS

In this section, the software JAGS is called from within R via R2jags to conduct the same Bayesian analysis of logistic regression model for HPCI data. After fitting the model with jags, a comparison will be made with the results obtained from LaplaceApproximation and LaplacesDemon, illustrated in previous sections.

Data Creation

The first step to provide data to JAGS in a list statement. In this case, we provide the vector of HPCI response on demands over time. Thus, the specification of these data set containing the name of each vector, as jags needs, is as follows. 156 3. Bayesian Reliability Analysis of Regression Models

n <- 63; y <- y time <- as.vector(time) time <- time-mean(time) data.jags <- list(n=n, y=y, time=time)

Here, n is the total number of observations, y is the observed response for each demand, and time is the number of elapsed days for 63 demands, and it is centered for improving convergence.

Initial Values

Initial values are used to start MCMC sampler, and may be provided for all stochastic nodes except for the response data variable. Hence, the initial values for parameters to start the chains are specified by writing a function as follows.

inits <- function(){list(beta.1=0, beta.2=0)}

Model Specification

For modeling the HPCI data, the binomial distribution is adopted to model the binary response variable. The model is defined as

yi ∼ Binomial(θi, 1), i = 1, 2,..., 63 with logit link function

! θi logit(θi) = log = β0 + β1timei, 1 − θi where θi is the success probability. The weakly informative normal priors with mean 0 and precision 0.001 are defined for each β parameters as

βj ∼ N(0, 0.001), j = 1, 2. 3.4. Regression Analysis of Discrete Models 157

Here is the JAGS code specifying the above logistic regression model using cat function to put in a file with name modelHPCI.txt.

cat("model{ for(i in 1:n){ y[i] ~ dbin(theta[i],1) logit(theta[i])<- beta.1+beta.2*time[i] } # Priors beta.1~dnorm(0,0.001) beta.2~dnorm(0,0.001) }", file="modelHPCI.txt")

Model Fitting

Finally, Bayesian model with weakly informative priors using jags function is fitted and its results are assigned to object Fit.Jags. Summary of results is reported in next section.

set.seed(123) Fit.Jags <- jags(data.jags, inits, parameters=c("beta.1","beta.2"), n.iter=20000, model.file = "modelHPCI.txt") print(Fit.Jags)

Summarizing Output

The following table shows the summary output of the posterior densities after being fitted to the logistic regression model for HPCI data. From this JAGS output, it is noticed that the values of the posterior mean of both beta parameters are very close to the values obtained from SIR and IM algorithms, although, β0 is

statistically significant whereas β1 is not significant. Its deviance 62.09 is almost 158 3. Bayesian Reliability Analysis of Regression Models

equal to the deviance 62.16 via SIR, and slightly differ with the deviance 60.75 of IM. The values of Rhat are less than 1.1 of each parameter, which indicates that chains are mixed well, implying good convergence. The values of n.eff are also satisfactory.

Parameters Mean SD 2.5% 50% 97.5% Rhat n.eff beta.1 -1.5470 0.3490 -2.2894 -1.5330 -0.9223 1.00 3000 beta.2 -0.0006 0.0005 -0.0016 -0.0006 0.0004 1.00 3000 deviance 62.0873 2.0489 60.0950 61.4526 67.6542 1.00 2700

Table 3.6: Posterior summary of JAGS simulations after being fitted to the logistic regression model for HPCI data.

Logistic Regression Model

LaplaceApproximation LaplacesDemon

0.8 jags Density 0.4 0.0 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 β1

LaplaceApproximation LaplacesDemon

600 jags Density 200 0

−0.002 −0.001 0.000 0.001 β2

Figure 3.1: Plots of posterior densities of the parameters β1 and β2 for the Bayesian analysis of logistic regression model using the functions LaplaceApproximation, LaplacesDemon, and jags, respectively. It is evident from these plots that the posterior densities obtained from three different methods are very close to each other. 3.4. Regression Analysis of Discrete Models 159

3.4.3 Poisson Regression Model for Count Data

In this section, focus is on Poisson regression model for response variable that is counts. Such variables usually express the number of failures (or successes) within a given period of time. The data obtained from such a process are called failure count data or simply count data. Failure count data are usually modeled using a Poisson model. The Poisson generalized linear model often called Poisson regression model, assumes that the response variable y is Poisson with mean θ, and therefore, variance θ. Because of the typically chosen log-link function, the Poisson regression model also known as Poisson log-linear model. Thus, the Poisson regression model is summarized by the following expressions,

yi ∼ Poisson(θi) with log-link function,

J X log(θi) = β0 + βjxij = (Xβ)i = ηi, j=1

where ηi = (Xβ)i is the linear predictor for the ith case. The likelihood contribution

for data response y = (y1, . . . , yn) is thus

n 1 Y − exp(ηi) yi p(y|θ) = e [exp(ηi)] . i=1 yi!

When considering the Bayesian posterior distribution, we condition on y, and so

the factors of 1/yi! can be absorbed into an arbitrary constant (Gelman et al., 2004). For example, let us consider a case where only one covariate is involved. Then the mean of Poisson response y can be express using the log-link function as

log(θi) = β0 + β1xi. 160 3. Bayesian Reliability Analysis of Regression Models

The likelihood contribution for this model will be of the form

n 1 Y − exp(β0+β1xi) yi p(yi|θ) = e [exp(β0 + β1xi)] . i=1 yi! " n # X = exp (yi (β0 + β1xi) − exp(β0 + β1xi) − log yi!) i=1 " n # n X  Y = exp (yi (β0 + β1xi) − exp(β0 + β1xi)) yi! i=1 i=1 " n # X ∝ exp (yi (β0 + β1xi) − exp(β0 + β1xi)) . i=1

Using the normal prior distribution of the type

k βj ∼ N(0, 10 ) (3.9)

(usually k = 4 to 6) for parameter β involved in the linear predictor, the marginal

posterior densities for β0 and β1 are not is closed form , that is,

Z p(β0|y, X) = p(β0, β1|y, X)dβ1

and Z p(β1|y, X) = p(β0, β1|y, X)dβ0,

and its corresponding summaries cannot be evaluated analytically. Thus, to obtain the posterior densities of this Poisson regression model, analytic approximation or simulation tools are needed. For this, an implementation is made in next section using both analytic approximation and simulation methods.

3.4.4 Implementation

For the implementation of previously discussed Poisson regression model, a data set, called System’s Component Reliability (SCR) data, is taken from SAS Institute Inc. (2008). In this data set, the number of maintenance repairs on a complex system 3.4. Regression Analysis of Discrete Models 161

are modeled as realization of Poisson response variable. A classical regression analysis of these data are discussed in SAS/STATr 9.2 User’s Guide (SAS Institute Inc., 2008). In this section, we modeled these data in Bayesian paradigm, and to draw inferences from posterior distribution both analytic and simulation methods are used. These methods are implemented using LaplacesDemon and JAGS software packages. For analytic approximation, LaplacesDemon approximates posterior densities via Laplace approximation method using LaplaceApproximation funtion, whereas simulations are made from Adaptive Metropolis-within-Gibs (AMWG) and Metropolis-within-Gibbs (MWG) using LaplacesDemon and JAGS, respectively.

3.4.4.1 Systems’s Components Reliability Data

In this data set, system under investigation has a large number of components, which occasionally break down and are replaced or repaired. During a four-year period, the system was observed to be in a state of steady operation, meaning that the rate of operation remained approximately constant. A monthly maintenance record is available for that period, which tracks the number of components removed for maintenance each month. The data are listed in Table 3.7, where the removals are the response variable.

3.4.4.2 Analysis Using LaplacesDemon

Before fitting the model with LaplacesDemon for the Bayesian analysis of Poisson regression model for SCR data, a preliminary analysis is performed by using the function bayesglm of arm (Gelman and Su, 2014) package for R. The function bayesglm is a good way to get quick posterior estimates using Students-t prior distribution for the coefficients.

At starting point of fitting of Poisson regression model for SCR data, we begin with the creation of data as required for bayesglm. This can be done in R as 162 3. Bayesian Reliability Analysis of Regression Models

Year Month Removals Year Month Removals Year Month Removals 1987 1 2 1987 2 4 1987 3 3 1987 4 3 1987 5 3 1987 6 8 1987 7 2 1987 8 6 1987 9 3 1987 10 9 1987 11 4 1987 12 10 1988 1 4 1988 2 6 1988 3 4 1988 4 4 1988 5 3 1988 6 5 1988 7 3 1988 8 4 1988 9 5 1988 10 3 1988 11 6 1988 12 3 1989 1 2 1989 2 6 1989 3 1 1989 4 5 1989 5 5 1989 6 4 1989 7 2 1989 8 2 1989 9 2 1989 10 5 1989 11 1 1989 12 10 1990 1 3 1990 2 8 1990 3 12 1990 4 7 1990 5 3 1990 6 2 1990 7 4 1990 8 3 1990 9 0 1990 10 6 1990 11 6 1990 12 6

Table 3.7: Systems’s Components Reliability Data (SAS Institute Inc.(2008)).

removalsData <- read.table("removalsData.txt",header=TRUE)

For fitting the model, the function bayesglm is called and define the model in the following way, and its results are assigned to object Fitbayesglm.

Fitbayesglm <- bayesglm(removals~factor(year)+factor(month), data=removalsData, family=poisson(link="log"))

A quick summary of results of this fitted model are summarized in Table 3.8. These results are useful as a starting point before getting the better posterior inference. However, to draw more precise inference, we fit the model using LaplacesDemon and JAGS in this and in the next section, respectively. Another reason behind the modeling of SCR data with bayesglm is to obtain the model matrix (also known as design matrix or X matrix) before fitting the model with LaplacesDemon and JAGS. This can be done by using a very powerful R function model.matrix. 3.4. Regression Analysis of Discrete Models 163

coef.est coef.se z value Pr(>|z|) (Intercept) 1.1829 0.2781 4.2534 0.0000 factor(year)1988 -0.1285 0.1922 -0.6687 0.5037 factor(year)1989 -0.2331 0.1977 -1.1790 0.2384 factor(year)1990 0.0527 0.1836 0.2873 0.7739 factor(month)2 0.6716 0.3236 2.0754 0.0379 factor(month)3 0.4900 0.3354 1.4613 0.1439 factor(month)4 0.4391 0.3390 1.2955 0.1952 factor(month)5 0.1377 0.3634 0.3789 0.7048 factor(month)6 0.4391 0.3390 1.2955 0.1952 factor(month)7 -0.0977 0.3866 -0.2527 0.8005 factor(month)8 0.2055 0.3574 0.5749 0.5653 factor(month)9 -0.1899 0.3968 -0.4786 0.6322 factor(month)10 0.6291 0.3262 1.9288 0.0538 factor(month)11 0.3290 0.3473 0.9473 0.3435 factor(month)12 0.8607 0.3130 2.7499 0.0060

Table 3.8: Summary of result of Poisson regression model fitted with bayesglm function for system’s component breakdown data.

This function creates a model matrix or design matrix for a fitted regression model with the specified formula and data. In this case, let’s look at the model matrix of that Poisson regression model fitted with bayesglm for SCR data.

X <- model.matrix(Fitbayesglm) (Intercept) (year)1988 (year)1989 (month)2 (month)3 (month)4 1 1 0 0 0 0 2 1 0 0 0 0 3 1 0 0 0 0 4 1 0 0 0 0 5 1 1 0 0 0 6 1 1 0 0 0

To conserve the space, only few rows and columns of model matrix are reported here. We make use of this X object for the analysis using LaplacesDemon and JAGS. 164 3. Bayesian Reliability Analysis of Regression Models

Fitting with LaplaceApproximation

Firstly, begin with the creation of data as the function LaplaceApproximation needs, followed by setting of initial values to start the iterations for each parameter, model specification, and model fitting. Finally, results are reported in summarizing output section.

Data Creation

In SCR data, the response variable denoted by y is removals, which is the maintenance record in the form of counts. The response depends on two covariates, that is, year and month. The year represents a period of four-year, in which the system was observed, and month denotes a monthly maintenance record, which tracks the number of components removed for maintenance each month. The model matrix is represented by X. Thus, J = 15 indicates that there are fifteen columns in model matrix, first column is for intercept term and remaining columns are for regressors.

N <- 48; J <- 15 X <- model.matrix(Fitbayesglm) y <- removalsData$removals mon.names <- "LP" parm.names <- as.parm.names(list(beta=rep(0,J))) PGF <- function(Data) return(rnormv(Data$J,0,1000)) MyData <- list(N=48,J=J, PGF=PGF, X=X, mon.names=mon.names, parm.names=parm.names, y=y)

In this case, there are fifteen betas, which are specified in a vector parm.names. The logposterior LP is included as monitored variable in mon.names vector. The total number of observations is given by N. At the end, all these are combined in MyData, which returns data in a list as LaplaceApproximation needs. 3.4. Regression Analysis of Discrete Models 165

Initial Values

Next thing is to provide the initial values for each parameter to start the it- erations. So, all the fifteen βs have been set equal to zero in a vector called Initial.Values.

Initial.Values <- c(rep(0,J))

Here, the function rep replicates zero values to J times, that is, equal to the number of βs, which is fifteen in this case.

Model Specification

Since the maintenance record, in case of system’s component reliability data, is

in the form of counts. Therefore, the number of removals yi (i = 1, 2,..., 48) is

assumed to follow Poisson distribution with mean or component removal rate θi,

yi ∼ Poisson(θi)

where the θi satisfy the log-linear regression model,

0 00 log(θi) = β + β yeari + β monthi.

Because the data were recorded at regular intervals from a system operating at a

constant rate (SAS Institute Inc., 2008), each θi is assumed to be a function of year and month only.

Further, to complete the Bayesian modeling, the independent and weakly informative normal prior distribution is assumed for each β parameter as

3 βj ∼ N(0, 10 ), j = 1, 2, . . . , J.

Finally, this Poisson regression model is written in R language as, 166 3. Bayesian Reliability Analysis of Regression Models

Model <- function(parm, Data){ ### Parameters beta <- parm ### Log-Prior beta.prior <- sum(dnormv(beta, 0, 1000, log=TRUE)) ### Log-Likelihood theta <- exp(tcrossprod(Data$X, t(beta))) LL <- sum(dpois(Data$y, theta, log=TRUE)) ### Log-Posterior LP <- LL + beta.prior Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP, yhat=rpois(length(theta), theta), parm=parm) return(Modelout) }

Function Model contains two arguments parm and Data, where parm is the set of parameters, and Data is the list of data. The regression parameters are hav- ing priors beta.prior. The object LL stands for loglikelihood and LP stands for logposterior. The function Model returns the object Modelout, which contains five objects in listed form that includes logposterior LP, deviance Dev, monitoring parameters Monitor, predicted values yhat and estimates of parameters parm.

Model Fitting

Now, it is time to call the function LaplaceApproximation to approximate the posterior densities of Poisson regression model for SCR data using Laplace approx- imation method, and its results are associated with object Fit.

Fit<-LaplaceApproximation(Model=Model, parm=Initial.Values, MyData, Method="BFGS", Iterations=1000, Samples=1000, sir=TRUE) Fit 3.4. Regression Analysis of Discrete Models 167

Summarizing Output

Table 3.9 represents the analytic results of posterior densities using Laplace ap- proximation method. Rows are parameters and columns include mode, standard deviation, lower bound, and upper bound. The bound constitutes a 95% probability interval. Table 3.10 represents simulated results through sampling importance resampling algorithm conducted via SIR function to draw independent posterior samples, which is possible only when LaplaceApproximation has converged.

Parameter Mode SD LB UB beta[1] 1.0847 0.2400 0.6048 1.5647 beta[2] -0.1310 0.1937 -0.5184 0.2564 beta[3] -0.2364 0.1994 -0.6351 0.1623 beta[4] 0.0513 0.1851 -0.3188 0.4215 beta[5] 0.7798 0.2946 0.1905 1.3690 beta[6] 0.5974 0.3084 -0.0194 1.2143 beta[7] 0.5461 0.3126 -0.0791 1.1714 beta[8] 0.2408 0.3412 -0.4415 0.9231 beta[9] 0.5461 0.3126 -0.0791 1.1713 beta[10] -0.0004 0.0163 -0.0329 0.0321 beta[11] 0.3098 0.3343 -0.3588 0.9783 beta[12] -0.0957 0.3810 -0.8578 0.6664 beta[13] 0.7372 0.2977 0.1418 1.3326 beta[14] 0.4349 0.3224 -0.2098 1.0797 beta[15] 0.9690 0.2822 0.4046 1.5334

Table 3.9: Numerical summary of the posterior densities for each betas due to analytic approximation method, that is, Laplace Approximation.

From the summary of output posterior densities using Laplace approximation method, it is observed that the posterior estimate of each betas is very close to the posterior estimates obtained from bayesglm function as well as SIR algorithm.

The effect of four betas, that is, β1, β5, β13, and β15 are statistically significant, whereas remaining are not significant. A very clear graphical summary of these results can be seen in Figure 3.3. 168 3. Bayesian Reliability Analysis of Regression Models

Parameter Mode SD MCSE ESS LB Median UB beta[1] 1.0048 0.271 0.0027 10000 0.595 1.031 1.507 beta[2] -0.1355 0.194 0.0019 10000 -0.503 -0.132 0.219 beta[3] -0.2585 0.210 0.0021 10000 -0.691 -0.211 0.125 beta[4] 0.0221 0.172 0.0017 10000 -0.282 0.045 0.378 beta[5] 0.8287 0.305 0.0031 10000 0.246 0.839 1.340 beta[6] 0.6422 0.303 0.0030 10000 0.017 0.632 1.178 beta[7] 0.6098 0.324 0.0032 10000 -0.032 0.654 1.369 beta[8] 0.2940 0.395 0.0039 10000 -0.420 0.274 0.992 beta[9] 0.5753 0.370 0.0037 10000 -0.210 0.588 1.230 beta[10] -0.0005 0.002 0.0000 10000 -0.003 -0.001 0.002 beta[11] 0.3475 0.355 0.0035 10000 -0.308 0.319 1.218 beta[12] -0.0681 0.377 0.0038 10000 -0.849 -0.004 0.540 beta[13] 0.7507 0.307 0.0031 10000 0.177 0.787 1.430 beta[14] 0.4941 0.332 0.0033 10000 -0.176 0.483 1.009 beta[15] 1.0189 0.308 0.0031 10000 0.450 0.990 1.483 Deviance 210.2688 4.748 0.0475 10000 201.783 210.343 220.174 LP -170.7298 2.374 0.0237 10000 -175.682 -170.766 -166.486

Table 3.10: Numerical summary of the simulated results due to sampling impor- tance resampling algorithm.

Fitting with LaplacesDemon

Here, the function LaplacesDemon is called to fit the model for SCR data, which maximizes the logarithm of unnormalized joint posterior densities. For this example, the Adaptive Metropolis-within-Gibbs (AMWG) algorithm is used to approximate the posterior densities. This algorithm is an adaptive version of Metropolis-within-Gibbs (MWG), and is much simpler than other adaptive methods that adapt based on sample covariance in large dimensions.

Model Fitting

For fitting the model with LaplacesDemon, the R code is written as Initial.Values <- as.initial.values(Fit) FitDemon<-LaplacesDemon(Model, Data=MyData, Initial.Values, Covar= FitDemon$Covar, Iterations=190000, Status=13157, Thinning=190, Algorithm="AMWG", Specs=list(B=NULL, n=10000, Periodicity=368)) 3.4. Regression Analysis of Discrete Models 169

Here, initial values are generated from LaplaceApproximation with resilient backpropagation method using the function as.initial.values to start the chains. The function set.seed is used to set a seed for a pseudo random number generation before fitting the model, so results can be reproduced. Finally, the model is fitted with LaplacesDemon function, and its results are associated with object FitDemon.

Summarizing Output

LaplacesDemon simulated the samples from posterior density, and approximates the results using an MCMC algorithm, that is, Adaptive Metropolis-within-Gibbs. Table 3.11 summarizes the simulated results of marginal posterior densities of parameters due to stationary samples, which includes mean, standard deviation, median, and 95% credible interval.

Parameter Mean SD MCSE ESS LB Median UB beta[1] 1.0609 0.329 0.0133 755 0.361 1.081 1.658 beta[2] -0.1377 0.196 0.0061 1000 -0.522 -0.132 0.257 beta[3] -0.2382 0.196 0.0061 1000 -0.610 -0.246 0.142 beta[4] 0.0468 0.185 0.0057 1000 -0.307 0.047 0.406 beta[5] 0.7741 0.378 0.0160 756 0.060 0.782 1.492 beta[6] 0.5901 0.375 0.0140 864 -0.126 0.592 1.320 beta[7] 0.5432 0.387 0.0151 863 -0.226 0.547 1.281 beta[8] 0.2137 0.394 0.0150 833 -0.551 0.226 0.948 beta[9] 0.5494 0.383 0.0148 774 -0.186 0.541 1.297 beta[10] -0.0364 0.426 0.0166 776 -0.853 -0.032 0.764 beta[11] 0.2909 0.416 0.0158 817 -0.479 0.277 1.131 beta[12] -0.1309 0.436 0.0158 842 -1.029 -0.128 0.674 beta[13] 0.7400 0.372 0.0151 820 0.073 0.732 1.525 beta[14] 0.4356 0.389 0.0146 807 -0.334 0.423 1.197 beta[15] 0.9797 0.363 0.0145 823 0.303 0.966 1.729 Deviance 211.0293 5.507 0.1870 830 202.183 210.318 223.613 LP -171.1102 2.754 0.0935 830 -177.402 -170.754 -166.686

Table 3.11: Summary of the MCMC simulations due to Adaptibeve Metropolis- within-Gibbs algorithm using the LaplacesDemon function 170 3. Bayesian Reliability Analysis of Regression Models

In this summary output, LaplacesDemon is appeased, because all the five conditions (as discussed in previous analysis), which LaplacesDemon needs, are satisfactory. Moreover, these MCMC results are also very close to the results obtained from LaplaceApproximation, SIR, and method used in bayesglm function. Small deviance 211.0293, which is also very close to the deviance 210.2688 obtained from LaplaceApproximation, suggests that the model is converged well. A graphical comparison can be seen in Figure 3.3.

3.4.4.3 Analysis Using JAGS

In this section, Bayesian analysis of same Poisson regression model is conducted for SCR data using JAGS called from within R via R2jags. After fitting the model, a comparison is made with the results obtained from LaplaceApproximation and LaplacesDemon, conducted in previous section.

Data Creation

For fitting the model with JAGS, the specification of data is needed in a list containing the name of each vector. This can be done in R as follows

N <- 48; J <- 15 X <- model.matrix(Fitbayesglm) y <- removalsData$removals data.jags <- list(N=N, J=J, y=y, X=X) where N represnts the total number of observations, y represents the response variable which is removals in this case, X is the model matrix, and finally these are combined in a list as data.jags.

Initial values To start the chains for MCMC sampler, the initial values for parameters are inits <- function(){list("beta" = rep(0,J))} 3.4. Regression Analysis of Discrete Models 171

Model Specification

The model for the data have the following structure

yi ∼ Poisson(θi)

0 00 log(θi) = β + β yeari + β monthi.

Here, the index of βj takes values from J = 1,..., 15, to be in concordance with the JAGS the code that follows.

The usual independent normal distributions with mean 0 and small precision

0.0001 are considered as weakly informative priors for each βj. Thus, the model that we outlined above is written in JAGS as follows:

cat("model{ for(i in 1:N){ y[i]~dpois(theta[i]) theta[i]<-exp(inprod(X[i,],beta[])) } ## Prior for(j in 1:J){ beta[j]~dnorm(0,0.0001)} }", file="modelRemov.txt")

Model Fitting

After defining and writing the model, data, and initial values, now we need to fit and run the model with jags as

FitJags<-jags(data=data.jags, inits=inits, parameters= c("beta"), n.chains=3, n.iter=25000, model.file="modelRemov.txt") FitJags 172 3. Bayesian Reliability Analysis of Regression Models

Summarizing Output

Posterior summaries and densities of the Poisson regression model, after running the MCMC algorithm for 25000 iterations for each of the three chains are provided in Table 3.12. From these results, it is observed that the posterior mean of each param- eter are much comparable with the mean obtained from LaplaceApproximation and LaplacesDemon. The Brook-Gelman-Rubin statistics Rhat indicates that chains are mixed well, which indicates a good convergence of the posterior densities. Each and every effective sample size n.eff are satisfactory. A very comprehensive graphic presentation of this analysis can be seen in Figure 3.2. Moreover, a graphi- cal comparison of these methods is made in Figure 3.3.

Parameter Mean SD 2.5% 50% 97.5% Rhat n.eff beta[1] 1.027 0.328 0.332 1.048 1.614 1.001 3100 beta[2] -0.129 0.198 -0.514 -0.132 0.259 1.001 3100 beta[3] -0.233 0.195 -0.623 -0.230 0.147 1.001 3100 beta[4] 0.051 0.186 -0.305 0.047 0.417 1.001 3100 beta[5] 0.808 0.373 0.089 0.808 1.562 1.001 3100 beta[6] 0.620 0.386 -0.106 0.608 1.406 1.001 3100 beta[7] 0.570 0.379 -0.139 0.561 1.330 1.001 3100 beta[8] 0.246 0.404 -0.512 0.243 1.076 1.001 2500 beta[9] 0.569 0.381 -0.147 0.555 1.348 1.001 3100 beta[10] 0.004 0.434 -0.826 0.004 0.863 1.001 3100 beta[11] 0.323 0.406 -0.454 0.321 1.124 1.001 3100 beta[12] -0.094 0.457 -1.017 -0.090 0.788 1.001 3100 beta[13] 0.762 0.371 0.052 0.748 1.532 1.001 3100 beta[14] 0.450 0.390 -0.300 0.452 1.224 1.002 1100 beta[15] 1.004 0.361 0.349 0.986 1.748 1.001 3100 deviance 211.033 8.392 202.147 210.172 223.327 1.001 3100

Table 3.12: Summary of the MCMC simulations due to Metropolis-within-Gibbs algorithm using JAGS. 3.5. Regression Analysis of Continuous Models 173

Bugs model at "modelRemov.txt", fit using jags, 3 chains, each with 25000 iterations (first 12500 discarded)

80% interval for each chain R−hat medians and 80% intervals −1 0 1 2 1 1.5 2+

beta[1] ● [2] ● [3] ● [4] ● ● [5] 2 [6] ● [7] ● [8] ● [9] ● ● [10] ● ● ● [11] ● 1 ● ● [12] ● ● ● [13] ● ● ● beta ● [14] ● ● ● ● [15] ● ● ● ● 0 ● ● ● −1 0 1 2 1 1.5 2+ ●

−1 1 2 3 4 5 6 7 8 9 10 12 14

220

● deviance210 ●

200

Figure 3.2: Graphical summary of JAGS simulations after being fitted to the Poisson regression model for system’s components reliability data. The left side of the display shows the overlap of three parallel chains, at the middle Rhat is near one for all parameters indicating good convergence, and right side shows the posterior inference for each parameter and the deviance.

3.5 Regression Analysis of Continuous Models

So far, Baysian analysis of discrete reliability models are discussed, in this section, continuous reliability distributions are considered for the Bayesian modeling of lifetime data. Consequently, various parametric models are used in the analysis of failure time data and modeling of aging or failure processes. Among models, here we will discuss few important distributions because of their demonstrated usefulness in a wide range of situations. These are logistic, log-logistic, Weibull, and generalized log-Burr distributions. 174 3. Bayesian Reliability Analysis of Regression Models 2.0 2.0 1.2 LaplaceAppro LaplacesDem 1.0 1.0

0.6 jags Density Density Density 0.0 0.0 0.0 −0.5 0.5 1.5 2.5 −1.0 −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5 β1 β2 β3 2.0 0.8 0.8 0.8 1.0 0.4 0.4 0.4 Density Density Density Density 0.0 0.0 0.0 0.0 −0.5 0.0 0.5 −1.0 0.0 1.0 2.0 −1.0 0.0 1.0 2.0 −1.0 0.0 1.0 2.0 β4 β5 β6 β7 0.8 0.8 0.8 0.8 0.4 0.4 0.4 0.4 Density Density Density Density 0.0 0.0 0.0 0.0 −2 −1 0 1 2 −1.0 0.0 1.0 2.0 −2 −1 0 1 2 −2 −1 0 1 2 β8 β9 β10 β11 0.8 0.8 0.8 0.8 0.4 0.4 0.4 0.4 Density Density Density Density 0.0 0.0 0.0 0.0 −2 −1 0 1 2 −1.0 0.0 1.0 2.0 −1.0 0.0 1.0 2.0 −1.0 0.0 1.0 2.0 β12 β13 β14 β15

Figure 3.3: Plots of posterior densities of the β parameters for the posterior dis- tribution of Poisson regression model using the functions LaplaceApproximation, LaplacesDemon, and jags. It is evident from these plots that the posterior densities obtained from three different methods are very close to each other.

Regression analysis of failure times involves specifications for the distribution of a lifetime T , given covariates X. The parametric models allow parameters to depend on X. For example, for Weibull regression model with scale α and shape δ

log α = Xβ ⇒ α = exp(Xβ),

and shape δ is independent of regressors. In such cases, when both the parameters are positive, log-link function is commonly used for modeling.

To complete the specification of Bayesian modeling, it is needed to specify the prior distribution for model parameters. In Bayesian regression analysis of 3.5. Regression Analysis of Continuous Models 175

failure time data, when there are no restriction on βs, one good choice to use an independent normal prior distribution with mean zero and sufficiently large variance for each βi, if little is known. If more is known about a particular regression coefficient, normal prior distribution with mean possibly different from zero and a smaller variance can be used.

Contents of this section are mainly based on the published papers Akhtar and Khan(2014a,b).

3.5.1 Log-logistic regression Model for Lifetime Data

The log-logistic distribution is an important parametric model for reliability as well as survival analysis. The fact that the cumulative distribution function can be written in closed form, like Weibull distribution, is particulary useful for analysis of lifetime data even when the data are censored, and these properties make it advantageous over log-normal distribution. Thus, the log-logistic distribution can be used as a basis failure time model by introducing covariates that effect α but not β by modeling log(α) as a linear function of the covariate. The log-logistic distribution including its density, survival, and hazard function are discussed in Chapter 2.

For Bayesian implementation of log-logistic regression model one need to specify likelihood, prior, and finally posterior density as

p(y|θ) = p(y|µ, σ) n exp( y−µ ) = Y σ−1 σ , y−µ 2 i=1 [1 + exp( σ )] where µ = Xβ, as a result, the likelihood will become

n exp( y−Xβ ) p(y|β, σ) = Y σ−1 σ . y−Xβ 2 i=1 [1 + exp( σ )] 176 3. Bayesian Reliability Analysis of Regression Models

The prior distributions for parameters β and σ are considered as

k βj ∼ N(0, 10 ), j = 1,...,J

(usually k = 3 to 6), and

σ ∼ HC(α = 25).

Thus, the joint posterior distribution is

p(β, σ|y, X) ∝ p(y|β, σ) × p(β) × p(σ) n exp( y−Xβ ) ∝ Y σ−1 σ y−Xβ 2 i=1 [1 + exp( σ )] J 1 1 β2 ! 2α × Y √ exp − j × . k k 2 2 j=1 2π × 10 2 10 π(σ + α )

Consequently, the marginal posterior densities of parameters βj and σ are, respec- tively, Z p(βj|y, X) = p(β, σ|y, X) dβ1 . . . dβj−1dβj+1 . . . dβJ dσ, and Z p(σ|y, X) = p(β, σ|y, X) dβ1 . . . dβJ .

These marginals are the basis of Bayesian inference, which are not in closed form. So, one has to look for analytic and simulation tools. Implementations are made in the following sections.

3.5.2 Implementation

To implement and illustrate applications of log-logistic distribution, let us consider lifetime of steel specimens data. The data, from Crowder (2000), give the lifetimes of steel specimens tested at 14 different stress levels. Using these data, Bayesian analysis of log-logistic lifetime model is disscussed. Both analysitic and simulation methods are implemented for the purpose of illustrations. 3.5. Regression Analysis of Continuous Models 177

3.5.2.1 Lifetimes of Steel Specimens Data

Kimber(1990) and Crowder(2000) discuss data, provided in Table 3.13, on the times to failure of steel specimens subjected to cyclic stress loading of various amplitudes. The same data is discussed in Lawless(2003); they are for 20 specimens at each of the 14 stress amplitudes 32.0, 32.5, 33.0, ..., 38.0, 38.5. Failure times t are in numbers of thousands of stress cycles. None of the 280 times are censored. Figure 3.4 shows box plots of the log failure times at each stress amplitude. This suggests that log failure times tend to be smaller at higher amplitudes. Further, exploratory data analysis suggests that a log-logistic distribution may provide a satisfactory description. 9 8

7 ●

● 6 5 4

32 32.5 33 33.5 34 34.5 35 35.5 36 36.5 37 37.5 38 38.5

stress (amplitude)

Figure 3.4: Box plots of steel specimen log failure times at 14 stress levels. 178 3. Bayesian Reliability Analysis of Regression Models

Stress Lifetime (t) 38.5 60 51 83 140 109 106 119 76 68 67 111 57 69 75 122 128 95 87 82 132 38.0 100 90 59 80 128 117 177 98 158 107 125 118 99 186 66 132 97 87 69 109 37.5 199 105 147 113 98 118 182 131 156 78 84 103 89 124 71 65 220 109 93 171 37.0 141 143 98 122 110 132 194 155 104 83 125 165 146 100 318 136 200 201 251 111 36.5 118 273 192 238 105 398 108 182 130 170 181 119 152 199 89 211 324 164 133 121 36.0 173 218 162 288 394 585 295 262 127 151 181 209 141 186 309 192 117 203 198 255 35.5 156 173 125 852 559 442 168 286 261 227 285 253 166 133 309 247 112 202 365 702 35.0 230 169 178 271 129 568 115 280 305 326 1101 285 734 177 493 218 342 431 143 381 34.5 155 397 1063 738 140 364 218 461 174 326 504 374 321 169 426 248 350 348 265 293 34.0 168 397 385 1585 224 987 358 763 610 532 449 498 714 159 326 291 425 146 246 253 33.5 154 305 957 1854 363 457 415 559 767 210 678 332 180 1274 528 254 835 611 482 593 33.0 184 241 273 1842 371 830 683 1306 562 166 981 1867 493 418 2978 1463 2220 312 251 760 32.5 4257 879 799 1388 271 308 2073 227 347 669 1154 393 250 196 548 475 1705 2211 975 2925 32.0 1144 231 523 474 4510 3107 815 6297 1580 605 1786 206 1943 935 283 1336 727 370 1056 413

Table 3.13: The lifetimes of steel specimens tested at 14 different stress levels.

3.5.2.2 Analysis Using LaplacesDemon

Fitting with LaplaceApproximation

The first thing is to provide data as LaplacesDemon needs. For this, the data set given in Table 3.13 can be created in R format as follows.

Data Creation For fitting of lifetimes of steel specimen data with LaplaceApproximation, the 3.5. Regression Analysis of Continuous Models 179

logarithm of lifetime will be the response variable and stress level will be the regressor variable. Since an intercept term will be included, a vector of 1’s is inserted into model matrix X. Thus, J = 2 indicates that, there are two columns in X matrix, first column for intercept term and second column for regressor.

N <- 280 J <- 2 X <- cbind(1,as.matrix(log(steelSpecimen$stress))) y <- log(steelSpecimen$lifetime) mon.names <- c("LP", "sigma") parm.names <- as.parm.names(list(beta=rep(0,J), log.sigma=0)) MyData <- list(N=N, J=J, X=X, mon.names=mon.names, parm.names=parm.names, y=y)

In this case of steel specimens data, all the three parameters including log.sigma are specified in a vector parm.names. The logposterior LP and sigma are included as monitored variables in a vector mon.names. Total number of observations is specified by N, which is 280. Censoring is not included as data are uncensored. However, implementation of censored data can also be made in similar way as discussed in Section 2.4.1, of Chapter 2 . Finally, all these things are combined with object name MyData which returns the data in a list.

Initial Values

The initial value is taken as a starting point for the estimation of a parameter. So, the first two parameters, the beta parameters have been set equal to zero, and log.sigma has been set equal to log(1), which is equal to zero.

Initial.Values <- c(rep(0,J), log(1)) 180 3. Bayesian Reliability Analysis of Regression Models

Model Specification

Fitting of the model with LaplaceApproximation, one must specify a model. For lifetimes of steel specimens data, consider that logarithm of lifetime follows logistic distribution. In this linear regression model with an intercept and one input variable, the model may be specified as

y ∼ Logistic(µ, σ),

and expectation vector µ is an additive, linear function of vector of regression parameters, β, and the model matrix X,

µ = Xβ.

Prior probabilities are specified respectively for regression coefficients, β, and scale parameter, σ,

βj ∼ N(0, 1000), j = 1,...,J

σ ∼ HC(25).

It is obvious that all prior densities defined above are weakly informative. Thus, to specify above defined model create a function called Model as:

Model <- function(parm, Data) { # Parameters beta <- parm[grep("beta", Data$parm.names)] sigma <- exp(parm[grep("log.sigma", Data$parm.names)]) # Log (Prior Densities) beta.prior <- sum(dnormv(beta, 0, 1000, log=TRUE)) sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE) 3.5. Regression Analysis of Continuous Models 181

# Log-Likelihood mu <- tcrossprod(Data$X, t(beta)) LL <- sum(dlogis(Data$y, location=mu, scale=sigma, log=TRUE)) LP <- LL + beta.prior + sigma.prior Modelout <- list(LP=LP, Dev=-2*LL, Monitor=c(LP, sigma), yhat=rlogis(length(mu), mu, sigma), parm=parm) return(Modelout) }

Model Fitting

Above specified model is fitted with LaplaceApproximation and fitted object is assigned to the object Fit. Its results are summarized in the next section.

Fit <- LaplaceApproximation(Model=Model, parm=Initial.Values, Data=MyData, Iterations=1000, Samples=1000, Method="BFGS", sir=TRUE) print(Fit)

Summarizing Output

The relevant summary of results of the fitted regression model is reported in the following tables. Table 3.14 represents the analytic result using Laplace approximation method, and Table 3.15 represents the simulated results using sampling importance resampling method.

Parameter Mode SD LB UB beta[1] 49.08 2.15 44.77 53.38 beta[2] -12.22 0.60 -13.42 -11.01 log.sigma -1.14 0.05 -1.24 -1.04

Table 3.14: Posterior summary of the analytic approximation using the function LaplaceApproximation, which is based on asymptotic approximation theory. 182 3. Bayesian Reliability Analysis of Regression Models

Parameter Mean SD MCSE ESS LB Median UB beta[1] 49.21 2.16 0.01 100000 44.94 49.22 53.45 beta[2] -12.25 0.61 0.00 100000 -13.44 -12.26 -11.06 log.sigma -1.13 0.05 0.00 100000 -1.23 -1.13 -1.03 Deviance 487.83 2.46 0.01 100000 485.02 487.19 494.25 LP -257.62 1.23 0.00 100000 -260.80 -257.30 -256.22 sigma 0.32 0.02 0.00 100000 0.29 0.32 0.36

Table 3.15: Posterior summary of the simulation due to sampling importance resampling method using the same function.

Fitting with LaplacesDemon

In this section, the function LaplacesDemon is used to analyze the same data. This function maximizes the logarithm of unnormalized joint posterior density with MCMC algorithms, and provides samples of the marginal posterior distributions, deviance and other monitored variables.

Model Fitting

For fitting the same model with the function LaplacesDemon by assigning the object name FitDemon, the R codes are as follows. Its summary of results is printed with the function print. set.seed(666) Initial.Values<-as.initial.values(Fit) FitDemon <- LaplacesDemon(Model, Data=MyData, Initial.Values, Covar=Fit$Covar, Iterations=2000, Status=100, Thinning=1, Algorithm="IM", Specs=list(mu=Fit$Summary1[1:length(Initial.Values),1]))

Summarizing Output

The function LaplacesDemon for this regression model, simulates the data from the posterior density using IM algorithm, and summary of results are reported in 3.5. Regression Analysis of Continuous Models 183

Parameter Mean SD MCSE ESS LB Median UB beta[1] 49.11 1.33 0.031 3106 46.51 49.10 51.77 beta[2] -12.23 0.37 0.009 3106 -12.97 -12.22 -11.50 log.sigma -1.14 0.03 0.001 3012 -1.20 -1.14 -1.08 Deviance 485.92 0.91 0.029 1844 484.87 485.67 488.26 LP -256.66 0.45 0.014 1799 -257.80 -256.53 -256.15 sigma 0.32 0.01 0.000 3009 0.30 0.32 0.34

Table 3.16: Posterior summary by the simulation method due to stationary samples using the function LaplacesDemon.

the Table 3.16, which represents the summary of the posterior densities due to stationary samples. The graphical summary of the results can be seen in Figure 3.5.

3.5.2.3 Analysis Using JAGS

Let us analyse the same steel specimens data using JAGS. For this, first we need to create a listed data in R format.

Data Creation

The data set steelSpecemen.txt contains times to failure of 280 steel specimens subjected to cyclic stress loading of various amplitudes. The logarithm of lifetime is the response variable and stress (level) as a regressor.

steelSpecimen <- read.table("steelSpecimen.txt", header=TRUE) n <- nrow(steelSpecimen) y <- log(steelSpecimen$lifetime) stress <- log(steelSpecimen$stress) data.jags <- list(n=n, y=y, stress=stress)

Here, n is the total number of observations, and object data.jags combined each element of data in a list. 184 3. Bayesian Reliability Analysis of Regression Models

Initail Values

In order to use JAGS, the initial values are needed to provide for each parameter to start the chains, and the list of parameters to monitor.

inits <- function(){ list("beta1"=rnorm(1,0,1), "beta2"=rnorm(1,0,1), "sigma"=runif(1))} params <- c("beta1", "beta2", "sigma")

To create initial values in R, the function creates a list that contains one element for each parameter. Each parameter then gets assigned a random draw from a normal distribution for betas and uniform distribution for sigma, as a starting value. This random draw is created using the rnorm and runif functions, respectively. The first argument of this function is the number of draws. If jags command specifies more than one chain, each chain will start at a different random value for each parameter. The params contains vector of parameters to be monitored.

Model Specifications

The model for lifetimes of steel specimens data having the following structure,

yi ∼ Logistic(µi, τ(= 1/σ))

µi = β1 + β2 stressi

In this case, independent normal prior distribution is used with mean zero and large variance 1000 for each β, and the uniform prior distribution for σ with parameters equal to 0 and 100.

βj ∼ N(0, 0.001), j = 1, 2

σ ∼ U(0, 100). 3.5. Regression Analysis of Continuous Models 185

Specification of this model in JAGS is as follows: cat("model{ for(i in 1:n){ y[i] ~ dlogis(mu[i], tau) mu[i]<-beta1+beta2*stress[i] } beta1~dnorm(0,0.001) beta2~dnorm(0,0.001) tau<-pow(sigma,-1) sigma~dunif(0,100) }", file="modelSteel.txt")

Here, y[i] denotes the logarithm of observes response of time to failure for ith observation. These are assumed to be logistically distributed with (location) parameter mu[i] depending on the stress[i] amplitude, and (rate) parameter tau (or scale sigma). The regression parameters beta1 and beta2 having normal priors with mean 0 and variance 100. The prior for sigma is dunif(0,100).

Model Fitting

With these specifications of data, initial values, and model, we can now call jags from R to first compile, initialize, and then run the model for some 700000 iterations for each of 3 chains. The summary of the posterior estimates of the parameters is assigned with object FitJags, and can be printed using the function print or by just typing the object name FitJags on R console. The output is reported in next section.

set.seed(123) FitJags<- jags(data=data.jags, inits=inits, parameters=params, n.chains=3, n.iter=700000, model.file="modelSteel.txt") FitJags 186 3. Bayesian Reliability Analysis of Regression Models

Summarizing Output

Table 3.17 represents the numerical summary of JAGS simulations after being fitted to log-logistic regression model for lifetimes of steel specimens data. The first five columns of numbers give inferences of posterior densities for the model parameters, including mean, standard deviation, and 95% credible interval. The second last column of the output gives information about convergence of the algorithm. If Rhat is less than 1.1 for all parameters, then we judge the algorithm to have approximately converged, in the sense that the parallel chains have mixed well. The final column, n.eff, shows the effective sample size of the simulations.

Parameter Mean SD 2.5% 50% 97.5% Rhat n.eff beta1 49.265 2.318 44.503 49.248 54.080 1.002 1000 beta2 -12.269 0.649 -13.624 -12.263 -10.939 1.003 1000 sigma 0.323 0.016 0.292 0.322 0.357 1.001 3000 deviance 487.887 2.644 484.977 487.136 494.806 1.007 810

Table 3.17: Numerical summary of the posterior densities via simulation method using JAGS.

LaplaceApproximation LaplaceApproximation LaplacesDemon LaplacesDemon jags 0.6 jags 0.15 0.4 0.10 Density Density 0.2 0.05 0.0 0.00 40 45 50 55 −15 −14 −13 −12 −11 −10 β1 β2

LaplaceApproximation LaplaceApproximation 25

8 LaplacesDemon LaplacesDemon jags jags 20 6 15 4 Density Density 10 2 5 0 0

−1.3 −1.2 −1.1 −1.0 0.26 0.28 0.30 0.32 0.34 0.36 0.38 log(σ) σ

Figure 3.5: Plots of posterior densities for the parameters β1, β2, log(σ), and σ of log-logistic model using functions LaplaceApproximation, LaplacesDemon, and jags, respectively. 3.5. Regression Analysis of Continuous Models 187

3.5.3 Generalized Log-Burr Model for Lifetime Data

The log-Burr distribution is a generalization of logistic and extreme value distri- butions, which are two very important reliability models for modeling lifetime data. The log-Burr distribution can be obtained by a parametric location-scale family of distribution given by Equation (2.12) in Chapter 2, to let pdf, cdf, or reliability function include one or more parameters. This distribution is much useful as they include common two parameters lifetime distribution as special cases. Discussion about location-scale family of distributions and log-Burr distribution is made in Section 2.2.6 and 2.2.7, respectively, of Chapter 2, including pdf, cdf and reliability function. In this section, Bayesian approach is used to model an electrical insulating fluid failure time data, given below, with log-Burr distribution using both analytic and simulation tools.

The implementation of log-Burr lifetime model in Bayesian paradigm can be made by specifying the likelihood, prior, and then posterior distribution. For this, the likelihood of log-Burr distribution is given by

n y − µ  y − µ . −k−1 p(y|θ) = p(y|µ, σ; k) = Y exp 1 + exp k , i=1 σ σ where µ = Xβ, as a result, the likelihood will become

!" ! #−k−1 n y − Xβ y − Xβ . p(y|β, σ; k) = Y exp 1 + exp k . i=1 σ σ

The independent prior probability distribution for parameters β and σ are defined as

m βj ∼ N(0, 10 ), j = 1,...,J and m = 3 to 6 and

σ ∼ HC(α = 25). 188 3. Bayesian Reliability Analysis of Regression Models

Thus, the joint posterior distribution by applying Baye’s theorem is given by

p(β, σ|y, X; k) ∝ p(y|β, σ; k) × p(β) × p(σ) !" ! #−k−1 n y − Xβ y − Xβ . ∝ Y exp 1 + exp k i=1 σ σ J 1 1 β2 ! 2α × Y √ exp − j × . m m 2 2 j=1 2π × 10 2 10 π(σ + α )

Consequently, the marginal posterior densities of parameters βj and σ are, respec- tively, Z p(βj|y, X) = p(β, σ|y, X) dβ1 . . . dβj−1dβj+1 . . . dβJ dσ, and Z p(σ|y, X) = p(β, σ|y, X) dβ1 . . . dβJ . which are not in closed form. Therefore, numerical approximation algorithms are needed to approximate integrals of these marginal posterior densities. Thus, to approximate these integrals both analytic and simulation tools are implemented in the following sections.

3.5.4 Implementation For modeling electrical insulating fluid failure time data, as always, the Laplace approximation is implemented for approximating posterior densities of the parame- ters. Moreover, parallel simulation tools are implemented using LaplacesDemon and JAGS software packages. Since data are not censored, therefore, censoring mechanism is not discussed here. However, the methods used for censored data in Section 2.4.3 can easily be extended in regression scenario too.

3.5.4.1 Electrical Insulating Fluid Failure Time Data

Let us introduce a failure times data set of electrical insulating fluid for fitting of regression model, which is taken from Lawless (2003). The same data set 3.5. Regression Analysis of Continuous Models 189

Voltage Level (kV) ni Breakdown Times 26 3 5.79, 1579.52, 2323.7 28 5 68.85, 426.07, 110.29, 108.29, 1067.6 30 11 17.05, 22.66, 21.02, 175.88, 139.07, 144.12, 20.46, 43.40, 194.90, 47.30, 7.74 32 15 0.40, 82.85, 9.88, 89.29, 215.10, 2.75, 0.79, 15.93, 3.91, 0.27, 0.69, 100.58, 27.80, 13.95, 53.24 34 19 0.96, 4.15, 0.19, 0.78, 8.01, 31.75, 7.35, 6.50, 8.27, 33.91, 32.52, 3.16, 4.85, 2.78, 4.67, 1.31, 12.06, 36.71, 72.89 36 15 1.97, 0.59, 2.58, 1.69, 2.71, 25.50, 0.35, 0.99, 3.99, 3.67, 2.07, 0.96, 5.35, 2.90, 13.77 38 8 0.47, 0.73, 1.40, 0.74, 0.39, 1.13, 0.09, 238

Table 3.18: Times to breakdown (in minutes) at each of seven voltage levels.

is discussed in Nelson (1972). Nelson (1972) described the results of a life test experiment in which specimen of a type of electrical insulating fluid were subjected to a constant voltage stress. The length of time until each specimen failed or broke down was observed. Table 3.18 gives results for seven groups of specimen, tested as voltage ranging from 26 to 38 kilovolts (kV).

3.5.4.2 Analysis Using LaplacesDemon

Fitting with LaplaceApproximation

To fit the model with LaplaceApproximation, first we create a data in R as LaplaceApproximation needs.

Data Creation

For fitting of failure times of electrical insulating fluid data with LaplaceApproxima- tion, the logarithm of breakdownTime is the response variable and voltageLevel is the regressor variable. Since an intercept term will be included, a vector of 1’s is inserted into the model matrix X. Thus, J = 2 indicates that, there are two columns of input variables, first column for intercept term and second column for regressor. 190 3. Bayesian Reliability Analysis of Regression Models

N <- 76 J <- 2 k <- 1 # k=1 for logistic and k=30 for Weibull model X <- cbind(1, as.matrix(log(insulatingFluid$voltageLevel))) y <- log(insulatingFluid$breakdownTime) mon.names <- c("LP", "sigma") parm.names <- as.parm.names(list(beta=rep(0, J), log.sigma=0)) MyData <- list(N=N, J=J, X=X, k=k, mon.names=mon.names, parm.names=parm.names, y=y)

In this case of electrical insulating fluid data, all the three parameters including log.sigma are specified in a vector parm.names. The logposterior LP and sigma are included as monitored variables in vector mon.names. Total number of observations is specified by N, which is 76. Thus, all these things are combined with object name MyData which returns the data in a list.

Initail Values

The initial value is taken as a starting point for the estimation of a parameter. So the first two parameters, the beta parameters have been set equal to zero, and log.sigma has been set equal to log(1), which is zero.

Initial.Values <- c(rep(0,J), log(1))

Model Specification

For fitting of the regression model with LaplaceApproximation, one must specify a model. Thus, for failure times of electrical insulating fluid data, consider that logarithm of breakdownTime follows log-Burr distribution. In this Bayesian linear regression with an intercept and one regressor variable the model is specified as

y ∼ Log-Burr(µ, σ; k), 3.5. Regression Analysis of Continuous Models 191

and expectation vector µ is an additive, linear function of vector of regression parameters, β, and the model matrix X.

µ = Xβ.

Prior probabilities are specified respectively for regression coefficients, β, and scale parameter, σ,

βj ∼ N(0, 1000), j = 1,...,J

σ ∼ HC(25).

It is obvious that all prior densities defined above are weakly informative. Thus, to specify above defined model, one must create an object called Model as:

Model <- function(parm, Data) { beta <- parm[grep("beta", Data$parm.names)] sigma <- exp(parm[grep("sigma", Data$parm.names)]) beta.prior <- sum(dnorm(beta, 0, 1000, log=TRUE)) sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE) mu <- tcrossprod(Data$X, t(beta)) z <- (y-mu)/sigma llf <- -log(sigma) + z - (k+1)*log(1+exp(z)/k) LL <- sum(llf) LP <- LL + beta.prior + sigma.prior Modelout <- list(LP=LP, Dev=-2*LL, Monitor=c(LP, sigma), yhat=mu, parm=parm) return(Modelout) } 192 3. Bayesian Reliability Analysis of Regression Models

Model Fitting

Above model is fitted with the function LaplaceApproximation, and its output is assigned to object Fit. Its summary of results is given in the next section.

Fit <- LaplaceApproximation(Model=Model, parm=Initial.Values, Data=MyData, Iterations=1000, Samples=1000, Method="BFGS", sir=TRUE) print(Fit)

Summarizing Output

The relevant summary of results of the fitted regression model using the function LaplaceApproximation, can easily be seen in these two tables. Table 3.19 rep- resents the analytic result using Laplace approximation method, and Table 3.20 represents the simulated results using sampling importance resampling method.

Log-logistic model (k=1) Parameter Mode SD LB UB beta[1] 62.90 6.11 50.69 75.12 beta[2] -17.35 1.74 -20.84 -13.87 log.sigma -0.16 0.10 -0.35 0.04 Weibull model (k=30) Parameter Mode SD LB UB beta[1] 64.87 5.62 53.62 76.11 beta[2] -17.74 1.61 -20.96 -14.53 log.sigma 0.23 0.09 0.06 0.41

Table 3.19: Posterior summary of the analytic approximation using the function LaplaceApproximation, which is based an asymptotic approximation theory.

Fitting with LaplacesDemon

In this section, the function LaplcesDemon is used to analyze the same data, that is, electrical insulating fluid failure times data. This function maximizes the logarithm of unnormalized joint posterior density with MCMC algorithms, 3.5. Regression Analysis of Continuous Models 193

Log-logistic model (k=1) Parameter Mean SD MCSE ESS LB Median UB beta[1] 62.73 6.34 0.06 10000 50.08 62.84 75.23 beta[2] -17.30 1.81 0.02 10000 -20.88 -17.33 -13.66 log.sigma -0.14 0.10 0.00 10000 -0.32 -0.14 0.06 Deviance 283.20 2.46 0.02 10000 280.38 282.56 289.75 LP -160.93 1.23 0.01 10000 -164.20 -160.61 -159.52 sigma 0.88 0.09 0.00 10000 0.72 0.87 1.06 Weibull model (k=30) Parameter Mean SD MCSE ESS LB Median UB beta[1] 65.18 5.86 0.06 10000 53.79 65.20 76.77 beta[2] -17.83 1.67 0.02 10000 -21.16 -17.83 -14.58 log.sigma 0.25 0.09 0.00 10000 0.08 0.25 0.43 Deviance 278.35 2.40 0.02 10000 275.57 277.72 284.52 LP -158.50 1.20 0.01 10000 -161.59 -158.19 -157.11 sigma 1.29 0.11 0.00 10000 1.09 1.29 1.54

Table 3.20: Posterior summary matrices of the simulation due to sampling importance resampling algorithm using the same function.

and provides samples of the marginal posterior distributions, deviance and other monitored variables.

Model Fitting

The same model is fitted with the function LaplacesDemon, and its output is assigned to the object FitDemon. Its summary of results is printed using the function print, and reported in next section.

set.seed(666) Initial.Values <- as.initial.values(Fit) FitDemon <- LaplacesDemon(Model, Data=MyData, Initial.Values, Covar=Fit$Covar, Iterations=2000, Status=100, Thinning=1, Algorithm="IM", Specs=list(mu=Fit$Summary1[1:length(Initial.Values),1])) print(FitDemon) 194 3. Bayesian Reliability Analysis of Regression Models

Summarizing Output

The function LaplacesDemon for this regression model, simulates the data from the posterior density using IM algorithm, and summary of results are reported in Table 3.21, which represents the posterior summary due to stationary samples.

Log-logistic model (k=1) Parameter Mean SD MCSE ESS LB Median UB beta[1] 62.98 6.34 0.29 420.00 50.54 62.82 75.45 beta[2] -17.38 1.81 0.08 420.00 -20.88 -17.35 -13.85 log.sigma -0.13 0.09 0.00 420.00 -0.31 -0.13 0.05 Deviance 283.20 2.59 0.14 364.17 280.33 282.65 289.18 LP -160.93 1.30 0.06 364.16 -163.92 -160.65 -159.49 sigma 0.88 0.08 0.00 420.00 0.73 0.87 1.05

Weibull model (k=30) Parameter Mean SD MCSE ESS LB Median UB beta[1] 65.24 5.56 0.13 1586.43 54.29 65.26 76.01 beta[2] -17.85 1.59 0.04 1586.97 -20.93 -17.86 -14.70 log.sigma 0.25 0.09 0.00 1430.58 0.08 0.25 0.44 Deviance 278.37 2.37 0.06 1800.00 275.60 277.78 284.39 LP -158.52 1.18 0.03 1800.00 -161.52 -158.22 -157.13 sigma 1.29 0.12 0.00 1427.11 1.08 1.29 1.55

Table 3.21: Posterior summaries of simulation due to stationary samples using the IM algorithm implemented in the function LaplacesDemon. Both the β parameters are statistically significant for k = 1 and k = 30. Moreover, for large value of k (i.e., 50 and 90), result remain same as k = 30. It may also be noted that deviance for k = 30 is smaller than k = 1, which means that Weibull fits better than that of log-logistic.

3.5.4.3 Analysis Using JAGS

To do the analysis of electrical insulating fluid failure time data in JAGS, let us create the data in R.

Data Creation

The data set insulatingFluid.txt contains failure times or breakdown times of insulating fluid over a constant voltage stress. The logarithm of breakdownTime 3.5. Regression Analysis of Continuous Models 195

is the response variable and voltageLevel is the only regressor variable. Total number of observations is 76.

insulatingFluid <- read.table("insulatingFluid.txt", header=TRUE) n <- nrow(insulatingFluid) y <- log(insulatingFluid$breakdownTime) voltageLevel <- log(insulatingFluid$voltageLevel) zeros <- rep(0, n) C <- 1000 k <- 1 data.jags<-list(n=n, y=y, voltageLevel=voltageLevel, zeros=zeros, C=C, k=k)

Initial Values

The initial values for the parameters β and σ, and the list of parameters to be monitored are specified as

inits <- function(){ list(beta1=rnorm(1),beta2=rnorm(1),sigma=runif(1)) }

Model Specification

The model for breakdown times of insulating fluid data is specified as

yi ∼ Log-Burr(µi, σ; k),

and the linear function of vector of regression parameters, β, with expectation vector µi is given by

µi = β1 + β2voltageLeveli i = 1, 2, . . . , n. 196 3. Bayesian Reliability Analysis of Regression Models

The prior probability distributions for parameters β1, β2, and σ are specified as

βj ∼ N(0, 0.001), j = 1, 2

σ ∼ U(0, 100).

The specification of this model in JAGS using Poisson zeros trick method is

cat("model{ for (i in 1:n) { zeros[i]~dpois(phi[i]) phi[i] <- -L[i] + C L[i] <- -log(sigma)+((y[i]-mu[i])/sigma) -(k+1)*log(1+exp((y[i]-mu[i])/sigma)/k) mu[i] <- beta1+beta2*voltageLevel[i] } # Priors beta1~dnorm(0,0.001) beta2~dnorm(0,0.001) sigma~dunif(0,100) }", file="modelFluid.txt")

Model Fitting

The above defined model is fitted with the function jags as

set.seed(999) FitJags<- jags(data=data.jags, inits=inits, n.iter=150000, n.chain=3, parameters = c("beta1","beta2","sigma"), model.file="modelFluid.txt",) FitJags 3.6. Discussion and Conclusion 197

Summarizing Output

Table 3.22 shows the summary of posterior densities of log-Burr model after being fitted with JAGS for different values of k. It is observed that these posterior infer- ences are very close to the inferences obtained from SIR and IM algorithms using LaplaceApproximation and LaplacesDemon, respectively. Both beta parameters are also statistically significant in each case. The values of Rhat indicate that chains are mixed well, and hence algorithm is approximately converged. The plots of posterior densities of each parameter and graphical comparison of three methods using LaplaceApproximation, LaplacesDemon, and jags can be seen in Figure 3.6.

Log-logistic model (k=1) Parameter Mean SD 2.5% 50% 97.5% Rhat n.eff beta1 59.096 6.223 47.664 58.806 71.787 1.002 2500 beta2 -16.265 1.775 -19.858 -16.182 -13.001 1.002 2000 sigma 0.888 0.089 0.733 0.881 1.075 1.001 2600 deviance 283.566 2.599 280.468 282.984 290.464 1.000 1

Weibull model (k=30) Parameter Mean SD 2.5% 50% 97.5% Rhat n.eff beta1 63.305 5.588 53.301 63.076 75.332 1.006 2800 beta2 -17.297 1.597 -20.723 -17.242 -14.430 1.006 1900 sigma 1.303 0.121 1.095 1.295 1.568 1.001 3000 deviance 278.544 2.623 275.554 277.889 285.341 1.000 1

Table 3.22: Summary of the posterior densities via Metropolis-within-Gibbs simulation algorithm using JAGS.

3.6 Discussion and Conclusion

In this chapter, Bayesian methods are applied to model the reliability data with covariates. The logistic, Poisson, and failure time regression models are used to model the reliability data in Bayesian paradigm. Appropriate link functions are used 198 3. Bayesian Reliability Analysis of Regression Models

Log−logistic (k=1) Log−logistic (k=1) Log−logistic (k=1)

LaplaceApproxim LaplaceApproxim LaplaceApproxim

0.06 LaplacesDemon LaplacesDemon LaplacesDemon jags jags jags 0.20 4 0.05 0.15 3 0.04 0.03 2 0.10 Density Density Density 0.02 1 0.05 0.01 0 0.00 0.00 30 40 50 60 70 80 90 −20 −15 −10 0.6 0.8 1.0 1.2 β1 β2 σ

Weibull (k=30) Weibull (k=30) Weibull (k=30)

LaplaceApproxim LaplaceApproxim 3.5 LaplaceApproxim

LaplacesDemon 0.25 LaplacesDemon LaplacesDemon jags jags jags 3.0 0.06 0.20 2.5 2.0 0.15 0.04 Density Density Density 1.5 0.10 1.0 0.02 0.05 0.5 0.0 0.00 0.00 40 50 60 70 80 90 −22 −18 −14 −10 1.0 1.2 1.4 1.6 1.8 β1 β2 σ

Figure 3.6: Plots of posterior densities of the parameters β1, β2, and σ for the posterior densities of logistic and Weibull models using the functions LaplaceApproximation, LaplacesDemon, and jags. It is evident from these plots that the posterior densities obtianed from three different methods are very close to each other.

to relate the model parameter and covariates for linear regression and generalized linear regression models. The same analytic approximation and simulation methods are implemented using the same software packages. From summary tables, it is observed that, the results obtained from Laplace approximation and simulation methods are very close to each other. Plot of the posterior densities shows the excellency of these approximations. Moreover, it is evident from the summaries of results that the Bayesian approach based on weakly informative priors is simpler to implement than the frequentist approach. The wealth of information provided in these numeric and graphic summaries are not possible in classical framework. Furthermore, the more complex models, that is, hierarchical models are discussed in next chapter. 4

Bayesian Reliability Analysis of Hierarchical Models

Contents 4.1 Introduction...... 201

4.2 Hierarchical Modeling of Binomial Data...... 202

4.2.1 Implementation...... 203

4.3 Hierarchical Modeling of Poisson Data...... 214

4.3.1 Implementation...... 215

4.4 Hierarchical Modeling of Lifetime Data...... 228

4.4.1 Implementation...... 229

4.5 Discussion and Conclusion...... 240

4.1. Introduction 201

4.1 Introduction

Many statistical applications require the specification of models that are more complex or involves multiple parameters, that can be regarded as related or con- nected in some way by the structure of problem, implying that a joint probability models for these parameters should reflect the dependence among them. The Bayesian paradigm provides a logical framework for dealing with such types of model, known as hierarchical models (Gelman et al., 2004). In Bayesian statistics, hierarchical modeling is a fundamental concept. The basic idea of hierarchical modeling is that parameters are endowed with probability distributions which may themselves introduce new parameters called hyperparameters, and this construction recurses. The usual decoration in hierarchical modeling is that of the conditionally independent hierarchy, in which a set of parameters are joined by making their distributions depend on a shared underlying parameter (Teh and Jordan, 2010). Moreover, hierarchies help to unify statistics, providing a Bayesian interpretation of frequentist concepts such as shrinkage and random effects, and also provide ways to specify non-standard distributional forms obtained as integrals over underlying parameters. These advantages are well appreciated in the world of parametric modeling, and helps in understanding multiparameter problems and also play an important role in developing computational strategies.

Generally, hierarchical models are more flexible than are the typical nonhierar- chical or fixed effects models, since a more complicated structure is accommodated in the model. For this reason, they describe the data better, especially when the sample size is large. On the other hand, a complicated hierarchical formulation may lead to a model that overfits the data; that is, it may describes the current dataset better but might not allow for enough uncertainty in order to predict sufficiently future observations. Robert(2007, sec. 10.2.2) provides a series of justifications and advantages for using hierarchical models, including the fact that the prior is 202 4. Bayesian Reliability Analysis of Hierarchical Models

decomposed into two main parts: one referring to structural information or assump- tions concerning the model and one referring to the actual subjective information of the model parameters. Another advantage, according to the same author, is that hierarchical structure leads to a more robust analysis, reducing subjectivism since posterior results are averaged across different prior choices of parameters of interest. Finally, the hierarchical structure simplifies both the interpretation and the computation of the model since the corresponding posterior distribution is simplified, resulting in conditional distributions of simpler form (Ntzoufras, 2009).

In this chapter, an attempt has been made to fit hierarchical models in Bayesian paradigm, focusing on different reliability models like binomial, Poisson, Weibull, Log-normal, and Log-logistic, in which multilevel parameters are treated hierarchi- cally, and comparisons are made among these models. In many practical situations, it has been seen that the multilevel modeling of reliability data in classical theory is not an easy task, whereas it can be much simpler when dealing in a Bayesian paradigm. Consequently, for the purpose of hierarchical Bayesian analysis of these reliability models, real data sets are used to illustrate the implementations in R, LaplacesDemon and JAGS. Thus, the hierarchical Bayesian analysis of reliability models has been made with the objectives: first to define a Bayesian model, that is, specification of likelihood and prior distribution, second to write down the R, LaplacesDemon and JAGS code for approximating posterior densities using analytic and simulation tools, and finally to illustrate numeric as well as graphic summaries of the posterior densities.

4.2 Hierarchical Modeling of Binomial Data

A popular hierarchical model formulation is that for the so called generalized linear (mixed) models, which is based on the GLM formulation also having a hierarchical structure by including random coefficients in the usual linear predictor. 4.2. Hierarchical Modeling of Binomial Data 203

A hierarchical logistic regression model which is a special case of generalized regression model is considered here for a repeated binomial response. Hence, the

hierarchical logistic regression model for a series of repeated binomial responses yit for t = 1, 2,...,T over the same individuals i = 1, 2, . . . , n can be formulated as

yit ∼ Binomial(N, pit) ! k pit X log = β0t + βjtXijt + εi, 1 − pit j=1

2 εi ∼ N(0, σ ), where β0t are fixed constants depending on time sequence t, Xijt are time-varying covariates for individual i and time sequence j, βjt are their corresponding

coefficients, and εi are individual random effects capturing within variability. The structure of the model formulated above can be slightly modified depending on the data and the problem at hand, for example, covariates or their effects may not depend on the time index t.

4.2.1 Implementation

For the implementation of hierarchial Bayesian analysis of logistic regression model, two important techniques, that is, analytic approximation and simulation methods are implemented using lme4 package of R and JAGS, respectively. The lme4 (Bates et al., 2014) is the canonical package for implementing multilevel/mixed effect models in R. Package lme4 provides functions for fitting and analyzing linear, generalized linear, and non-linear mixed models, using approximate methods (Gelman and Hill, 2007). Parallel simulation methods are implemented using JAGS, which generates MCMC samples from posterior distribution of the parameters of a Bayesian model. Thus, for hierarchial Bayesian modeling of logistic regression, a real data set with binomial response is taken from Poloski and Sullivan(1980). The same data are also discussed in Hamada et al.(2008). 204 4. Bayesian Reliability Analysis of Hierarchical Models

4.2.1.1 Emergency Diesel Generators Demand Data

Emergency Diesel Generators (EDGs) provide backup power during external power outages at commercial nuclear power plants. To ensure safety and to control the risk of severe core damage during station blackouts, EDGs must be sufficiently reliable. Poloski and Sullivan(1980) presents EDG failure to start on demand data at U.S. commercial nuclear power plants. The weekly test data are derived from Licensee Event Reports, mandated by the U.S. Nuclear Regulatory Commission, from January 1, 1976, to December 31, 1978. Table 4.1 presents the combined annual number of demands and failures for 1976–1978 by plant and (coded) nuclear steam supply system (NSSS) vendor for 58 nuclear power plants. The table also shows the date that each plant first attained criticality. The column of Days is created using the function difftime of R.

S.N. Plant NSSS Days Year Failures Demands 1 Arkansas Nuclear One 1 A 878 1976 1 104 2 Arkansas Nuclear One 1 A 1243 1977 1 104 3 Arkansas Nuclear One 1 A 1608 1978 0 104 4 Crystal River 3 A 351 1977 4 100 5 Crystal River 3 A 716 1978 2 104 6 Davis-Besse A 112 1977 1 32 7 Davis-Besse A 477 1978 2 104 8 Rancho Seco A 837 1976 2 104 9 Rancho Seco A 1202 1977 1 104 10 Rancho Seco A 1567 1978 0 104 11 Three Mile Island 1 A 940 1976 1 104 12 Three Mile Island 1 A 1305 1977 0 104 13 Three Mile Island 1 A 1670 1978 2 104 14 Three Mile Island 2 A 278 1978 2 80 15 Arkansas Nuclear One 2 B 26 1978 0 8 ...... 161 Zion 2 D 1103 1976 0 156 162 Zion 2 D 1468 1977 0 156 163 Zion 2 D 1833 1978 0 156

Table 4.1: EDG failure to start and demand data during 1976–1978. 4.2. Hierarchical Modeling of Binomial Data 205

4.2.1.2 Analysis using R

Our starting point for fitting hierarchical logistic regression model for EDGs data is glmer, a generic function of lme4 package for R, which fits generalized linear model with varying coefficients using point estimation of variance parameters. Two other important functions of lme4 are lmer which stands for “linear mixed effects in R” and is used to fit linear mixed models (but the function actually works for generalized linear mixed models as well), and nlmer which stands for “non-linear mixed effects in R” and is used to fit non-linear mixed models. These functions are described by Bates(2005a,b), continuing on earlier works of Pinheiro and Bates(2000).

Here, Our problem is to fit hierarchical logistic regression model, that is a generalized linear mixed model, for this we call the function glmer, which incorporates both fixed-effects and random-effects parameters in a linear predictor, via maximum likelihood. The linear predictor is related to the conditional mean of the response through the inverse link function. Moreover, the expression for likelihood of a mixed-effects model is an integral over the random effects space. For a linear mixed model (LMM), this integral can be evaluated exactly, but for a generalized linear mixed-effects model (GLMM), the integral must be approximated. The most reliable approximation for GLMMs with a single grouping factor for a random effects is adaptive Gauss-Hermite quadrature (Bates et al., 2014). Thus, before fitting the model, just take a look into the function and its important arguments.

glmer(formula, data=NULL, family=gaussian, control=glmerControl(), start=NULL, verbose=0L, nAGQ=1L, subset, weights, na.action, offset, contrasts=NULL, mustart, etastart, devFunOnly=FALSE,...)

The first argument formula defines a two-sided linear formula object describing both the fixed-effects and random-effects part of the model, with the response on 206 4. Bayesian Reliability Analysis of Hierarchical Models

the left of a ∼ (tilde) operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars (“|”) separating expressions for design matrices from grouping factors. The argument data requires a data frame containing the variables named in formula. The family argument requires a description of the error distribution (default gaussian) and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. The gaussian family accepts the links identity (default), log and inverse; the binomial family the links logit (default), probit, cauchit, (corresponding to logistic, normal and Cauchy cdfs, respectively) and cloglog (complementary log-log); and the poisson family the links log (default), identity, and sqrt. The nAGQ argument controls the number of nodes in the quadrature formula. A model with a single, scalar random-effects term could reasonably use up to 25 quadrature points per scalar integral. Integer scalar via nAGQ defines the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. The default approximation method is the Laplace approximation, corresponding to the nAGQ=1. More detail about the function and its arguments can be found in Bates et al.(2014). The function glmer is a good way to get quick approximate estimates using default Laplace approximation method before fitting the model with JAGS.

Data Creation

Let us begin with the creation of data as lme4 needs, and assemble it in R as

EDGfailure <- read.table("EDGfailure.txt", header=TRUE) NSSS <- factor(EDGfailure$NSSS) Plant <- as.integer(factor(EDGfailure$Plant)) Days <- as.vector(EDGfailure$Days) 4.2. Hierarchical Modeling of Binomial Data 207

Model Specification

In hierarchical modeling for EDG demand data using logistic regression model, the

number of failures yi for the 163 demand failure data given in Table 4.1 is assumed

to follow binomial distribution with failure rate θi, (i = 1, 2,..., 163),

yi ∼ Binomial(ni, θi)

where ni is the number of demands for ith observation. For fitting with glmer, the

default logit link function is used to relate the EDG demand failure rate θi with time measured in days.

logit(θi) = µ + βDaysi + γNSSSi + Plantindi

where on the logit(θi) scale, µ is the overall mean effect, and Daysi represents linear fixed effect of time. Since the plant effects are considered random, therefore, it is assumed that the plant effects are conditionally independent and follow normal distribution with mean 0 and variance σ2, which will be estimated through hyperprior. Notationally,

2 Plantj ∼ N(0, σ ) j = 1, 2,..., 58.

Finally, to complete the Bayesian model, the following independent and weakly informative prior distributions are considered.

µ ∼ N(0, 100000),

β ∼ N(0, 100000),

γ ∼ N(0, 100000), and

σ ∼ U(0, 100). 208 4. Bayesian Reliability Analysis of Hierarchical Models

Model Fitting

An approximate version of the above defined hierarchical model is fitted with the function glmer, which approximates the posterior densities using Laplace approximation method. Its results are assigned to the object Fit. The function display (Gelman and Su, 2014) with argument detail=TRUE is used to print the results, which are reported in the next section.

Fit <- glmer(cbind(Failures, Demands-Failures)~Days+NSSS+(1|Plant), family=binomial(link="logit"), nAGQ=1) display(Fit, detail=TRUE)

Summarizing Output

The function display prints a clear print out of the fitted model with glmer focusing on the most pertinent pieces of information, which includes coefficients and their standard errors, p-values or z-values, the sample size, number of predictors, and residual standard deviation. The output is reported as follows.

coef.est coef.se z value Pr(>|z|) (Intercept) -4.2809 0.5107 -8.3821 0.0000 Days -0.0001 0.0001 -0.4767 0.6336 NSSSB -0.1172 0.6735 -0.1740 0.8618 NSSSC -0.3018 0.6089 -0.4957 0.6201 NSSSD -1.3786 0.5934 -2.3231 0.0202 Error terms: Groups Name Std.Dev. Plant (Intercept) 1.0561 Residual 1.0000 number of obs: 163, groups: Plant, 58 AIC = 463.2, DIC = 451.2, deviance = 451.2 4.2. Hierarchical Modeling of Binomial Data 209

The top of this output shows the inference about the intercept, the coefficients for Days and NSSS, and their standard errors. It is observed that, only the effect of NSSS vendor D relative to vendor A is statistically significant as p-value associated with NSSSD is 0.0202. The bottom part gives the estimated variations 1.0561 and 1.000 for Plant and Residual, respectively . Finally, it can also be seen that the model was fit to 163 demands within 58 plants.

4.2.1.3 Analysis Using JAGS

To make more precise inference, we fit the model using JAGS, because the approxi- mate inference provided by glmer, which does not fully account for uncertainty in the estimated variance parameters, is not very much reliable. It is still useful as a starting point, however, (Gelman and Hill, 2007) recommend performing the quick fit if possible before getting the more elaborate inference. In some other settings, it will be difficult to get JAGS to run successfully and we simply use the inference from glmer.

Data Creation

Let us begin with the creation of data in a list statement for fitting the hierarchical logistic regression model using JAGS. n <-nrow(EDGfailure) J <- 4; K <- 58 Demands <- EDGfailure$Demands Days <- EDGfailure$Days NSSS <- as.integer(factor(EDGfailure$NSSS)) Plant <- as.integer(factor(EDGfailure$Plant)) y <- EDGfailure$Failures data.jags<-list(n=n, J=J, K=K, Demands=Demands, Days=Days, NSSS=NSSS, Plant=Plant, y=y) 210 4. Bayesian Reliability Analysis of Hierarchical Models

Initial Values

The jags function takes an inits argument, which is a function that may be written for creating starting points. Therefore, it is convenient to specify initial values as a function using random numbers, so that running it for several chains will automatically give different starting points. Typically, we start parameters from random N(0, 1) distributions, unless they are restricted to be positive, in which case we typically use random numbers that are uniformly distributed between 0 and 1, that is, U(0, 1). In this case, the initial values for the parameters µ and β are specified as

inits <- function(){list(mu=rnorm(1), beta=rnorm(1))}

Also, the parameters which should be monitored are

param <- c("mu","beta","alpha","gamma","sigma")

Model Specification

For modeling EDG demand in JAGS, following code are written for hierarchical logistic model defined in previous section.

cat("model{ for(i in 1:n){ y[i]~ dbin(theta[i],Demands[i]) cloglog(theta[i])<-mu+beta*Days[i]+gamma[NSSS[i]]+alpha[Plant[i]] # logit(theta[i])<-mu+beta*Days[i]+gamma[NSSS[i]]+alpha[Plant[i]] } ## Priors mu~dnorm(0,0.000001) beta~dnorm(0,0.000001) 4.2. Hierarchical Modeling of Binomial Data 211

gamma[1] <- 0 for(j in 2:J){ gamma[j]~dnorm(0,0.000001)} for(k in 1:K){ alpha[k]~dnorm(0,tau)} ## Hyperprior tau <- 1/pow(sigma,2) sigma~dunif(0,100)}", file="modelEDG.txt")

The logistic regression part of this model is at the beginning. The data distribution is the binomial distribution dbin, where Demands[i] represents the number of demands for the ith data and theta[i] is the failure rate for ith observation.

Further, to relate the failure rate θi with covariate days, our choice of link function is complementry log-log (cloglog) instead of logit or probit link, because in this case, it is observed that the speed of convergence with complementry log-log is faster and provide more precise results as compared to logit or probit link functions. Furthermore, it is needed to define random effects and the prior distribution. For

2 the random effects, the normal distribution with αk ∼ N(0, τ = 1/σ ) is used whose hyperparameter σ is estimated. Finally, the weakly informative prior distributions are assigned for the model parameters µ, β, and γ, and a uniform prior U(0, 100) is used for hyperparameter σ.

Model Fitting

After setting up all the things data, initial values, and model in R environment as JAGS needs, we call JAGS from R to run the model for 50000 iterations for each of 3 chains.

FitJags <- jags(data=data.jags, inits=inits, parameters=param, model.file="modelEDG.txt", n.chain=3, n.iter=50000) print(FitJags) 212 4. Bayesian Reliability Analysis of Hierarchical Models

Summarizing Output

Posterior summaries of model parameters, using a sample of 50000 iteration for each of three chains are reported in Table 4.2. The hierarchial model is fitted with JAGS using complementary log-log (cloglog) link function. The convergence diagnostic on the basis of Rhat values suggests a good convergence and well mixing of chains.

Parameter Mean SD 2.5% 50% 97.5% Rhat n.eff alpha[1] -0.6638 0.7260 -2.1152 -0.6348 0.6878 1.00 1200 alpha[2] -0.2231 1.1203 -2.4372 -0.2184 1.8159 1.00 1100 alpha[3] 2.5458 0.4620 1.6573 2.5393 3.4856 1.00 1600 alpha[4] 2.9172 0.5852 1.7806 2.9018 4.1051 1.00 3000 alpha[5] -0.8931 0.6403 -2.2513 -0.8593 0.2873 1.00 3000 alpha[6] 0.1095 0.4987 -0.8835 0.1151 1.0839 1.00 3000 alpha[7] 0.6060 0.6132 -0.5877 0.6003 1.8470 1.00 3000 alpha[8] 0.3018 0.6557 -0.9972 0.2842 1.5642 1.00 1600 alpha[9] -0.3928 0.6648 -1.7871 -0.3692 0.8358 1.00 3000 alpha[10] 0.6123 0.6303 -0.6669 0.6234 1.8417 1.01 410 alpha[11] 0.2366 0.7034 -1.2466 0.2593 1.5300 1.01 430 alpha[12] -0.1383 0.8154 -1.8400 -0.1295 1.3528 1.00 3000 alpha[13] 1.2140 0.7961 -0.3573 1.2234 2.7010 1.00 3000 alpha[14] 1.7080 0.7869 0.2011 1.6847 3.3476 1.00 3000 alpha[15] 1.7930 0.4074 0.9871 1.7823 2.6022 1.00 3000 alpha[16] 0.2967 0.5540 -0.8301 0.3037 1.3640 1.00 1400 alpha[17] -0.0635 0.5968 -1.3088 -0.0446 1.0589 1.00 1600 alpha[18] 0.5165 0.4874 -0.4609 0.5055 1.4851 1.00 3000 ...... alpha[58] -1.0371 0.9179 -2.9761 -0.9544 0.5986 1.00 1300 beta -0.0001 0.0001 -0.0004 -0.0001 0.0002 1.00 2900 gamma[1] 0.0000 0.0000 0.0000 0.0000 0.0000 1.00 1 gamma[2] -0.1834 0.7482 -1.7177 -0.1775 1.2536 1.00 1600 gamma[3] -0.3481 0.6614 -1.6812 -0.3471 0.9114 1.00 1000 gamma[4] -1.4733 0.6457 -2.7678 -1.4558 -0.2383 1.00 1100 mu -4.2274 0.5473 -5.2633 -4.2325 -3.1454 1.01 480 sigma 1.2152 0.1986 0.8779 1.1990 1.6625 1.00 2600 deviance 378.2241 10.3602 359.6193 377.4908 400.3791 1.00 3000

Table 4.2: Summary of posterior densities of model parameters for EDG demand data due to Metropolis-within-Gibbs algorithm using the JAGS. 4.2. Hierarchical Modeling of Binomial Data 213

Bugs model at "modelEDG.txt", fit using jags, 3 chains, each with 50000 iterations (first 25000 discarded)

80% interval for each chain R−hat medians and 80% intervals −200 0 200 400 1 1.5 2+ 5 ● alpha[1] ● * ● ● ● ● ● ● [2] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● [3] alpha 0 ● ● ● ● ● ● ● ● ● ● * ● ● ● ● [4] ● [5] ● [6] ● −5 1 2 3 4 5 6 7 8 910 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 [7] ● [8] ● [9] ● 2e−04 [10] ● [11] ● 0 [12] ● beta ● [13] ● −2e−04 [14] ● [15] ● −4e−04 [16] ● [17] ● ● [18] 400 [19] ● [20] ● 390 ● [21] deviance 380 ● [22] ● 370 [23] ● [24] ● 360 [25] ● [26] ● [27] ● 2 [28] ● ● [29] ● 0 ● ● [30] ● gamma ● [31] ● −2 [32] ● ● [33] −4 1 2 3 4 [34] ● [35] ● [36] ● [37] ● −3 [38] ● [39] ● mu −4 ● [40] ● ●

● beta −5 deviance ● gamma[1] ● [2] ● 1.6 [3] ● [4] ● 1.4 sigma 1.2 ● mu ● 1 −200 0 200 400 1 1.5 2+ 0.8 * array truncated for lack of space

Figure 4.1: Graphical summary of JAGS simulations after being fitted to the hierarchical logistic regression model for EDG demand data. Rhat is near one for all parameters indicating good convergence, and right side shows the posterior inference for each parameter and the deviance.

Table 4.2 suggests that, the posterior density of beta is centered about 0, which means the time has almost no effect on the demand failure rates. It is noted that, only the vendor D relative to vendor A is concentrated below zero, that is, all its quantiles are negative, indicating that its effect is statistically significant. Moreover, the posterior estimate of sigma is 1.215 ± 0.199 and statistically significant too, which clearly indicates there is a plant effect. In other words, different plants have different EDG demand failure rates. Graphical summary of these results can be seen in Figure 4.1. The density plots of the model parameters are depicted in Figure 4.2. 214 4. Bayesian Reliability Analysis of Hierarchical Models 0.6 3000 0.5 0.5 0.4 0.4 2000 0.3 0.3 Density Density Density 0.2 0.2 1000 0.1 0.1 500 0 0.0 0.0

−6e−04 −2e−04 2e−04 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 β γ2 γ3 0.6 0.7 2.0 0.6 0.5 0.5 1.5 0.4 0.4 0.3 1.0 Density Density Density 0.3 0.2 0.2 0.5 0.1 0.1 0.0 0.0 0.0

−4 −3 −2 −1 0 1 −6 −5 −4 −3 −2 0.5 1.0 1.5 2.0 γ4 µ σ

Figure 4.2: Plots of the posterior densities for the parameters β, γ2, γ3, γ4, µ, and σ, respectively.

4.3 Hierarchical Modeling of Poisson Data

As discussed in the previous Section 4.2, a usual fashion in building hierarchical models is to express the model as a typical GLM and then add the random effects at the linear predictor. In this way, a log-linear Poisson hierarchical model by adding a random effect at the linear predictor is formulated, which is assumed to be normally distributed with mean 0 and variance σ2, and hyperprior is assigned to the parameter σ. Hence, the hierarchical Poisson regression model for count

discrete responses yit for time index t = 1, 2,...,T over the same individuals yit 4.3. Hierarchical Modeling of Poisson Data 215

for i = 1, 2, . . . , n, can be formulated as

yit ∼ Poisson(θit),

the log-linear model relates the mean number of counts θit and linear predictor

k X log(θit) = β0t + βjtXijt + εi j=1

2 εi ∼ N(0, σ )

where βs are regression coefficients, and εi are the individual random effects of

observations yit for time index t = 1, 2,...,T .

4.3.1 Implementation For implementing the hierarchial modeling of Poisson regression, a reliability count data, named nuclear power plant scram rate data (NPPSRD) is taken from Martz et al.(1999). The data are discussed and reported in the next section.

4.3.1.1 Nuclear Power Plant Scram Rate Data

The reactor protection system is an important frontline safety system in a nuclear power plant. When a transient event occurs, such as a loss of off-site power, the reactor protection system, also called the scram system, rapidly changes the reactor from a critical to a noncritical status. The rate at which unplanned scrams occur is an important consideration in assessing overall plant reliability. Martz et al. (1999) presents unplanned scram rate data for 66 U.S. commercial nuclear power plants during 1984–1993, which are displayed in Table 4.3. The data consist of the

annual number of unplanned scrams yij in Tij total critical operating hours for the ith plant (i = 1, ..., 66) and jth coded year (j = 1, ..., 10). The 66 nuclear plants are believed similar, but not identical, and we incorporate their similarity by a 216 4. Bayesian Reliability Analysis of Hierarchical Models

S.N. Plant Year Failure Time 1 Arkansas1 1984 3 6250.0 2 Arkansas2 1984 12 7643.3 3 BeaverValley1 1984 4 6451.6 4 BigRockPoint 1984 2 6896.6 5 Brunswick2 1984 3 2654.9 6 Callaway 1984 12 1503.8 7 CalvertCliffs1 1984 5 7575.8 8 Cook1 1984 3 8108.1 9 Cook2 1984 7 5303.0 10 CooperStation 1984 3 6000.0 11 CrystalRiver3 1984 2 8333.3 12 Davis-Besse 1984 4 5555.6 13 DiabloCanyon1 1984 5 1084.6 14 Dresden2 1984 3 6521.7 ...... 658 WashingtonNuclear2 1993 4 6961.5 659 Zion1 1993 1 6987.6 660 Zion2 1993 0 5427.4

Table 4.3: U.S. commercial nuclear power plant scram rate data (number of scrams y in T total critical operating hours) from 1984–1993. These data contain 660 rows and 4 columns. The whole data are saved in text file scramRateData.txt. However, to conserve the space, only a part of the data is reported in this table.

hierarchical model. Using these data, estimates of trends in the scram rate at each plant over this 10-year period and comparisons to the overall population trend are of interest. The same data are also discussed in Hamada et al.(2008).

4.3.1.2 Analysis Using R

As a beginning point, we start with the glmer function of R to fit the hierarchical Poisson regression model for scram rate data. This function used log-link as a default link function for Poisson family, and approximate the results using Laplace approximation technique corresponding to the argument nAGQ=1. Before fitting the model with JAGS, the glmer may be a better alternative to get a quick approximate inference. 4.3. Hierarchical Modeling of Poisson Data 217

Data Creation

For fitting the model with glmer, the scram rate data are created and assembled in R as follows.

scramRateData <- read.table("scramRateData.txt", header=TRUE) scramRateData$Plant <- as.integer(factor(scramRateData$Plant)) scramRateData$Year <- scramRateData$Year-1983 scramRateData$Time <- as.vector(scramRateData$Time)

Model Specification

The scram rate data are counts and assumed to follow a Poisson distribution with

mean number of scrams or scram rate θij per 1000 critical operating hours, i.e.,

yij ∼ Poisson(θijTimeij/1000), i = 1,..., 66, j = 1,..., 10

where Timeij is the total critical operating hours for the ith plant in jth year. The

log-linear model using log-link function for θij is defined as

log(θij) = β0 + β1Yearj + Plantindi ,

where β0 is the overall mean effect, Year is the fixed effect term, and Plant is random effect term , which is independent and assumed to follow N(0, σ2).

Model Fitting

The above defined model is fitted with the function glmer of lme4. This function approximates the results using Laplace approximation method for nAGQ=1. The argument offset is used to add a predictor with its regression coefficient fixed to the value 1, and it is not estimated during optimization. The function display is used to print the results, which are reported in next section. 218 4. Bayesian Reliability Analysis of Hierarchical Models

Fitglmer<-glmer(Failure~Year+(1|Plant), family=poisson(link="log"), offset=log(Time/1000), data=scramRateData) display(Fitglmer, detail=TRUE)

Summarizing Output

The following output displays the approximate inference about the intercept, the coefficients for Year, and their standard errors. It is noted that, the effect of Year is significant with p-value < 2e-16. The estimated variations for Plant and Residual are 0.411 and 1.000. At the end, it can also be noted that model was fit to 660 unplanned scram within 58 plants.

coef.est coef.se z value Pr(>|z|) (Intercept) 0.0445 0.0668 0.6668 0.5049 Year -0.1938 0.0087 -22.1621 0.0000 Error terms: Groups Name Std.Dev. Plant (Intercept) 0.4105 Residual 1.0000 number of obs: 660, groups: Plant, 66 AIC = 2740.3, DIC = 2734.3, deviance = 2734.3

As discussed and illustrated in this and previous sections, the function glmer of lme4 package for R can fit many hierarchical linear and generalized linear regression model. The approximate results obtained from glmer are useful as a starting point, but can run into problem when the sample size or number of groups is small or the hierarchical model is complicated, there just might not be enough information to estimate variance parameters precisely. At that point, to get more precise inference it is useful to switch to Bayesian inference using JAGS program for better account of uncertainty in model fitting. 4.3. Hierarchical Modeling of Poisson Data 219

4.3.1.3 Analysis Using JAGS

In this section, the JAGS program is used to get more precise inference about the posterior densities of hierarchical Poisson regression model fitted for scram rate data. For this, we begin with the creation of data, model specification, and then finally fitting the specified model using JAGS calling from R via R2jags in the following steps.

Data Creation

Let us start with the creation of data in R for the Bayesian hierarchical modeling of scram rate data using JAGS.

scramRateData <- read.table("scramRateData.txt", header=TRUE) n <- 660 J <- 66 y <- scramRateData$Failure Year <- scramRateData$Year-1983 Time <- as.numeric(scramRateData$Time) Plant <- as.integer(factor(scramRateData$Plant)) data.jags <- list(n=n, J=J, y=y, Year=Year, Time=Time, Plant=Plant)

Initial Values

The starting values for parameters α, β, and σ are specified in the following list statement as:

inits<-function(){ list(beta0=rnorm(1), beta1=rnorm(1), sigma=runif(1)) } 220 4. Bayesian Reliability Analysis of Hierarchical Models

Model Specification

For modeling the scram rate data, the number of (unplanned) scram is assumed

to follow a Poisson distribution with scram rate or mean number of scram θij per 1000 critical operating hours

yij ∼ Poisson(θijTimeij/1000), i = 1,..., 66, j = 1,..., 10

where Timeij is the total critical operating hours for the ith plant and jth year. The log-linear model to relate scram rate with the linear predictor of covariates including random effects is defined as

log(θij) = β0 + β1Yearj + εi,

where βs are regression coefficients of fixed effects, Yearj represents coded year,

that is, Year-1983, and εi represents plants effect which is random and assumed to follow a normal distribution with mean 0 and variance σ2 as

2 εi ∼ N(0, σ ).

The following independent and weakly informative prior distributions are used for the parameters of Bayesian model.

6 β0 ∼ N(0, 10 )

6 β1 ∼ N(0, 10 )

σ ∼ U(0, 100)

Thus, the specification of this model in JAGS language can be written as

cat("model{ for(i in 1:n){ 4.3. Hierarchical Modeling of Poisson Data 221

y[i] ~ dpois(theta[i]*Time[i]/1000) log(theta[i])<-beta0+beta1*Year[i]+epsilon[Plant[i]]} for(j in 1:J){ epsilon[j]~dnorm(0, tau)} ## Prior beta0~dnorm(0, 0.000001) beta1~dnorm(0, 0.000001) ## Hyperprior tau<- 1/pow(sigma, 2) sigma~dunif(0, 100) }", file="modelScramRate.txt")

At the beginning, it shows the Poisson regression part of the model running from 1 to total number of observations n. The data distribution is the Poisson dpois, where theta[i] represents the scram rate for the ith data and Time[i] is the total critical operating hours for ith observation. Further, to relate the failure rate theta[i] with covariate Year[i], the log link function is chosen. Furthermore, we need to define random effects and the prior distribution. For the random effects,

2 the normal distribution with εj ∼ N(0, τ = 1/σ ) is used whose hyperparameter σ is to be estimated. Finally, the weakly informative prior distributions are assigned

for the model parameters β0 and β1, and a weakly informative uniform prior U(0, 100) is used for hyperparameter σ.

Model Fitting

Using these data and the specified model, the function jags is called to run MCMC simulation to get the posterior inference about the model parameters, and its results are assigned with the object Fit. After running the three chains each for 15000 iterations, the summary output can conveniently be printed by just typing Fit on R console. 222 4. Bayesian Reliability Analysis of Hierarchical Models

set.seed(123) Fit <- jags(data=data.jags, inits=inits, param=parm, n.chain=3, n.iter=15000, model.file="modelScramRate.txt") Fit

Summarizing Output

Summary of the posterior densities after being fitted with jags for 15000 iterations and 3 chains is reported in the following table. The convergence diagnostic Rhat indicates a good convergence and well mixing of chains.

Parameter Mean SD 2.5% 50% 97.5% Rhat n.eff beta0 0.0451 0.0691 -0.0892 0.0447 0.1782 1.00 1200 beta1 -0.1940 0.0088 -0.2113 -0.1939 -0.1767 1.00 3200 epsilon[1] -0.1297 0.1897 -0.5185 -0.1205 0.2225 1.00 3200 epsilon[2] 0.1885 0.1661 -0.1509 0.1906 0.5084 1.00 1400 epsilon[3] 0.0394 0.1736 -0.3113 0.0409 0.3688 1.00 3200 epsilon[4] -0.5393 0.2185 -1.0028 -0.5322 -0.1370 1.00 3200 epsilon[5] -0.0193 0.1932 -0.4096 -0.0127 0.3511 1.00 1300 epsilon[6] 0.6533 0.1424 0.3627 0.6547 0.9319 1.00 3200 epsilon[7] 0.0843 0.1766 -0.2727 0.0896 0.4163 1.00 3200 epsilon[8] -0.3427 0.2055 -0.7611 -0.3366 0.0538 1.00 3200 epsilon[9] 0.1329 0.1859 -0.2462 0.1385 0.4900 1.00 1800 epsilon[10] -0.1794 0.1964 -0.5660 -0.1767 0.1962 1.00 3100 epsilon[11] -0.1030 0.1949 -0.4891 -0.0962 0.2634 1.00 3200 epsilon[12] 0.1091 0.1961 -0.2869 0.1113 0.4786 1.00 3200 epsilon[13] 0.4009 0.1620 0.0866 0.3992 0.7165 1.00 3200 epsilon[14] 0.0793 0.1757 -0.2731 0.0895 0.4085 1.00 3200 epsilon[15] 0.3372 0.1723 -0.0139 0.3399 0.6584 1.00 3200 epsilon[16] -0.0384 0.1825 -0.4146 -0.0320 0.3051 1.00 3200 ...... epsilon[64] 0.8734 0.1364 0.6094 0.8733 1.1410 1.00 1100 epsilon[65] -0.1380 0.2003 -0.5446 -0.1344 0.2401 1.00 1300 epsilon[66] -0.2303 0.1982 -0.6288 -0.2225 0.1433 1.00 3200 sigma 0.4255 0.0483 0.3396 0.4228 0.5268 1.00 3200 deviance 2621.7537 11.6209 2600.5125 2621.4879 2646.0389 1.00 1900

Table 4.4: Summary of posterior densities of model parameters for scram rate data due to MCMC sampler using the JAGS. 4.3. Hierarchical Modeling of Poisson Data 223

Bugs model at "modelScramRate.txt", fit using jags, 3 chains, each with 15000 iterations (first 7500 discarded)

80% interval for each chain R−hat medians and 80% intervals −1000 0 1000 2000 3000 1 1.5 2+

beta0 ● 0.2

● beta1 0.1 ● deviance beta0 ● ● 0 * epsilon[1] [2] ● [3] ● −0.1 [4] ● [5] ● [6] ● [7] ● −0.18 [8] ● [9] ● ● −0.19 [10] ● [11] ● beta1 [12] ● −0.2 [13] ● [14] ● −0.21 [15] ● [16] ● [17] ● [18] ● 2640 [19] ● [20] ● ● [21] ● 2620 ● [22] ● deviance [23] ● [24] ● [25] ● 2600 [26] ● [27] ● [28] ● [29] ● 1 ● ● ● ● [30] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● [31] ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● [32] ● ● ● ● epsilon ● ● ● [33] ● * ● [34] ● −1 [35] ● ● −2 [36] 1 2 3 4 5 6 7 8 910 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 [37] ● [38] ● [39] ● [40] ● 0.5 [41] ● ● [42] 0.45 [43] ● sigma ● [44] ● [45] ● 0.4 −1000 0 1000 2000 3000 1 1.5 2+ 0.35 * array truncated for lack of space

Figure 4.3: Graphical summary of JAGS simulations after being fitted to the hierarchical Poisson regression model for scram rate data. The left panel of the figure shows the convergence diagnostic about the chains. Whereas, the right panel gives summary of Bayesian inference about model parameters and deviance. Variability in plant effects is evident from the estimates of random effects epsilon.

From Table 4.4, it is observed that the posterior estimate of beta0 is 0.0451±0.0691. The posterior estimate of beta1 is −0.1940 ± 0.0088, and all its posterior quantiles are negative, which indicates there is a decreasing nature in the scram rate over time. Moreover, the posterior estimate of sigma is 0.4255 ± 0.0483 and statistically significant too, which indicates there is a significant plant effect on the scram rate. Graphical summary of these results is reported in Figure 4.3. The plots of posterior densities of the model parameters are depicted in Figure 4.4. 224 4. Bayesian Reliability Analysis of Hierarchical Models 8 5 40 6 4 30 3 4 20 Density Density Density 2 2 10 1 0 0 0

−0.2 0.0 0.1 0.2 0.3 0.4 −0.22 −0.20 −0.18 −0.16 0.3 0.4 0.5 0.6 β0 β1 σ

Figure 4.4: Plots of the posterior densities for the parameters β0, β1, and σ, respectively. Symmetric plots of the posterior shows the good normal approximation of the marginal densities.

4.3.1.4 Prediction for a New Response

In practice, we are often interested in simulating draws from the posterior dis- tribution of the model parameters, and perhaps from the posterior predictive distribution of unknown observables (say y˜ or yrep or ynew),

Z p(˜y|y) = p(˜y|θ)p(θ|y)dθ.

For example, it may be the case to fill in missing values or censored data or to predict replicate data set in order to check the adequacy of the model. Finally, our interest is to make predictions about the future.

In classical paradigm, it would not be straightforward to make full predictions after fitting a model. Although point prediction of a new or future quantity y˜ may be easy, but obtaining appropriate full predictive distribution for y˜ is challenging, as one needs to account for three components: uncertainty about the expected future value E(y˜), the inevitable sampling variability of y˜ around its expectation, and the uncertainty about the size of that error, as well as the correlations between these components (Lunn et al., 2013). Fortunately, it is so trivial now to obtain 4.3. Hierarchical Modeling of Poisson Data 225

such predictive distributions with having tools like MCMC sampler that it can be dealt briefly.

In this section, our goal is to make prediction for a new or future number of failures for the scram rate data. For this, we can simulate predictions in two ways: directly in JAGS, by adding additional unit to the model, or using R, by drawing simulations based on the model fit to existing data. For these data, first approach is used to predict the future number of failures.

Thus, to predict the future number of failures, the data set merely needed to extend by one point indicated as NA, and a line y.tilde <- y[n.new] is added at the end of the JAGS model. The complete code written in JAGS program is as follows. After these setting, JAGS automatically makes predictions for modeled data with NA values using MCMC algorithm. The numeric and graphic summary of these output can be seen in Table 4.5 and Figure 4.5, respectively. Figure 4.6 represents the histogram for the new response y˜.

########################### Data Creation ########################## n.new <- n+1 J.new <- J ## Number of plants are same y.new <- c(scramRateData$Failure, NA) Year.new <- c(scramRateData$Year-1983, 1) Time.new <- c(as.numeric(scramRateData$Time), mean(scramRateData$Time)) Plant.new <- c(as.integer(factor(scramRateData$Plant)), 1) data.new <- list(n.new=n.new, J.new=J.new, y.new=y.new, Year.new=Year.new, Time.new=Time.new, Plant.new=Plant.new) ######################## Model Specification ####################### cat("model{ for(i in 1:n.new){ 226 4. Bayesian Reliability Analysis of Hierarchical Models

y.new[i] ~ dpois(lambda[i]*Time.new[i]/1000) log(lambda[i]) <- beta0+beta1*Year.new[i]+ epsilon[Plant.new[i]] } for(j in 1:J.new){ epsilon[j]~dnorm(0, tau) } ## Prior beta0~dnorm(0,0.000001) beta1~dnorm(0,0.000001) ## Hyperprior tau<- 1/pow(sigma,2) sigma~dunif(0,100) y.tilde <- y.new[n.new] }", file="modelScramRatePred.txt") ########################## Initial Values ########################## inits <- function(){ list(beta0=rnorm(1),beta1=rnorm(1),sigma=runif(1)) } ######################## Parameter to save ######################### parm <- c("beta0", "beta1", "epsilon", "sigma", "y.tilde") ########################### Model Fitting ########################## Fit.new<-jags(data=data.new, inits=inits, param=parm, n.chain=3, n.iter=15000, model.file="modelScramRatePred.txt") Fit.new ####################################################################

Similarly, we can make predictions for any number of new failures by adding additional NA’s to the end of the data vector. It is necessary to specify, the 4.3. Hierarchical Modeling of Poisson Data 227

predictors for these new failures, if we set Plant or Year to NA for any of the predictors the JAGS model will not run.

Parameter Mean SD 2.5% 50% 97.5% Rhat n.eff beta0 0.0468 0.068 -0.084 0.047 0.183 1.00 3200 beta1 -0.1940 0.009 -0.211 -0.194 -0.177 1.00 3200 epsilon[1] -0.1325 0.187 -0.517 -0.127 0.211 1.00 1800 epsilon[2] 0.1900 0.164 -0.149 0.193 0.503 1.00 3200 epsilon[3] 0.0431 0.170 -0.302 0.050 0.370 1.00 3200 epsilon[4] -0.5387 0.218 -0.985 -0.532 -0.135 1.00 3200 epsilon[5] -0.0126 0.197 -0.412 -0.007 0.358 1.00 1700 epsilon[6] 0.6538 0.145 0.368 0.659 0.933 1.00 3200 epsilon[7] 0.0876 0.178 -0.280 0.090 0.429 1.00 2600 epsilon[8] -0.3419 0.202 -0.769 -0.329 0.033 1.00 3200 epsilon[9] 0.1377 0.184 -0.236 0.140 0.495 1.00 2000 epsilon[10] -0.1862 0.201 -0.593 -0.181 0.201 1.00 3200 epsilon[11] -0.1109 0.194 -0.515 -0.106 0.244 1.00 3200 epsilon[12] 0.1021 0.192 -0.279 0.105 0.462 1.00 3200 epsilon[13] 0.3982 0.163 0.070 0.399 0.709 1.00 2000 epsilon[14] 0.0684 0.180 -0.294 0.073 0.399 1.00 3200 epsilon[15] 0.3362 0.172 -0.013 0.338 0.669 1.00 3200 epsilon[16] -0.0482 0.188 -0.428 -0.044 0.304 1.00 3200 epsilon[17] -0.2654 0.193 -0.659 -0.269 0.100 1.00 3200 epsilon[18] 0.1069 0.163 -0.225 0.109 0.419 1.00 960 epsilon[19] -0.7066 0.234 -1.179 -0.695 -0.271 1.00 3200 epsilon[20] -0.1603 0.182 -0.537 -0.153 0.174 1.00 1600 epsilon[21] 0.7248 0.145 0.428 0.728 0.994 1.00 3200 epsilon[22] -0.0759 0.188 -0.453 -0.073 0.280 1.00 3200 epsilon[23] 0.4297 0.157 0.116 0.436 0.733 1.00 1400 epsilon[24] 0.3157 0.163 -0.020 0.317 0.626 1.00 3200 epsilon[25] 0.3099 0.166 -0.030 0.314 0.626 1.00 3200 ...... epsilon[64] 0.8730 0.138 0.601 0.875 1.138 1.00 3200 epsilon[65] -0.1342 0.195 -0.521 -0.132 0.231 1.00 3200 epsilon[66] -0.2248 0.199 -0.620 -0.222 0.159 1.00 1500 sigma 0.4240 0.048 0.338 0.421 0.530 1.00 3200 y.tilde 5.0970 2.487 1.000 5.000 10.000 1.00 3200 deviance 2621.6050 11.524 2600.409 2621.021 2645.630 1.00 2600

Table 4.5: Posterior summary of hierarchical model for scram rate data, including the posterior predictive density of new response y˜ for Plant 1 and year 1984. 228 4. Bayesian Reliability Analysis of Hierarchical Models

Bugs model at "modelScramRatePred.txt", fit using jags, 3 chains, each with 15000 iterations (first 7500 discarded)

80% interval for each chain R−hat medians and 80% intervals −1000 0 1000 2000 3000 1 1.5 2+ 0.2 beta0 ● 0.1 beta1 ● beta0 ● deviance ● 0 * epsilon[1] ● −0.1 [2] ● [3] ● [4] ● [5] ● −0.18 [6] ● ● −0.19 [7] ● ● beta1 [8] −0.2 [9] ● [10] ● −0.21 [11] ● [12] ● [13] ● [14] ● 2640 [15] ● ● [16] ● [17] ● deviance 2620 [18] ● ● [19] 2600 [20] ● [21] ● [22] ● [23] ● 1 ● ● ● ● ● ● [24] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● [25] ● ● ● ● ● ● ● ● [26] ● * epsilon ● [27] ● −1 [28] ● −2 [29] ● 1 2 3 4 5 6 7 8 910 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 [30] ● [31] ● [32] ● 0.5 [33] ● [34] ● 0.45 [35] ● sigma ● [36] ● 0.4 [37] ● [38] ● 0.35 [39] ● [40] ● [41] ● 10 [42] ● [43] ● y.tilde 5 ● sigma ●

−1000 0 1000 2000 3000 1 1.5 2+ 0 * array truncated for lack of space

Figure 4.5: Graphical posterior summary of JAGS simulation including the posterior predictive density of new response y˜ for Plant 1 and year 1984.

4.4 Hierarchical Modeling of Lifetime Data

Hierarchical modeling is an increasingly important tool in the analysis of complex data that arises frequently in modern reliability applications. Preceding sections discussed about the hierarchical modeling of discrete reliability data in Bayesian paradigm and showed their use in handling the complexity often found in reliability applications. In this section, hierarchical modeling of lifetime data with covariates are considered. 4.4. Hierarchical Modeling of Lifetime Data 229 0.15 0.10 Probability 0.05

0.00 0 1 2 3 4 5 6 7 8 9 10 12 14 16 18 x

Figure 4.6: Histogram for the new response y˜ obtained using the simulation technique via JAGS.

4.4.1 Implementation

It is always advantageous to fit the lifetime data model dictated or suggested by the science or engineering of the problem. When such theoretical models are not available, there are several models to choose from that have proven useful in practice. Three very common distributions for survival or lifetime data that might be considered are the gamma, log-normal, and Weibull distributions, which are defined for random variables that take positive values. First two distributions suffer from the considerable disadvantage that their survival (or reliability) and hazard functions can not be expressed in close form. One limitation of the Weibull distribution is in its hazard function that it is a monotonic function of time. So, at this stage, alternative of these distributions need to be considered. One alternative is the log-logistic distribution, which is defined for positive valued random variables. The properties of log-logistic distribution make it a particularly 230 4. Bayesian Reliability Analysis of Hierarchical Models

attractive alternative to these lifetime distributions. Its closed form expressions for reliability and hazard functions make it advantageous over gamma and log- normal distributions. Moreover, because of its non-monotonic (or increasing and decreasing) nature of hazard function, it fits better than log-normal and Weibull. Thus, in this section, log-logistic distribution is considered to model lifetime data using prior distribution in hierarchy.

For the implementation of hierarchical modeling of lifetime data using log- logistic distribution, let us introduce a lifetime data of bearing fatigue taken from Ku et al.(1972). Moreover, for the purpose of hierarchical Bayesian analysis of these data, analytic approximation and simulation methods are implemented using LaplacesDemon software package. For cross validation of the simulated results obtained from LaplacesDemon using Independent-Metropolis algorithm, the JAGS package is used.

4.4.1.1 Bearing Fatigue Failure Times Data

Ku et al.(1972) reports on fatigue testing of bearings used with a particular lubricant, and tested with 10 different testers, that is bench-type rigs, and found that the testers impacted the measured failure times. Table 4.6 presents the bearing failure time data (in hours) that they collected when they used an aviation gas turbine lubricant O-64-2. The random variable is failure time T , and reliability is the probability that a bearing does not fail before time t.

4.4.1.2 Analysis Using LaplacesDemon

Fitting with LaplaceApproximation

For fitting of hierarchical regression model the function LaplaceApproximation is used, which deterministically maximizes the logarithm of the unnormalized joint posterior density using one of the several optimization techniques. 4.4. Hierarchical Modeling of Lifetime Data 231

Tester Failure Time 1 130.3 135.2 152.4 161.7 74.0 155.0 141.2 167.8 137.2 110.1 2 243.6 242.1 239.0 202.1 190.5 159.8 275.5 192.4 183.8 203.7 3 71.3 137.8 101.2 75.3 164.5 113.9 54.7 224.0 171.7 226.5 4 183.4 276.9 210.3 262.8 115.3 242.2 293.5 221.3 108.9 191.5 5 132.9 74.0 169.2 126.4 79.9 139.7 139.0 104.3 100.2 108.2 6 117.9 168.4 153.7 174.7 65.8 158.4 115.7 133.4 171.4 203.0 7 208.5 135.2 217.7 158.5 215.7 136.6 223.3 188.2 190.3 159.8 8 167.5 164.6 215.6 118.3 151.1 166.5 162.6 215.6 171.6 207.6 9 94.2 113.0 180.2 90.4 118.0 101.8 97.8 104.6 154.9 181.3 10 138.0 134.4 200.8 202.7 181.6 126.9 80.0 152.6 173.1 169.5

Table 4.6: Bearing fatigue failure times (in hours) for lubricant O-64-2.

Data Creation

In bearing fatigue failure time data, the logarithm of failureTime is the response variable and tester is the regressor variable. Since, an intercept term will be included, a vector of 1’s is inserted into design matrix X. Thus, J = 2 indicates that, there are two columns for independent variable, first column for intercept term and second column for regressor, in the design matrix.

fatigue <- read.table("fatigue.txt", header=TRUE) N <- 100 J <- 10 x <- as.integer(factor(fatigue$tester)) y <- log(fatigue$failureTime) mon.names <- c("LP") parm.names <- as.parm.names(list(alpha=0, beta=rep(0,(J-1)), log.sigma=rep(0,2))) PGF <- function(Data) return(c(rnorm(1,0,100), rnorm(Data$J-1,0,1), log(rhalfcauchy(2,25)))) MyData <- list(J=J, N=N, PGF=PGF, mon.names=mon.names, parm.names=parm.names, x=x, y=y) 232 4. Bayesian Reliability Analysis of Hierarchical Models

In this case, there are three parameters, alpha, beta and log.sigma, which are specified in a vector parm.names. The logposterior LP and sigma are included as monitored variables in vector mon.names. Number of observations is specified by N. Finally, all these things are combined with object name MyData, which returns the data in a list.

Initial Values

To start the chains, the initial value for each parameter is needed to be specified. For this, alpha and all the betas parameters have been set equal to zero, and the remaining parameters, log.sigmas, has been set equal to log(1). The order of the elements of the initial values must match the order of the parameters. Thus, define a vector of initial values

Initial.Values <- c(0, rep(0, 9), log(1), log(1))

Model Specification

For modeling the bearing fatigue failure time data, logarithm of failureTime is assumed to follow logistic distribution, which is often written as

yi ∼ Logistic(µi, σ1),

and expectation vector µi is equal to

µi = α + β[Xi], i = 1,...,N.

Prior probability and hyperprior probability distributions are specified respectively for first and second stage parameters as follows,

α ∼ N(0, 1000), (4.1) 4.4. Hierarchical Modeling of Lifetime Data 233

2 βj ∼ N(0, σ2), j = 1,..., (J − 1)

σ1:2 ∼ HC(25).

The large variance or small precision indicates a lot of uncertainty about α, and is hence a weakly informative prior distribution. Similarly, half-Cauchy is a weakly informative prior for each σ. Thus, to specify above defined Bayesian hierarchial model in Laplace’s Demon, create a function called Model as

Model <- function(parm, Data) { ### Parameters alpha <- parm[1] beta <- rep(NA,Data$J) beta[1:(Data$J-1)] <- parm[2:Data$J] beta[Data$J] <- -sum(beta[1:(Data$J-1)])#Sum-to-zero constraint sigma <- exp(parm[grep("log.sigma", Data$parm.names)]) ### Log(Prior Densities) alpha.prior <- dnorm(alpha, 0, 100, log=TRUE) beta.prior <- sum(dnorm(beta, 0, sigma[2], log=TRUE)) sigma.prior <- sum(dhalfcauchy(sigma,25, log=TRUE)) ### Log-Likelihood mu <- alpha + beta[Data$x] LL <- sum(dlogis(Data$y, mu, sigma[1], log=TRUE)) ### Log-Posterior LP <- LL + alpha.prior + beta.prior + sigma.prior Modelout <- list(LP=LP, Dev=-2*LL, Monitor=c(LP), yhat=rlogis(length(mu), mu, sigma[1]), parm=parm) return(Modelout) 234 4. Bayesian Reliability Analysis of Hierarchical Models

}

In the beginning, the Model function contains two arguments, that is, parm and Data, where parm is for the set of parameters, and Data is the list of data. There are three parameters alpha, beta and sigma having priors alpha.prior, beta.prior and sigma.prior, respectively. The object LL stands for loglikelihood and LP stands for logposterior. The function Model returns the object Modelout, which contains five objects in listed form that includes logposterior LP, deviance Dev, monitoring parameters Monitor, predicted values yhat and estimates of parameters parm.

Model Fitting

To fit the above specified model, the function LaplaceApproximation is used and its results are assigned to object Fit. Its summary of results are printed by the function print, which prints detailed summary of results and it is not possible to show here. However, its relevant parts are summarized in the next section.

Fit <- LaplaceApproximation(Model=Model, Initial.Values, Data=MyData, Iterations=10000) print(Fit)

Summarizing Output

The function LaplaceApproximation approximates the posterior density of the fitted model, and posterior summaries can be seen in the following tables. Ta- ble 4.7 represents the analytic approximation results using Laplace approximation method while Table 4.8 represents the simulated results using sampling importance resampling method.

From these posterior summaries, it is obvious that, the posterior mode of

intercept parameter α is 5.038 ± 0.027 whereas posterior mode of log(σ1) and 4.4. Hierarchical Modeling of Lifetime Data 235

Parameter Mode SD LB UB alpha 5.038 0.027 4.984 5.091 beta[1] -0.095 0.067 -0.230 0.040 beta[2] 0.261 0.072 0.117 0.405 beta[3] -0.151 0.090 -0.332 0.029 beta[4] 0.250 0.083 0.084 0.416 beta[5] -0.235 0.076 -0.387 -0.083 beta[6] -0.039 0.071 -0.180 0.103 beta[7] 0.137 0.071 -0.004 0.278 beta[8] 0.093 0.068 -0.042 0.228 beta[9] -0.228 0.079 -0.387 -0.070 log.sigma[1] -1.884 0.087 -2.058 -1.711 log.sigma[2] -1.755 0.290 -2.336 -1.174

Table 4.7: Summary of the asymptotic approximation using the function LaplaceApproximation. It may be noted that these summaries are based on asymptotic approximation, and hence Mode stands for posterior mode, SD for pos- terior standard deviation, and LB, UB are 2.5% and 97.5% quantiles, respectively.

log(σ2) are −1.884 ± 0.087 and −1.755 ± 0.290, respectively. These parameters are statistically significant also. Simulation tool is being discussed in the next section.

Fitting with LaplacesDemon

In this section, simulation method is applied to analyze the same data with the function LaplacesDemon, which maximizes the logarithm of the unnormalized joint posterior density with MCMC algorithm.

Model Fitting

Now, fit the prespecified model with the function LaplacesDemon, and its results are assigned to the object name FitDemon. Its summary of results are printed with the function print, and its relevant parts are summarized in the next section.

set.seed(666) Initial.Values <- as.initial.values(Fit) FitDemon <- LaplacesDemon(Model, Data=MyData, Initial.Values, 236 4. Bayesian Reliability Analysis of Hierarchical Models

Parameter Mean SD MCSE ESS LB Median UB alpha 5.031 0.028 0.001 1000 4.969 5.030 5.089 beta[1] -0.096 0.072 0.002 1000 -0.239 -0.099 0.042 beta[2] 0.270 0.078 0.002 1000 0.116 0.272 0.405 beta[3] -0.150 0.090 0.003 1000 -0.346 -0.155 0.031 beta[4] 0.243 0.089 0.003 1000 0.067 0.246 0.402 beta[5] -0.242 0.076 0.002 1000 -0.387 -0.242 -0.079 beta[6] -0.049 0.079 0.002 1000 -0.232 -0.044 0.090 beta[7] 0.136 0.080 0.003 1000 -0.011 0.140 0.283 beta[8] 0.098 0.072 0.002 1000 -0.035 0.097 0.239 beta[9] -0.216 0.084 0.003 1000 -0.381 -0.210 -0.030 log.sigma[1] -1.831 0.084 0.003 1000 -1.998 -1.825 -1.663 log.sigma[2] -1.627 0.294 0.009 1000 -2.237 -1.626 -1.134 Deviance 34.397 5.072 0.160 1000 26.184 34.168 43.741 LP -27.892 2.433 0.077 1000 -32.125 -27.853 -23.678

Table 4.8: Summary of the simulated results due to sampling importance resam- pling method using the same function, where Mean stands for posterior mean, SD for posterior standard deviation, MCSE for Monte Carlo standard error, ESS for effective sample size, and LB, Median, UB are 2.5%, 50%, 97.5% quantiles, respectively.

Covar=Fit$Covar, Iterations=1000, Status=100, Thinning=1, Algorithm="IM", Specs=list(mu=Fit$Summary1[1:length(Initial.Values),1])) print(FitDemon)

Summarizing Output

The LaplacesDemon simulated the data from the posterior density via IM algorithm and approximated the results which can be seen in Table 4.9. This table represents the simulated results in a matrix form that summarizes the marginal posterior densities of the parameters due to stationary samples which contains mean, standard deviation, MCSE (Monte Carlo Standard Error), ESS (Effective Sample Size), and 2.5%, 50%, 97.5% quantiles, respectively. 4.4. Hierarchical Modeling of Lifetime Data 237

Parameter Mean SD MCSE ESS LB Median UB alpha 5.038 0.006 0.000 1000 5.026 5.038 5.050 beta[1] -0.096 0.012 0.000 1000 -0.120 -0.095 -0.072 beta[2] 0.273 0.019 0.001 1000 0.236 0.272 0.308 beta[3] -0.156 0.024 0.001 1000 -0.204 -0.157 -0.111 beta[4] 0.253 0.021 0.001 845 0.214 0.253 0.294 beta[5] -0.238 0.010 0.000 1000 -0.257 -0.238 -0.219 beta[6] -0.041 0.024 0.001 766 -0.087 -0.042 0.005 beta[7] 0.133 0.015 0.000 1000 0.104 0.133 0.160 beta[8] 0.099 0.012 0.000 1000 0.076 0.099 0.122 beta[9] -0.223 0.019 0.001 1000 -0.262 -0.223 -0.187 log.sigma[1] -1.884 0.020 0.001 1000 -1.924 -1.883 -1.846 log.sigma[2] -1.749 0.065 0.002 856 -1.871 -1.749 -1.620 Deviance 24.384 0.937 0.032 1000 22.913 24.246 26.511 LP -21.898 0.213 0.008 791 -22.451 -21.861 -21.615

Table 4.9: Posterior summary of simulated results due to stationary samples.

4.4.1.3 Analysis Using JAGS

To draw the posterior inference, as for the log-logistic hierarchical lifetime model, the function jags is used to simulate the data and approximate the results using MCMC algorithm. Thus, for fitting the model using JAGS, the R and JAGS code are written for data creation and model specification, respectively, in the following steps.

Data Creation

The bearing fatigue failure time data are created in R as

fatigue <- read.table("fatigue.txt",header=TRUE) time <- fatigue$failureTime n <- length(time) y <- log(time) tester.name <- as.integer(factor(fatigue$tester)) uniq.name <- unique(tester.name) 238 4. Bayesian Reliability Analysis of Hierarchical Models

J <- length(uniq.name) tester <- rep(NA,J) for(i in 1:J){tester[tester.name==uniq.name[i]]<- i} data.jags <- list("n","J","y","tester")

Initial Values

The initial values for the model parameters to start the chains are specified as

inits <- function(){ list(alpha=rnorm(1), beta=rnorm(J), sigma=runif(1), sigma.a=runif(1)) }

Model Specification

The hierarchial model for bearing fatigue data are formulated as

yi ∼ Logistic(µi, τ)

µi = α + β[testeri], i = 1, 2,...,N.

The weakly informative prior and hyperprior distributions for the model parameters are considered as

α ∼ N(0, 1.0E-5)

τ = 1/σ

σ ∼ U(0, 100)

βj ∼ N(0, τa) j = 1, 2,...,J

2 τa = 1/σa

σa ∼ U(0, 100) 4.4. Hierarchical Modeling of Lifetime Data 239

Thus, the following JAGS code are written to define the model. cat("model { for (i in 1:n){ y[i]~dlogis(mu[i], tau) mu[i] <- alpha+beta[tester[i]] } alpha~dnorm(0,1.0E-5) tau <- pow(sigma, -1) sigma~dunif (0, 100) for (j in 1:J){ beta[j]~dnorm (0, tau.a) } tau.a <- pow(sigma.a, -2) sigma.a ~ dunif (0, 100) }", file="modelFatigue.txt")

Model Fitting

To fit the hierarchical model described above, the function jags is used and its results are assigned to object FitJags.

set.seed(123) FitJags<-jags(data.jags, inits=inits, n.chain=3, n.iter=15000, parameters=("alpha","beta","sigma","sigma.a"), model.file="modelFatigue.txt") FitJags

Summarizing Output

Posterior summaries of the hierarchical model used for the fatigue failure time data 240 4. Bayesian Reliability Analysis of Hierarchical Models

are presented in Table 4.10. This table shows the simulated results of the marginal posterior densities for the model parameters by making draws using MCMC algorithms. A clear picture of marginal posterior density for each parameter can be seen in Figure 4.7

Parameter Mean SD 2.5% 50% 97.5% Rhat n.eff alpha 5.037 0.085 4.864 5.037 5.205 1.002 1500 beta[1] -0.100 0.108 -0.318 -0.099 0.109 1.002 1300 beta[2] 0.274 0.110 0.068 0.272 0.504 1.002 2000 beta[3] -0.170 0.123 -0.413 -0.169 0.065 1.001 3200 beta[4] 0.263 0.118 0.040 0.258 0.505 1.001 2900 beta[5] -0.250 0.115 -0.470 -0.249 -0.025 1.001 3200 beta[6] -0.045 0.111 -0.267 -0.042 0.175 1.001 2300 beta[7] 0.143 0.109 -0.067 0.141 0.362 1.002 1500 beta[8] 0.098 0.109 -0.109 0.096 0.319 1.001 3200 beta[9] -0.237 0.114 -0.477 -0.236 -0.018 1.005 490 beta[10] 0.007 0.111 -0.215 0.003 0.226 1.001 3000 sigma 0.161 0.014 0.136 0.160 0.191 1.001 3200 sigma.a 0.242 0.079 0.128 0.227 0.430 1.001 3200 deviance 33.548 5.229 25.625 32.843 45.734 1.001 3200

Table 4.10: Summary of the MCMC simulations due to Metropolis-within-Gibbs algorithm using JAGS.

4.5 Discussion and Conclusion

This chapter serves as the hierarchical modeling for reliability data in Bayesian paradigm. This is an exciting area and much more can be done than what is treated here both in practical as well as theoretical terms. For approximation of the posterior probability densities, again both analytic and simulation methods are implemented using the same three software packages. From the obtained results, it 4.5. Discussion and Conclusion 241

is evident that the hierarchical modeling with Bayesian approach based on weakly informative priors is simpler to implement than the classical one. Furthermore, the wealth of information provided in these numeric and graphic summaries are not possible in classical framework (Lawless, 2003). Bayesian approach, on the other hand, makes it simple to implement using tools like R and JAGS. Finally, software developed in this case can also be applied for Bayesian survival analysis. 242 4. Bayesian Reliability Analysis of Hierarchical Models

5 LA LA LA LA

LD LD LD 3.0 LD 3 4 jags jags 3.0 jags jags 3 2.0 2 2.0 2 Density Density Density Density 1 1.0 1.0 1 0 0 0.0 0.0 4.7 4.9 5.1 5.3 −0.6 −0.2 0.2 −0.2 0.2 0.6 −0.8 −0.4 0.0 0.4 α β1 β2 β3

LA LA LA LA LD LD LD LD 3 3 3 jags 3 jags jags jags 2 2 2 2 Density Density Density Density 1 1 1 1 0 0 0 0

−0.2 0.2 0.6 −0.6 −0.2 0.2 −0.4 0.0 0.4 −0.2 0.2 0.6 β4 β5 β6 β7 4

LA LA LA 6 LA LD LD LD LD 3 5 3 jags jags jags jags 20 4 2 2 3 Density Density Density Density 10 2 1 1 5 1 0 0 0 0

−0.4 0.0 0.4 −0.8 −0.4 0.0 0.12 0.16 0.20 0.24 0.0 0.2 0.4 β β 8 9 σ1 σ2

Figure 4.7: Plot of posterior densities of the parameters α, β’s and σ’s for the pos- terior distribution of log-logistic model using the functions LaplaceApproximation, LaplacesDemon, and jags, respectively. It is evident from these plots that Laplaces- Demon is excellent as it resembles with JAGS. The difference between the two seems magical. Bibliography

Akhtar, M. T. and Khan, A. A. (2014a). Bayesian Analysis of Generalized Log-Burr Family with R. SpringerPlus, 3: 185.

Akhtar, M. T. and Khan, A. A. (2014b). Log-logistic Distribution as a Reliability Model: A Bayesian Analysis. American Journal of Mathematics and Statistics, 4(3): 162–170.

Albert, J. (2009). Bayesian Computation with R. Springer, New York, 2nd edition.

Ando, T. (2007). Bayesian Predictive Information Criterion for the Evaluation of Hierarchical Bayesian and Empirical Bayes Models. Biometrika, 94(2): 443–458.

Bates, D. (2005a). Fitting linear models in R using the lme4 package. R News, 5(1): 27–30. cran.r–project.org/doc/Rnews/Rnews 2005–1.pdf.

Bates, D. (2005b). The mlmRev package. cran.r-project.org/doc/Rnews/Rnews 2005-1.pdfpackages/mlmRev.pdf.

Bates, D., Maechler, M., Bolker, B., and Walker, S. (2014). lme4: Linear mixed- effects models using Eigen and S4. R package version 1.1-7. https://CRAN.R-project.org/package=lme4.

Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society 330-418. Reprinted, with bio- graphical note by G. A. Barnard in,. Biometrika, 45: 293–315 (1958).

Bayes, T. and Price, R. (1763). An Essay Towards Solving a Problem in the Doctrine of Chances. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S. Philosophical Transactions of the Royal Society of London, 53: 370–418. 244 Bibliography

Beckar, R. A., Chambers, J. M., and Wilks, A. R. (1988). The New S Language: A Programming Language for Data Analysis and Graphics. Pacific Grove, CA, USA: Wadsworth & Brooks/Cole.

Bernardo, J. and Smith, A. (1994). Bayesian Theory. Wiley, Chichester, UK.

Bernardo, J. and Smith, A. (2000). Bayesian Theory. John Wiley & Sons, West Sussex, England.

Berndt, E., Hall, B., Hall, R., and Hausman, J. (1974). Estimation and Inference in Nonlinear Structural Models. Annals of Economic and Social Measurement, 3: 653–665.

Birgin, E. G., Martinez, J. M., and Raydan, M. (2000). Nonmonotone spectral projected gradient methods on convex sets. SIAM J Optimization, 10: 1196–1211.

Birgin, E. G., Martinez, J. M., and Raydan, M. (2001). SPG: software for convex- constrained optimization. ACM Transactions on Mathematical Software.

Brooks, S. P. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4): 434–455.

Broyden, C. (1970). The Convergence of a Class of Double Rank Minimization Algorithms: 2. The New Algorithm. Journal of the Institute of Mathematics and its Applications, 6: 76–90.

Casella, G. and George, E. (1992). Explaining the Gibbs sampler. The American Statistician, 46: 167–174.

Chambers, J. M. (2008). Software for Data Analysis: Programming with R. Springer, New York.

Coolen, F. P. A. (2008). Parametric Probability Distributions in Reliability. Encyclopedia of Quantitative Risk Analysis and Assessment. III. Bibliography 245

Crowder, M. J. (2000). Tests for A Family of Survival Models Based on Extremes. In Recent Advances In Reliability Theory, N. Limnios and M. Nikulin, Eds., pp. 307-321. Birkhauser, Boston.

Damien, P., Wakefield, J., and Walker, S. (1999). Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variable. Journal of the Royal Statistical Society B, 61: 331–344.

Dey, D. K., Ghosh, S. K., and Mallik, B. K. (2000). Generalized Linear Models: A Bayesian Perspective. Marcel Dekker, New York.

Erkanli, A. (1994). Laplace approximations for posterior expectation when the model occurs at the boundary of the parameter space. Journal of the American Statistical Association, 89: 205–258.

Fletcher, R. (1970). A New Approach to Variable Metric Algorithms. Computer Journal, 13(3): 317–322.

Gamerman, D. and Lopes, H. (2006). Markov Chain Monte Carlo. Texts in Statistical Science, Chapman & Hall, New York, 2nd edition.

Gelfand, A., Hills, S., Racine-Poon, A., and Smith, A. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association, 85: 972–985.

Gelfand, A. and Smith, A. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85: 398–409.

Gelman, A. (2006). Prior Distributions for Variance Parameters in Hierarchical Models. Bayesian Analysis, 1(3): 515–533.

Gelman, A. (2008). Scaling Regression Inputs by Dividing by Two Standard Deviations. Statistics in Medicine, 27: 2865–2873. 246 Bibliography

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian Data Analysis. Chapman & Hall/CRC, New York, 3rd edition.

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004). Bayesian Data Analysis. Chapman & Hall/CRC, New York, 2nd edition.

Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multi- level/Hierarchical Models. Cambridge University Press, New York.

Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4): 457–511.

Gelman, A. and Su, Y. (2014). arm: Data Analysis Using Regression and Mul- tilevel/Hierarchical Models. R package version 1.7-07, URL http://CRAN.R- project.org/package=arm.

Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6: 721– 741.

Gilks, W., Richardson, S., and Spiegelhalter, D. (1996). Markov Chain Monte Carlo in Practice. Interdisciplinary Statistics, Chapman & Hall, Suffolk, UK.

Gilks, W., Thomas, A., and Spiegelhalter, D. (1994). A Language and Program for Complex Bayesian Modelling. The Statistician, 43(1): 169–177.

Givens, G. and Hoeting, J. (2005). Computational Statistics. Wiley Series in Probability and Statistics, John Wiley & Sons, New York.

Goldfarb, D. (1970). A Family of Variable Metric Methods Derived by Variational Means. Mathematics of Computation, 24(109): 23–26. Bibliography 247

Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993). Novel Approach to Nonlinear/Non-Gaussian Bayesian State Estimation. IEEE Proceedings F on Radar and Signal Processing, 140(2): 107–113.

Grant, G. M., Roesener, W. S., Hall, D. G., Atwood, C. L., Gentillon, C. D., and Wolf, T. R. (1999). Reliability Study: High-Pressure Coolant Injection (HPCI) System, 1987-1993. Technical Report NUREG-CR-5500, Vol. 4, INEL-94-0158, Idaho National Engineering and Environmental Laboratory, Idaho Falls, ID, 1999.

Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82: 711–732.

Hamada, M. S., Wilson, A. G., Reese, C. S., and Martz, H. F. (2008). Bayesian Reliability. Springer, New York.

Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57: 97–109.

Higdon, D. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applications. Journal of the American Statistical Association, 93: 585–595.

Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods. Springer, New York.

Hoffmann-Jorgensen, J. (1994). Probability with a View Towards Statistics. Vol. 1. Probability Series, Chapman & Hall/CRC, New York.

Ibrahim, J. and Chen, M. (2000). Power prior distributions for regression models. Statistical Science, 15: 46–60.

Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5: 299–314. 248 Bibliography

Irony, T. and Singpurwalla, N. (1997). Noninformative Priors Do Not Exist: a Discussion with Jose M. Bernardo. Journal of Statistical Inference and Planning, 65: 159–189.

ISO (1986). ISO 8402 International Standard: Quality Vocabulary. ISO: Interna- tional Organization for Standardization, Geneva, Switzerland.

Jeffreys, H. (1961). Theory of Probability. Oxford University Press, Oxford, England, 3rd edition.

Johnson, V. E., Moosman, A., and Cotter, P. (2005). A hierarchical model for estimating the early reliability of complex syatems. IEEE Transactions on Reliability, (54): 224–231.

Kalbfleisch, J. D. and Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data. John Wiley & Sons, New York, 2nd edition.

Kass, R. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90: 928–934.

Kery, M. (2010). Introduction to WinBUGS for Ecologists: A Bayesian Approach to Regression, ANOVA, Mixed Models and Related Analyses. Academic Press, USA, 1st edition.

Kimber, A. C. (1990). Exploratory Data Analysis for Possibly Censored Data from Skewed Distributions. Appl. Stat., 39: 21–30.

Klein, J. P. and Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York.

Kleinbaum, D. G. and Klein, M. (2012). Survival Analysis: A Self-Learning Text. Springer, New York, 3rd edition. Bibliography 249

Kruschke, J. K. (2015). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. Academic Press, London, UK, 2nd edition.

Ku, P. M., Anderson, E. L., and Carper, H. J. (1972). Some Considerations in Rolling Fatigue Evaluation. ASLE Transactions, 15: 113–129.

Lambert, P., Sutton, A., Burton, P., Abrams, K., and Jones, D. (2005). How Vague is Vague? A Simulation Study of the Impact of the Use of Vague Prior Distributions in MCMC using WinBUGS. Statistics in Medicine, 24: 2401–2428.

Laplace, P. (1774). Memoire sur la Probabilite des Causes par les Evenements. l’Academie Royale des Sciences, 6: 621–656. English translation by S.M. Stigler in 1986 as “Memoir on the Probability of the Causes of Events” in Statistical Science, 1(3): 359–378.

Laplace, P. (1812). Theorie Analytique des Probabilites. Courcier, Paris. Reprinted as “Oeuvres Completes de Laplace”, 7: 1878-1912. Paris: Gauthier-Villars.

Laplace, P. (1814). Essai Philosophique sur les Probabilites. English translation in Truscott, F. W. and Emory, F. L. (2007) from (1902) as “A Philosophical Essay on Probabilities”. ISBN 1602063281, translated from the French 6th ed. (1840).

Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. John Wiley & Sons, New York, 2nd edition.

Lawson, A. B., Browne, W. J., and Vidal Rodeiro, C. L. (2003). Disease Mapping with WinBUGS and MLwiN. John Wiley & Sons, Hoboken, NJ.

Lindley, D. V. (1983). Theory and practice of bayesian statistics. Statistician, 32: 1–11.

Lindley, D. V. (2006). Understanding Uncertainty. John Wiley & Sons, New York. 250 Bibliography

Lunn, D., Jackson, C., Best, N., Thomas, A., and Spiegelhalter, D. (2013). The BUGS Book: A Practical Introduction to Bayesian Analysis. Chapman & Hall/CRC, Boca Raton, FL.

Lunn, D. J., Thomas, A., Best, N., and Spiegelhalter, D. (2000). WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing, 10: 325–337.

Marshall, A. W. and Olkin, I. (2007). Life Distributions: Structure of Nonpara- metric, Semiparametric, and Parametric Families. Springer, New York.

Martz, H. F., Parker, R. L., and Rasmuson, D. M. (1999). Estimation of Trends in the Scram Rate at Nuclear Power Plants. Technometrics, 41: 352–364.

Martz, H. F. and Waller, R. A. (1982). Bayesian Reliability Analysis. John Wiley & Sons, New York.

Meeker, W. Q. and Escobar, L. A. (1998). Statistical Methods for Reliability Data. John Wiley & Sons, New York.

Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E. (1953). Equations of state calculations by fast computing machine. Journal of Chemical Physics, 21: 1087–1092.

Møller, J. (1999). Perfect simulation of conditionally specified models. Journal of the Royal Statistical Society B, 61: 251–264.

Mosteller, F. and Wallace, D. L. (1964). Inference and Disputed Authorship: The Federalist. Reading: Addison-Wesley.

Muller, C. (2003). Reliability analysis of the 4.5 roller bearing. Master’s thesis, Naval Postgraduate School. Bibliography 251

Neal, P. and Roberts, G. (2008). Optimal scaling for random walk Metropolis on spherically constrained target densities. Methodology and Computing in Applied Probability, 10: 277–297.

Neal, R. (2003). Slice sampling. The Annals of Statistics, 31: 705–767.

Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal of the Royal Statistical Society A, 135: 370–384.

Nelson, W. (1990). Accelerated Testing: Statistical Models, Test Plans, and Data Analyses. John Wiley & Sons, New York.

Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS. John Wiley & Sons, New York.

Peskun, P. (1973). Optimum Monte Carlo sampling using Markov chains. Biometrika, 60: 607–612.

Pinheiro, J. C. and Bates, D. M. (2000). Mixed-Effects Models in S and S-Plus. Springer-Verlag, New York.

Plummer, M. (2003). JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 20–22, Vienna, Austria. ISSN 1609-395X.

Plummer, M., Best, N. G., Cowles, K., and Vines, K. (2004). coda: Output Analysis and Diagnostics for MCMC. R package version 0.9-1, URL http://www- fis.iarc.fr/coda/.

Poloski, J. P. and Sullivan, W. H. (1980). Data Summaries of Licensee Event Reports of Diesel Generators at U. S. Commercial Nuclear Power Plants from January 1, 1976 to December 31, 1978. Technical Report NUREG-CR-1362, 252 Bibliography

EGG-EA-5092, Idaho National Engineering and Environmental Laboratory, Idaho Falls, ID.

Press, S. J. (1989). Bayesian Statistics: Priciples, Models, and Applications. John Wiley & Sons, New York.

Propp, J. and Wilson, D. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms, 9: 223–252.

R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory. Division of Research, Graduate School of Business Administration, Harvard University.

Rigby, R. and Stasinopoulos, D. (2005). Generalized Additive Models for Location, Scale, and Shape (with disccusion). Journal of the Royal Statistical Society C, 54: 507–554.

Robert, C. P. (2007). The Bayesian Choice. Springer-Verlag, New York, 2nd edition.

Robert, C. P. and Casella, G. (2010). Introducing Monte Carlo Methods with R. Springer, New York.

Roberts, G., Gelman, A., and Gilks, W. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Statistics, 7: 110– 120.

SAS Institute Inc. (2008). SAS/STATr 9.2 User’s Guide. Cary, NC: SAS Institute Inc.

Schmee, J. and Nelson, W. (1977). Estimates and pproximate confidence limits for (log) normal life distributions from singly censored samples by maximum Bibliography 253

likelihood. General Electric C. R. & D. TIS Report 76CRD250. Schenectady, New York. .

Shanno, D. (1970). Conditioning of quasi-Newton Methods for Function Minimiza- tion. Mathematics of Computation, 24: 647–650.

Smith, A. and Roberts, G. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society B, 55: 3–23.

Smith, A. F. M. and Gelfand, A. E. (1992). Bayesian statistics without tears. American Statistician, 46: 84–188.

Spiegelhalter, D., Best, N., and Carlin, B. (1998). Bayesian Deviance, the Effective Number of Parameters, and the Comparison of Arbitrarily Complex Models. Research Report, 98-009.

Spiegelhalter, D., Best, N., Carlin, B., and van der˜ Linde, A. (2002). Bayesian Measures of Model Complexity and Fit (with Discussion). Journal of the Royal Statistical Society B, 64: 583–639.

Spiegelhalter, D., Thomas, A., Best, N., and Lunn, D. (2003). WinBUGS User Manual, Version 1.4. MRC Biostatistics Unit, Institute of Public Health and Department of Epidemiology and Public Health, Imperial College School of Medicine, UK.

Statisticat LLC (2013). LaplacesDemon: Complete Environment for Bayesian Inference. R package version 13.11.17, http://www.bayesian- inference.com/software.

Su, Y. and Yajima, M. (2014). R2jags: A Package for Running jags from R.R package version 0.04-01, URL http://CRAN. R-project. org/package= R2jags. 254 Bibliography

Sun, J. (2006). The Statistical Analysis of Interval-censored Failure Time Data. Springer, New York.

Tableman, M. and Kim, J. S. (2004). Survival Analysis Using S: Analysis of Time-to-Event Data. Chapman & Hall/CRC, New York.

Tanner, M. and Wong, W. (1987). The calculation of the posterior distributions by data augmentation. Journal of the American Statistical Association, 82: 528–549.

Tanner, M. A. (1996). Tools for Statistical Inference. Springer-Verlag, New York.

Teh, Y. W. and Jordan, M. I. (2010). Hierarchical Bayesian Nonparametric Models with Applications. In N. Hjort, C. Holmes, P. Müller, and S. Walker, editors, To Appear in Bayesian Nonparametrics: Principles and Practice. Cambridge University Press.

Tierney, L. and Kadane, J. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81: 82– 86.

Tierney, L., Kass, R., and Kadane, J. (1989). Fully exponential Laplace approxi- mations to expectations and variances of nonpositive functions. Journal of the American Statistical Association, 84: 710–716.

Varadhan, R. and Gilbert, P. D. (2009). BB: An R Package for Solv- ing a Large System of Nonlinear Equations and for Optimizing a High- Dimensional Nonlinear Objective Function. Journal of Statistical Software, 32:4, http://www.jstatsoft.org/v32/i04/.

Woodworth, G. G. (2004). Biostatistics: A Bayesian Introduction. John Wiley & Sons, Hoboken, NJ.

Wu, C.-F. J. and Hamada, M. S. (2009). Experiments: Planning, Analysis, and Optimization. John Wiley & Sons, New York, 2nd edition. Bibliography 255

Yang, R. and Berger, J. (1996). A Catalog of Noninformative Priors. Technical Report, Institute of Statistics and Decision Sciences, Duke University, Durham, NC.

Zacks, S. (1992). Introduction to Reliability Analysis: Probability Models and Statistical Methods. Springer, New York.