UNIVERSITY OF CINCINNATI

Date:______

I, ______, hereby submit this work as part of the requirements for the degree of: in:

It is entitled:

This work and its defense approved by:

Chair: ______

Gibbs and Expectation Maximization Methods for Estimation of Censored Values from Correlated Multivariate Distributions

A dissertation submitted to the

Division of Research and Advanced Studies of the University of Cincinnati

in partial ful…llment of the requirements for the degree of

DOCTORATE OF PHILOSOPHY (Ph.D.)

in the Department of Mathematical Sciences of the McMicken College of Arts and Sciences

May 2008

by

Tina D. Hunter

B.S. Industrial and Systems Engineering The Ohio State University, Columbus, Ohio, 1984

M.S. Aerospace Engineering University of Cincinnati, Cincinnati, Ohio, 1989

M.S. University of Cincinnati, Cincinnati, Ohio, 2003

Committee Chair: Dr. Siva Sivaganesan Abstract

Statisticians are often called upon to analyze censored . Environmental and toxicological data is often left-censored due to reporting practices for mea- surements that are below a statistically de…ned detection limit. Although there is an abundance of literature on univariate methods for analyzing this type of data, a great need still exists for multivariate methods that take into account possible correlation amongst variables. Two methods are developed here for that purpose.

One is a Markov Chain that uses a Gibbs sampler to es- timate censored data values as well as distributional and regression parameters.

The second is an expectation maximization (EM) algorithm that solves for the distributional parameters that maximize the complete in the presence of censored data. Both methods are applied to bivariate normal data and compared to each other and to two commonly used simple substitution methods with respect to bias and squared error of the resulting parameter estimates.

The EM method is the most consistent for estimating all distributional and regres- sion parameters across all levels of correlation and proportions of . Both methods provide substantially better estimates of the correlation coe¢ cient than the univariate methods.

iii iv To Bryce and Callie, with all my love.

May you know the satisfaction and rewards of hard work and perseverance.

v Acknowledgements

I would like to thank Dr. Marc Mills and Dr. Bryan Boulanger, my mentors at the U.S. Environmental Protection Agency, for all of their time and support during my traineeship, without which this dissertation would not exist. They have been a pleasure to work with and have provided invaluable encouragement and feedback. I would also like to thank my entire committee, and especially my advisor, Dr. Siva Sivaganesan, for taking the time to review my ideas and writing, for helpful and constructive feedback, and for allowing me the freedom to follow my own ideas and interests. I am grateful for all of the friends who kept me sane along the entire road to this Ph.D., especially those traveling it with me. I have many fond memories of parties, travels, and long talks, all of which eased the stresses along the way. I am also grateful to the many caring professors in the Department of Mathematical Sciences. In particular, Don French has been a great friend as well as a talented teacher and generous tutor. Jim Deddens has always been especially encouraging and willing to help. Terri, Anita, Nancy, and Patti have all been very kind and helpful with a wide variety of administrative technicalities. Finally, I would like to express a very special thank you to my family. My parents have been wonderful cheerleaders and have helped me out in countless ways on the way to this Ph.D. Bryce and Callie have not only put up with me, but have even turned into amazing young adults along the way. This research was funded by the U.S. Environmental Protection Agency, Of- …ce of Research and Development, National Exposure Research Laboratory and National Risk Management Research Laboratory, through a Graduate Research Training Grant.

vi Contents

List of Tables x

List of Figures xi

Chapter 1. Introduction 1

1.1. General Problem Statement 1

1.2. The Speci…c Analysis that Inspired this Research 7

1.3. Simpli…ed Problem 12

Chapter 2. Literature Review and Analysis 14

2.1. Overview of Methods for Censored Data 14

2.2. Dropping Censored Values 15

2.3. Replacing Censored Values with Constants 17

2.4. Maximum Likelihood Methods 22

2.5. Regression Methods. 29

2.6. Other Methods 30

2.7. EPA Guidelines 31

2.8. Extensions to Multivariate Distributions 33

2.9. Summary 36

vii Chapter 3. Gibbs Sampling Theory and Method Development 39

3.1. 39

3.2. Markov Chain Monte Carlo 45

3.3. Gibbs Sampler Algorithm 48

Chapter 4. Expectation Maximization Theory and Method Development 57

4.1. General Theory of Expectation Maximization 58

4.2. Expectation Maximization for Multivariate Normal Data with Left

Censoring 59

Chapter 5. Implementation and Analysis of Methods 71

5.1. Assumptions and Data Transformation 71

5.2. Evaluation of Methods 73

5.3. R Program 75

Chapter 6. Results 77

6.1. Mean, , and Correlation Coe¢ cient in Normal Scale 78

6.2. Regression Parameters 101

6.3. Comparisons in Original Lognormal Scale 106

6.4. Results for Application of Methods to Lower Variance Distribution 107

6.5. Results for Larger Sample Sizes 112

6.6. Conclusions 115

Chapter 7. Discussion and Recommendations 119

viii 7.1. MCMC Method 119

7.2. EM Method 122

7.3. Future Extensions of MCMC and EM Methods 127

References 129

Appendix . R Code 133

ix List of Tables

1.1 Hormones Analyzed in E• uent Survey 9

1.2 Sample Correlation Coe¢ cients Above .5 10

1.3 Proportions of 46 Values Reported as

2.1 Comparison of Two Simple Substitution Methods for a Lognormal

Distribution such that Ln(x) is Distributed N(0,1) 23

2.2 Comparison of Two Simple Substitution Methods for a Lognormal

Distribution such that Ln(x) is Distributed N(0,1/2) 23

2.3 Comparison of Two Simple Substitution Methods for a Lognormal

Distribution such that Ln(x) is Distributed N(0,2) 23

6.1 Con…dence Intervals for Bias Plots of Mean Estimates 79

6.2 E¤ects of Bias on Lognormal Parameters 107

7.1 Bias Introduced by Estimating Individual Variance Components Incorrectly125

7.2 Bias Reduction with Multiple Imputations 126

x List of Figures

2.1 Comparison of Tails for 10% Nondetects 21

2.2 Comparison of Tails for 30% Nondetects 22

6.1 Biases in Mean Estimates when  = 0:1 80 6.2 MSEs of Mean Estimates when  = 0:1 80 6.3 Biases in Mean Estimates when  = 0:5 81 6.4 MSEs of Mean Estimates when  = 0:5 81 6.5 Biases in Mean Estimates when  = 0:9 82 6.6 MSEs of Mean Estimates when  = 0:9 82 6.7 Biases in Mean Estimates when  = 0:1 83

6.8 MSEs of Mean Estimates when  = 0:1 83

6.9 Biases in Mean Estimates when  = 0:5 84

6.10MSEs of Mean Estimates when  = 0:5 84

6.11Biases in Mean Estimates when  = 0:9 85

6.12MSEs of Mean Estimates when  = 0:9 85

6.13Biases in Estimates when  = 0:1 88

xi 6.14MSEs of Standard Deviation Estimates when  = 0:1 88 6.15Biases in Standard Deviation Estimates when  = 0:5 89 6.16MSEs of Standard Deviation Estimates when  = 0:5 89 6.17Biases in Standard Deviation Estimates when  = 0:9 90 6.18MSEs of Standard Deviation Estimates when  = 0:9 90 6.19Biases in Standard Deviation Estimates when  = 0:1 91

6.20MSEs of Standard Deviation Estimates when  = 0:1 91

6.21Biases in Standard Deviation Estimates when  = 0:5 92

6.22MSEs of Standard Deviation Estimates when  = 0:5 92

6.23Biases in Standard Deviation Estimates when  = 0:9 93

6.24MSEs of Standard Deviation Estimates when  = 0:9 93

6.25Biases in Estimates of  when  = 0:1 95 6.26MSEs of Estimates of  when  = 0:1 95 6.27Biases in Estimates of  when  = 0:5 96 6.28MSEs of Estimates of  when  = 0:5 96 6.29Biases in Estimates of  when  = 0:9 97 6.30MSEs of Estimates of  when  = 0:9 97 6.31Biases in Estimates of  when  = 0:1 98

6.32MSEs of Estimates of  when  = 0:1 98

xii 6.33Biases in Estimates of  when  = 0:5 99

6.34MSEs of Estimates of  when  = 0:5 99

6.35Biases in Estimates of  when  = 0:9 100

6.36MSEs of Estimates of  when  = 0:9 100

6.37Biases in Estimates of Intercept 0 when  = 0:5 103

6.38MSEs of Estimates of Intercept 0 when  = 0:5 103

6.39Biases in Estimates of Slope 1 when  = 0:5 104

6.40MSEs of Estimates of Slope 1 when  = 0:5 104

6.41Biases in Estimates of Slope 2 when  = 0:5 105

6.42MSEs of Estimates of Slope 2 when  = 0:5 105

6.43Biases in Estimates of Lognormal Mean when  = 0:5 108 6.44MSEs of Estimates of Lognormal Mean when  = 0:5 108 6.45Biases in Estimates of Lognormal Mean when  = 0:5 109

6.46MSEs of Estimates of Lognormal Mean when  = 0:5 109

6.47Biases in Estimates of Lognormal Variance when  = 0:5 110 6.48MSEs of Estimates of Lognormal Variance when  = 0:5 110 6.49Biases in Estimates of Lognormal Variance when  = 0:5 111

6.50MSEs of Estimates of Lognormal Variance when  = 0:5 111

6.51Biases in Estimates of Mean Mu when  = 0:5 and are 2 = 0:25113

xiii 6.52MSEs of Estimates of Mean Mu when  = 0:5 and Variances are 2 = 0:25113

6.53Biases in Standard Deviation Estimates when  = 0:5 and Variances are

2 = 0:25 114

6.54MSEs of Standard Deviation Estimates when  = 0:5 and Variances are

2 = 0:25 114

6.55Biases in Mean Estimates for n = 100 when  = 0:5 116

6.56MSEs of Mean Estimates for n = 100 when  = 0:5 116

6.57Biases in Estimates of  for n = 100 when  = 0:5 117

6.58MSEs of Estimates of  for n = 100 when  = 0:5 117

7.1 vs. Lag for Single Variable Censoring 123

7.2 Autocorrelation vs. Lag for Multivariable Censoring 123

xiv CHAPTER 1

Introduction

1.1. General Problem Statement

Environmental and chemical data often include measurements that fall below some detection limit for the instrument or analytical procedure being used to obtain the data. It is common practice for these measurements to be reported simply as

“below the detection limit” such that analyzing the data are left to determine how best to treat the missing values[1]. Should they be dropped at the expense of reducing sample sizes that are often already small? Should they be replaced with a constant value that lies somewhere in the known region, which is typically between zero and the detection limit? Or should some attempt be made to estimate more precisely where the measurement is likely to be within this region?

The answer to these questions may depend on what is known about the underly- ing distribution, the quantity and nature of the available data, which distributional parameters are of greatest interest, and whether there is a preference for choosing the most conservative approach or the most accurate approach. It may also depend on practical considerations such as the cost/bene…t ratio for expending extra time

1 and e¤ort to obtain a more accurate answer, or the skill level of the analyzing the data.

We will assume for the purposes of this paper that the statistician, who is work- ing with censored data due to reporting practices associated with detection limits, has a bias for accuracy in their estimation of parameters associated with the un- derlying distribution of a particular compound. This is often the case since many compounds can have signi…cant health or environmental e¤ects even in very small quantities. Hence, accurate measurement and/or estimation of these small quan- tities can be critical for accurate prediction of the e¤ects they may cause and/or risks they may pose. As methodology improves, the proportion of measurements of a particular compound that fall below these limits will decrease. However, the improved methodology will typically allow for measurement of previously unmea- surable compounds with smaller concentrations. Thus, the problem of dealing with these limits will remain.

Literature describing investigations of the e¤ects of censored data on statis- tical estimation and predictive modeling has primarily centered on developing or comparing methods to contend with censored response variables. Many of these methods can also be applied to univariate predictor variables, but the options for dealing with censored multivariate predictor variables are very limited. There is a dearth of research or methodology focused on cases of multiple predictor variables, some or all of which may be censored. The situation is expressed best in the state- ment from a paper by Austin and Hoch: “Despite a plethora of methods, both in

2 the . . . and statistics literature. . . , for the estimation of regression models in which the dependent variable is subject to either a ceiling e¤ect or to censoring, there is a paucity of literature on the e¤ect of an independent variable being subject to censoring or a ceiling e¤ect.”[4].

When the situation of multiple predictor variables exists, there exists the pos- sibility, which in many scenarios is a strong likelihood, of correlation among pre- dictors variables. This correlation is a complicating reality which is typically ignored. Yet this correlation could be an important characteristic of the data and could possibly even be exploited to better estimate a missing value of one vari- able when the values of any other predictor variables at the same data point are known. Also, the level of correlation could itself be a matter of great interest and thus a parameter to be estimated from the data. Yet with the scarcity of research on estimation methods for multivariate predictor variables comes an even greater scarcity of research comparing the e¤ects of these methods on the estimation of correlation coe¢ cients.

This dissertation will present two new methods for estimating censored multi- variate data that may or may not be correlated. In the case that the variables are correlated, it will use the sample correlation to provide an estimate that is conditional on the values of the other variables at the same data point. This method will be used on simulated data from distributions with varying levels of correlation and left tail censoring, then compared against the most common prac- tices of replacing the censored data with constant values of either 1 or 1 times 2 p2

3 the detection limit. It will be assumed that the data originates from lognormal distributions, which is typical of environmental and toxicology data. The method itself is not limited by this assumption as long as the data can be transformed to a multivariate normal distribution. Thus with the use of Box-Cox transformations, it could be used on a wide of underlying distributions[7]. The signi…cance of assuming an underlying lognormal data distribution is in the use of the comparison methods, which make sense only in the context of data that is limited on the lower end by zero.

The …rst estimation method presented here is a Markov chain Monte Carlo

(MCMC) method that uses a Gibbs sampler to successively and iteratively estimate the distributional parameters and the censored data values for a bivariate predictor variable with censoring in the left tail. At each step of the iteration, a mean vector and correlation matrix are drawn for the underlying data distribution based on the values of the sample and sample correlation matrix. Then a value for each censored observation is drawn from the distribution of the left tail, truncated at the detection limit, given these distributional parameters and conditional on the remaining data values. Because the mean and correlation are simulated at every step, the variance of the predictor will include the variance due to uncertainty about the parameters of the underlying data distribution. This is a sharp contrast to most methods, which tend to underestimate variance due to an underlying assumption that the data is perfectly representative of the distribution from which it was sampled with respect to location and variance.

4 The second estimation method presented is an expectation-maximization (EM) algorithm. The method estimates distributional parameters and censored data values in an iterative manner very similar to the MCMC method, but with ex- act calculations rather than random draws at each step. Within each iteration, an expectation step calculates the expected values of each censored data value successively, given that the value comes from the distribution of the left tail of a normal distribution that is conditional on the current values of both distributional parameters and all other data. Then a maximization step calculates the updated

MLE estimators, which are simply the sample mean and sample variation, using the latest values of the full data set. While this method is similar in many respects to the MCMC method, it requires fewer iterations, by several orders of magnitude.

Estimates of the distributional parameters found by both the MCMC method and the EM method are compared with estimates found by the common substi- tution methods and with similar estimates calculated from the uncensored data.

Methods are compared in terms of bias and mean-squared error (MSE) for each parameter that they are used to estimate. In addition to estimating means and variances for each variable, the correlation coe¢ cient of the two predictor variables is also estimated. Then a response variable is simulated from the precensored mul- tivariate predictor data using a …xed linear relationship and the addition of an error term. The matrices of predictor variables, with censored values replaced by each method, are used to predict this linear relationship by …tting multiple regressions

5 and comparing the regression parameters to the known slope and intercept para- meters. The resulting estimates of regression parameters are all compared, along with the estimates of the distributional parameters, for each level of censoring and correlation.

Both the MCMC and EM method, with their associated calculations and com- parisons, are implemented by a program that has been written in the free shareware language R. It is therefore readily accessible and easily modi…ed by a statistician to

…t their particular needs. Some examples of easy modi…cations would be changes to the level of correlation or censoring, the location of the censoring (right and/or left tails), a change in underlying distribution, addition or deletion of comparison methods, addition of statistics, or the addition of multiple detection limits for a single variable and/or between variables. With a little more work, it could be modi…ed to handle multivariate normal distributions of any dimension with any correlation structure. The main program used for the cases presented in this paper can be found in the Appendix.

In summary, this dissertation presents a MCMC method, speci…cally a Gibbs sampler, and an EM method for estimating the distributional parameters of left- censored data from a bivariate normal distribution. It makes comparisons of these two new methods to the most common substitution methods used on data that originates from lognormal distributions. It also estimates regression parameters for the case when this bivariate data is used in the predictive modeling of a re- sponse variable. Independence of the predictor variables is not assumed, and in

6 fact any correlation among predictor variables is used to make better predictions for the censored data values and subsequently the distributional and regression parameters. Uncertainty about the underlying data distribution due to sampling is taken into account and re‡ected in the mean squared errors of the parameters estimated by the MCMC method. Both methods are easily generalized to most common situations involving statistical analysis of censored data.

1.2. The Speci…c Analysis that Inspired this Research

The inspiration for this research came from an analysis of the data obtained by the U.S. Environmental Protection Agency in a survey of e• uents from approx- imately 50 wastewater treatment plants (WWTPs) in the United States in 2002 and 2003. The primary purpose of this survey was to determine whether certain types of wastewater treatments plants performed better than others with respect to removal of endocrine disrupting chemicals (EDCs). The eight EDCs chosen for measurement in this particular study were all steroid hormones, of both natural and synthetic varieties. The e¤ectiveness of EDC removal in the WWTPs was to be determined by comparing the levels of vitellogenin produced in …sh exposed to the various wastewater samples for 24 hours with the levels produced in control

…sh. Vitellogenin is an egg precursor protein that is normally expressed only in female …sh. An increase or decrease of vitellogenin production in …sh exposed to e• uent waters, as compared to …sh maintained in various control waters, could

7 indicate a potential environmental problem for …sh in waters near the discharge of the particular wastewater treatment plant[27].

Additional goals of this survey were to test new and/or improved laboratory methodologies and to compare their results. The …sh assay was a new method for measuring the overall estrogenic e¤ect that a cocktail of both estrogenic and anti-estrogenic hormones could have on exposed wildlife. Improved methods for gas chromatography/mass spectrometry (GC/MS) analysis were also used in order to measure the very small concentrations that are common of steroid hormones in a matrix of dirty wastewater. These individual hormone levels were measured in hopes of increasing the understanding of which hormones or combinations of hor- mones had the greatest e¤ect on vitellogenin production in …sh. The eight steroid hormones were measured by this GC/MS procedure in each of 46 subsamples of the wastewater samples. These hormones, along with their abbreviations, are listed in Table 1.1.

Five male and …ve female …sh were also exposed to each wastewater sample for

24 hours under controlled conditions. Another …ve male and …ve female …sh were exposed to each of several negative and positive control waters for comparison.

The positive control had a known concentration of EE2 added to laboratory water while the negative controls had either no chemical addition or the addition only of the solvent used to dissolve the EE2 in the positive control. At the end of the 24 hour exposure time, …sh were sacri…ced and expression of vitellogenin in the livers was measured by polymerase chain reaction (PCR).

8 Hormone Abbreviation Estrone E1 17- -Estradiol E2 17- -Ethynylestradiol EE2 Estriol E3 Testosterone Test Androstenedione Andro Dihydrotestosterone Dihyd Progesterone Prog Table 1.1. Hormones Analyzed in E• uent Survey

There was no statistical plan for the e• uent survey at the onset, but the data that had been collected from both the …sh assay and the chemical analysis through 2004 was re-analyzed from a statistical perspective in 2006-2007. Since there was interest in modeling vitellogenin levels of the …sh in this survey as a function of hormone levels, the matrix of hormone levels was used as an eight- dimensional independent (predictor) variable for various regressions. The vector of vitellogenin levels was the dependent (response) variable. The distributions of the individual hormones and of the vitellogenin levels were all determined to be roughly lognormal. There were also relatively high levels of correlation among some of the hormones, with seven out of 28 sample correlation coe¢ cients above

.5 after taking a natural log transformation of the data to produce approximately normal distributions. These levels, based on a sample size of 46 e• uents, are shown in Table 1.2. All of the higher correlations between these variables were positive, but it is conceivable that these could be negative under di¤erent circumstances.

For instance, if hormone data is collected over a period of time, such as at di¤erent

9 Hormone 1 Hormone 2 Sample  E1 E2 .75 E1 EE2 .60 E2 EE2 .58 E2 Test .59 EE2 Test .65 EE2 Andro .68 E3 Andro .53 Table 1.2. Sample Correlation Coe¢ cients Above .5 processes within a wastewater treatment plant, the levels of estrone (E1) and

17- -estradiol (E2) could conceivably be negatively correlated due to chemical conversion that can occur between the two.

Because hormone concentrations of interest tend to be very low, on the or- der of several nanograms per liter, their measurement by GC/MS is pushing the current limits of this technology. This is especially true of hormones that origi- nate in a complex environmental matrix such as wastewater e• uent, and must go through a series of solid phase extractions (SPEs) to separate them from the mix of impurities. Consequently, many measurements of the eight hormones in this analysis fell below the method detection limits (MDLs) for the GC/MS method used, which included the SPE preparation steps. For some hormones, the number of measurements above the MDL was so low that the data was not considered usable, but regressions with the remaining hormones showed encouraging levels of the coe¢ cient of multiple determination, R2. The proportion of values below the

MDL ranged from .20 for estrone to .98 for progesterone. Proportions of values below the MDL are given in Table 1.3 for each of the eight measured hormones.

10 Hormone Proportion < MDL E1 .20 E2 .37 EE2 .48 E3 .78 Test .74 Andro .59 Dihyd .89 Prog .98 Table 1.3. Proportions of 46 Values Reported as

This particular study was large and exploratory in nature and the chemical recovery levels for the GC/MS analysis varied too widely to render the results reportable, but it is a good example of the type of testing that is important in the environmental arena and the data has many characteristics common to data in environmental and health studies. First of all, environmental data can often be approximated by a lognormal distribution. Second, it is typical for the sta- tistician analyzing environmental data to be faced with a decision of how best to handle the measurements that are censored by the fact that they fall below some reportable detection limit. And …nally, it is not uncommon for the goal of the research to be an understanding of the combined e¤ect of a mixture of chemicals

(pharmaceuticals, contaminants, pollutants, etc.) on a particular biological marker or health endpoint. The purpose of the method developed in this dissertation is to address problems with these common characteristics and to determine whether an appreciable improvement in predictive accuracy can be made over the accuracy of the most commonly used methods for handling the censored data values.

11 1.3. Simpli…ed Problem

For the purposes of developing and testing the method presented in this dis- sertation, certain assumptions must be made about the distribution of the data being considered. Data is then simulated from the postulated distributions, cen- soring applied to the left tail of the generated data, and various methods applied to these censored observations. The resulting data sets, estimated distributional parameters, and estimated regression parameters are checked for accuracy against the known characteristics of the distributions used to simulate the data and against parameter estimates made on precensored data. The di¤erent methods of handling censored observations are also compared to each other with respect to accuracy and variability.

It is assumed that the data comprising the predictor variables comes from lognormal distributions. It is further assumed that a natural log transformation of this data produces a bivariate normal distribution in which the variables are correlated. The Pearson product- correlation coe¢ cient is varied from 0 to 0:9 and from 0:9 to 0:1. The data is also censored at a method detection limit (MDL) that is common to both predictor variables. This MDL is chosen such that the percent of censored data points ranges from 10% to 50% in the left tail.

Distributions with each combination of correlation coe¢ cient and censoring level are tested such that there are a total of 95 distributional scenarios to consider. For each of these 95 cases, 100 samples of size n = 25 each are generated. Estimates of

12 distributional parameters and mean squared errors are calculated for each sample and their over the 100 samples are compared for the di¤erent methods of handling censored data.

In addition to the bivariate predictor variables, it is assumed that there is a univariate response variable that is a linear combination of the predictor vari- ables, with noise added in the form of an error term having a standard normal distribution. Thus, the response variable, Y , is a function the predictor variable,

X = (X1;X2), according to the relation

(1.1) Y = 0 + 1X1 + 2X2 + " with the error " N(0; 1). The vector of regression coe¢ cients is assumed to be  T = ( 0; 1; 2) = (1; 0:75; 0:25). The errors in equation 1.1 are simulated from a standard normal distribution, then the corresponding values of Y calculated for each combination of simulated X1, X2, and " according to the equation. After calculating the Y values, the simulated X values are arti…cially censored and the various methods applied to the censored data points. The resulting data matrices are used to estimate the regression coe¢ cients 0, 1, and 2 and to calculate the mean squared errors of these estimates. The estimates and corresponding mean squared errors are then compared to the known coe¢ cients and to each other.

13 CHAPTER 2

Literature Review and Analysis

2.1. Overview of Methods for Censored Data

There is an abundance of literature discussing and analyzing the common sit- uation arising when statisticians are called upon to analyze censored data. In the arena of environmental and toxicology data, the censoring is typically in the left tail and is a result of the limitations of measurement technology to accurately measure very low concentrations. The overwhelming majority of the literature on censoring concerns univariate data and thus direct extensions of the methodology to multivariate data require an assumption of independence among the variables so that the methods can be applied independently to each variable. Comparisons among methods in the literature rarely extend beyond comparison of distribu- tional means and standard deviations, and in fact are often limited to comparisons of means only. Thus most methodology for handling censored data does not consider or attempt to estimate correlations among multiple variables. Nor does it routinely consider estimation of regression parameters for linear regressions on censored data.

Some of the methods developed for analyzing censored data are methods for choosing a "best" estimate of each censored value. These methods result in

14 a complete set of data which can then be analyzed in any way desired by the statistician. Other methods provide estimators of distributional parameters for the data given the known and censored data values, without actually estimating the censored values. While some of these latter methods may perform well for their intended purpose, they cannot be used directly to make inference about the data beyond the parameter estimates themselves. Often these methods do not provide any measure of variability for their resulting estimates.

An overview of the methods available for analyzing censored data will be dis- cussed here with an emphasis on ease of use, associated problems and limitations, and range of applicability. Guidelines, recommendations, and rules for choosing methods will also be discussed. Comparisons among methods will be presented as found in the literature. In some cases, these comparisons will be supplemented with theoretical comparisons. Unless otherwise stated, these are methods for univariate data which may be directly extended to multivariate data only if the variables are assumed to be uncorrelated. A summary will be given of the available methods with the needs for more widely applicable methods highlighted.

2.2. Dropping Censored Values

A very straightforward, but mostly eschewed, method for handling censored data is to simply drop all censored values from the data set. There are several obvious problems with this method. First of all, left censored data that is dropped will result in overestimation of the distribution mean and underestimation of the

15 distribution variance. A method that avoids this bias, called the trimmed mean method, can be used only for estimating the mean of a distribution. This method drops the same number of data points from both tails, thus providing an unbiased estimate of the mean in the case of symmetric distributions. However, symmetric distributions are not common with environmental data, so this method would require a transformation of the data …rst.

A major problem inherent to any method that involves dropping of data values is the obvious loss in sample size. Rarely are sample sizes so large in practice that this loss can be justi…ed, particularly if the proportion of nondetects is high.

The problem is only compounded when multivariate data is considered because censoring of any one variable can result in an entire data point being discarded.

This compounding of data loss can only be avoided if the variables are all assumed to be independent, which is often a poor assumption.

The Winsorized mean and standard deviation are estimates that avoid sample size reduction by dropping an equal number or percent of data values in both tails, then replacing them with the next closest data value. The sample mean and standard deviation are computed from the modi…ed sample and the standard deviation is adjusted upwards to account for the loss of extreme data values. The resulting estimates of mean and standard deviation are referred to as the P % or kth Winsorized mean and standard deviation, having P % censoring or k censored

16 values in each tail. The kth Winsorized estimates are calculated as

(k + 1)x(k+1) + x(k+2) + ::: + x(n k+1) + (k + 1)x(n k) (2.1) ^ = =x  w n

s(n 1) (2.2) ^ = w n 2k 1 where x and s are the sample mean and standard deviation of the modi…ed sam- ple. This method again assumes a symmetric underlying distribution, so skewed distributions require a transformation before it can be applied [3].

2.3. Replacing Censored Values with Constants

Simple substitution methods are probably the easiest and most common ap- proach to dealing with censored data. These methods simply replace each censored value with a constant value. For data that is censored in the left tail and that originates from some type of measurement that cannot have a value below zero, such as measurements of quantity or concentration, underlying distributions are often assumed to be lognormal. For this type of data, any replacement value in the interval from zero to the detection limit (DL) may be chosen. The most commonly used values are 0, DL/2, DL/p2, and DL. Using the extreme values of 0 and DL will obviously bias estimates of the mean by underestimating and overestimating the true mean, respectively. Sometimes DL is used because it is thought to be a conservative estimator of the mean when higher concentrations are associated

17 with adverse e¤ects, as in the cases of pollutants or toxins. However, this practice can be misleading due to the fact that estimates of variance will simultaneously be underestimated. Also, using 0 is not an option if a log transformation will be utilized for analyzing the data. The performances of the middle estimates, DL/2 and DL=p2, are highly dependent on the in the underlying data and on the level of censoring. These substitutions are very simple to use and therefore used widely, despite the fact that they rarely perform as well or as consistently as more complex methods .

Use of the value DL/2 is attributable to Nehls and Akland of the Environmen- tal Protection Agency (EPA) and is still one of the EPA’s recommendations for data with less than 15% nondetects [29][3]. This method inherently assumes a uniform distribution for data that is censored due to a detection limit. Although the method is typically used on data from approximately lognormal distributions,

Nehls and Akland claim it is rare that this distribution describes the data well in the left tail, thus rendering the assumption of uniformity, and the correspond- ing estimate DL/2, reasonable choices for this region. However, they do recom- mend against computing any statistics from data that has utilized this substitution method if there were originally more than 25% nondetects.

Hornung and Reed of the National Institute for Occupational Safety and Health

(NIOSH) later developed the method of substituting DL=p2 for left-censored val- ues from lognormal distributions [22]. This method is based on the theory that a better approximates the shape of a lognormal distribution in the

18 left tail region than a uniform distribution. The triangle is formed by connecting the points (0,0), (DL,0), and (DL, density(DL)). The value DL=p2 then has the property that half of the area in the approximating triangle falls below it and half falls above it. In their paper describing the method, Hornung and Reed compare the method to a maximum likelihood method and the DL/2 simple substitution method. Evaluations are based on comparing the biases in estimating means and standard deviations of simulated data. The distributions evaluated were uni- variate with …xed geometric mean, exp(), of 1.0, geometric standard deviations, exp(), of 1.5, 2.0, 2.5, and 3.0, and proportions censored of 0, 0.15, 0.30, 0.45, and 0.60. They concluded that the maximum likelihood method performed best overall and should be used when accuracy of estimates for both means and stan- dard deviations is important. However, they concluded that one of the simpler methods would be su¢ ciently accurate for most situations and that the choice of which simple substitution method to use should be dependent on the skewness of the distribution. Distributions with lower skewness and/or lower proportions of censored values performed better with their method of substituting DL=p2, while distributions with higher standard deviations and/or higher proportions of censored values performed better with the DL/2 substitution.

A visual representation of the theories behind the simple substitution methods using DL/2 and DL/p2 to replace values censored at DL can be seen in 2.1 for 10% nondetects and in 2.2 for 30% nondetects. Both depict the same three univariate

1 lognormal distributions with parameters  = 0 and either  = 2 (in green),  = 1

19 (in red), or  = 2 (in blue). These are the mean and standard deviations of the corresponding normal distributions, thus the geometric means are both exp(0) = 1 and the geometric standard deviations are exp(1=2) = 1:65, exp(1) = 2:72, and exp(2) = 7:39. Each …gure shows how the censored data for the two distributions would be approximated by the uniform distribution for the DL/2 substitution method or by a triangular area for the DL=p2 substitution method. The …rst

…gure uses a censoring point at 10% of each cumulative distribution while the second uses a censoring point at 30% of each cumulative distribution. It is easy to see from these …gures why the DL=p2 method would be the better of the two simple substitution methods for low levels of nondetects and lower values of , and why the DL/2 method would be best when the levels of nondetects and the values of  are higher, especially when the censoring point is to the right of the .

It can also be readily seen that distributions with higher skew (higher standard deviations) will reach the point of favoring the DL/2 method at a lower level of censoring than distributions with lower skew (lower standard deviations).

The DL/2 and DL/p2 simple substitution methods can also be compared the- oretically. This comparison is shown for the same three distributions that were shown graphically. The actual mean of each truncated tail distribution is calcu- lated along with the numerical estimates for each of the two simple substitution methods. Then the simple substitution method with the lowest bias is determined for each distribution at each of …ve levels of censoring. Results are summarized in 2.1 and 2.2. These theoretical results are in accord with the simulated results

20 Figure 2.1. Comparison of Tails for 10% Nondetects of Hornung and Reed and also with the simulated results presented in the results of this paper. Hornung and Reed did not present a theoretical comparison of the methods, but they did present simulation results for lognormal distributions with equivalent normal means of zero and standard deviations of approximately

.41, .69, .92, and 1.10 that agree well with the trends apparent in these two dis- tributions having standard deviations of .50, 1.00, and 2.00. The results in this paper are primarily for a bivariate distribution with variances of 1.00, but also a nonzero correlation coe¢ cient. Thus they are most similar to the results for the

21 Figure 2.2. Comparison of Tails for 30% Nondetects standard normal distribution, but since conditional variances are lowered by corre- lation, they fall in the range between the distributions having  = 1 and  = :50, depending on the level of correlation. Details of the method used for calculating the mean of the tail distribution are given in a later chapter.

2.4. Maximum Likelihood Methods

For data that is assumed to be normally distributed, maximum likelihood meth- ods seek to choose estimates ^ML and ^ML of the parameters  and  which max- imize the probability that the data came from a N(; 2) distribution. When k

22 Proportion Detection Normal Min. Bias Censored Limit (DL) Ln(DL) Tail Mean Ln( DL ) Ln( DL ) Estimate 2 p2 .10 0.28 -1.28 -1.75 -1.97 -1.63 DL=p2 .20 0.43 -0.84 -1.40 -1.53 -1.19 DL=2 .30 0.59 -0.52 -1.16 -1.22 -0.87 DL=2 .40 0.78 -0.25 -0.97 -0.95 -0.60 DL=2 .50 1.00 0.00 -0.80 -0.69 -0.35 DL=2 Table 2.1. Comparison of Two Simple Substitution Methods for a Lognormal Distribution such that Ln(x) is Distributed N(0,1)

Proportion Detection Normal Min. Bias Censored Limit (DL) Ln(DL) Tail Mean Ln( DL ) Ln( DL ) Estimate 2 p2 .10 0.53 -0.64 -0.88 -1.33 -0.99 DL=p2 .20 0.66 -0.42 -0.70 -1.11 -0.77 DL=p2 .30 0.77 -0.26 -0.58 -0.96 -0.61 DL=p2 .40 0.88 -0.13 -0.48 -0.82 -0.47 DL=p2 .50 1.00 0.00 -0.40 -0.69 -0.35 DL=p2 Table 2.2. Comparison of Two Simple Substitution Methods for a Lognormal Distribution such that Ln(x) is Distributed N(0,1/2)

Proportion Detection Normal Min. Bias Censored Limit (DL) Ln(DL) Tail Mean Ln( DL ) Ln( DL ) Estimate 2 p2 .10 0.08 -2.56 -3.51 -3.26 -2.91 DL=2 .20 0.19 -1.68 -2.80 -2.38 -2.03 DL=2 .30 0.35 -1.05 -2.32 -1.74 -1.40 DL=2 .40 0.60 -0.51 -1.93 -1.20 -0.85 DL=2 .50 1.00 0.00 -1.60 -0.69 -0.35 DL=2 Table 2.3. Comparison of Two Simple Substitution Methods for a Lognormal Distribution such that Ln(x) is Distributed N(0,2)

of the n data points are censored below a detection limit (DL) the likelihood of

23 the censored normal sample is

n DL  1 xi  (2.3) L [( )]k exp ( )2 _  f2  g i=k+1 X where  is the standard normal distribution function. Taking the natural loga- rithms of 2.3, di¤erentiating with respect to both  and 2, and equating to zero, results in the

DL  (2.4) z = ;  k '(z)  =x   ( )[ ] ; f n k (z) g s2 + (x )2 2 = k '(z) 1 + z ( n k )[ (z) ] f g where x and s2 are the sample mean and sample variance, respectively, of the n k data values that are above the detection limit. This system of equations can not be solved in closed form, so various methods have been devised to facilitate the solution. When the data is assumed to be lognormally distributed, these same methods are used on the log-transformed data, sometimes with factors to correct for transformation bias that occurs when functions of the maximum likelihood estimators (MLEs) are used to estimate means and variances in the original scale.

Some of the types of methods used to calculate maximum likelihood estima- tors include methods utilizing tables, iterative numerical methods, and restricted likelihood methods. Tabular methods rearrange the estimating equations to some

24 convenient form with one or several unknown quantities that are then …lled in with values that are found in tables as a function of some statistics calculated from the data at hand. Iterative methods begin with reasonable guesses for  and 2, then calculate the likelihood with those guesses, make new guesses for  and 2 that will increase the likelihood, and stop when the increase in likelihood is smaller than some prede…ned amount. Restricted likelihood methods make some simplifying assumption that results in the ability to directly solve the estimating equations.

One of the more commonly used tabular methods is Cohen’s MLE method which …ts the best lognormal distribution to data based on probabilities that the data came from a truncated distribution, or from a censored distribution of either

Type I (…xed censoring point, as with detection limits) or Type II (…xed number of censored observations). This method requires looking up a parameter ^ in a

k s2 table, or on a graph, as a function of the proportion of nondetects, n , and (x DL)2 , where x and s2 are the sample mean and sample variance of the n k uncensored observations. The parameter ^ is then used to calculate the MLE estimates from the rearranged estimating equations:

(2.5) ^2 = s2 + ^(x DL)2 MLE ^ =x  ^(x DL) MLE

Cohen claims that this method will lead to estimates identical to those obtained by many other MLE methods with the possible exception of calculation error. He

25 also claims that they are consistent and asymptotically e¢ cient, as are MLEs in general, and that they are recommended for sample sizes that are at least greater than 10[23][24].

Persson and Rootzén developed a restricted maximum likelihood (RML) method to maximize a Type I censored maximum likelihood. The premise behind this method is that the number of observations above (or below) the censoring point, or detection limit, is binomially distributed. Thus the proportion of censored

k values, n , is a reasonable estimate of the normal distribution function at the DL. Using this estimate gives

1 k DL  (2.6)  ( ) = z n t  so each instance of z in the estimating equations can be replaced by the restric-

1 k tion, or estimate,  ( n ). With this substitution, the estimating equations become explicitly solvable for ^RML and ^RML. However, while these estimates are consis- tent and asymptotically normally distributed, they are not unbiased for censored samples as they are for uncensored samples. Persson and Rootzén include an adjustment to correct for the bias, which is especially needed for higher propor- tions of censoring since the bias increases with the censoring. The adjusted RML estimators are asymptotically normal with asymptotic e¢ ciencies of >99.6% and

>98% for ^ and ^ respectively over all parameter values[31].

26 The most commonly used iterative method for maximizing the complicated like- lihood functions associated with incomplete data sets is the expectation-maximization algorithm of Dempster, Laird, and Rubin. As stated in their introduction, it is remarkable in its theoretical simplicity and its wide range of application. The way that it works is simply to iterate through steps that calculate an expectation or expectations for any , then maximize the likelihood given the current complete data set, as if all data had been fully observed. The authors prove that the resulting algorithm will increase the likelihood with each successive iteration unless it reaches a maximum, that a maximum-likelihood estimate is a …xed point of an EM algorithm, the conditions under which the EM algorithm converges to the

ML estimate, and even compute rates of convergence[9]. In addition to …nding the

ML estimates ^ML and ^ML of  and , an expectation-maximization algorithm can also be used to simultaneously estimate other parameters such as correlation coe¢ cients, i, and best Box-Cox transformation coe¢ cient, . Shumway, Azari, and Johnson use this method to solve for ^ML and ^ML by using a likelihood equation that includes the Jacobian for a Box-Cox transformation with unknown coe¢ cient . At each step of the algorithm, they maximize the log likelihood for

…xed  to get estimators for  and , then scan these estimators over  to get the …nal set of estimators for , , and [34]. Freeman and Modarres extend this method to the multivariate case in which all variables may require a Box-Cox transformation to achieve normality, but only the response variable is subject to left censoring[10]. Both of these methods estimate means in the original scale by

27 using the invariance property of MLEs to calculate E(x) as a function of the pa- rameters  and . For example, when the original scale is lognormal,  = 0 and

1 2 E(x) = exp( + 2  ), so the MLE estimate of E(x) is:

1 (2.7) E[(x) = exp(^ + ^2 ) ML ML 2 ML

Although ^ML and ^ML are unbiased, E[(x)ML is not. One advantage of the EM algorithm over other MLE methods that these authors do not exploit is the fact that it results in a full data set with estimates for each censored observation.

This data set can be transformed to the original scale before calculating statistics, such as sample mean and variance, that avoid the problem of transformation bias, which results from the fact that expected values in the original scale are not linear combinations of the parameters estimated in the normal scale.

Maximum likelihood methods generally compare favorably with other methods used to calculate distributional parameters of censored data. Their main draw- backs are bias problems due to both censoring and to calculation of expectations in the original scale when the original distribution is not normal. However, methods for correcting these biases do exist. Another drawback to many of the methods is that they do not address variances of the estimators. And …nally, most apply only to single variable distributions. Only iterative solutions, such as the expectation maximization algorithm, can be readily modi…ed for use with multivariate data.

28 2.5. Regression Methods.

Regression on order statistics (ROS), or equivalently probability plotting, can be used to estimate  and  for data from a normal distribution or for any data that is distributed normally after a transformation. The basic premise is that any variable x that comes from a normal distribution can be expressed as

(2.8) x =  + z where z is the z-score of the normal distribution. In ROS methods, the observa- tions are …rst ordered such that

(2.9) x1 < x2 < ::: < xk < ::: < xn

1 then zi is estimated by  (pi) for each uncensored observation xk+1 to xn, where pi is an estimate of the probability that any x from a sample of size n will be less than the value of the ith order . This estimate, called the plotting position or adjusted rank, can vary by method. A commonly used estimate is Blom’sformula:

3 i 8 (2.10) pi = 1 n + 4

The resulting linear relationship

1 2 (2.11) xi =  +  (pi) + "i;"i s iid N(0; "); i = k + 1; :::; n

29 is solved by least regression to obtain the estimates ^ROS and ^ROS [30]. A more robust version of ROS begins as above, but goes a step further to improve the estimates of  and . In this method, values are predicted for each of the censored observations from the estimates ^ROS and ^ROS by extrapolating the regression to ranks i = 1 to k. The resulting complete data set is then used in the same manner to obtain revised estimates for  and . The complete data set can also be transformed back to the original scale for calculation of sample means and variances in order to avoid transformation bias [20].

2.6. Other Methods

Many other methods have been proposed and compared, but most either per- formed poorly in comparison to the more commonly reported methods or are very similar theoretically. Nonparametric methods exist, but do not seem to be very popular. In fact, one of the most recent and complete comparisons of methods, by

Hewett and Ganser, did not …nd the nonparametric methods that they compared to be as robust as expected[21]. Some other examples of poor performers were estimates of  and  that used only uncensored data and assumed a truncated normal distribution with truncation at the DL, and a consideration of all linear

30 combinations

G = Gixi and(2.12) i X (2.13) H = Hixi i X of the ordered uncensored data values such that the expected values were  or

; respectively[14]. An example of a method very similar theoretically to one described previously is substitution of all censored values with values that are uniformly distributed from 0 to DL [20]. This method should essentially perform

DL equally to the simple substitution with 2 . The main di¤erence is that it is less amenable than the simple substitution to being extended to a multivariate scenario, since ordering of one variable relative to another could be important.

2.7. EPA Guidelines

The U.S. Environmental Protection Agency gives some general guidelines for how to handle data with nondetects in the document EPA QA/G-9S, Data Quality

Assessment: Statistical Methods for Practitioners [3]. This is a 2006 update of the document EPA QA/G-9 in which the recommendations have been slightly modi…ed, but some of the explanations of methodology have been removed. Thus the two documents should be read in tandem, with a warning that the original version has mistakes in the subscripts under the section describing calculation of the Winsorized mean and standard deviation. Since this description was deleted

31 in the updated document, but the method is still listed as a recommendation, one should refer to the correct description above in the section on dropping censored values.

The general guidelines provided by the EPA give choices for data analysis based on the percent of nondetects present in a data sample. All of the methods are fairly straightforward for use by scientists who are not statisticians. The assumptions underlying each method are stated, along with recommendations to consult a statistician when they are not met. The recommended methods for 0

DL to 15% censoring are simple substitutions of 0, 2 , or DL , or the use of Cohen’s method, which is described above under maximum likelihood methods. For 15 to

50% censoring, Cohen’s method is again recommended, along with the trimmed mean or the Winsorized mean and standard deviation, both of which are described above in the section on dropping censored values. Finally, for cases of 50 to

90% censoring, replacement methods are not recommended at all. Instead, the recommendation is to do only tests of proportions to test hypotheses regarding of the distribution at levels above the proportion of censoring.

While the EPA recommendations are easy to use, they all have drawbacks as detailed in the sections above. Moreover, there are no recommendations for multivariate censored data. Due to the importance of the underlying assumptions and the need for data transformation in most cases, a statistician should always be involved in de…ning the plan for analyzing censored data samples.

32 2.8. Extensions to Multivariate Distributions

Few studies have been done on censored data from multivariate distributions and none were found that allowed for possible censoring on more than one of the variables. Many of the methods reviewed could be extended to multivariate situations so long as the variables are assumed to be independent and the methods are then applied to each variable separately. However, it is not uncommon to have measurements of multiple chemicals, all of which could possibly be related.

One example is with any type of water quality monitoring, where levels of various chemicals and/or pollutants can be related either directly, as in the case of chemical conversions or reactions, or indirectly, as a function of common latent variables such as temperature or processing parameters in a treatment facility. Another example would be when a health response is measured in a person or animal that depends on exposure to a cocktail of chemicals or pollutants, some or all of which could be censored. In this case, the response itself is less likely to be censored than the individual exposure levels. Methods are needed to handle these situations such that possible correlation amongst variables is considered and can even be predicted from censored data. In the case of modeling a response as a function of multiple predictors, it is important to account for the correlation amongst predictors accurately, as estimates of a response may be a¤ected by this correlation.

33 Some of the methods above obviously lend themselves better to extension be- yond univariate scenarios than others. Simple substitutions could be used easily.

As with the univariate cases, they may provide satisfactory results for some cases, particularly if the proportion of nondetects is low, but will be biased and probably not robust. Dropping censored data would be a very poor option since it would require dropping an entire data point in any case where any one or more variables are censored, thus increasing the amount of dropped data, with the expected pro- portion dropped as high as 1 qi, where qi = 1 pi and pi is the proportion i of the data censored for variableYi. Both simple substitution and data-dropping would also be limited to estimating distribution means and variances, with the as- sumption of zero correlation amongst variables and no way to estimate the actual correlation. Finally, the theory behind regression on order statistics and many of the MLE methods would be extremely di¢ cult to extend to a multivariate data sample.

Three relative studies were found that speci…cally addressed estimating parame- ters from multivariate censored distributions. The …rst study by Lyles, Williams, and Chuachoowong, was the only one that allowed censoring of more than one variable [28]. In it, a maximum likelihood method was solved by numerical ap- proximation, without data augmentation, to estimate the correlation between two viral load assays. Comparison of multiple methodologies like this is an important application, but since the method used here does not result in a complete data set, it cannot be extended to di¤erent types of applications requiring estimation

34 of di¤erent parameters. Extension to comparisons of more than two assays could be achieved by applying the method in a pairwise fashion to obtain the correlation between any two of them.

Austin and Hoch considered a bivariate predictor variable X with a linear relationship to a response variable Y of the form Yi = 0 + 1X1i + 2X2i + "i where "i N(0; ) and varying proportions of the X1 values were censored in  the right tail at a ceiling level c. They considered …ve methods for handling the censored values, then looked at their e¤ect on bias of the estimated parameters

1 and 2 ( 0 was held constant). Distributional parameters for X were not estimated. The …rst method, "ignoring the ceiling e¤ect", probably refers to replacement of the censored values with the ceiling value c. The second method was to perform the regression only on the observations that were not subject to censoring. This method resulted in low bias in estimation of , even though it is known to produce highly biased estimates of the distributional parameters. The remaining methods involved replacing censored values with their expected order statistics, the expected value of a truncated distribution, and an estimation based on the full likelihood. The likelihood method was not detailed, but since it had to result in data augmentation in order to obtain a full set of data for regression, it is likely that the EM algorithm was used. Estimates of from the full-likelihood method were unbiased and also robust to misspeci…cation of the underlying data distribution [4].

35 Freeman and Modarres used an EM algorithm to estimate the distributional parameters of water quality monitoring data. The data distribution was bivariate and skewed to the right. The …rst variable, Biochemical Oxygen Demand (BOD), included some measurements that were censored below a detection limit. The second variable, Total Suspended Solids (TSS), was not censored. The method also computed likelihoods for nine combinations of Box-Cox transformation parameters in order to choose the optimal set of transformations for the data. The authors looked at the e¤ect of an incorrect transformation on the estimation of parameters, but did not compare the method to any other methods or use it on simulated data with a known underlying distribution. Estimates of mean and variance were calculated for the variables in their original scale, along with con…dence intervals

[10].

2.9. Summary

The methods that tend to perform best for estimating distributional parameters when the shape of the underlying distribution is known are the various maximum likelihood methods. However, adjustments to ML estimators may be necessary to account for various sources of bias. If the underlying distributional shape is un- known but can be found by optimizing over Box-Cox transformation parameters,

MLEs can still be obtained with the expectation-maximization algorithm. Regres- sion on order statistics has also been found to perform well and using ROS after a

36 lognormal transformation of the data is considered fairly robust even to misspec- i…cation of the data distribution [20]. Simpler methods that involve dropping censored data or substituting constant values tend to provide reasonable estimates of distributional parameters only when the proportion of nondetects is low and the underlying data distribution lines up well with the assumptions of the particular method.

The types of methods that are most ‡exible to be used for estimating a variety of parameters or statistics are the types of methods that provide estimates for each censored data value. These "…ll-in" methods result in a complete data set that can then be analyzed in any way that the statistician desires. Included in this cat- egory are all simple substitution methods, the expectation-maximization method, and the robust version of regression on order statistics. Another advantage of these methods, beyond their inherent ‡exibility, is the fact that their resulting complete data sets can be transformed back to the original data scale, treated as an uncensored sample, and used to calculate statistics in the original scale, such as sample mean and variance, that avoid transformation bias.

Extension of most methods to multivariate data distributions requires an as- sumption of independence among the variables that may not be realistic. A few methods could theoretically be extended to a correlated multivariate case, but only with great di¢ culty. Of the methods reviewed, only the expectation-maximization algorithm lends itself well to correlated multivariate extension, as well as to most other complicating data situations. Another possible category of methods for

37 estimating parameters and statistics of correlated multivariate data is Markov

Chain Monte Carlo (MCMC). This type of approach to estimating data is well- documented, but no literature was found where it had been compared to any of the other methods that are commonly used for censored environmental and toxicologi- cal data, so its relative performance in these situations is unknown. Both EM and

MCMC will be explored in detail in this dissertation. EM and MCMC methods will be developed for use on multivariate censored data distributions with possible correlation. These methods will be applicable to any underlying distribution that is transformable to a normal distribution.

38 CHAPTER 3

Gibbs Sampling Theory and Method Development

The …rst method developed here for multivariate censored distributions is a

Gibbs sampler. This is a speci…c type of Markov Chain Monte Carlo simulation used for estimating the posterior distribution in a problem of when this distribution involves an integral that cannot be evaluated analytically.

The Gibbs sampler iteratively makes random draws from the full conditional distri- butions of each unknown parameter or missing data point given all other parame- ters and data values. The resulting Markov chains then converge to approximate random draws from the joint target distribution of unknowns. In the case of a multivariate left-censored data set that has been transformed to a multivariate normal distribution, the unknowns are the mean vector , the variance- matrix , and the censored data values xij where xij < DL. Estimates of the unknowns are averages over their simulated states in the chain.

3.1. Bayesian Inference

3.1.1. General Theory

In Bayesian statistics, all inference about unknown parameters comes from the posterior distribution, which is the joint distribution of the unknown parameters

39 given the data, p( x). This distribution is found from a prior distribution p(), j which formally expresses any beliefs about the distribution of the unknown para- meters before collecting data, and the likelihood p(x ) of the data x. In order j to calculate the posterior distribution, we can …rst express the joint distribution of the data and the parameters in two di¤erent ways, both using Bayes Rule:

(3.1) p(; x) = p()p(x ) j and

(3.2) p(; x) = p(x)p( x) j

Combining these two expressions gives

p()p(x ) (3.3) p( x) = j j p(x) where p(x) is the marginal distribution of x:

(3.4) p(x) = p(; x)d = p()p(x )d j Z Z

Since p(x) is not dependent on , it can be viewed simply as a normalizing constant such that

(3.5) p( x) p()p(x ) j / j

40 and the unnormalized posterior density can be found from just the prior distribu- tion of the parameters p() and the likelihood function of the data p(x ). j Prior distributions p() can be chosen to best represent prior knowledge about

 or they can be chosen for mathematical convenience. If they are reasonable and include all plausible values of , the information contained in the data will typically have the greater in‡uence on the posterior distribution. In general, the mean of the the posterior distribution is somewhere between the mean of the prior distribution and the sample mean. The larger the sample size, the closer it will be to the sample mean. Additionally, the posterior variance of  will, on , be smaller than the prior variance.

A mathematically convenient type of prior distribution is a conjugate prior distribution. A conjugate family of distributions for a particular class of is one that results in a posterior distribution from the same family as the prior distribution. Conjugate families are convenient in that the form of the posterior distribution is known. Therefore, if the sampling distribution has a known conjugate form, as will all probability distributions of the exponential class, one might wish to choose this form with a mean and variance that best express any prior knowledge about the centering and spread of the possible parameter values.

Another convenient type of prior distribution, especially when there is little or no prior knowledge about , is a noninformative prior. A noninformative prior can be a very di¤use prior of some conventional form, but with very large variance, or it can be completely ‡at such that p() is proportional to a constant on the interval

41 ( ; ). In the case of the ‡at prior, it is not an actual probability density 1 1 because it does not integrate to one, and thus is termed improper. However, there are many cases where such an improper prior can be used to de…ne an unnormalized posterior density, as in 3.5 above, and will still result in a proper posterior density.

When used with care, these priors can be good choices for situations where no prior data exists.

3.1.2. Application to Multivariate Normal Data

For a d dimensional multivariate normal random variable x, the probability density function is

d=2 1 1 1 (3.6) f(x ; ) = (2)  2 exp (x )0  (x ) j j j f2 g where the parameters  are now the mean vector  and the .

The likelihood for an uncensored sample of size n from this distribution is then:

n nd=2 n 1 1 (3.7) L(;  x) = (2)  2 exp (x )0  (x ) 2 i i j j j f i=1 g X The multivariate normal distribution with unknown mean and variance has a conjugate joint prior density

((0+d)=2+1) 1 1 0 1 (3.8) p(; )  exp( tr(0 ) (  )0 (  )) / j j 2 2 0 0

42 which can be summarized as

(3.9) p(; ) = p( )p() j where

1 (3.10)  Inv W ishart ( )  0 0

(3.11)   N( ; =0) j  0

and the hyperparameters 0, 0, 0, and 0 are the degrees of freedom and the scale matrix for the inverse-Wishart distribution of , the prior mean, and the number of measurements on the  scale, respectively. Using this prior density with hyperparameters chosen from previous data would result in a posterior density of the same form. The new parameters in the posterior density are a function of both the prior parameters and the data as follows:

0 n (3.12) n = 0 + x 0 + n 0 + n

(3.13) n = 0 + n

(3.14) n = 0 + n

43 0n (3.15) n = 0 + S + (x 0)(x 0)0 0 + n

In the case where there is no prior data, the noninformative multivariate Jef- freys prior density,

(d+1)=2 (3.16) p(; )  ; / j j can be used instead of the conjugate prior. This prior is the limiting density of the conjugate prior above as 0 0, 0 1, and 0 0: It results in ! ! j j ! the posterior density form of 3.9 with

(3.17)  Inv W ishartn 1(S) 

(3.18)   N(x; =n) j  where this posterior is dependent only on the data. The d d matrix S is the  sum of squares about the sample mean:

n

(3.19) S = (xj x)(xj x)0 j=1 X (3.20) (xj x)0 = [(x1j x1); (x2j x2); :::; (xnj xn)]

This noninformative prior is used to develop a multivariate Gibbs Sampler method that can be used with any multivariate normal data.[11]

44 When the data set is censored, with some values xij < DL; i = 1; ::d; j =

1; 2; ::; n, the likelihood function becomes much more complicated, making it im- possible to directly compute the posterior distribution. For a bivariate normal distribution, the censored likelihood has the general form

(3.21) L(;  x) = f1(x) f2(x) f3(x) f4(x) j x1j ;x2j DL x1j

(3.22) f1(x) = f(x1; x2 ; ) j

(3.23) f2(x) = f(x2 < DL ; ;X1 = x1)f(x1) j

(3.24) f3(x) = f(x1 < DL ; ;X2 = x2)f(x2) j

(3.25) f4(x) = Pr(x1 < DL; x2 < DL)

Combining this likelihood with the multivariate Je¤reys prior density does not lead to a nice form for the joint posterior distribution p(;  x). However, the j Gibbs sampler provides a way to sample from this posterior distribution when all full conditional probabilities are known, as will be shown below.

3.2. Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) methods are algorithms that produce a random sample from a by setting up a Markov chain with

45 this distribution as its stationary or target distribution. Once the chain reaches a steady-state, future states are used as the random sample. The length of the transitional period required to reach steady-state, commonly called the burn-in period, will depend on the starting values of the unknowns.

In the Bayesian paradigm, inference about the parameters of a distribution or functions of these parameters, are calculated as expectations under the posterior distribution p( x). For example, if we are interested in the expected value of j the function f(), we calculate:

(3.26) E[f()] = f()p( x)d j Z

Often, this function is analytically intractable, but MCMC provides a way to ap- proximate expectations with simulations. This requires construction of an ergodic chain in  with stationary distribution p( x). Beginning with an initial state j (0), N states are simulated, each dependent only on the state immediately prior.

Under these conditions, the average value of the desired function over the N states approaches the desired expectation as N , so that we may use ! 1 1 N (3.27) f()p( x)d f((i)) N j  i=1 Z X as an approximation of E[f()].

46 There exists a Markov chain that says

(3.28) pN(^ ) D N(0; 2) as N , N ! ! 1

where  is the expectation of interest, as in 3.26, and ^N is the approximation in 3.27. The variance 2 of the limiting distribution is

1 (3.29) 2 = var[f((i))] + 2 cov f((i)); f((i+k)) f g k=1 X which can also be approximated more easily from the simulated states of the pos- terior distribution. If the sample size n is large, then

(3.30) pn(^ ) N(0; 2) n  and 2 can be approximated by n(^ )2. If the number of simulated states n N is also large relative to the sample size, such that 1 << n << N, then 2 is approximately n(^ ^ )2 and we can use the estimate n N

1 N (3.31) 2 n(^ ^ )2 N n N  i=1 X to approximate the variance of the expectation. In this case, ^n is the estimate for a sample of size n and ^N is the average of the estimates over N steps.[12]

47 3.3. Gibbs Sampler Algorithm

3.3.1. General Methodology

The Gibbs sampler is a MCMC method that iteratively draws values of each un- known parameter conditional on all others. Each iteration cycles through the unknowns, drawing one value for each and conditioning on the most recent value for all remaining unknowns. For example, if the target distribution is the joint posterior distribution p(1; 2; 3 x), then the full conditionals are: j

(3.32) p(1 2; 3; x) j

(3.33) p(2 1; 3; x) j

(3.34) p(3 1; 2; x) j

The parameter values generated in N iterations of the Gibbs sampler constitute a sample of size N from the joint posterior distribution. The algorithm proceeds as follows:

Begin with some initial values (0), (0), (0)  1 2 3 For i = 1; 2; :::; N:  (i) (i 1) (i 1) –Draw  from p(1  ;  ; x) 1 j 2 3 (i) (i) (i 1) –Draw  from p(2  ;  ; x) 2 j 1 3 (i) (i) (i) –Draw  from p(3  ;  ; x) 3 j 1 2

48 In practice, the full conditional distributions for the unknowns are often known even when the joint posterior distribution has an unconventional or messy form.

Another advantage of the method is that any missing or censored data point can be treated as an unknown in the same way as unknown parameters. For example, an additional unknown, a missing data value xij, added to the previous example, will have the full conditional distribution:

(3.35) p(xi 1; 2; 3; xik) k = j j 6

Further, if it is known that xij < DL, the full conditional becomes:

(3.36) p(xi 1; 2; 3; xik; xi < DL) k = j j 6

3.3.2. Gibbs Sampling from the Posterior Distribution of a Multivariate

Normal Censored Data Set

This method will be speci…cally developed for a d dimensional multivariate normal data set with left censoring at the detection limit, DL. It will then be tested for simulated bivariate data, but the development is not dependent on a distribution with only two variables. Minor changes are needed for the cases of right censoring or interval censoring, but these can easily be accommodated with the same basic theory.

The …rst step in developing a Gibbs sampling algorithm is to identify the un- knowns. We are assuming that both the mean and variance of the data set are

49 unknown and also that some of the observations are censored such that it is only known that they are below the detection limit. Thus in the bivariate case, we have the following unknowns:

1 (3.37)  = 2 3 2 6 7 4 5 2 1 12 (3.38)  = 2 3 2 12 2 6 7 4 5 (3.39) xij; i = 1; 2 j = 1; 2; ::; n s:t: xij < DL 8

The next step is to determine the full conditionals for each unknown. For  and  this is very straightforward. If we assume a multivariate Je¤reys prior, as in 3.16, then the full conditionals will simply be the posterior distribution of 3.17 and 3.18, since they are conditional on the data x. For the censored data values, we need to derive the full conditional distributions

(3.40)

f(xij ; ; xkl; xij < DL); where i; k = 1; 2; ::; d j; l = 1; 2; ::; n s:t: kl = ij j 6 separately for each censored value. We will proceed by conditioning on successively more of the parameters and data.

The …rst step in deriving the full conditional distributions of 3.40 is to consider that the distribution conditioned on just the mean and variance is simply the mar-

2 ginal distribution of xi, which is distributed N(1; 1). Adding the conditioning

50 on the remaining data values, xkl, is equivalent to conditioning only on the other variable values at the same data point such that xkl = xkj. For a multivariate normal distribution, this again results in a univariate normal conditional distribu- tion. Without loss of generality, assume that  and  are ordered such that the variable of interest is x1. Now partition the parameters such that

 1 1 1  (3.41)  = 2 3 ; with dimensions 2 3 ! 2 (d 1) 1 6 7 6  7 4 5 4 5

11 12 1 1 1 (d 1)   (3.42)  = 2 3 ; with dimensions 2 3 21 22 (d 1) 1 (d 1) (d 1) 6 7 6   7 4 5 4 5 2 2 where 11 = 1. Then conditional distribution f(x1j ; ; x2j) is N(1 2; 1 2) j j j with

1 (3.43) 1 2 = 1 + 1222 (!x 2 ! 2) j

2 1 (3.44) 1 2 = 11 1222 21 j which reduces to

(3.45) 1 2 = 1 + 1(x2 2) j

2 2 2 (3.46) 1 2 = 1(1  ) j

51 when the original data distribution is bivariate.

The …nal level of conditioning is the censoring of the variable. When the vari- able is conditioned on the fact that it comes from the left tail of the distribution, below the detection limit, the resulting distribution is a truncated normal distrib- ution, with the mean and variance as in 3.43 and 3.45. This is a right-truncated normal distribution with the right truncation at the detection limit. Draws from this truncated distribution can be simulated by a reverse cumulative distribution function (CDF) method.

3.3.3. Speci…c Method for Estimation of Distributional and Regression

Parameters from Multivariate Normal Data with Left Censoring

The Gibbs sampling method developed here can be used on any multivariate nor- mal data with left censoring of any or all variables at the same detection limit.

Future modi…cations could easily allow for multiple detection limits, either be- tween or within variables. It is assumed here that the original distribution was lognormal, which is only relevant in calculations for some of the comparisons to other methods, the simplest of which are limited to use on lognormal distributions or at least distributions with zero probability for x < 0. Comparisons of the method against other methods are performed on samples drawn from bivariate normal distributions, but the general method is not limited to the bivariate case.

The prior distribution used is the noninformative multivariate Je¤reys prior, as described previously.

52 The Gibbs sampler method and simulation procedure used for testing the method are implemented with an R program, included in the Appendix. The general procedure is as follows:

(1) Draw 100 samples of size n = 25 each from bivariate normal distributions 1  with  = (0; 0),  = 2 3, and every combination of  = 0;:1;:2; :::; :9  1  6 7 and proportion of censoring4 5 = :1;:2; :::; :5. The resulting x matrices will each be of dimension 25 2.  (2) Draw a random error vector of size 25 1 for each sample from a N(0; 1)  distribution. Use the vector to calculate y = xb + error.

(3) Calculate all desired statistics from the uncensored samples for reference.

(4) For each sample, determine a detection limit, DL, or censoring point,

that corresponds to the given combination of proportion censored and

correlation coe¢ cient, . Use the equicoordinate quantile function such

that the proportion of censoring includes data points for which either or

both of the variables are censored.

(5) Set up a status matrix of the same dimensions as x for each sample. For

every xij in the sample, if xij < DL, then status[i; j] = 1, otherwise

status[i; j] = 0.

53 (0) (6) For each xij < DL in the sample, replace xij with a starting value xij equal to half the equivalent censoring level in the lognormal scale. Specif-

(0) 1 ically, set xij = ln( 2 exp(DL)) for all xij such that status[i; j] = 1. Cal- culate all desired statistics from the resulting data for comparison.

(7) For each x < DL in the sample, replace x with a value equal to 1 ij ij p2 times the equivalent censoring level in the lognormal scale. Speci…cally,

set x = ln( 1 exp(DL)) for all x such that status[i; j] = 1. Calculate ij p2 ij all desired statistics from the resulting data for comparison.

(0) (8) Begin MCMC iterations with the xij values generated in item 5 above as the starting values for censored data points. Set the number of iterations

to desired value N plus desired number of burn-in iterations. Final values

chosen for this method with bivariate data and the designated starting

values are N = 1000 with 200 burn-in iterations for a total of 1200 MCMC

iterations. Within each MCMC iteration, numbered k = 1 to 1200:

n (a) Calculate x = (x1; x2) and S = (xj x)(xj x)0 from all un- j=1 (k 1) censored xij and the current xij Pvalues of censored data.

(k) (b) Use x and S to randomly draw a  matrix from Inv W ishartn 1(S), then a (k) vector from N(x; =n).

(k) (c) Update each censored data value to x1j as follows:

2 (k) (i) Calculate 1 2 and 1 2 from 3.45 using current values of  , j j (k) (k 1)  , and x2j .

54 (ii) Find the probability of x1j < DL given that x1j is distributed

2 DL 1 2 N( ;  ). This is equal to ( j ), where  is the 1 2 1 2 1 2 j j j standard normal cumulative distribution function.

(iii) Draw a random probability v from the uniform distribution

DL 1 2 1 U(0; ( j )). Then  (v) is the z-score of a random draw 1 2 j 2 from the left tail of N(1 2; 1 2), below DL. j j (k) 1 (iv) Calculate the updated value x1j = 1 2 + 1 2 (v) as a ran- j j dom draw from the censored tail of the current conditional dis-

tribution.

(k) (d) Update each censored data value to x2j as follows:

2 (k) (i) Calculate 2 1 and 2 1 as in 3.45 using current values of  , j j (k) (k)  , and x1j .

(ii) Find the probability of x2j < DL given that x2j is distributed

2 DL 2 1 N( ;  ). This is equal to ( j ), where  is the 2 1 2 1 2 1 j j j standard normal cumulative distribution function.

(iii) Draw a random probability v from the uniform distribution

DL 2 1 1 U(0; ( j )). Then  (v) is the z-score of a random draw 2 1 j 2 from the left tail of N(2 1; 2 1), below DL. j j (k) 1 (iv) Calculate the updated value x2j = 2 1 + 2 1 (v) as a ran- j j dom draw from the censored tail of the current conditional dis-

tribution.

55 (e) Calculate all statistics of interest using uncensored xij and the new

(k) th xij values of censored data. Save these values to the k level of their appropriately formatted array.

(f) If k < 1200, go back to step a and continue. If k = 1200, stop

and calculate the averages of the last 1000 values for each statistic

of interest. These averages are the estimates of the statistics for

the given data sample. Use these averages to calculate the sample

variance of each statistic from the last 1000 iterations. These are the

estimates of the variances of the estimators.

56 CHAPTER 4

Expectation Maximization Theory and Method

Development

An expectation maximization (EM) algorithm is an iterative method of …nding the maximum likelihood estimates for the parameters of a probability model in the presence of incomplete data. Both the probability model and the incomplete data can take a variety of forms. The probability model is commonly a proba- bility distribution or mixture of distributions. The incomplete data may include missing data of various types, censored data, or unknown latent variables such as the distribution from which a data value was generated in a mixture model.

Dempster, Laird, and Ruben developed the generalized methodology of expecta- tion maximization, as well as the associated convergence theory. Although speci…c instances of EM algorithms had previously been proposed by other authors under various names, their paper is considered the seminal treatise on the subject[9].

While EM refers to a general class of algorithm, the method developed here will be speci…cally for the case of incomplete data from a multivariate normal distribution that is censored in the left tail, as when data values are measured below a detection limit. It can also be used on any data set that has been transformed to this form, as in the case of lognormally distributed data. Since we

57 are assuming a lognormal underlying distribution, we would take the natural logs of each measurement before applying the method.

4.1. General Theory of Expectation Maximization

EM algorithms have some nice properties that make them an attractive choice as a computational tool. Often, a likelihood function takes on a form that has no closed form solution. For many of these functions, an EM algorithm can be devel- oped to solve for the parameters that maximize the likelihood. Convergence of an

EM algorithm is guaranteed in most cases, and in fact an EM iteration will never decrease the likelihood function. Under very general conditions, convergence is to the maximum likelihood estimator. Exceptions include multimodal distributions or distributions with a saddle point. Not only are the convergence properties very good for most models, but the rate of convergence is often very fast, producing good estimates in a relatively small number of iterations. Another bene…t of EM is that in addition to producing ML estimates of the parameters of interest, EM output often includes a complete data set or multiple imputations of complete data sets, for potential future analysis [9].

The basic process of an EM algorithm is to alternate between an expecta- tion step and a maximization step. In the expectation step, the expectation of the likelihood, or equivalently of the joint su¢ cient statistics of the likelihood, is calculated given the observed data and the current estimates of the unknown

58 distributional parameters. In the maximization step, new estimates of the para- meters are calculated from expectations of the joint su¢ cient statistics such that the likelihood function is maximized. After initial estimates are chosen for each unknown parameter, the algorithm iterates between these two steps until the pa- rameter estimates converge. For probability distributions from the , the maximization step is trivial since the complete-data log likelihood is linear in the expected complete-data su¢ cient statistics [9].

4.2. Expectation Maximization for Multivariate Normal Data with

Left Censoring

4.2.1. Expectation Step

The expectation step for multivariate normal data with left censoring involves calculating expected values for the joint su¢ cient statistics

n

(4.1) E[ xij] = xij + E[xij xij < DL]; i = 1; :::; d j j=1 xij DL xij

n

(4.2) E[ xijxlj] = xijxlj + xljE[xij xij < DL; xlj] j j=1 xij ;xlj DL xij

(4.3)+ xijE[xlj xlj < DL; xij] + E[xijxlj xij; xlj < DL] j j xij DL; xij ;xlj

59 where i; l = 1; ::; d. For uncensored data, the summations are straightforward, but conditional expectations must be calculated for the censored values. In order to simplify the calculations to expectations from a univariate distribution, and to simultaneously improve the estimates of correlated x values, each expectation

2 E[xij xij < DL] and E[x xij < DL] will …rst be conditioned on the known j ij j values of current estimates of all xlj; l = i: [34] 6 4.2.1.1. Expected Values for a Univariate Truncated Normal Distribu- tion. The censored data we consider here originates in the left tail of the lognormal distribution, somewhere below the detection limit. Thus, the natural log of this data lies in the left tail of the normal distribution, below the natural log of the detection limit. When we consider a truncated normal distribution, we tend to think of the distribution of the uncensored data. However, it is actually the distri- bution of the censored data in the left tail that we need to consider for estimation of the censored values.

In order to compute each E[xij xij < DL] in the expectation step, we must j

…rst compute the univariate normal distribution of xij conditioned on the current values of all other xlj values where l = 1; ::; d, l = i and using the most recent esti- 6 mates of all distributional parameters. So if we partition the current parameters

2  and  of a multivariate normal distribution such that 1 and 11 = 1 are the

60 mean and variance of xlj, we can de…ne:

 1 1 1  (4.4)  = 2 3 , with dimensions 2 3 2 (d 1) 1 6 7 6  7 4 5 4 5 and

11 12 1 1 1 (d 1)   (4.5)  = 2 3 , with dimensions 2 3 21 22 (d 1) 1 (d 1) (d 1) 6 7 6   7 4 5 4 5 Then the conditional density of xij is the univariate normal density

(4.6) xij xlj l = i N( ; c) j 8 6  c with

1 (4.7)  =  + 12 (xlj  ) c 1 21 2 and

1 (4.8) c = 11 12 21 , 22

where c and c denote the conditional mean and variance. Next, we must condition further on the censoring xij < DL, where DL is the natural log of the detection limit in the original scale. The resulting distribution is a truncated normal distribution, with truncation at DL on the upper end.

61 To simplify calculations, we will begin by de…ning the standard normal distri- bution. The density function of this distribution is

1 1 x2 (4.9) '(x) = e 2 , < x < , p2 1 1 and the cumulative distribution function is

x (4.10) (x) = '(t)dt, < x < . 1 1 Z1

Using these de…nitions for '(x) and (x), the density function of a normal distri- bution with mean  and standard deviation  can be expressed as

1 x  (4.11) f(x) = '( )2; < x <   1 1 and the density function of a truncated normal distribution with mean  and standard deviation  can be expressed as

1 x   '(  ) (4.12) f(x) = b  a  ; a < x < b. ( ) ( )   Since we are interested in the distribution of the left tail, below the detection limit, we have the limits a = and b = DL, resulting in a left tail distribution of 1

1 x   '(  ) (4.13) f(x) = DL  , < x < DL. (  ) 1

62 If we calculate the conditional parameters c and c for a variable from a multivari-

(k 1) (k 1) ate normal distribution with  = ^ and  = ^ , where k is the number of the current EM iteration, then use these c and c values for  and  in 4.13, then we will have the desired distribution for xij conditioned on xlj l = i; xij < DL; 8 6 (k 1) (k 1)  ; and  .

Now that we have the distribution of interest, that of the censored left tail, we wish to …nd the expected value of a data point from this distribution. It is this expected value which will be our new best guess for the value of the censored data point. In order to calculate this expected value, we must …rst …nd the moment generating function, M(t), from 4.13. This is computed as follows

DL 1 x  tx tx  '(  ) M(t) = E[e x < DL] = e DL  dx j ( ) Z1 

DL 1 1 1 x  2 tx 2 (  ) = DL  e dx ( ) p2  Z1 DL 1 1 1 (x2 2(+2t)x+2) 22 = DL  e ( ) p2  Z1 DL 1 1 1 [x (+2t)]2 (+2t)2+2 22 = DL  e f g ( ) p2  Z1 1 2 2 2 2 [ (+ t) ] DL e 2 1 1 [x (+2t)]2 22 = DL  e ( ) p2  Z1 2 2 t+  t DL 2 e 2 1 x ( +  t) = DL  '( )dx ( )    Z1

63 DL  2t2 ( t) t+ 2  (4.14) = e [ DL  ] (  )

Using the moment generating function in 4.14, we then calculate the expectation of a censored x value that comes from this left tail distribution. The resulting expectation is

E[x x < DL] = M(t) t=0 j j 2t2 DL  t+ 2 2 2 e DL  2 t+  t (  t) = '( t) + ( +  t)e 2 [ ] DL   DL  t=0 (  ) (  ) j

DL  '(  ) (4.15) =   DL  (  )

Similarly, the moment generating function can be used to calculate

2 (4.16) E[x x < DL] = M 00(t) t=0 j j DLl  DL  DL  2 2 2 (  )'(  ) '(  ) (4.17) =  +  +  DL  2 DL  (  ) (  ) and

(4.18) V ar(x x < DL) = E[x2 x < DL] (E[x x < DL])2 j j j DLl  DL  DL  2 (  )'(  ) '(  ) 2 (4.19) =  1 DL  [ DL  ] ; f (  ) (  ) g each of which is used in the expectation step of the algorithm [25].

64 4.2.1.2. Expectations of Variability Statistics. The su¢ cient statistics for

n the variance-covariance matrix of a multivariate normal distribution are j=1 xijxlj, i; l = 1; ::d. Thus in the expectation step we need to calculate P

(4.20) n

E[ xijxlj] = xijxlj+ xljE[xij]+ xijE[xlj]+ E[xijxlj]

j=1 xij ;xlj DL xij

2 case where i = l, 4.16 can be used to calculate E[xi xi < DL]. But when i = l, j 6 the univariate distribution cannot be used to …nd the estimated value. Instead, we observe that

(4.21) E[(xi  )(xl  )] = E[xixl]   = il i l i l from which we calculate the expectation

(k 1) (4.22) Ek[xixl] = il +  il

th at the k iteration. For an individual point j, Ek[xijxlj] is calculated from 4.22 by using the the current values of Ek[xij] and Ek[xlj] for i and l, the estimate of

(k 1)  from the maximization step in the previous iteration, and the roots of

65 the current values of V ark(xij xij < DL) and V ark(xlj xlj < DL) as calculated j j in 4.18.

4.2.2. Maximization Step

Once the su¢ cient statistics have been calculated in the expectation step, the maximization step is fairly trivial. All that is required to update the estimates of

2 2 ^i and ^i ; i = 1; :::; d, is to divide the expectations of xi and xi , respectively,

2 by the sample size n to get current values for Ek[xi] andPEk[xi ].P Then the updated estimates are:

(k) (4.23) ^i = Ek[xi] and

2(k) 2 2 (4.24) ^ = Ek[x ] (Ek[xi]) i i

The estimate of each correlation coe¢ cient ij is then estimated from these new ^i

2 n and ^i values and the E[ j=1 xijxlj] that was calculated in the expectation step. We calculate this as: P

n (k) (k) Ek[ xijxlj]=n   (k) j=1 i l (4.25) il = (k) (k) P i l

66 4.2.3. Expectation Maximization Algorithm with Multiple Imputation

Estimation of Variance

The expectation maximization method developed here can be used on any multi- variate normal data with left censoring of any or all variables at the same detection limit, but the R program to implement the method is currently limited to the bi- variate case. As with the MCMC method, future modi…cations could easily allow for multiple detection limits, either between or within variables. It is once again assumed that the original distribution was lognormal, which is only relevant in calculations for some of the comparisons to other methods, the simplest of which are limited to use on lognormal distributions or at least distributions with zero probability for x < 0. Comparisons of the method against other methods are performed on samples drawn from bivariate normal distributions.

The general procedure is as follows:

(1) Draw 100 samples of size n = 25 each from bivariate normal distributions 1  with  = (0; 0),  = 2 3, and every combination of  = 0;:1;:2; :::; :9  1  6 7 and proportion of censoring4 5 = :1;:2; :::; :5. The resulting x matrices will each be of dimension 25 2.  (2) Draw a random error vector of size 25 1 for each sample from a N(0; 1)  distribution. Use the vector to calculate y = xb + error.

(3) Calculate all desired statistics from the uncensored samples for reference.

67 (4) For each sample, determine a detection limit, DL, or censoring point,

that corresponds to the given combination of proportion censored and

correlation coe¢ cient, . Use the equicoordinate quantile function such

that the proportion of censoring includes data points for which either or

both of the variables are censored.

(5) Set up a status matrix of the same dimensions as x for each sample. For

every xij in the sample, if xij < DL, then status[i; j] = 1, otherwise

status[i; j] = 0.

(0) (6) For each xij < DL in the sample, replace xij with a starting value xij equal to half the equivalent censoring level in the lognormal scale. Specif-

(0) 1 ically, set xij = ln( 2 exp(DL)) for all xij such that status[i; j] = 1. Cal- culate all desired statistics from the resulting data for comparison.

(7) For each x < DL in the sample, replace x with a value equal to 1 ij ij p2 times the equivalent censoring level in the lognormal scale. Speci…cally,

set x = ln( 1 exp(DL)) for all x such that status[i; j] = 1. Calculate ij p2 ij all desired statistics from the resulting data for comparison.

(0) (8) Begin EM iterations with the xij values generated in item 5 above as the starting values for censored data points. Set the number of iterations to

desired value N. The …nal value chosen for this method with bivariate

data and the designated starting values was N = 10 iterations. Results

were compared to those calculated in N = 100 iterations, with no dis-

cernible di¤erence. Use the sample mean and sample variance calculated

68 (0) from the full starting data set, including the starting xij values, as the starting values for  and 2. Use actual multiples of all starting val-

(0) (0) ues x1j x2j , whether censored or not, as starting values for each x1jx2j. Within each EM iteration, numbered k = 1 to 10:

(k) (a) Update the expectation of each censored data value x1j and it’scor- 2(k) responding square x1j and variance as follows:

2 (k 1) (i) Calculate 1 2 and 1 2 from 3.45 using current values of  , j j (k 1) (k 1)  , and x2j .

2 (ii) Calculate the expectations of x1j and x1j and the corresponding

variance of x1j given that x1j < DL and x1j is distributed

2 N(1 2; 1 2). j j (k) (b) Update the expectation of each censored data value x2j and it’scor- 2(k) responding square x2j and variance as follows:

2 (k 1) (i) Calculate 2 1 and 2 1 as in 3.45 using current values of  , j j (k 1) (k)  , and x1j .

2 (ii) Calculate the expectations of x2j and x2j and the corresponding

variance of x2j given that x2j < DL and x2j is distributed

2 N(2 1; 2 1). j j

(c) Update the expectations of each x1jx2j where one or both variables

is censored.

(i) Where just one variable is censored, multiply the current values

(k) (k) x1j and x2j together.

69 (ii) Where both variables are censored, use 4.22 to calculate the

expected value.

(d) Update estimates of 1 ; 2 with 4.23.

2 2 (e) Update estimates of 1 and 2 with 4.24. (f) Update estimate of  with 4.25.

(g) If k < 10, go back to step a and continue. If k = 10, stop. Calculate

regression parameters from …nal estimates of the data. The most

(10) 2(10) (10) recent updates, i , i , and  are the …nal estimates of these parameters for the given data sample.

70 CHAPTER 5

Implementation and Analysis of Methods

5.1. Assumptions and Data Transformation

The data used for testing both the MCMC and EM methods developed here is 1  simulated from known bivariate normal distributions with  = (0; 0),  = 2 3,  1 6 7 and varying levels of censoring and correlation. Censoring is varied from 04to 50%5, de…ned such that the percent censored is the percent of the n data points for which either x1, x2, or both x1 and x2 are censored. The censoring point is equivariate, meaning that it is the same value for both x1 and x2. Censoring of data is achieved by calculating the censoring point for each distribution, simulating data from the distribution, then keeping track of the values that fall below the censoring limit with a status matrix. Each method is then successively applied to only those data values having censored status and the corresponding data values are updated with estimates provided by the method. The correlation coe¢ cient  is also varied from

0 to :9 at each level of censoring. Data sets are all of size n = 25 and results are  averaged over 100 data sets for each distribution.

Although the data used here is simulated from bivariate normal distributions, and both the MCMC and EM methods can be used on data from any distribution

71 that can be transformed to a multivariate normal distribution, some of the com- parisons here assume that the underlying data distribution is lognormal. This is the most common type of underlying distribution used to model environmental and toxicology data, thus both of the simple substitution methods that are evalu- ated here against the new methods make sense only with a lognormal underlying distribution, or at least an underlying distribution with density f(x) = 0 for all x < 0. The lognormal assumption is important in the way that the simple sub- stitution methods are applied to the simulated data and in some of the parameter estimates that are compared in the results. In order to apply the 1 DL or 1 DL 2 p2 methods to normal data, the censoring point is …rst found in the normal scale, then transformed back to log scale by taking its exponent, multiplied by the appropriate factor of 1 or 1 , and then transformed back to normal scale by taking the natural 2 p2 log.

For comparisons of regression parameter estimates, a vector y is generated for each data set from the relationship

(5.1) y = 0 + 1x1 + 2x2 + " using = (1; 0:75; 0:25)T and " N(0; 1). The uncensored data is used to  calculate the y vector, then censoring is applied to the x matrix, and censored xij values are replaced successively by each method. Regressions are performed on each resulting complete set of data.

72 5.2. Evaluation of Methods

Four methods are compared here for calculating all distributional and regression parameters from left-censored bivariate data originating from lognormal distribu- tions. These methods include the new MCMC and EM methods, and the simple

1 substitution methods of replacing each censored xij value with either 2 DL or 1 DL. The four methods are also compared with results obtained by computing p2 all of the same parameters on the data prior to censoring.

The statistics computed for each parameter estimate and used to compare and evaluate the various methods are bias and mean-squared error (MSE). Biases are calculated for each distribution, as de…ned by the correlation coe¢ cient and level of censoring, as

1 100 (5.2) Bias(^) = ^s 0 0 100 s=1 X  

1 100 1  (5.3) Bias(^) = ^ s 100 2 3 s=1  1 X 6 7 4 5 and

100 ^ 1 ^ (5.4) Bias( ) = s 1:00 0:75 0:25 100 s=1 X   where s is the number of the simulated sample for the particular distribution and the distribution itself is distinguished by the combination of percent censoring and

73 correlation coe¢ cient . Similarly, MSEs are calculated for each distribution from:

1 100 (5.5) MSE(^ ) = (^ 0)2 i = 1; 2 i 100 is s=1 X

1 100 (5.6) MSE(2) = (2 1)2 i = 1; 2 i 100 is s=1 X c c

1 100 (5.7) MSE(^) = (^ )2 100 s s=1 X and

1 100 (5.8) MSE( ^ ) = ( ^ )2 i = 0; 1; 2 i 100 is i s=1 X For each level of correlation, the bias and MSE averages are plotted as a function of percent censoring.

In addition to bias and MSE values for each of these distributional parameters and regression parameters, the bias and MSE of the mean and variance in the original scale are also calculated. Since we have assumed an underlying data distribution that is lognormal, we calculate estimates of the mean and variance of this distribution from our parameter estimates using the known relationships for a lognormal distribution:

2 (5.9)  = exp( + ) LN 2

74 and

(5.10) 2 = (exp(2) 1) exp(2 + 2) LN

The true values for the lognormal mean and variance are similarly calculated from

2 2 the true values i = 0 and i = 1, i = 1; :::; d, resulting in LN = 1 and LN = 4:67. These values are then used as in equations 5.2, 5.3, 5.5, and 5.6 to determine the biases and MSEs of original scale estimates.

5.3. R Program

Both the MCMC and EM methods developed here were implemented with a program written in the free shareware language R. All methods were applied to each randomly generated data set and results were averaged over the 100 data sets per combination of correlation coe¢ cient and proportion of censoring. The code for this program is included in the appendix. The ‡ow of the program was designed such that the methods were each applied to the exact same set of 100 data sets rather than to a new set of 100 data sets for each method. The MCMC and EM methods were also intertwined such that out of 1200 total MCMC iterations, the

…rst 100 were actually the EM method iterates. Since the …rst 200 are discarded in the MCMC method as burn-in, these EM results are not averaged into any MCMC calculations and only serve as starting values for the Markov chains. Similarly,

DL starting values for the EM method were chosen as the 2 method estimates for

75 each xij < DL, resulting in corresponding sample means and variances as starting estimates of  and 2.

76 CHAPTER 6

Results

The results of this research are two new working methods for analyzing mul- tivariate left-censored data. Both methods are implemented by a code written in R language and attached in the Appendix. Comparisons are made between these methods and two simple substitution methods with respect to their abil- ity to estimate bivariate normal distributional parameters, regression parameters, and the distributional parameters of an assumed underlying lognormal distribu- tion. While the MCMC method performs well in many cases, the EM method clearly outperforms all other methods with extremely high consistency over all conditions tested.

The two new methods for handling censored data were tested on randomly gen- erated bivariate normal data with standard normal marginal densities and varying levels of correlation and censoring. Parameter estimates were calculated for each sample before arti…cially censoring the sample data. The two simple substitution methods of replacing censored values by either DL=2 or DL=p2 were also applied to the same data. Estimates of the distributional parameters were calculated by the MCMC and EM methods and from the same data with each of the substitution methods applied. The bias and MSE for each parameter estimate were calculated

77 for 100 samples of size n = 25. These were used as the criteria for comparing the four methods and estimates from the uncensored data. While all methods performed fairly well in many cases, the EM method was the most consistently unbiased while simultaneously having the most consistently low mean-squared er- rors.

6.1. Mean, Variance, and Correlation Coe¢ cient in Normal Scale

The distributional parameters estimated by each method are means 1 and 2, standard deviations 1 and 2, and correlation coe¢ cient . Since both variables are assumed to have standard normal marginal distributions, the biases and MSEs of ^1 and ^2 are averaged for plotting purposes, as are the biases and MSEs of

^1 and ^2. While all estimates are calculated for bivariate normal distributions with  = 0; 0:1; 0:2; :::; 0:9; plots will be shown only for  = 0:1; 0:5; 0:9 as a   representative sample of positive and negative low, medium and high correlation.

Note that the plots were …t to the data and as such do not have consistent scaling.

Since the marginals of each xi are distributed N(0; 1), the uncensored xi for each

1 sample of size 25 are distributed N(0; 25 ) and the mean ^i over 100 samples is

1 1 distributed N(0; 2500 ). Thus the average of ^1 and ^2 is distributed N(0; 5000 (1 + )). The 95% con…dence intervals for  = 0:1; 0:5; 0:9 are shown in Table 6.1.  This band is not plotted in the interest of cleaner graphs, but should be considered in evaluating the di¤erences between methods.

78 2   for mean of ^i 95% Con…dence Interval 0:1 0:00022 0:0291 0:5 0:00030 0:0339 0:9 0:00038 0:0382 0:1 0:00018 0:0263 0:5 0:00010 0:0196 0:9 0:00002 0:0088 Table 6.1. Con…dence Intervals for Bias Plots of Mean Estimates

The estimates of mean for the simple substitution methods are simply the sam- ple means calculated from the complete X matrices that result from replacing all censored values by either DL=2 or DL=p2. Similarly, the estimates of mean for the EM method are sample means of the complete X matrix that exists after 10 iterations of updates to the expected values of the censored values. MCMC esti- mates are calculated as sample means as well, but the calculations are performed at each iteration and the …nal estimates are the averages of these sample means over the last 1000 iterations per sample. Finally, uncensored sample means are calculated for comparison to the four methods. From the plots, we see that the

EM method most consistently follows the uncensored data with respect to both bias and MSE. The DL=p2 substitution method has consistently higher bias than the other methods which was expected since this was the wrong choice of substi- tution method for the standard normal distribution. The MCMC method does pretty well in most cases up to about 40% censoring, but then diverges with both higher biases and MSEs, particularly in the cases with higher levels of positive correlation.

79 Figure 6.1. Biases in Mean Estimates when  = 0:1

Figure 6.2. MSEs of Mean Estimates when  = 0:1

80 Figure 6.3. Biases in Mean Estimates when  = 0:5

Figure 6.4. MSEs of Mean Estimates when  = 0:5

81 Figure 6.5. Biases in Mean Estimates when  = 0:9

Figure 6.6. MSEs of Mean Estimates when  = 0:9

82 Figure 6.7. Biases in Mean Estimates when  = 0:1

Figure 6.8. MSEs of Mean Estimates when  = 0:1

83 Figure 6.9. Biases in Mean Estimates when  = 0:5

Figure 6.10. MSEs of Mean Estimates when  = 0:5

84 Figure 6.11. Biases in Mean Estimates when  = 0:9

Figure 6.12. MSEs of Mean Estimates when  = 0:9

85 Standard deviation estimates are calculated for each method as the of sample variances. As with the sample means, the sample variances for the simple substitution methods are calculated from the complete X matrices, in which all of the censored data values have been replaced by DL=2 or DL=p2.

Similarly, the same calculations are performed on the uncensored X matrix and on the X matrices drawn in each iteration of the MCMC method, with averaging over the last 1000 iterations. All of these sample variances are calculated with the cov() function in R, which uses n 1 in the denominator. Only the EM method is slightly di¤erent in that it iteratively recalculates maximum likelihood estimators, and thus uses n in the denominator rather than n 1. Further, the estimates are not calculated from the X matrix alone, but also from the …nal vectors of an X2 matrix as

n 2 n E10[x ] E10[xij] (6.1) ^2 = j=1 ij ( j=1 )2 i n n P P

2 where the E10[xij] values are the values in the …nal X matrix and the E10[xij] values are the values in the …nal X2 matrix. If desired, the …nal variance estimate

n calculated by the EM method could be multiplied by n 1 , but this was not done in the results shown here for simulated data and would not have improved the results obtained from the EM method.

From the plots of bias and MSE, it is clear that the EM method is once again the winner when it comes to estimating standard deviations. It consistently

86 follows the uncensored data very closely. The substitution of DL=p2 for censored values consistently underestimates the standard deviation, with the negative bias increasing as a function of the level of censoring. The MCMC method once again has higher MSEs than the other methods at higher levels of censoring.

87 Figure 6.13. Biases in Standard Deviation Estimates when  = 0:1

Figure 6.14. MSEs of Standard Deviation Estimates when  = 0:1

88 Figure 6.15. Biases in Standard Deviation Estimates when  = 0:5

Figure 6.16. MSEs of Standard Deviation Estimates when  = 0:5

89 Figure 6.17. Biases in Standard Deviation Estimates when  = 0:9

Figure 6.18. MSEs of Standard Deviation Estimates when  = 0:9

90 Figure 6.19. Biases in Standard Deviation Estimates when  = 0:1

Figure 6.20. MSEs of Standard Deviation Estimates when  = 0:1

91 Figure 6.21. Biases in Standard Deviation Estimates when  = 0:5

Figure 6.22. MSEs of Standard Deviation Estimates when  = 0:5

92 Figure 6.23. Biases in Standard Deviation Estimates when  = 0:9

Figure 6.24. MSEs of Standard Deviation Estimates when  = 0:9

93 Correlation coe¢ cient estimates are calculated for each method except EM exactly as with standard deviation estimates except that the R function of cor() is used in place of cov(). The correlation coe¢ cient estimate from the EM method is

n E10[xijxlj]=n ^ ^ (6.2) ^ = j=1 i l il ^ ^ P i l

2 where the E10[xijxlj] values come from the …nal X matrix. As can be seen in the plots, all methods were very similar in predicting correlation coe¢ cient when the correlation was very low, but at higher correlations, the methods diverged and both the MCMC and EM continued to perform well, while the performance of both simple substitution methods dropped o¤ substantially. The simple substitu- tion methods underestimated the level of correlation increasingly with increasing correlation and with increasing censoring within the more highly correlated dis- tributions. This is to be expected since both the MCMC and EM methods take the estimated correlation into account when predicting censored values, while the simple substitution methods do not. It is clear then that if predicting correlation among variables is of interest, simple substitution methods should not be used.

94 Figure 6.25. Biases in Estimates of  when  = 0:1

Figure 6.26. MSEs of Estimates of  when  = 0:1

95 Figure 6.27. Biases in Estimates of  when  = 0:5

Figure 6.28. MSEs of Estimates of  when  = 0:5

96 Figure 6.29. Biases in Estimates of  when  = 0:9

Figure 6.30. MSEs of Estimates of  when  = 0:9

97 Figure 6.31. Biases in Estimates of  when  = 0:1

Figure 6.32. MSEs of Estimates of  when  = 0:1

98 Figure 6.33. Biases in Estimates of  when  = 0:5

Figure 6.34. MSEs of Estimates of  when  = 0:5

99 Figure 6.35. Biases in Estimates of  when  = 0:9

Figure 6.36. MSEs of Estimates of  when  = 0:9

100 6.2. Regression Parameters

The y vector that was generated from the X matrix before censoring, according to the relation y = 0 + 1x1 + 2x2 + " was then regressed against the censored X matrix after each method was applied. In all cases, the …nal X matrix was used except with the MCMC method, in which case the regression was performed at each step and the vectors averaged over the last 1000 steps. The vector used to generate the y vector was 1:0 0:75 0:25 .   In addition to the methods that are compared with respect to distributional parameters, the method of dropping censored data values is also used for regression comparisons. It did not perform badly overall, but was more erratic with respect to following the bias of the uncensored data. This method is probably more consistent in the case of a single independent variable and should not be used for the multivariate case. The simple substitution method of replacing censored data with DL=p2 is less consistent than using DL=2, as predicted in Chapter 2 for standard normal distributions. The DL=p2 method underestimates the intercept and overestimate the slopes relative to the other methods. The remaining methods all performed well in most cases with respect to bias, but the MCMC method had much higher MSE than the other methods, particularly at higher levels of censoring. This is not surprising due to the fact that it is designed to capture the cumulative random variation contributed by every unknown in the problem.

The EM method is the best performer overall with respect to bias, MSE, and

101 consistency over all levels of correlation and censoring. Plots of the bias and MSE of each i are shown only for distributions with  = 5.

102 Figure 6.37. Biases in Estimates of Intercept 0 when  = 0:5

Figure 6.38. MSEs of Estimates of Intercept 0 when  = 0:5

103 Figure 6.39. Biases in Estimates of Slope 1 when  = 0:5

Figure 6.40. MSEs of Estimates of Slope 1 when  = 0:5

104 Figure 6.41. Biases in Estimates of Slope 2 when  = 0:5

Figure 6.42. MSEs of Estimates of Slope 2 when  = 0:5

105 6.3. Comparisons in Original Lognormal Scale

Comparisons of mean and variance estimates for the variables x1 and x2 are also compared in the original scale since the lognormal mean and variance are known functions of the mean  and variance 2 in the normal scale. For each method, the estimates of lognormal mean and variance are calculated from

^2 (6.3) ^ = exp(^ + ) LN 2 and

(6.4) ^2 = exp(^2 1) exp(2^ + ^2) LN where ^ and ^2 are the mean and variance estimates calculated by a particular method in the normal scale. Only plots of  = 0:5 are shown since these are  simply functions of parameters that have already been compared. The point here is to show how bias and MSE increase when functions of the parameters are converted back to the lognormal scale with the exponential formulas above. This is especially true in the case of lognormal bias which is the product of two exponential functions. It can be seen by the huge jumps in the scale of the lognormal bias plots and huge additional jumps in the scale of the lognormal MSE plots. The lognormal variance estimate from the MCMC method, with its inherently higher variability, is particularly vulnerable to blowing up with high levels of censoring.

To see exactly how this happens, we examine the changes in lognormal parameters

106 2 2    LN LN 0 1 1 1.6 4.7 0b 2b 4b 7.4b 2926.4b 1 1 1 4.5 34.5 1 2 4 20.1 2.2E+04 2 1 1 12.2 255.0 2 2 4 54.6 1.6E+05 Table 6.2. E¤ects of Bias on Lognormal Parameters for increasingly biased estimates of  and  in 6.2. It can be seen here that bias in the mean estimate alone does not have such a large e¤ect, but bias in the estimate of standard deviation is magni…ed enormously. This is seen in the plots below when the bias of the lognormal variance estimate from the MCMC method jumps substantially higher at 50% censoring than the bias in  did, and the MSE of this estimate blows up beyond the scale of the graphs.

6.4. Results for Application of Methods to Lower Variance

Distribution

Although a full analysis was only done for bivariate normal distributions with

1 = 2 = 1, a few comparisons are presented here for the case when 1 = 2 = 0:5 and  = 0:5 to verify that the performance of the simple substitution methods is dependent on the underlying distribution. As predicted in Chapter 2 calculations, the substitution of 1 DL is better than the substitution of 1 DL for this lower p2 2 variance distribution. So while the performance of the new methods is similar when the variance of the variables changes, the performance of the simple substitution

107 Figure 6.43. Biases in Estimates of Lognormal Mean when  = 0:5

Figure 6.44. MSEs of Estimates of Lognormal Mean when  = 0:5

108 Figure 6.45. Biases in Estimates of Lognormal Mean when  = 0:5

Figure 6.46. MSEs of Estimates of Lognormal Mean when  = 0:5

109 Figure 6.47. Biases in Estimates of Lognormal Variance when  = 0:5

Figure 6.48. MSEs of Estimates of Lognormal Variance when  = 0:5

110 Figure 6.49. Biases in Estimates of Lognormal Variance when  = 0:5

Figure 6.50. MSEs of Estimates of Lognormal Variance when  = 0:5

111 methods changes dramatically. Since 1 DL is larger than 1 DL, when 1 DL is a p2 2 p2 1 good estimate for x < DL, 2 DL will naturally underestimate these x values and when 1 DL is a good estimate, 1 DL will overestimate them. It is also possible 2 p2 that for some distributions, neither simple substitution will be a good estimate; possibly when the best estimate is between the two. The bias and MSE plots for estimates of mean and standard deviation are shown here for the distribution 0:25 0:125 X N( 0 0 ; .  2 3   0:125 0:25 6 7 4 5 6.5. Results for Larger Sample Sizes

The full comparison of methods was done for samples of size n = 25. This is much closer to reality in many cases where large samples are very expensive to obtain. However, most comparisons in the literature have been done on larger samples, so the e¤ect of increasing sample size to n = 100 will be examined here.

Each variable has a standard normal marginal distribution as in the larger scale comparisons, but comparisons between sample sizes are only made for the case when  = 0:5.

In general, MSEs of larger samples are lower for all parameter estimates from all methods, and bias is lower for estimates of some parameters, so the real points of interest in comparing results for n = 100 with results for n = 25 are the changes in relative performance among methods. The most notable of these changes are the improved performance, both in bias and MSE, of the MCMC method in estimating  and . For the case of n = 25, this method did not perform well when

112 Figure 6.51. Biases in Estimates of Mean Mu when  = 0:5 and Variances are 2 = 0:25

Figure 6.52. MSEs of Estimates of Mean Mu when  = 0:5 and Variances are 2 = 0:25

113 Figure 6.53. Biases in Standard Deviation Estimates when  = 0:5 and Variances are 2 = 0:25

Figure 6.54. MSEs of Standard Deviation Estimates when  = 0:5 and Variances are 2 = 0:25

114 estimating these parameters at higher proportions of censoring, whereas for n =

100 it continued to perform very well right up to the highest level of censoring. In contrast, there was no appreciable change in relative performance among methods for estimating the correlation coe¢ cient or the regression parameters. Only the plots for bias and MSE of estimates of  and  are shown here since these are where the di¤erences are seen.

6.6. Conclusions

In summary, the EM method is the most consistent for estimating all parame- ters across all distributions and levels of censoring. The MCMC method performs very well in many cases, but tends to have higher MSE than the other methods, except in estimating the correlation coe¢ cient. It also does not perform as well above 40% censoring in many cases. Both of the new methods are a substan- tial improvement over univariate methods when estimating correlation between the variables since they do not assume independence as the univariate methods do. Univariate methods do not perform badly for the distributions tested pro- vided that the correct univariate method is chosen for the correct distribution.

However, they should never be used without …rst estimating the variance in each variable to determine which is the correct method to apply. This could even lead to a need to use a di¤erent method on each variable in the case of bivariate data. The new methods, and particularly the EM method, are more appealing in that they are more consistent across di¤erent levels of variance and correlation

115 Figure 6.55. Biases in Mean Estimates for n = 100 when  = 0:5

Figure 6.56. MSEs of Mean Estimates for n = 100 when  = 0:5

116 Figure 6.57. Biases in Estimates of  for n = 100 when  = 0:5

Figure 6.58. MSEs of Estimates of  for n = 100 when  = 0:5

117 in the underlying distribution. The same method can be safely applied to multi- ple variables with di¤erent variances and all distributional parameters, including correlation coe¢ cient, can be estimated by the same method.

118 CHAPTER 7

Discussion and Recommendations

7.1. MCMC Method

Since the estimates calculated by the MCMC method are averages over 1000 complete data sets per sample, it is easy to obtain a con…dence interval for any of the estimated parameters from just a single data set. It is also possible to save an array of all 1000 data sets for any further analysis that may be desired at a future date. If an estimate is desired of a new statistic, this is simply calculated from each of the 1000 data sets, then averaged over them to provide a point estimate. A single

"best" estimate of the complete data set itself can also be obtained by averaging over the 1000 sets. This will provide the best point estimate for each censored xij value all together in one matrix, but should not be used to calculate further statistics because it will not take into account the variability in each estimated xij.

Thus the variance of the distribution would be underestimated, with negative bias of the variance increasing along with the proportion of censoring. This will be discussed further below. It will not present a problem with the MCMC method if all future calculations are performed on each of the 1000 complete data sets, then averaged.

119 If desired, the MCMC method could be further modi…ed and likely improved by investigating the e¤ect of various prior densities. The Je¤reys prior was chosen here because it is a commonly used noninformative prior that is the limit of the conjugate inverse-Wishart prior density and is known to result in a proper posterior density. However, using a more sophisticated prior such as a reference prior or prior may result in better frequentist properties [8][6].

7.1.1. MCMC Error

For some of the parameter estimates, the MSE of the MCMC method diverged from the MSEs of the other methods at higher levels of censoring. This increase of MSE for estimates of  and  was no longer seen when sample sizes were increased to 100. However, increasing sample size is often not an option, so it would be desirable to reduce this in MSE without the need for large samples. It would also be desirable to reduce the increase in MSEs for regression parameter estimates since the relative magnitudes of these MSEs were not reduced substantially by the larger sample size.

In order to determine possible remedies, we must …rst go back to the theory of Chapter 3 where we saw that the variance of the limiting distribution of the

MCMC estimate ^N = E[f()] is

1 (7.1) 2 = var[f((i))] + 2 cov f((i)); f((i+k)) f g k=1 X

120 where N is the number of MCMC iterations. The …rst component of this variance is the Monte Carlo error, and  can be estimated by the Monte Carlo Standard

Error (MCSE), which is the consistent estimator [12]:

N 1 (i) 2 (7.2) ^N = [f( ) ^ ] vN N u i=1 u X t This is the error that was reduced by increasing the sample sizes from 25 to 100.

The second component is the covariance between successive estimates at the various iterations. Ideally we would like to see this covariance die out very quickly with spacing between iterations.

Some investigation into this particular MCMC algorithm shows that covariance dies out quickly among successive estimates of data values, xij < DL, where only one of the two variables is censored at point j, but is slower for the data points where both are censored. This can be seen in plots of the autocorrelation function

(ACF) versus iteration lag of estimated xij values that follow for two points from the same sample. The samples of size n = 25 came from a distribution with 50% censoring and  = 0:9. The …rst plot is for a censored x1j value for which x2j was not censored. The second is for a censored x2j value for which x1j was also censored. These plots suggest that the higher MSEs for MCMC estimates at 50% censoring could be at least partly due to the autocorrelation at doubly censored points, of which more exist as censoring level goes up. A change in the algorithm that lowers this autocorrelation could potentially lower the MSEs to a satisfactory

121 level. One idea for a change is to remove conditioning of a censored variable on the current value of the second variable when both are censored at the same point.

Although the idea is simple, the restructuring of code required was too extensive to explore in the scope of this research.

7.2. EM Method

The output of the EM method, in addition to estimates of distributional pa- rameters, includes a …nal complete data matrix, X, with all uncensored data and

…nal point estimates of each censored xij. It also includes an n d(d+1)=2 matrix  of best estimates for each xijxlj value. For uncensored data, these are just the actual multiples of the data values, and for censored data they are expected values.

While the X matrix alone should be used with caution, most statistics of interest could be estimated well by using a combination of the X matrix and the matrix of xijxlj values. Alternately, multiple complete data sets could be generated from the

…nal parameter estimates and used together for further analysis in the same way that the 1000 complete data sets can be used from the MCMC method. These can be generated by several methods, either of which will require substantially fewer than 1000 sets, as required by the MCMC method, to provide a good basis for any future analysis.

One method of generating multiple complete data sets is to use the …nal esti- mates of distributional parameters to randomly draw multiple sets of censored xij values. This would be accomplished in exactly the same way that each censored

122 Figure 7.1. Autocorrelation vs. Lag for Single Variable Censoring

Figure 7.2. Autocorrelation vs. Lag for Multivariable Censoring

123 xij value is drawn in the MCMC method, the di¤erence being that all draws would be made from the same distribution. Each set of randomly drawn censored data would be combined with uncensored data to form one complete data set. Some further experimentation would be required with simulated data to determine how many data sets are necessary to provide a basis that accurately duplicates the original parameter estimates.

Another method of generating multiple complete data sets is to divide the censored tail regions into intervals of equal probability and calculate expected xij values from each interval. This method will not necessarily reproduce the correct level of correlation for the generated data, but will provide a small number of complete data sets from which analysis on each individual variable will provide results representative of the underlying marginal distributions. The advantage in this method of generating data sets is that we can readily check the results theoretically and choose a number of data sets that provides the desired level of accuracy. In order to illustrate, we will …rst consider the bias introduced into estimates of distributional variance when the estimates are calculated from a single complete data set, in which each censored data value has been replaced by its estimated value. The distributional variance is

(7.3) V ar(x) = E[x2 x < DL] (E[x x < DL])2 j j

124 2 2 Percent Censoring E[xj xj < DL] (E[xj xj < DL]) Bias Percent Bias 10 j3.25 j 3.07 -0.17 -5.2% 20 2.18 1.96 -0.22 -10.1% 30 1.61 1.34 -0.26 -16.1% 40 1.24 0.93 -0.31 -25.0% 50 1.00 0.64 -0.36 -36.0% Table 7.1. Bias Introduced by Estimating Individual Variance Com- ponents Incorrectly

which is estimated as

1 n 1 n (7.4) V\ ar(x) = E[x2] ( E[x ])2 n j n j j=1 j=1 X X where the expectations are simply the data values themselves for uncensored data and are expectations given xj < DL for the censored values. If we use the …nal

X matrix from a run of the EM algorithm and treat the imputed data values as

2 2 if they were actual data, this has the e¤ect of replacing E[xj ] with (E[xj]) for each censored value xj. It is obvious from equation 7.4 that this will result in underestimating the variance since the censored values will add nothing to the variance estimate. Assuming a standard normal marginal distribution at various levels of censoring, we can see from Table 7.2 that the negative biases for individual components of the variance calculation are substantial and increase with increased censoring.Not only do biases of the individual expectations increase with increased censoring, but the number of estimated data points increases as well, resulting in a magni…cation of the bias in the variance estimate. Note that the censoring

125 m 2 1 2 Percent E[xj xj < DL] m (E[x ai < x < bi]) Percent Bias j i=1 j Censoring x N(0; 1) mP= 5 m = 10 m = 5 m = 10 10 3.25 3.22 3.24 -0.8% -0.4% 20 2.18 2.15 2.17 -1.5% -0.7% 30 1.61 1.57 1.59 -2.2% -1.0% 40 1.24 1.21 1.23 -2.6% -0.9% 50 1.00 0.96 0.98 -4.1% -1.7% Table 7.2. Bias Reduction with Multiple Imputations

level shown in Table 7.2 refers to the level of censoring in a univariate marginal distribution.

In contrast to the bias obtained by using a single X matrix with all censored values replaced by their expected value within the censored tail, we will now cal- culate the bias obtained from …ve or ten complete datasets. These data sets are generated under the assumption that if we divide the tail region into equiproba- bility intervals, then a censored data value is equally likely to come from any of these intervals. So calculating the expected values of a particular x, given that x is within each interval, then averaging over the intervals is a valid method for choosing multiple imputations. In fact, the average of the multiple imputations of

E[x ai < x < bi] is exactly the desired expectation E[x x < DL], while the aver- j j 2 age of the squares of the multiple imputations (E[x ai < x < bi]) is signi…cantly j greater than (E[x x < DL])2 and much closer to the desired E[x2 x < DL], j j so that it will result in a much smaller bias in the variance estimate of 7.4.The

126 desired number of imputed data sets could be chosen based on calculations sim- ilar to those summarized in Table 7.2 prior running the EM algorithm based on preliminary estimates of the distributional variance and percent censoring.

7.3. Future Extensions of MCMC and EM Methods

Both the MCMC and EM methods developed here could be extended to higher dimension multivariate distributions by changing parts of the R code accordingly.

The methods themselves, as developed in previous chapters, are not limited to a bivariate scenario. In addition to extending the current computer code, the code could be run as it is currently in a pairwise fashion, but some of the advantages that these methods have over simple substitution methods would be lost if the expectations of censored data values were not conditioned on the values of all uncensored variables at the same data point. The y variable could also be censored to see how this a¤ects predictions of regression parameters. In this case, censored y values could be replaced by univariate MCMC or EM methods, which would simply skip the step of conditioning on uncensored variables at the same data point. Then the regression coe¢ cients could be estimated. Alternatively, the y variable could be treated as a third variable in a multivariate distribution and the algorithms could be extended as previously mentioned to estimate censored response and predictor variables simultaneously.

A few simple modi…cations of the current computer code could allow for either right-censored or interval-censored data instead of left-censored, in which case it

127 could be used for a wide range of possible applications. It could also very easily be made to allow for multiple detection limits, either within or between variables.

Theoretically, each piece of data could have it’s own censoring type and its own detection limit(s) read in from additional input matrices. And further comparisons could be made amongst the methods for data from distributions with more or less variability than the standard normal, as well as greater than 50% censoring.

A slightly larger extension, but one already developed for the univariate case, would be to modify the EM algorithm to maximize the likelihood function over a set of Box-Cox parameters as well as over the distributional parameters and censored data values. Details for the univariate case can be found in [34] and

[10]. The basic procedure is just to add an outer loop that iterates through a pre-chosen set of possible data transformations and in the end, chooses the one that maximizes the likelihood. All other parameters continue to be maximized within that loop as they are currently. If the shape of the underlying distribution is not known, this would be a very useful modi…cation and could greatly reduce the error caused by an incorrect assumption about the type of underlying distribution.

With this extension, the robustness of each method to various misspeci…cation of the underlying data distribution could be explored.

128 References

[1] Guidelines for data acquisition and data quality evaluation in environmental chemistry. (52):2242–2249, 1980. T1:.

[2] US Environmental Protection Agency. 40 cfr part 136, appendix b.

[3] US Environmental Protection Agency. Data quality assessment: Statistical methods for practicioners. Technical Report EPA QA/G9-S, 2006.

[4] Peter C. Austin and Je¤rey S. Hoch. Estimating models in the presence of a censored independent variable. Statistics in medicine, , 23(3):411–429, 2004. CP: Copyright Âl 2004 John Wiley Sons, Ltd.; PN: 0277-6715.

[5] A. Baccarelli, R. Pfei¤er, D. Consonni, A. C. Pesatori, M. Bonzini, D. G. Pat- terson Jr., P. A. Bertazzi, and M. T. Landi. Handling of dioxin measurement data in the presence of non-detectable values: Overview of available methods and their application in the seveso chloracne study. Chemosphere, 60(7):898– 906, 2005. Cited By (since 1996): 7.

[6] J. O. Berger and J. M. Bernardo. On the development of the reference prior method. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, ed- itors, Bayesian Statistics 4: Proceedings of the Fourth Valencia International Meeting. Oxford University Press, 2002.

[7] G. E. P. Box and D. R. Cox. An analysis of transformations. Journal of the Royal Statistical Society.Series B (Methodological), 26(2):211–252, 1964.

[8] Gauri Sankar Datta and Rahul Mukerjee. Probability matching priors : higher order asymptotics. Springer-Verlag, New York, 2004.

129 [9] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Soci- ety.Series B (Methodological), 39(1):1–38, 1977.

[10] Jade Freeman and Reza Modarres. Analysis of censored environmental data with box-cox transformations. In American Statistical Association, Proceed- ings of the Section on Statistics and the Environment, 2003.

[11] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Chapman Hall/CRC, Boca Raton, Fla., 2004.

[12] Charles Geyer. Introduction to markov chain monte carlo. In Institute of Math- ematics and its Applications, Sept. 16, 2003.

[13] Charles J. Geyer. Practical markov chain monte carlo. Statistical Science, 7(4):473–483, Nov. 1992.

[14] A. Gleit. Estimation for small normal data sets with detection limits. Envi- ronmental Science and Technology, 19(12):1201–1206, 1985. Cited By (since 1996): 35.

[15] W. Gri¢ ths. A Gibbs’Sampler for the Parameters of a Truncated Multivariate Normal Distribution. Contemporary Issues in Economics and Econometrics: Theory and Application. Edward Elgar, Cheltenham, U.K., 2004.

[16] D. R. Helsel. Less than obvious: Statistical treatment of data below the de- tection limit. Environmental Science and Technology, 24(12):1766–1774, 1990. Cited By (since 1996): 124.

[17] D. R. Helsel. More than obvious: Better methods for interpreting nondetect data. Environmental Science and Technology, 39(20), 2005. Cited By (since 1996): 7.

[18] D. R. Helsel. Fabricating data: How substituting values for nondetects can ruin results, and what can be done about it. Chemosphere, 65(11):2434–2439, 2006.

[19] D. R. Helsel and T. A. Cohn. Estimation of for multiply censored water quality data. Water Resources Research, 24(12):1997–2004, 1988. Cited By (since 1996): 48.

130 [20] D. R. Helsel and R. J. Gilliom. Estimation of distributional parameters for censored trace level water quality data. 2. veri…cation and applications. Water Resources Research, 22(2):147–155, 1986. Cited By (since 1996): 27.

[21] P. Hewett and G. H. Ganser. A comparison of several methods for analyzing censored data. The Annals of occupational hygiene, 51(7):611–632, 2007.

[22] R. W. Hornung and L. D. Reed. Estimation of average concentration in the presence of nondetectable values. Applied Occupational and Environmental Hygiene, 5(1):46–51, 1990. Cited By (since 1996): 372.

[23] A. Cli¤ord Cohen Jr. Simpli…ed estimators for the normal distribution when samples are singly censored or truncated. Technometrics, 1(3):217–237, Aug. 1959.

[24] A. Cli¤ord Cohen Jr. Tables for maximum likelihood estimates: Singly trun- cated and singly censored samples. Technometrics, 3(4):535–541, Nov. 1961.

[25] Samuel S. Kortum. Economics 8206 lecture notes 4, 1992.

[26] C. N. Kroll and J. R. Stedinger. Estimation of moments and quantiles using censored data. Water Resources Research, 32(4):1005–1012, 1996.

[27] James M. Lazorchak and Mark E. Smith. National screening survey of edcs in municipal wastewater treatment e• uents, 2004.

[28] Robert H. Lyles, Jovonne K. Williams, and Rutt Chuachoowong. Correlating two viral load assays with known detection limits. Biometrics, 57(4):1238– 1244, Dec. 2001.

[29] Gerald J. Nehls and Gerald G. Akland. Procedures for handling aerometric data. Journal of the Air Pollution Control Association, 23(3):180–184, 01/01 1973. TY: BOOK; Journal Code: CIJAUG1973; Entry Date: 1973.

[30] Michael C. Newman, Philip M. Dixon, Brian B. Looney, and John E. Pinder. Estimating mean and variance for environmental samples with below detection limit observations. Water Resources Bulletin, 25(4):905–916, 1989. Cited By (since 1996): 49.

131 [31] Tore Persson and Holger Rootzen. Simple and highly e¢ cient estimators for a type i censored normal sample. Biometrika, 64(1):123–128, Apr. 1977.

[32] Raymond C. Rhodes. Too much ado about next to nothing; what to do with measurements below the detection limits. pages 16–20, 1993.

[33] D. B. Richardson and A. Ciampi. E¤ects of exposure measurement error when an exposure variable is constrained by a lower limit. American Journal of , 157(4):355–363, 2003. Cited By (since 1996): 9.

[34] R. H. Shumway, A. S. Azari, and P. Johnson. Estimating mean concentrations under transformation for environmental data with detection limits. Techno- metrics, 31(3):347–356, Aug. 1989.

132 APPENDIX

R Code

133 # BEGIN PROGRAM TO GENERATE SIMULATED DATA library(MASS) library(mvtnorm) library(MCMCpack)

# Define dimension of MVN=d d<-2

# Initialize a vector for the true distribution mean mu<-c(0,0)

# Assign number of draws to generate n<-25

# Number of MCMC iterations nsteps<-1200

# Initialize the case number to keep track or the combination of rho/p_cens case_number<-0

# Initialize an n by d matrix of zeros for draws from the multivariate normal distribution x<-matrix(rep(0,d*n),n,d)

# Initialize matrices for linear regression x_matrix <- matrix(0,n,d+1) x_matrix[,1] <- rep(1,n) y <- matrix(0,n,1) error <- matrix(0,n,1) b <- c(1,.75,.25) # Set relationship between y and x for regression bmatrix<-matrix(b,100,3,byrow=T)

# Initialize matrices for conditional distributions mu2<-matrix(0,d-1,1) # Define mu2 as d-1x1 matrix x2<-matrix(0,d-1,1) # Define x2 as d-1x1 matrix sig12<-matrix(0,1,d-1) # Define sig12 as 1xd-1 matrix sig21<-matrix(0,d-1,1) # Define sig21 as d-1x1 matrix sig22<-matrix(0,d-1,d-1) # Define sig22 as d-1xd-1 matrix

# Initialize matrices for results of 100 rho/p_cens combinations result<-matrix(0,100,9) xbias_sampling<- matrix(0,100,d) xbias_half_mdl <- matrix(0,100,d) xbias_sqrt_mdl <- matrix(0,100,d) xbias_mcmc <- matrix(0,100,d) xbias_EM_1 <- matrix(0,100,d) xbias_EM_10 <- matrix(0,100,d) xbias_EM_100 <- matrix(0,100,d) stdbias_sampling<- matrix(0,100,d) stdbias_half_mdl <- matrix(0,100,d) stdbias_sqrt_mdl <- matrix(0,100,d) stdbias_mcmc <- matrix(0,100,d) stdbias_EM_1 <- matrix(0,100,d) stdbias_EM_10 <- matrix(0,100,d) stdbias_EM_100 <- matrix(0,100,d) meanlognormbias_sampling<- matrix(0,100,d) meanlognormbias_half_mdl <- matrix(0,100,d) meanlognormbias_sqrt_mdl <- matrix(0,100,d) meanlognormbias_mcmc <- matrix(0,100,d) meanlognormbias_EM_1 <- matrix(0,100,d) meanlognormbias_EM_10 <- matrix(0,100,d) meanlognormbias_EM_100 <- matrix(0,100,d) varlognormbias_sampling<- matrix(0,100,d) varlognormbias_half_mdl <- matrix(0,100,d) varlognormbias_sqrt_mdl <- matrix(0,100,d) varlognormbias_mcmc <- matrix(0,100,d) varlognormbias_EM_1 <- matrix(0,100,d) varlognormbias_EM_10 <- matrix(0,100,d) varlognormbias_EM_100 <- matrix(0,100,d) rhobias_sampling<- matrix(0,100,1) rhobias_half_mdl <- matrix(0,100,1) rhobias_sqrt_mdl <- matrix(0,100,1) rhobias_mcmc <- matrix(0,100,1) rhobias_EM_1 <- matrix(0,100,1) rhobias_EM_10<- matrix(0,100,1) rhobias_EM_100 <- matrix(0,100,1) bbias_uncens <- matrix(0,100,3) bbias_half_mdl <- matrix(0,100,3) bbias_sqrt_mdl <- matrix(0,100,3) bbias_na <- matrix(0,100,3) bbias_mcmc <- matrix(0,100,3) bbias_EM_1 <- matrix(0,100,3) bbias_EM_10 <- matrix(0,100,3) bbias_EM_100 <- matrix(0,100,3) xbias_result <- matrix(0,100,9) rhobias_result <- matrix(0,100,9) bbias_result <- matrix(0,100,26) stdbias_result<-matrix(0,100,16) meanlognormbias_result<-matrix(0,100,16) varlognormbias_result<-matrix(0,100,16) mse_x_result<-matrix(0,100,16) mse_rho_result<-matrix(0,100,9) mse_b_result<-matrix(0,100,26) mse_std_result<-matrix(0,100,16) mse_meanlognorm_result<-matrix(0,100,16) mse_varlognorm_result<-matrix(0,100,16)

# Initialize matrices to record summaries for each sample uncens_xbar <- matrix(0,100,d) uncens_rho <- matrix(0,100,1) uncens_std1<-matrix(0,100,1) uncens_std2<-matrix(0,100,1) mean1_lognorm_uncens<- matrix(0,100,1) mean2_lognorm_uncens<- matrix(0,100,1) var1_lognorm_uncens<- matrix(0,100,1) var2_lognorm_uncens<- matrix(0,100,1) cens_half_mdl_xbar <- matrix(0,100,d) cens_half_mdl_rho <- matrix(0,100,1) cens_half_std1<- matrix(0,100,1) cens_half_std2<- matrix(0,100,1) mean1_lognorm_half<- matrix(0,100,1) mean2_lognorm_half<- matrix(0,100,1) var1_lognorm_half<- matrix(0,100,1) var2_lognorm_half<- matrix(0,100,1) cens_sqrt_mdl_xbar <- matrix(0,100,d) cens_sqrt_mdl_rho <- matrix(0,100,1) cens_sqrt_std1<- matrix(0,100,1) cens_sqrt_std2<- matrix(0,100,1) mean1_lognorm_sqrt<- matrix(0,100,1) mean2_lognorm_sqrt<- matrix(0,100,1) var1_lognorm_sqrt<- matrix(0,100,1) var2_lognorm_sqrt<- matrix(0,100,1) cens_mcmc_xbar <- matrix(0,100,d) cens_mcmc_rho <- matrix(0,100,1) cens_mcmc_std1<- matrix(0,100,1) cens_mcmc_std2<- matrix(0,100,1) mean1_lognorm_mcmc<- matrix(0,100,1) mean2_lognorm_mcmc<- matrix(0,100,1) var1_lognorm_mcmc<- matrix(0,100,1) var2_lognorm_mcmc<- matrix(0,100,1) cens_EM_xbar1 <- matrix(0,100,d) cens_EM_rho1 <- matrix(0,100,1) cens_EM_xbar10 <- matrix(0,100,d) cens_EM_rho10 <- matrix(0,100,1) cens_EM_xbar100 <- matrix(0,100,d) cens_EM_rho100 <- matrix(0,100,1) cens_EM1_std1<- matrix(0,100,1) cens_EM1_std2<- matrix(0,100,1) cens_EM10_std1<- matrix(0,100,1) cens_EM10_std2<- matrix(0,100,1) cens_EM100_std1<- matrix(0,100,1) cens_EM100_std2<- matrix(0,100,1) mean1_lognorm_EM1<- matrix(0,100,1) mean2_lognorm_EM1<- matrix(0,100,1) mean1_lognorm_EM10<- matrix(0,100,1) mean2_lognorm_EM10<- matrix(0,100,1) mean1_lognorm_EM100<- matrix(0,100,1) mean2_lognorm_EM100<- matrix(0,100,1) var1_lognorm_EM1<- matrix(0,100,1) var2_lognorm_EM1<- matrix(0,100,1) var1_lognorm_EM10<- matrix(0,100,1) var2_lognorm_EM10<- matrix(0,100,1) var1_lognorm_EM100<- matrix(0,100,1) var2_lognorm_EM100<- matrix(0,100,1) bhat_uncens <- matrix(0,100,d+1) bhat_half_mdl <- matrix(0,100,d+1) bhat_sqrt_mdl <- matrix(0,100,d+1) bhat_na <- matrix(0,100,d+1) bhat_mcmc <- matrix(0,100,d+1) bhat_EM_1 <- matrix(0,100,d+1) bhat_EM_10 <- matrix(0,100,d+1) bhat_EM_100 <- matrix(0,100,d+1) mse_x_uncens<-matrix(0,100,d) mse_x_half_mdl<-matrix(0,100,d) mse_x_sqrt_mdl<-matrix(0,100,d) mse_x_mcmc<-matrix(0,100,d) mse_x1_EM<-matrix(0,100,d) mse_x10_EM<-matrix(0,100,d) mse_x100_EM<-matrix(0,100,d) mse_std_uncens<-matrix(0,100,d) mse_meanlognorm_uncens<-matrix(0,100,d) mse_varlognorm_uncens<-matrix(0,100,d) mse_std_half<-matrix(0,100,d) mse_meanlognorm_half<-matrix(0,100,d) mse_varlognorm_half<-matrix(0,100,d) mse_std_sqrt<-matrix(0,100,d) mse_meanlognorm_sqrt<-matrix(0,100,d) mse_varlognorm_sqrt<-matrix(0,100,d) mse_std_mcmc<-matrix(0,100,d) mse_meanlognorm_mcmc<-matrix(0,100,d) mse_varlognorm_mcmc<-matrix(0,100,d) mse_std_EM1<-matrix(0,100,d) mse_std_EM10<-matrix(0,100,d) mse_std_EM100<-matrix(0,100,d) mse_meanlognorm_EM1<-matrix(0,100,d) mse_varlognorm_EM1<-matrix(0,100,d) mse_meanlognorm_EM10<-matrix(0,100,d) mse_varlognorm_EM10<-matrix(0,100,d) mse_meanlognorm_EM100<-matrix(0,100,d) mse_varlognorm_EM100<-matrix(0,100,d) mse_rho_uncens<-matrix(0,100,1) mse_rho_half_mdl<-matrix(0,100,1) mse_rho_sqrt_mdl<-matrix(0,100,1) mse_rho_mcmc<-matrix(0,100,1) mse_rho1_EM<-matrix(0,100,1) mse_rho10_EM<-matrix(0,100,1) mse_rho100_EM<-matrix(0,100,1) mse_b_uncens<-matrix(0,100,d+1) mse_b_half_mdl<-matrix(0,100,d+1) mse_b_sqrt_mdl<-matrix(0,100,d+1) mse_b_na<-matrix(0,100,d+1) mse_b_mcmc<-matrix(0,100,d+1) mse_b1_EM<-matrix(0,100,d+1) mse_b10_EM<-matrix(0,100,d+1) mse_b100_EM<-matrix(0,100,d+1)

# Arrays of values for each MCMC iteration x_3d_array <- array(0,c(n,d,nsteps))

# Arrays to store all x values for later calculations x_uncens_3d<-array(0,c(n,d,100,100)) x_mcmc_3d<-array(0,c(n,d,100,100)) x_EM1_3d<-array(0,c(n,d,100,100)) x_EM10_3d<-array(0,c(n,d,100,100)) x_EM100_3d<-array(0,c(n,d,100,100)) x_half_3d<-array(0,c(n,d,100,100)) x_sqrt_3d<-array(0,c(n,d,100,100)) set.seed(66) # For consistency of results

# BEGIN LOOPS FOR 50 CASES (Combinations of rho and percent censoring) for (neg_corr in 1:2) { # Begin Loop 1 (positive/negative corr) if (neg_corr==1) {neg_mult<- -1} if (neg_corr==2) {neg_mult<- 1} for (rho_base in 0:9){ # Begin Loop 2 (correlation coeff .5 to .9) rho <- rho_base * .1 * neg_mult

# Vary p_cens from .1 to .5 for (p_base in 1:5){ # Begin Loop 3 (proportion censored .1 to .5) p_cens <- p_base * .1 case_number<-case_number+1

# Generate 100 samples for each case for(sample_no in 1:100) { # Begin sample_no loop

# Initialize a var-cov matrix sigma<-matrix(data = rho, nrow = d, ncol = d) for (i in 1:d){ sigma[i,i]<-1}

# Set the method detection limit (MDL) based on p_cens # qmvnorm computes the equicoordinate quantile function of the MVN # Need to use “upper” tail with (1 - p_cens) due to the fact that it is an AND situation: # this is the MDL at which censoring is above on BOTH X1 AND X2, equivalent to # censoring being below MDL on X1 OR X2

cdf_at_p_cens<-qmvnorm((1-p_cens),sigma=sigma,tail="upper.tail")

# “cens” is the equicoordinate MDL cens<-cdf_at_p_cens$quantile # value of x for which values below are censored (MDL) half_mdl_trans<-log(.5*exp(cdf_at_p_cens$quantile)) # ln transformed half MDL sqrt_mdl_trans<-log((1/sqrt(2))*exp(cdf_at_p_cens$quantile)) # ln transf 1/sqrt2*MDL

# Create an n by d status matrix to keep track of which x values are

# Draws n data points from d dimensional MVN distribution with mean mu and # covariance sigma, then stores draws in nxd matrix x. Values that are

# Generate n random MVN(mu, sigma) variables for each sample x <- mvrnorm(n, mu=mu, Sigma=sigma) # since mu and sigma are 2-dim, so will be x x_uncens_3d[,,sample_no,case_number]<-x xx <- x x_na <- x x_matrix[,2:(d+1)] <- x # this is the design matrix (first column is 1s) uncens_xbar[sample_no,]<-mean(data.frame(x)) uncens_corr<-cor(x) uncens_rho[sample_no]<- uncens_corr[1,2] uncens_var1<-cov(x)[1,1] uncens_var2<-cov(x)[2,2] uncens_std1[sample_no]<-sqrt(uncens_var1) uncens_std2[sample_no]<-sqrt(uncens_var2) mean1_lognorm_uncens[sample_no]<-exp(uncens_xbar[sample_no,1]+.5*uncens_var1) mean2_lognorm_uncens[sample_no]<-exp(uncens_xbar[sample_no,2]+.5*uncens_var2) var1_lognorm_uncens[sample_no]<-exp(2*uncens_xbar[sample_no,1]+ uncens_var1)*(exp(uncens_var1)-1) var2_lognorm_uncens[sample_no]<-exp(2*uncens_xbar[sample_no,2]+ uncens_var2)*(exp(uncens_var2)-1)

# Generate n random standard normal error terms error <- rnorm(n)

# Calculate y=b*x_matrix + std normal error (b values set previously) y <- x_matrix %*% b + error

# Regress uncensored data regr_uncens <-lm(y~x)

# Censor x and xx by replacing values below cens with ½ * cens or 1/sqrt(2) * cens # Censor x_na by replacing values below cens with NA for (i in 1:n){ for (j in 1:d){ if (x[i,j]

# Regress censored data with censored values replaced by ½ * cens, 1/sqrt(2) * cens, or NA regr_half_mdl <- lm(y~x) regr_sqrt_mdl <- lm(y~xx) regr_na <- lm(y~x_na)

# Calculate statistics of interest for ½ and 1/sqrt(2) methods cens_half_mdl_xbar[sample_no,]<-mean(data.frame(x)) cens_half_mdl_corr<-cor(x) cens_half_mdl_rho[sample_no] <- cens_half_mdl_corr[1,2] cens_half_var1<-cov(x)[1,1] cens_half_var2<-cov(x)[2,2] cens_half_std1[sample_no] <-sqrt(cens_half_var1) cens_half_std2[sample_no] <-sqrt(cens_half_var2) mean1_lognorm_half[sample_no] <-exp(cens_half_mdl_xbar[sample_no,1]+.5*cens_half_var1) mean2_lognorm_half[sample_no] <-exp(cens_half_mdl_xbar[sample_no,2]+.5*cens_half_var2) var1_lognorm_half[sample_no] <- exp(2*cens_half_mdl_xbar[sample_no,1]+cens_half_var1)*(exp(cens_half_var1)-1) var2_lognorm_half[sample_no] <- exp(2*cens_half_mdl_xbar[sample_no,2]+cens_half_var2)*(exp(cens_half_var2)-1) cens_sqrt_mdl_xbar[sample_no,]<-mean(data.frame(xx)) cens_sqrt_mdl_corr<-cor(xx) cens_sqrt_mdl_rho[sample_no] <- cens_sqrt_mdl_corr[1,2] cens_sqrt_var1<-cov(xx)[1,1] cens_sqrt_var2<-cov(xx)[2,2] cens_sqrt_std1[sample_no] <-sqrt(cens_sqrt_var1) cens_sqrt_std2[sample_no] <-sqrt(cens_sqrt_var2) mean1_lognorm_sqrt[sample_no] <-exp(cens_sqrt_mdl_xbar[sample_no,1]+.5*cens_sqrt_var1) mean2_lognorm_sqrt[sample_no] <-exp(cens_sqrt_mdl_xbar[sample_no,2]+.5*cens_sqrt_var2) var1_lognorm_sqrt[sample_no] <- exp(2*cens_sqrt_mdl_xbar[sample_no,1]+cens_sqrt_var1)*(exp(cens_sqrt_var1)-1) var2_lognorm_sqrt[sample_no] <- exp(2*cens_sqrt_mdl_xbar[sample_no,2]+cens_sqrt_var2)*(exp(cens_sqrt_var2)-1)

# MAIN MCMC (and EM) LOOP

# Initial Values for Mean and Covariance Matrices for EM sig_draw<-cov(x) mu_draw<-mean(data.frame(x)) E_x_sq<-matrix(0,n,(d*(d+1))/2) mu_est<-mu_draw sig_est<-c(sig_draw[1,1],sig_draw[2,2]) rho_est<-sig_draw[1,2] for(i in 1:n){ if(status[i,1]==0) { E_x_sq[i,1]<-x[i,1]^2 } if(status[i,2]==0) { E_x_sq[i,2]<-x[i,2]^2 } }

# Matrices for estimates at each MCMC iteration regr_coeff_mcmc<- matrix(0,nsteps,d+1) xbar_mcmc_steps<- matrix(0,nsteps,d) rho_mcmc_steps<-matrix(0,nsteps,1) var1_mcmc_steps<- matrix(0,nsteps,1) var2_mcmc_steps<- matrix(0,nsteps,1) for(step in 1:nsteps){ # Begin nsteps loop xbar<-mean(data.frame(x)) # Applies mean by column S<-matrix(0,d,d) # Initialize S matrix to zero for(k in 1:n) S<- S+((x[k,]-xbar) %*% t(x[k,]-xbar))

# This is start of MCMC method after 100 iterations of EM if(step>100){ # EM method will skip this for step 1 to 100 # Draw var-cov matrix from Inverse Wishart (S) with n-1 d.f. (in MCMCpack) # Works for any d if all off-diagonal rho values are the same sig_draw<-riwish(n-1,S) # Draw mean from MVN(xbar, sigma/n) mu_draw<-mvrnorm(n = 1, mu=xbar, Sigma= sig_draw/n) }

# Draw new x values from truncated conditional distributions of # MVN(mu_draw,sig_draw) for all x values that were initially

# Calculate partitions of mu_draw vector, mu1 and mu2 mu1<-mu_draw[j] for (k in 1:d) {if(kj) mu2[k-1]<-mu_draw[k]} # Calculate partitions of sig_draw matrix; sig11, sig12, sig21, sig22 sig11<-sig_draw[j,j] for (k in 1:d) {if(kj) sig12[1,k-1]<-sig_draw[j,k]} for (k in 1:d) {if(kj) sig21[k-1,1]<-sig_draw[k,j]}

for (m in 1:d) { for (p in 1:d) { if ((mj)) sig22[m,p-1]<-sig_draw[m,p] if ((m>j)&(pj)&(p>j)) sig22[m-1,p-1]<-sig_draw[m,p] } }

for (i in 1:n) { # Begin loop over n points to draw censored values from conditional distributions if (status[i,j]==1) { for (k in 1:d) {if (kj) x2[k-1]<-x[i,k]} cond_mu<-mu1+sig12%*%solve(sig22)%*%(x2-mu2) # conditional mean cond_var<-sig11-sig12%*%solve(sig22)%*%sig21

# This is EM method being done for the first 100 iterations if(step<=100){ # Expectation Step (Calculate x and x^2 values for sufficient statistics) # E(x) x[i,j]<-cond_mu - sqrt(cond_var)*(dnorm((cens-cond_mu)/sqrt(cond_var))/pnorm((cens- cond_mu)/sqrt(cond_var))) # E(x1^2 ) and E(x2^2) norm_ratio<-dnorm((cens-cond_mu)/sqrt(cond_var))/pnorm((cens-cond_mu)/sqrt(cond_var)) cond_sig<-sqrt(cond_var) E_x_sq[i,j]<- cond_var + cond_mu^2 – (cond_var*((cens-cond_mu)/sqrt(cond_var)) +2*cond_mu*cond_sig)*norm_ratio }

#This is the MCMC method being done for iterations <100 if(step>100){ # This DRAWS a value in the tail for the MCMC method upper<- pnorm((cens-cond_mu)/sqrt(cond_var)) v<-runif(1,0,upper) x[i,j]<-cond_mu+sqrt(cond_var)*qnorm(v) # This randomly draws from tail below ln(MDL) }

} # End lf statement for status (whether data point is censored or not)

} # End loop over n points to draw censored values from conditional distributions } # End loop over d variables to draw censored values from conditional distributions

# Calculate MCMC parameter estimates for the step after drawing parameters and x values regr_mcmc <- lm(y~x) regr_coeff_mcmc[step,]<- regr_mcmc$coefficients xbar_mcmc_steps[step,]<-mean(data.frame(x)) cens_mcmc_corr<-cor(x) rho_mcmc_steps[step]<-cens_mcmc_corr[1,2] var1_mcmc_steps[step]<-cov(x)[1,1] var2_mcmc_steps[step]<-cov(x)[2,2] x_3d_array[,,step] <- x

# This is EM method being done for the first 100 iterations if(step<=100){ var_trunc<-c(0,0) for(i in 1:n) { # E(x1x2|x1,x2DL) if(sum(status[i,]==1)<2){ E_x_sq[i,3]<-x[i,1]*x[i,2]} } # Maximization Step mu_est<-mean(data.frame(x)) x_squared_est<-mean(data.frame(E_x_sq[,1:2])) sig_est<-x_squared_est-c(mu_est[1]^2,mu_est[2]^2) rho_est<-((sum(E_x_sq[,3])/n)-mu_est[1]*mu_est[2])/(sqrt(sig_est[1])*sqrt(sig_est[2]))

if (step==1) { x_EM_1<-x # Final x for 1 iteration for this sample (of 100 samples) of this case x_EM1_3d[,,sample_no,case_number]<-x cens_EM_xbar1[sample_no,]<-mu_est cens_EM_rho1[sample_no] <- rho_est cens_EM1_var1<-sig_est[1] cens_EM1_var2<-sig_est[2] cens_EM1_std1[sample_no] <-sqrt(cens_EM1_var1) cens_EM1_std2[sample_no] <-sqrt(cens_EM1_var2) mean1_lognorm_EM1[sample_no] <-exp(cens_EM_xbar1[sample_no,1]+.5*cens_EM1_var1) mean2_lognorm_EM1[sample_no] <-exp(cens_EM_xbar1[sample_no,2]+.5*cens_EM1_var2) var1_lognorm_EM1[sample_no] <- exp(2*cens_EM_xbar1[sample_no,1]+cens_EM1_var1)*(exp(cens_EM1_var1)-1) var2_lognorm_EM1[sample_no] <- exp(2*cens_EM_xbar1[sample_no,2]+cens_EM1_var2)*(exp(cens_EM1_var2)-1) }

if (step==10){ x_EM_10<-x # Final x for 10 iterations for this sample (of 100 samples) of this case x_EM10_3d[,,sample_no,case_number]<-x cens_EM_xbar10[sample_no,]<-mu_est cens_EM_rho10[sample_no] <-rho_est cens_EM10_var1<-sig_est[1] cens_EM10_var2<-sig_est[2] cens_EM10_std1[sample_no] <-sqrt(cens_EM10_var1) cens_EM10_std2[sample_no] <-sqrt(cens_EM10_var2) mean1_lognorm_EM10[sample_no] <-exp(cens_EM_xbar10[sample_no,1]+.5*cens_EM10_var1) mean2_lognorm_EM10[sample_no] <-exp(cens_EM_xbar10[sample_no,2]+.5*cens_EM10_var2) var1_lognorm_EM10[sample_no] <- exp(2*cens_EM_xbar10[sample_no,1]+cens_EM10_var1)*(exp(cens_EM10_var1)-1) var2_lognorm_EM10[sample_no] <- exp(2*cens_EM_xbar10[sample_no,2]+cens_EM10_var2)*(exp(cens_EM10_var2)-1) }

if (step==100) { x_EM_100<-x # Final x for 100 iterations for this sample (of 100 samples) of this case x_EM100_3d[,,sample_no,case_number]<-x cens_EM_xbar100[sample_no,]<-mu_est cens_EM_rho100[sample_no] <-rho_est cens_EM100_var1<-sig_est[1] cens_EM100_var2<-sig_est[2] cens_EM100_std1[sample_no] <-sqrt(cens_EM100_var1) cens_EM100_std2[sample_no] <-sqrt(cens_EM100_var2) mean1_lognorm_EM100[sample_no] <- exp(cens_EM_xbar100[sample_no,1]+.5*cens_EM100_var1) mean2_lognorm_EM100[sample_no] <- exp(cens_EM_xbar100[sample_no,2]+.5*cens_EM100_var2) var1_lognorm_EM100[sample_no] <- exp(2*cens_EM_xbar100[sample_no,1]+cens_EM100_var1)*(exp(cens_EM100_var1)-1) var2_lognorm_EM100[sample_no] <- exp(2*cens_EM_xbar100[sample_no,2]+cens_EM100_var2)*(exp(cens_EM100_var2)-1) } }

} # End step loop

# Take mean of x matrices from MCMC steps, excluding first 200 steps for (ii in 1:n) { for (jj in 1:d) { x[ii,jj] <- mean(data.frame(x_3d_array[ii,jj, 201:nsteps])) }} x_mcmc_3d[,,sample_no,case_number]<-x

# Regress censored data with censored values replaced by Cond Exp Value, 1 Iteration regr_EM_1 <- lm(y~x_EM_1)

# Regress censored data with censored values replaced by Cond Exp Value, 10 Iterations regr_EM_10 <- lm(y~x_EM_10)

# Regress censored data with censored values replaced by Cond Exp Value, 100 Iterations regr_EM_100 <- lm(y~x_EM_100)

# Save regression coefficients for the current sample in matrix bhat_uncens[sample_no,] <- regr_uncens$coefficients bhat_half_mdl[sample_no,] <- regr_half_mdl$coefficients bhat_sqrt_mdl[sample_no,] <- regr_sqrt_mdl$coefficients bhat_na[sample_no,] <- regr_na$coefficients bhat_mcmc[sample_no,] <- mean(data.frame(regr_coeff_mcmc)) bhat_EM_1[sample_no,] <- regr_EM_1$coefficients bhat_EM_10[sample_no,] <- regr_EM_10$coefficients bhat_EM_100[sample_no,] <- regr_EM_100$coefficients

# Calculate MCMC parameter estimates for current sample cens_mcmc_xbar[sample_no,]<-mean(data.frame(xbar_mcmc_steps)) cens_mcmc_rho[sample_no] <- mean(rho_mcmc_steps) cens_mcmc_var1<-mean(var1_mcmc_steps) cens_mcmc_var2<- mean(var2_mcmc_steps) cens_mcmc_std1[sample_no] <-sqrt(cens_mcmc_var1) cens_mcmc_std2[sample_no] <-sqrt(cens_mcmc_var2) mean1_lognorm_mcmc[sample_no] <-exp(cens_mcmc_xbar[sample_no,1]+.5*cens_mcmc_var1) mean2_lognorm_mcmc[sample_no] <-exp(cens_mcmc_xbar[sample_no,2]+.5*cens_mcmc_var2) var1_lognorm_mcmc[sample_no] <- exp(2*cens_mcmc_xbar[sample_no,1]+cens_mcmc_var1)*(exp(cens_mcmc_var1)-1) var2_lognorm_mcmc[sample_no] <- exp(2*cens_mcmc_xbar[sample_no,2]+cens_mcmc_var2)*(exp(cens_mcmc_var2)-1)

} # End sample_no loop

xbias_sampling[case_number,]<- mean(data.frame(uncens_xbar)) xbias_half_mdl[case_number,] <- mean(data.frame(cens_half_mdl_xbar)) xbias_sqrt_mdl[case_number,] <- mean(data.frame(cens_sqrt_mdl_xbar)) xbias_mcmc[case_number,] <- mean(data.frame(cens_mcmc_xbar)) xbias_EM_1[case_number,] <- mean(data.frame(cens_EM_xbar1)) xbias_EM_10[case_number,] <- mean(data.frame(cens_EM_xbar10)) xbias_EM_100[case_number,] <- mean(data.frame(cens_EM_xbar100)) stdbias_sampling[case_number,]<- mean(data.frame(cbind(uncens_std1, uncens_std2)))-c(1,1) stdbias_half_mdl[case_number,] <- mean(data.frame(cbind(cens_half_std1, cens_half_std2))) -c(1,1) stdbias_sqrt_mdl[case_number,] <- mean(data.frame(cbind(cens_sqrt_std1, cens_sqrt_std2))) -c(1,1) stdbias_mcmc[case_number,] <- mean(data.frame(cbind(cens_mcmc_std1, cens_mcmc_std2))) - c(1,1) stdbias_EM_1[case_number,] <- mean(data.frame(cbind(cens_EM1_std1, cens_EM1_std2))) -c(1,1) stdbias_EM_10[case_number,] <- mean(data.frame(cbind(cens_EM10_std1, cens_EM10_std2))) - c(1,1) stdbias_EM_100[case_number,] <- mean(data.frame(cbind(cens_EM100_std1, cens_EM100_std2))) - c(1,1) meanlognormbias_sampling[case_number,]<- mean(data.frame(cbind(mean1_lognorm_uncens, mean2_lognorm_uncens)))-c(exp(.5),exp(.5)) meanlognormbias_half_mdl[case_number,] <- mean(data.frame(cbind(mean1_lognorm_half, mean2_lognorm_half))) -c(exp(.5),exp(.5)) meanlognormbias_sqrt_mdl[case_number,] <- mean(data.frame(cbind(mean1_lognorm_sqrt, mean2_lognorm_sqrt))) -c(exp(.5),exp(.5)) meanlognormbias_mcmc[case_number,] <- mean(data.frame(cbind(mean1_lognorm_mcmc, mean2_lognorm_mcmc))) -c(exp(.5),exp(.5)) meanlognormbias_EM_1[case_number,] <- mean(data.frame(cbind(mean1_lognorm_EM1, mean2_lognorm_EM1))) -c(exp(.5),exp(.5)) meanlognormbias_EM_10[case_number,] <- mean(data.frame(cbind(mean1_lognorm_EM10, mean2_lognorm_EM10))) -c(exp(.5),exp(.5)) meanlognormbias_EM_100[case_number,] <- mean(data.frame(cbind(mean1_lognorm_EM100, mean2_lognorm_EM100))) -c(exp(.5),exp(.5)) varlognormbias_sampling[case_number,]<- mean(data.frame(cbind(var1_lognorm_uncens, var2_lognorm_uncens)))-c(exp(2)-exp(1),exp(2)-exp(1)) varlognormbias_half_mdl[case_number,] <- mean(data.frame(cbind(var1_lognorm_half, var2_lognorm_half))) -c(exp(2)-exp(1),exp(2)-exp(1)) varlognormbias_sqrt_mdl[case_number,] <- mean(data.frame(cbind(var1_lognorm_sqrt, var2_lognorm_sqrt))) -c(exp(2)-exp(1),exp(2)-exp(1)) varlognormbias_mcmc[case_number,] <- mean(data.frame(cbind(var1_lognorm_mcmc, var2_lognorm_mcmc))) -c(exp(2)-exp(1),exp(2)-exp(1)) varlognormbias_EM_1[case_number,] <- mean(data.frame(cbind(var1_lognorm_EM1, var2_lognorm_EM1))) -c(exp(2)-exp(1),exp(2)-exp(1)) varlognormbias_EM_10[case_number,] <- mean(data.frame(cbind(var1_lognorm_EM10, var2_lognorm_EM10))) -c(exp(2)-exp(1),exp(2)-exp(1)) varlognormbias_EM_100[case_number,] <- mean(data.frame(cbind(var1_lognorm_EM100, var2_lognorm_EM100))) -c(exp(2)-exp(1),exp(2)-exp(1)) rhobias_sampling[case_number] <- mean(data.frame(uncens_rho)) – rho rhobias_half_mdl[case_number] <- mean(data.frame(cens_half_mdl_rho)) - rho rhobias_sqrt_mdl[case_number] <- mean(data.frame(cens_sqrt_mdl_rho)) – rho rhobias_mcmc[case_number] <- mean(data.frame(cens_mcmc_rho)) – rho rhobias_EM_1[case_number,] <- mean(data.frame(cens_EM_rho1)) - rho rhobias_EM_10[case_number,] <- mean(data.frame(cens_EM_rho10)) - rho rhobias_EM_100[case_number,] <- mean(data.frame(cens_EM_rho100)) - rho bbias_uncens[case_number,] <- mean(data.frame(bhat_uncens)) - b bbias_half_mdl[case_number,] <- mean(data.frame(bhat_half_mdl)) - b bbias_sqrt_mdl[case_number,] <- mean(data.frame(bhat_sqrt_mdl)) - b bbias_na[case_number,] <- mean(data.frame(bhat_na)) - b bbias_mcmc[case_number,] <- mean(data.frame(bhat_mcmc)) – b bbias_EM_1[case_number,] <- mean(data.frame(bhat_EM_1)) - b bbias_EM_10[case_number,] <- mean(data.frame(bhat_EM_10)) - b bbias_EM_100[case_number,] <- mean(data.frame(bhat_EM_100)) - b

# Estimates of MSEs of Parameter Estimates are averages of (estimator – actual parameter)^2

mse_x_uncens[case_number,]<-diag(t(uncens_xbar)%*%uncens_xbar)/100

mse_rho_uncens[case_number,]<- t(uncens_rho-matrix(rho,100,1,byrow=TRUE))%*%(uncens_rho- matrix(rho,100,1,byrow=TRUE))/100

mse_b_uncens[case_number,]<-diag(t(bhat_uncens -matrix(b,100,3,byrow=TRUE))%*%(bhat_uncens -matrix(b,100,3,byrow=TRUE)))/100

mse_std_uncens[case_number,]<-diag(t(cbind(uncens_std1, uncens_std2)-matrix(1,100,2))%*% (cbind(uncens_std1, uncens_std2)- matrix(1,100,2)))/100

mse_meanlognorm_uncens[case_number,]<-diag(t(cbind(mean1_lognorm_uncens, mean2_lognorm_uncens)-matrix(exp(.5),100,2))%*% (cbind(mean1_lognorm_uncens, mean2_lognorm_uncens)- matrix(exp(.5),100,2)))/100

mse_varlognorm_uncens[case_number,]<-diag(t(cbind(var1_lognorm_uncens, var2_lognorm_uncens)- matrix(exp(2)-exp(1),100,2))%*% (cbind(var1_lognorm_uncens, var2_lognorm_uncens)- matrix(exp(2)- exp(1),100,2)))/100

mse_x_half_mdl[case_number,]<-diag(t(cens_half_mdl_xbar)%*%cens_half_mdl_xbar)/100

mse_rho_half_mdl[case_number,]<- t(cens_half_mdl_rho - matrix(rho,100,1,byrow=TRUE))%*%(cens_half_mdl_rho-matrix(rho,100,1,byrow=TRUE))/100

mse_b_half_mdl[case_number,]<-diag(t(bhat_half_mdl - matrix(b,100,3,byrow=TRUE))%*%(bhat_half_mdl -matrix(b,100,3,byrow=TRUE)))/100

mse_std_half[case_number,]<-diag(t(cbind(cens_half_std1, cens_half_std2)-matrix(1,100,2))%*% (cbind(cens_half_std1, cens_half_std2)- matrix(1,100,2)))/100

mse_meanlognorm_half[case_number,]<-diag(t(cbind(mean1_lognorm_half, mean2_lognorm_half)- matrix(exp(.5),100,2))%*% (cbind(mean1_lognorm_half, mean2_lognorm_half)- matrix(exp(.5),100,2)))/100

mse_varlognorm_half[case_number,]<-diag(t(cbind(var1_lognorm_half, var2_lognorm_half)- matrix(exp(2)-exp(1),100,2))%*% (cbind(var1_lognorm_half, var2_lognorm_half)- matrix(exp(2)- exp(1),100,2)))/100

mse_x_sqrt_mdl[case_number,]<-diag(t(cens_sqrt_mdl_xbar)%*%cens_sqrt_mdl_xbar)/100

mse_rho_sqrt_mdl[case_number,]<- t(cens_sqrt_mdl_rho -matrix(rho,100,1,byrow=TRUE))%*%( cens_sqrt_mdl_rho-matrix(rho,100,1,byrow=TRUE))/100

mse_b_sqrt_mdl[case_number,]<-diag(t(bhat_sqrt_mdl - matrix(b,100,3,byrow=TRUE))%*%(bhat_sqrt_mdl -matrix(b,100,3,byrow=TRUE)))/100

mse_std_sqrt[case_number,]<-diag(t(cbind(cens_sqrt_std1, cens_sqrt_std2)-matrix(1,100,2))%*% (cbind(cens_sqrt_std1, cens_sqrt_std2)- matrix(1,100,2)))/100

mse_meanlognorm_sqrt[case_number,]<-diag(t(cbind(mean1_lognorm_sqrt, mean2_lognorm_sqrt)- matrix(exp(.5),100,2))%*% (cbind(mean1_lognorm_sqrt, mean2_lognorm_sqrt)- matrix(exp(.5),100,2)))/100

mse_varlognorm_sqrt[case_number,]<-diag(t(cbind(var1_lognorm_sqrt, var2_lognorm_sqrt)- matrix(exp(2)-exp(1),100,2))%*% (cbind(var1_lognorm_sqrt, var2_lognorm_sqrt)- matrix(exp(2)- exp(1),100,2)))/100

mse_x_mcmc[case_number,]<-diag(t(cens_mcmc_xbar)%*%cens_mcmc_xbar)/100

mse_rho_mcmc[case_number,]<- t(cens_mcmc_rho -matrix(rho,100,1,byrow=TRUE))%*%( cens_mcmc_rho-matrix(rho,100,1,byrow=TRUE))/100

mse_b_mcmc[case_number,]<-diag(t(bhat_mcmc -matrix(b,100,3,byrow=TRUE))%*%(bhat_mcmc - matrix(b,100,3,byrow=TRUE)))/100

mse_std_mcmc[case_number,]<-diag(t(cbind(cens_mcmc_std1, cens_mcmc_std2)- matrix(1,100,2))%*% (cbind(cens_mcmc_std1, cens_mcmc_std2)- matrix(1,100,2)))/100

mse_meanlognorm_mcmc[case_number,]<-diag(t(cbind(mean1_lognorm_mcmc, mean2_lognorm_mcmc)-matrix(exp(.5),100,2))%*% (cbind(mean1_lognorm_mcmc, mean2_lognorm_mcmc)- matrix(exp(.5),100,2)))/100

mse_varlognorm_mcmc[case_number,]<-diag(t(cbind(var1_lognorm_mcmc, var2_lognorm_mcmc)- matrix(exp(2)-exp(1),100,2))%*% (cbind(var1_lognorm_mcmc, var2_lognorm_mcmc)- matrix(exp(2)- exp(1),100,2)))/100

mse_x1_EM[case_number,]<-diag(t(cens_EM_xbar1)%*%cens_EM_xbar1)/100

mse_rho1_EM[case_number,]<- t(cens_EM_rho1 -matrix(rho,100,1,byrow=TRUE))%*%( cens_EM_rho1-matrix(rho,100,1,byrow=TRUE))/100

mse_b1_EM[case_number,]<-diag(t(bhat_EM_1 -matrix(b,100,3,byrow=TRUE))%*%(bhat_EM_1 - matrix(b,100,3,byrow=TRUE)))/100

mse_x10_EM[case_number,]<-diag(t(cens_EM_xbar10)%*%cens_EM_xbar10)/100

mse_rho10_EM[case_number,]<- t(cens_EM_rho10 -matrix(rho,100,1,byrow=TRUE))%*%( cens_EM_rho10-matrix(rho,100,1,byrow=TRUE))/100

mse_b10_EM[case_number,]<-diag(t(bhat_EM_10 -matrix(b,100,3,byrow=TRUE))%*%(bhat_EM_10 - matrix(b,100,3,byrow=TRUE)))/100

mse_x100_EM[case_number,]<-diag(t(cens_EM_xbar100)%*%cens_EM_xbar100)/100

mse_rho100_EM[case_number,]<- t(cens_EM_rho100 -matrix(rho,100,1,byrow=TRUE))%*%( cens_EM_rho100-matrix(rho,100,1,byrow=TRUE))/100

mse_b100_EM[case_number,]<-diag(t(bhat_EM_100 - matrix(b,100,3,byrow=TRUE))%*%(bhat_EM_100 -matrix(b,100,3,byrow=TRUE)))/100

mse_std_EM1[case_number,]<-diag(t(cbind(cens_EM1_std1, cens_EM1_std2)-matrix(1,100,2))%*% (cbind(cens_EM1_std1, cens_EM1_std2)- matrix(1,100,2)))/100

mse_std_EM10[case_number,]<-diag(t(cbind(cens_EM10_std1, cens_EM10_std2)- matrix(1,100,2))%*% (cbind(cens_EM10_std1, cens_EM10_std2)- matrix(1,100,2)))/100

mse_std_EM100[case_number,]<-diag(t(cbind(cens_EM100_std1, cens_EM100_std2)- matrix(1,100,2))%*% (cbind(cens_EM100_std1, cens_EM100_std2)- matrix(1,100,2)))/100

mse_meanlognorm_EM1[case_number,]<-diag(t(cbind(mean1_lognorm_EM1, mean2_lognorm_EM1)- matrix(exp(.5),100,2))%*% (cbind(mean1_lognorm_EM1, mean2_lognorm_EM1)- matrix(exp(.5),100,2)))/100

mse_meanlognorm_EM10[case_number,]<-diag(t(cbind(mean1_lognorm_EM10, mean2_lognorm_EM10)-matrix(exp(.5),100,2))%*% (cbind(mean1_lognorm_EM10, mean2_lognorm_EM10)- matrix(exp(.5),100,2)))/100

mse_meanlognorm_EM100[case_number,]<-diag(t(cbind(mean1_lognorm_EM100, mean2_lognorm_EM100)-matrix(exp(.5),100,2))%*% (cbind(mean1_lognorm_EM100, mean2_lognorm_EM100)- matrix(exp(.5),100,2)))/100

mse_varlognorm_EM1[case_number,]<-diag(t(cbind(var1_lognorm_EM1, var2_lognorm_EM1)- matrix(exp(2)-exp(1),100,2))%*% (cbind(var1_lognorm_EM1, var2_lognorm_EM1)- matrix(exp(2)- exp(1),100,2)))/100

mse_varlognorm_EM10[case_number,]<-diag(t(cbind(var1_lognorm_EM10, var2_lognorm_EM10)- matrix(exp(2)-exp(1),100,2))%*% (cbind(var1_lognorm_EM10, var2_lognorm_EM10)- matrix(exp(2)- exp(1),100,2)))/100

mse_varlognorm_EM100[case_number,]<-diag(t(cbind(var1_lognorm_EM100, var2_lognorm_EM100)- matrix(exp(2)-exp(1),100,2))%*% (cbind(var1_lognorm_EM100, var2_lognorm_EM100)- matrix(exp(2)- exp(1),100,2)))/100

mse_b_na[case_number,]<-diag(t(bhat_na -matrix(b,100,3,byrow=TRUE))%*%(bhat_na - matrix(b,100,3,byrow=TRUE)))/100

# Combine Bias, MSE, and MCSE results

xbias_result[case_number,] <- c(p_cens, rho, mean(uncens_xbar), mean(cens_half_mdl_xbar), mean(cens_sqrt_mdl_xbar), mean(cens_mcmc_xbar), mean(cens_EM_xbar1), mean(cens_EM_xbar10), mean(cens_EM_xbar100))

rhobias_result[case_number,] <- c(p_cens, rho, rhobias_sampling[case_number,], rhobias_half_mdl[case_number,], rhobias_sqrt_mdl[case_number,], rhobias_mcmc[case_number,], rhobias_EM_1[case_number,], rhobias_EM_10[case_number,], rhobias_EM_100[case_number,])

bbias_result[case_number,] <- c(p_cens, rho, bbias_uncens[case_number,], bbias_half_mdl[case_number,], bbias_sqrt_mdl[case_number,], bbias_na[case_number,], bbias_mcmc[case_number,], bbias_EM_1[case_number,], bbias_EM_10[case_number,], bbias_EM_100[case_number,])

stdbias_result[case_number,] <- c(p_cens, rho, stdbias_sampling[case_number,], stdbias_half_mdl[case_number,], stdbias_sqrt_mdl[case_number,], stdbias_mcmc[case_number,], stdbias_EM_1[case_number,], stdbias_EM_10[case_number,], stdbias_EM_100[case_number,])

meanlognormbias_result[case_number,] <- c(p_cens, rho, meanlognormbias_sampling [case_number,], meanlognormbias_half_mdl[case_number,], meanlognormbias_sqrt_mdl[case_number,], meanlognormbias_mcmc[case_number,], meanlognormbias_EM_1[case_number,], meanlognormbias_EM_10[case_number,], meanlognormbias_EM_100[case_number,])

varlognormbias_result[case_number,] <- c(p_cens, rho, varlognormbias_sampling [case_number,], varlognormbias_half_mdl[case_number,], varlognormbias_sqrt_mdl[case_number,], varlognormbias_mcmc[case_number,], varlognormbias_EM_1[case_number,], varlognormbias_EM_10[case_number,], varlognormbias_EM_100[case_number,])

mse_x_result[case_number,] <- c(p_cens, rho, mse_x_uncens[case_number,], mse_x_half_mdl[case_number,], mse_x_sqrt_mdl[case_number,], mse_x_mcmc[case_number,], mse_x1_EM[case_number,], mse_x10_EM[case_number,], mse_x100_EM[case_number,])

mse_rho_result[case_number,] <- c(p_cens, rho, mse_rho_uncens[case_number,], mse_rho_half_mdl[case_number,], mse_rho_sqrt_mdl[case_number,], mse_rho_mcmc[case_number,], mse_rho1_EM[case_number,], mse_rho10_EM[case_number,], mse_rho100_EM[case_number,])

mse_b_result[case_number,] <- c(p_cens, rho, mse_b_uncens[case_number,], mse_b_half_mdl[case_number,], mse_b_sqrt_mdl[case_number,], mse_b_mcmc[case_number,], mse_b_na[case_number,], mse_b1_EM[case_number,], mse_b10_EM[case_number,], mse_b100_EM[case_number,])

mse_std_result[case_number,] <- c(p_cens, rho, mse_std_uncens[case_number,], mse_std_half[case_number,], mse_std_sqrt[case_number,], mse_std_mcmc[case_number,], mse_std_EM1[case_number,], mse_std_EM10[case_number,], mse_std_EM100[case_number,])

mse_meanlognorm_result[case_number,] <- c(p_cens, rho, mse_meanlognorm_uncens[case_number,], mse_meanlognorm_half[case_number,], mse_meanlognorm_sqrt[case_number,], mse_meanlognorm_mcmc[case_number,], mse_meanlognorm_EM1[case_number,], mse_meanlognorm_EM10[case_number,], mse_meanlognorm_EM100[case_number,])

mse_varlognorm_result[case_number,] <- c(p_cens, rho, mse_varlognorm_uncens[case_number,], mse_varlognorm_half[case_number,], mse_varlognorm_sqrt[case_number,], mse_varlognorm_mcmc[case_number,], mse_varlognorm_EM1[case_number,], mse_varlognorm_EM10[case_number,], mse_varlognorm_EM100[case_number,])

} # End Loop 3 } # End Loop 2 } # End Loop 1

# Plot codes for results matrices go here (not included in this copy)