Kernel Smoothing and Tolerance Intervals for Hierarchical Data Christopher Wilson Clemson University, [email protected]

Kernel Smoothing and Tolerance Intervals for Hierarchical Data Christopher Wilson Clemson University, Cwilso6@Clemson.Edu

Clemson University TigerPrints

All Dissertations Dissertations

12-2016 Kernel Smoothing and Tolerance Intervals for Hierarchical Data Christopher Wilson Clemson University, [email protected]

Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations

Recommended Citation Wilson, Christopher, "Kernel Smoothing and Tolerance Intervals for Hierarchical Data" (2016). All Dissertations. 1816. https://tigerprints.clemson.edu/all_dissertations/1816

This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. Kernel Smoothing and Tolerance Intervals for Hierarchical Data

A Dissertation Presented to the Graduate School of Clemson University

In Partial Fulﬁllment of the Requirements for the Degree Doctor of Philosophy Mathematical Sciences

by Christopher Wilson December 2016

Accepted by: Dr. Patrick Gerard, Committee Chair Dr. William Bridges Dr. Colin Gallagher Dr. Julia Sharp Abstract

Multistage sampling is a common sampling technique in many studies. A challenge presented by multistage sampling schemes is that an additional random term should be introduced to the linear model. Observations are identically distributed but not independent, thus many traditional kernel smoothing techniques, which assume that the data is independent and identically distributed, may not produce reasonable estimates for the marginal density. Breunig (2001) proposed a method to account for the intra-class correlation leading to a complex bandwidth involving high order derivatives for bivariate kernel density estimate. We consider an alternative approach where the data are grouped into multiple random samples, by taking one observation each class, then constructing a kernel density estimate for each sample. A weighted average of these kernel density estimates yields a simple expression for the optimal bandwidth that accounts for the intra-class correlation. For unbalanced data, resampling methods are implemented to ensure that each class is included in every random sample. Both simulations and analytical results are provided.

One-sided tolerance intervals are conﬁdence intervals for percentiles. Many authors have provided methods to estimate one-sided tolerance limits for both random samples and hierarchical data. Many of these methods have assumed that the population is normally distributed. Since multistage sampling is a popular sampling scheme, we would like to employ methods that avoid such assumptions on the population. We explore non-parametric methods that utilize bootstrapping and/or kernel density estimation to produce data driven percentile estimates. One way to account for hierarchical data is to decompose observations in a way that is consistent with decomposition of sum of squares for analysis of a one-way random eﬀects model. We provide simulation study with two percentiles of interest.

ii Dedication

This work dedicated to my parents, Michael Wilson and Susan Reynolds, who have always loved me unconditionally and whose good examples have taught me to work hard for the things that

I aspire to achieve. Thank you to my girlfriend Shivani Shah, for all her love and support

iii Acknowledgment

I oﬀer my most heartfelt praise to my advisor Dr. Patrick Gerard who has been an amaz- ing advisor. Without his knowledgeable advice, insightful criticisms, and patient encouragement, completing this dissertation would not be possible.

I am grateful that Dr. William Bridges, Dr. Colin Gallagher, and Dr. Julia Sharp provided me with constant enthusiasm and an interest in improving my research.

I would like to thank Dr. Pete Kiessler and Dr. Robert Lund for not only being great friends, but for encouraging me in every step of my journey at Clemson.

iv Table of Contents

Title Page ...... i

Abstract ...... ii

Dedication...... iii

Acknowledgments ...... iv

List of Tables ...... vi

List of Figures...... vii

1 Hierarchical Linear Models...... 1

2 Kernel Smoothing for i.i.d. Data ...... 7

3 Kernel Smoothing for Hierarchical Data...... 11

4 Percentile Estimation ...... 29

5 Application...... 68

6 Conclusion...... 73

7 Future Work...... 74

Appendices ...... 76

A Marginal Distribution with Non-Normal Errors or Cluster Eﬀects ...... 77

B Approximate Covariance of Density Estimates in (3.3) ...... 84

C Proofs...... 86

Bibliography...... 91

v List of Tables

1.1 ANOVA table for a one way random effects model for general hierarchical data sets.3 1.2 ANOVA table for a one way random effects model for hierarchical data sets with balanced data. b = ni for all i = 1, . . . , a...... 3 1.3 Comparison of coverage probability and half-width (HW) of different confidence intervals. Coverage probability refers to the percentage of simulations that µ = 0 is included in confidence intervals (1.6) and (1.7)...... 6

2 3.1 Solutions to (3.10) for various values of στ , for datasets a = 40 and b = 10, where 2 2 τi ∼ N(0, στ ) and ij ∼ N(0, 1 − στ ). Additonally, we consider 500 resamples. . . . . 16 2 3.2 Solutions to (3.18) for various values of στ , for datasets a = 40 and b = 10, and 2 2 τi ∼ N(0, στ ) and ij ∼ N(0, 1 − στ )...... 19 3.3 List of all possible combinations for distributions of τi and ij, where x denotes the combinations that were considered...... 20 3.4 Simulation results for balanced data with 40 clusters and 1000 observations for each cluster, and τi follow a normal distribution and ij are normally distributed...... 24 3.5 Simulation results for balanced data with 40 clusters and 10 observations for each cluster, and τi follow a normal distribution and ij have a log normal distribution. . 24 3.6 Simulation results for balanced data with 40 clusters and 10 observations for each cluster, and τi follow a normal distribution and ij have a double exponential distribution...... 25 3.7 Simulation results for balanced data with 40 clusters and 10 observations for each cluster, and τi follow a log normal distribution and ij are normally distributed. . . . 25 3.8 Simulation results for balanced data with 40 clusters and 10 observations for each cluster, and τi follow a double exponential distribution and ij are normally distributed. 26 3.9 Simulation results for unbalanced data with 62 clusters and τi follow a normal distribution and ij are normally distributed...... 26 3.10 Simulation results for unbalanced data with 62 clusters and τi follow a normal distribution and ij are log normally distributed...... 26 3.11 Simulation results for unbalanced data with 62 clusters and τi follow a normal distribution and ij have a double exponential distribution...... 27 3.12 Simulation results for unbalanced data with 62 clusters and τi follow a log normal distribution and ij follow a normal distribution...... 27 3.13 Simulation results for unbalanced data with 62 clusters and τi follow a double exponential distribution and ij follow a normal distribution...... 27 3.14 Simulation results for unbalanced data with 27 clusters and both τi and ij follow normal distributions...... 27 3.15 Simulation results for unbalanced data with 27 clusters and τi follow a normal distribution and ij are log normally distributed...... 28 3.16 Simulation results for unbalanced data with 27 clusters and τi follow a normal distribution and ij are double exponential distributed...... 28

vi 3.17 Simulation results for unbalanced data with 27 clusters and τi follow a log normal distribution and ij follow a normal distribution...... 28 3.18 Simulation results for unbalanced data with 27 clusters and τi follow a double expoential distribution and ij follow a normal distribution...... 28

4.1 Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 38 4.2 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 39 4.3 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 40 4.4 Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 41 4.5 Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 42 4.6 Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 62 clusters and 672 observations...... 43 4.7 Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a log normal distribution. Each dataset had 62 clusters and 672 observations...... 44 4.8 Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a double exponential distribution. Each dataset had 62 clusters and 672 observations...... 45 4.9 Results from 1000 simulations where the data is generated with τi following log normal distribution and ij also following a normal distribution. Each dataset had 62 clusters and 672 observations...... 46 4.10 Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij also following a log normal distribution. Each dataset had 62 clusters and 672 observations...... 47 4.11 Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 27 clusters and 285 observations...... 48 4.12 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 27 clusters and 285 observations...... 49 4.13 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 27 clusters and 285 observations...... 50 4.14 Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 27 clusters and 285 observations...... 51 4.15 Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 27 clusters and 285 observations...... 52

vii 4.16 Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 53 4.17 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 54 4.18 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 55 4.19 Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 56 4.20 Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster...... 57 4.21 Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 62 clusters and 672 observations...... 58 4.22 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 62 clusters and 672 observations...... 59 4.23 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 62 clusters and 672 observations...... 60 4.24 Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 62 clusters and 672 observations...... 61 4.25 Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 62 clusters and 672 observations...... 62 4.26 Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 27 clusters and 285 observations...... 63 4.27 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 27 clusters and 285 observations...... 64 4.28 Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 27 clusters and 285 observations...... 65 4.29 Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 27 clusters and 285 observations...... 66 4.30 Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 27 clusters and 285 observations...... 67

5.1 Results from Levene’s test for several measure of lumber strength, as well as summary statistics. Here ICCd is the estimated intra-class correlation for each quantity fo interest. 70 5.2 Estimate for 10th and 25th percentile of measure of lumber strength...... 72

viii th th 2 2 A.1 The 10 and 25 percentile of the sum of a τi ∼ N(0, στ ) and ij −1 ∼ LN(0, 1 − στ ) with varying amounts of correlation...... 78 th th 2 2 A.2 The 10 and 25 percentile of the sum of a τi −1 ∼ LN(0, στ ) and ij ∼ N(0, 1 − στ ) with varying amounts of correlation...... 80 th th 2 2 A.3 The 10 and 25 percentile of the sum of a τi ∼ N(0, στ ) and ij ∼ DE(0, 1 − στ ) with varying amounts of correlation...... 82 A.4 The 10th and 25th percentile of the marginal distribution of observations from model 2 2 (1.1) when τi ∼ DE(0, στ ) and ij ∼ N(0, 1 − στ )...... 83

ix List of Figures

2.1 Comparison between fourth order kernel and a second order kernel (standard normal pdf)...... 8

2 3.1 Plots of (3.10), as a function of h, for variaous values of στ , for datasets with a = 40 2 2 and b = 10, and τi ∼ N(0, στ ) and ij ∼ N(0, 1 − στ ). Additonally, we consider 500 resamples...... 17

5.1 Visually summary of MOE, MOR, and MOE/MOR...... 69 5.2 Kernel density estimate of marginal distribution of MOE of board. The blue kernel density estimate uses resampling to compute the bandwidth, while the red uses the Sheather and Jones plug in method...... 70 5.3 Kernel density estimate of marginal distribution of MOR of board. The blue kernel density estimate uses resampling to compute the bandwidth, while the red uses the Sheather and Jones plug in method...... 71 5.4 Kernel density estimate of marginal distribution of the ratio MOR/MOE of board. The blue kernel density estimate uses resampling to compute the bandwidth, while the red uses the Sheather and Jones plug in,...... 71

2 A.1 Above are plots of the marginal pdf of an observation from model (1.1) if τi ∼ N(0, στ ) 2 and ij ∼ LN(0, 1 − στ )...... 79 A.2 Above are plots of the marginal pdf of an observation from model (1.1) if τi − 1 ∼ 2 2 LN(0, στ ) and ij ∼ N(0, 1 − στ )...... 80 2 A.3 Above are plots of the marginal pdf of an observation from model (1.1) if τi ∼ N(0, στ ) 2 and ij ∼ DE(0, 1 − στ )...... 81 A.4 Above are plots of the marginal pdf of an observation from model (1.1) if τi ∼ 2 2 DE(0, στ ) and ij ∼ N(0, 1 − στ )...... 82

x Chapter 1

Hierarchical Linear Models

Hierarchical linear models are statistical models that have more than one source of random variation. This type of linear model is considered when data is collected by means of multistage sampling. The first stage involves breaking the population into groups, called clusters, and randomly selecting several clusters. The second stage is collecting a random sample from each of the selected clusters. Hierarchical linear models are useful because inference can be made about all of the clusters, unlike fixed effects models where the only source of random variation is due to the sampling within the groups. If a fixed effects model is employed, then inference should only be made concerning the clusters that are represented in the experiment. In other words, the population being considered by random effects models is larger than that of fixed effects models.

Hierarchical data occurs often in manufacturing of goods. For instance, there are many lumber mills in the United States. It is too time consuming and expensive to study lumber from each mill. For practical purposes, we collect data from a much smaller number of mills. There are many elements of lumber manufacturing that can impact the quality of the final product. Possible examples are mills could get their raw materials from different locations around the world, and mills may have varying production protocols. Also, environmental factors like climate, may cause differences in the final product from each mill. The quality of the lumber from the same mill could be impacted by any of these factors in a very similar way. In other words, while boards from the same mill are related, they are not related to boards from other mills. We aim to take this data structure into consideration.

We will be studying one-way random eﬀect models which have two sources of random

1 variation. One source of random variation is due to randomness in the selection of the clusters, and the other is due to randomness from choosing the units from each cluster. An example of a linear model for hierarchical data is

yij = µ + τi + ij i = 1, . . . , a, j = 1, . . . , ni, (1.1)

th th where yij is the value of the response variable of the j observation from the i cluster, and µ is the overall mean. We assume that {τ1, τ2, . . . , τa} is a collection of independent and identically distributed (i.i.d.) random variables. In many cases, each τi is considered to be a normal random

2 variable with mean 0 and variance στ , however these random variables are not necessarily required to be normally distributed. Similarly, {11, 12, . . . , ana } are independent and identically distributed

random variables. Commonly, {11, 12, . . . , ana } are assumed to follow a normal distribution with

2 mean 0 and variance σ , but again the normality assumption is not required. Regardless of the distribution assumptions, every τi and ij are independent of one another. Due to the additional source of random variation, the covariance structure of a hierarchical model is

   2 2 0 0  0 0 στ + σ i = i , j = j 1 i = i , j = j     Cov(yij, yi0j0 ) = σ2 i = i0, j 6= j0 and Corr(yij, yi0j0 ) = σ2/(σ2 + σ2) i = i0, j 6= j0  τ  τ τ    0  0 0 i 6= i , 0 i 6= i . (1.2)

The correlation between observations within the same cluster must be taken into account because as

2 στ increases, the information gained from a single observation can decrease signiﬁcantly. If the data 0 P are independent and Corr(yij, yij0 ) = 0 for all j 6= j , then each of the N = i ni observations contain the same amount of information. Alternatively, if the data are perfectly correlated, Corr(yij, yij0 ) = 1, then each observation from the same cluster is identical. Thus only one observation from each cluster is required. In that scenario there are only a out of N observations that contain information.

A useful tool for analyzing a dataset is an analysis of variance table (ANOVA). The ANOVA table is helpful for testing 2 H0 : στ = 0 (1.3) 2 H1 : στ > 0.

2 Source df SS MS E[MS] a ! X n2 N − i a N X 2 2 2 i=1 Between a − 1 SSB = ni(¯yi· − y¯··) MSB = SSB/(a − 1) σ + σ τ a − 1 i=1 a a ni a X X X 2 X 2 Within (ni − 1) SSW = (yij − y¯i·) MSW = SSW/ (ni − 1) σ i=1 i=1 j=1 i=1 a ni X X 2 Total N − 1 SST = (yij − y¯··) i=1 j=1

Table 1.1: ANOVA table for a one way random eﬀects model for general hierarchical data sets.

Source df SS MS E[MS] a X 2 2 2 Between a − 1 SSB = b (¯yi· − y¯··) MSB = SSB/(a − 1) σ + bστ i=1 a b X X 2 2 Within a(b − 1) SSW = (yij − y¯i·) MSW = SSW/(a(b − 1)) σ i=1 j=1 a b X X 2 Total ab − 1 SST = (yij − y¯··) i=1 j=1

Table 1.2: ANOVA table for a one way random eﬀects model for hierarchical data sets with balanced data. b = ni for all i = 1, . . . , a.

The ANOVA table for the hierarchical linear model (1.1) can be found in Table 1.1. The ANOVA table for model (1.1) reduces to the ANOVA table in Table 1.2, when we consider balanced data

(b = ni for all i = 1, . . . , a). The following is focused on balanced data, where we have the same number of observations from each clusters.

If the underlying population is normal, the test statistic for testing the hypothesis found in

(1.3) is F0 = MSB/MSW , which under the null hypothesis follows an F -distribution with a − 1 numerator degrees of freedom and a(b − 1) denominator degrees of freedom. The null hypothesis is rejected if the between group variation is much larger than the within group variation. For a reference for what large values of F0 are, under the null hypothesis E[F0] = a(b − 1)/(a(b − 1) − 2) and E[F0] approaches 1 as either a or b increases.

2 2 To ﬁnd appropriate estimators for the variance components, στ and σ , we can solve the following system of equations,

2 2 MSB =σ ˆ + bσˆτ

2 MSW =σ ˆ .

3 2 The above system yieldsσ ˆτ = (MSB − MSW )/b, hence an estimator for the variance of an observation is

2 2 σˆτ +σ ˆ = MSB/b + (1 − 1/b)MSW. (1.5)

Notice that both the between and within errors are involved to estimate the variance of an observation, as opposed to analyzing a random sample where we would use the sample variance to estimate the population variance.

To illustrate possible implications of conducting inappropriate analysis that ignores the covariance structure of random eﬀects models, we conduct a simulation study to see how often conﬁdence intervals for the overall mean, µ, actually contain µ and we also record the mean margin

2 2 of error (MOE). In this simulation study, we will set µ = 0 and στ = 1 − σ , and then generate datasets via (1.1). We will construct 95% conﬁdence intervals for µ assuming that the data are a random sample, or that is there no correlation between any observations. A 95% conﬁdence interval for µ based on a simple random sample is given by:

s y¯·· ± t0.975,ab−1 ∗ √ , (1.6) ab

where s is the sample standard deviation and t0.975,ab−1 is the 97.5 percentile of a t-distribution with ab − 1 degrees of freedom. We also construct the following 95% conﬁdence interval which should be used for hierarchical models like (1.1):

r MSB y¯ ± t ∗ . (1.7) ·· 0.975,a−1 ab

There are two important differences to notice in above confidence intervals, the first is we only use the mean squares between (MSB) for the margin of error for (1.7), while sample standard deviation,

2 s, is used in (1.6). As στ increases we expect the cluster means to become more spread out leading to a larger MSB, while the sample standard deviation should not change dramatically and should

2 2 be approximately στ + σ . The second difference is that the degrees of freedom used for the critical values, ab − 1 degrees of freedom are used for the critical value in (1.6) and we only use a − 1 degrees of freedom in (1.7). We know that as we increase degrees of freedom, critical values of a t-distribution decrease for a fixed confidence level. The disparity in the critical values, t0.975,a−1 and t0.975,ab−1, becomes more obvious with fewer clusters. For instance, for a = 3 and b = 2, the

4 resulting critical values are t0.975,2 = 4.303 and t0.975,5 = 2.571. From Table 1.3, we see that the average MOE behaves diﬀerently for the conﬁdence intervals.

2 If στ = 0, and both a and b are small, confidence intervals computed using (1.7) are much wider, on average, than confidence intervals constructed using (1.6); surprisingly confidence interval (1.6) has a higher coverage probability. However, if either a or b are reasonably large, then the discrepancy

2 2 between the MOE’s is less noticeable for στ = 0. When στ = 0.95, the diﬀerence in the MOE is due to the critical value, since s2 and MSB should be relatively close to 1. We observe that

2 conﬁdence intervals constructed via (1.6) become more narrow as στ increases resulting in low coverage probability. On the other hand, conﬁdence intervals constructed using (1.7) become wider as

2 στ increases and maintain coverage probabilities that are always approximately 0.95. This illustrates the value in using tools that take the sampling scheme into account as opposed to blindly using formulas assuming data are i.i.d. We will focus on developing methods tailored to hierarchical linear models and we will be making comparisons to methods that assume that data are a random sample.

While estimating the overall population mean is a goal of many studies, it is possible that estimation of another population quantities or curves that may be of interest. For instance, in the study of manufactured goods or quality control it may be important to estimate the pth percentile,

Yp = inf{y : F (y) ≥ p}.

There are many approaches to estimating percentile, which will be discussed in Chapter 4. Chapter

2 and 3 will focus applying kernel density estimation to estimation the population density, f(y) = dF (y) dy , for both i.i.d. data (Chapter 2), as well as hierarchical data (Chapter 3).

5 Conﬁdence Interval (1.6), df= ab − 1 Conﬁdence Interval (1.7), df= a − 1

2 a b στ Coverage Probability Average HW Coverage Probability Average HW 0 0.949 0.9834 0.934 1.5177 0.1 0.943 0.9771 0.952 1.6320 0.25 0.922 0.9848 0.934 1.7528 3 2 0.5 0.890 0.9324 0.946 1.8836 0.75 0.828 0.8843 0.962 2.0352 0.95 0.779 0.8533 0.950 2.1866 0 0.942 0.5425 0.946 0.9686 0.1 0.920 0.5349 0.949 1.1280 0.25 0.856 0.5189 0.958 1.3584 3 5 0.5 0.724 0.4908 0.950 1.6664 0.75 0.572 0.4544 0.939 1.9041 0.95 0.531 0.4316 0.952 2.1779 0 0.953 0.1609 0.947 0.1719 0.1 0.842 0.1603 0.956 0.2358 0.25 0.724 0.1601 0.961 0.3095 15 10 0.5 0.592 0.1585 0.953 0.4047 0.75 0.494 0.1563 0.946 0.4795 0.95 0.469 0.1544 0.956 0.5324 0 0.937 0.0982 0.949 0.1454 0.1 0.464 0.0969 0.948 0.4777 0.25 0.306 0.0953 0.948 0.7602 4 100 0.5 0.205 0.0910 0.955 1.0531 0.75 0.162 0.0851 0.947 1.2571 0.95 0.120 0.0795 0.950 1.4152 0 0.962 0.5714 0.954 0.6407 0.1 0.945 0.5635 0.945 0.6545 0.25 0.923 0.5633 0.948 0.7093 7 2 0.5 0.893 0.5550 0.964 0.7724 0.75 0.841 0.5511 0.944 0.8430 0.95 0.822 0.5352 0.949 0.8775 0 0.954 0.0664 0.951 0.0684 0.1 0.694 0.0663 0.953 0.1263 0.25 0.506 0.0660 0.941 0.1800 35 25 0.5 0.404 0.0657 0.955 0.2456 0.75 0.365 0.0651 0.953 0.2958 0.95 0.311 0.0652 0.949 0.3339 0 0.951 0.0982 0.954 0.1013 0.1 0.860 0.0981 0.951 0.1382 0.25 0.732 0.0978 0.941 0.1808 40 10 0.5 0.585 0.0973 0.947 0.2350 0.75 0.528 0.0968 0.934 0.2789 0.95 0.471 0.0962 0.960 0.3089

Table 1.3: Comparison of coverage probability and half-width (HW) of different confidence intervals. Coverage probability refers to the percentage of simulations that µ = 0 is included in confidence intervals (1.6) and (1.7).

6 Chapter 2

Kernel Smoothing for i.i.d. Data

We seek an estimate for the marginal pdf for an observation from a continuous population using kernel density estimation. This chapter will be a brief summary of many important topics in kernel density estimation. Further details can be found in Wand and Jones (1994). Let

Y1,Y2,...,Yn represent a random sample, then a kernel density estimate of a pdf at a given value of y can be found by n 1 X y − Yi fˆ(y) = K , (2.1) nh h i=1 where K(·) is the kernel function, and h is called the bandwidth. To aid in obtaining reasonable kernel estimates, the following assumptions are typically employed.

1. The density f is such that its second derivative f 00 is continuous, square integrable and ulti-

mately monotone. A function is ultimately monotone if the function is monotonic over both

(−∞, −M] and [M, ∞) for some M > 0.

2. Let h = hn be a sequence of positive values such that

lim hn = 0 and lim nhn = ∞. n→∞ n→∞

3. The kernel K is a bounded probability density function having ﬁnite fourth moment and is

symmetric about the origin.

Assumptions 1 and 3 can easily be weakened or even replaced with other conditions for f and K.

It has been shown that, if the above assumptions are met, then the choice of the kernel function

7 Figure 2.1: Comparison between fourth order kernel and a second order kernel (standard normal pdf). does not severely impact the eﬃciency of the kernel density estimate. However, as we will see later, selection of h plays an important role in the quality of a kernel estimate.

R s If the above assumptions are used, then µ2(K) > 0, where µs(g) = y g(y)dy, and K is called a second order kernel. Weakening the third assumption, in particular not requiring K to

th be a probability density, can lead to higher order kernels. K is a r order kernel if µs = 0 for

th s = 1, 2, . . . , r − 1 and µr 6= 0. We denote the r order kernel by K[r]. Jones and Foster (1993) discovered a recursive formula to compute higher order kernels,

3 1 K (y) = K (y) + yK0 (y). [r+2] 2 [r] 2 [r]

For example, if K[2](y) = φ(y), where φ(·) is the standard Gaussian density function, then a fourth

1 2 order kernel is K[4](y) = 2 (3 − y )φ(y). Higher order kernels can lead to density and distribution estimates that are diﬃcult, or even impossible, to interpret due to possibly producing density estimates with negative values or a potentially decreasing distribution function estimate.

While the use of a suboptimal second order kernel function may not greatly impact the

8 efficiency of kernel density estimates, the choice of the bandwidth is crucial. Large bandwidths can lead to important features of the data being missed, while small bandwidths can lead to noisy estimates. The trade off between estimates being too smooth or too noisy is known as the variance- bias trade off. We seek a bandwidth that leads to a density estimate that is not too noisy, yet reflects important features. Common bandwidth selection techniques attempt minimize the asymptotic mean integrated squared error (AMISE) which is minimized by

R(K) 1/5 hAMISE = 2 00 , (2.2) µ2(K) R(f )n

R 2 R 2 where R(g) = g(y) dy, and µ2(K) = y K(y)dy. Optimizing the bandwidth requires some knowledge of the population, namely R(f 00). There are many approaches to bandwidth selection.

An example of bandwidth selection is the normal scale rule which assumes that the population follows a normal distribution. Other examples of bandwidth selection are cross validation, bootstrapping methods, and direct plug-in methods. There are problems associated with all of these methods of bandwidth selection, but we will concentrate on plug-in methods.

A direct plug in method requires an estimate for R(f (s)). The relationship R(f (s)) =

R f (s)(y)2dy = (−1)s R f (2s)(y)f(y)dy will be helpful, and can easily be shown using integration by

(s) (r) r/2 R (r) parts. R(f ) can be thought of as an expectation, E[f (y)] = ψr = (−1) f (y)f(y)dy. We estimate ψr with n n n X X X yi − yj ψˆ = n−1 fˆ(r)(y , g) = n−2 L(r) , (2.3) r i g i=1 i=1 i=1 where L(r) is the rth derivative of the kernel L and g is the bandwidth.

00 To obtain an estimate for ψ2 = R(f ) in the denominator of (2.2), we set r = 2 in (2.3). A practical concern would be the selection of the bandwidth, g. To select an appropriate bandwidth to estimate fˆ00(y), we can ﬁnd the bandwidth that minimizes the AMISE of f 00(y), which requires and estimate of the fourth derivative of the population. The following process for ﬁnding bandwidth which attempts to minimize the AMISE has been suggested by Sheather and Jones (1993):

ˆNS 9 1. Estimate ψ8 using the normal scale rule estimate ψ8 = 105/(32πσˆ ).

ˆ 2. Estimate ψ6 using the kernel estimator ψ6(g1), where

(6) 2 ˆNS 1/9 g1 = {−2K (0)/(µ2(K) ψ8 n)} .

9 ˆ 3. Estimate ψ4 using the kernel estimator ψ4(g2), where

(4) 2 ˆ 1/7 g2 = {−2K (0)/(µ2(K) ψ6(g1)n)} .

4. The selected bandwidth is

ˆ 2 ˆ 1/5 hDPI,2 = {R(K)/(µ2(K) ψ4(g2)n)} .

Plug-in methods are attractive because they are completely data driven and do not tend to produce bandwidths that are as volatile as some other bandwidth selection procedures.

In the following chapter, we will discuss kernel density estimation for more complicated sampling techniques, in particular a hierarchical sample.

10 Chapter 3

Kernel Smoothing for Hierarchical

Data

In this chapter, we will attempt to estimate the marginal density of an observation obtained from a multistage sampling scheme via kernel smoothing. There has been minimal work done regarding smoothing clustered data. Breunig (2001) proposed a method that requires the use of fourth order kernels. Higher order kernels were explored to avoid complicated polynomial expressions that resulted from including more terms in the Taylor expansion to approximate the AMISE.

Higher order kernels are not required to be density functions, thus the resulting density estimate is not guaranteed to be a density. This can make kernel density estimates constructed from higher order kernel diﬃcult to interpret. To account for general hierarchical data, Breunig proposed the following kernel density estimator at a given y

a ni a 1 X X y − Yij X fˆ(y) = K where N = n . (3.1) Nh h i i=1 j=1 i=1

If a higher order kernel is used, the bandwidth that minimizes the AMISE is very diﬃcult to estimate. A data driven bandwidth would involve bivariate kernel density estimation which requires selecting an appropriate bandwidth matrix. Breunig did not address this issue, rather he simply assumed that the population follows a normal distribution and found an optimal bandwidth with a normal scale rule. We seek an alternative approach that does not require such severe assumptions

11 the population distribution.

There are several assumptions that are typically made when working with clustered data.

Observations from the same cluster are identically distributed but not independent; while observations from different clusters are i.i.d. If there were only one observation from each cluster, then we would have a random sample of size a. An estimate of the population pdf could be constructed by appealing to the classic kernel density estimate approaches outlined in Chapter 2. However, multistage samples contain multiple observations from each cluster. Any combination of a observations that are all from different clusters can be viewed as an i.i.d. sample and can be used to construct a density estimate. We present two different kernel density estimators that combine kernel density estimates and are based upon i.i.d. samples. The first estimator, which can be used for any hierarchical dataset, is expressed as

Q ˆ X ˆ∗ fR(y) = wjfj (y), (3.2) j=1 where a 1 X y − Yij∗ fˆ∗(y) = K , (3.3) j ah h i=1

∗ where Yij is randomly selected from Yi = {Yi1,Yi2,...,Yini } with replacement and each element P of Yi is equally likely to be selected. We also assume that each wj is non-negative and j wj = 1. Estimators of this form have been studied in Hoﬀman (2001) to implicitly account for the correlation between observations in the same cluster. The second estimator selects each observation without replacement which is only appropriate for balanced datasets, where ni = b for all i, in order to ensure that each cluster is represented in each i.i.d. sample. We express the second estimator as

b ˆ X ˆ fC (y) = wjfj(y), (3.4) j=1 where a 1 X y − Yij fˆ (y) = K , (3.5) j ah h i=1

th th P where Yij is the j observed value from the i cluster, for j = 1, . . . , ni, and wj ≥ 0 and j wj = 1. Hence, we can construct b density estimates so that each observation is used only once, and all observations will be used. Next, we will focus on approximating the MISE of (3.2).

12 In the following lemmas, we derive the bias and the variance of individual kernel density estimates based on resampled observations, as well as the covariance between two resampled density estimates. The bias and variance will allow us to compute the AMISE, which in turn will help in obtaining an appropriate bandwidth for kernel density estimator (3.2). Proofs are provided in

Appendix C.

ˆ∗ ˆ∗ Lemma 3.1 If we consider fj (y), in (3.3), as an estimator for f(y), then the bias of fj (y) is

h2µ (K)f 00(y) E[fˆ∗(y) − f(y)] = 2 + o(h2) j 2

ˆ∗ and fj (y) has variance R(K)f(y) Var(fˆ∗(y)) = + o((ah)−1). j ah

ˆ∗ Additionally, if h → 0 and ah → ∞, then fj (y) is a consistent estimator of f(y).

Lemma 3.2 The covariance between two resampled density estimators, in (3.3), is

a ( 2 2) ∗ ∗ −1 X R(K)f(y) (ni − 1)στ 0 y − µ −1 Cov(fˆ (y), fˆ 0 (y)) = (ah) + K + O(a ) j j an an h3 h i=1 i i and approaches 0 as ah3 → ∞.

We can use Lemma 3.1 and Lemma 3.2 to write an expression for the variance of (3.2), which is

 Q  ˆ X ˆ∗ Var(fR) = Var  wjfj (y) j=1 Q Q X 2 ˆ∗ X X ˆ∗ ˆ∗ = wj Var(fj (y)) + wjwj0 Cov fj (y), fj0 (y) j=1 j=1 j06=j Q Q a ( 2 2) X 2 R(K)f(y) X X wjwj0 X R(K)f(y) (ni − 1)στ 0 y − µ −1 ≈ wj + + 3 K + o((ah) ) ah ah ani anih h j=1 j=1 j06=j i=1 Q Q a ( 2 2) X R(K)f(y) X wj(1 − wj) X R(K)f(y) (ni − 1)σ y − µ = w2 + + τ K0 + o((ah)−1). j ah ah an an h3 h j=1 j=1 i=1 i i

13 Hence, the asymptotic MSE is

ˆ ˆ 2 AMSE(fR(y)) = lim E[(fR(y) − f(y)) ] a→∞ ˆ 2 ˆ = lim E[fR(y) − f(y)] + lim Var(fR(y)) a→∞ a→∞ 4 2 00 2 Q h µ2(K) f (y) X R(K)f(y) = + w2 4 j ah j=1 Q a ( 2 2) X wj(1 − wj) X R(K)f(y) (ni − 1)σ y − µ + + τ K0 . ah an an h3 h j=1 i=1 i i

AMISE is obtained by integrating with respect to y , which yields

Z ˆ ˆ AMISE(fR) = AMSE(fR(y))dy  Z 4 2 00 2 Q h µ2(K) f (y) X R(K)f(y) = + w2  4 j ah j=1  Q a ( 2 2) X wj(1 − wj) X R(K)f(y) (ni − 1)σ y − µ + + τ K0 dy ah an an h3 h  j=1 i=1 i i

4 2 00 Q h µ2(K) R(f ) X R(K) = + w2 4 j ah j=1 Q a 2 0 X wj(1 − wj) X R(K) (ni − 1)σ R(K ) + + τ . (3.6) ah an an h2 j=1 i=1 i i

R 0 y−µ 2 R 0 2 0 y−µ −1 Note that K h dy = h K (u) du = hR(K ), by letting u = h and du = h dy.

To ﬁnd the optimal weights for (3.2), we seek a set of weights that minimizes the AMISE P subject to j wj = 1. We will use a Lagrangian multiplier which produces the following function, where λ is a constant that is not restricted in sign,

  Q 4 2 00 Q X h µ2(K) R(f ) X R(K) g(y) = λ 1 − w + + w2  j 4 j ah j=1 j=1

Q a 2 0 X wj(1 − wj) X R(K) (ni − 1)σ R(K ) + + τ . ah an an h2 j=1 i=1 i i

To ﬁnd the optimal weights, we take the partial derivative of g(y) with respect to each wj. Hence,

14 for all j,

a 2 0 ∂g(y) 2wjR(K) X R(K) (ni − 1)σ R(K ) set = −λ + + (1 − 2w ) + τ = 0. ∂w ah j a2n h a2n h2 j i=1 i i

This leads to the following system of equations,

a 2 0 2w1R(K) X R(K) (ni − 1)σ R(K ) λ = + (1 − 2w ) + τ ah 1 a2n h a2n h2 i=1 i i a 2 0 2w2R(K) X R(K) (ni − 1)σ R(K ) λ = + (1 − 2w ) + τ ah 2 a2n h a2n h2 i=1 i i . .

a 2 0 2wQR(K) X R(K) (ni − 1)σ R(K ) λ = + (1 − −2w ) + τ . ah Q a2n h a2n h2 i=1 i i

This system can only be satisﬁed if all the weights are equal. For this to be the case, the optimal

−1 weights are wj = Q for all j. If all weights are equal to Q−1, then the AMISE becomes

h4µ (K)2R(f 00) R(K) AMISE(fˆ (y)) = 2 + R 4 ahQ a 2 0 Q − 1 X R(K) (ni − 1)σ R(K ) + + τ . (3.7) ahQ an an h2 i=1 i i

To compute the optimal bandwidth, diﬀerentiate AMISE with respect to h yielding

∂AMISE(fˆ (y)) R(K) R = h3µ (K)2R(f 00) − ∂h 2 ah2Q a 2 0 Q − 1 X R(K) 3(ni − 1)σ R(K ) set − + τ = 0. (3.8) aQ ah2n an h4 i=1 i i

If we consider a balanced data set, where ni = b for all i, then (3.8) becomes

∂AMISE(fˆ (y)) R(K) R = h3µ (K)2R(f 00) − ∂h 2 ah2Q Q − 1 R(K) 3(b − 1)σ2R(K0) − + τ set= 0. (3.9) Q abh2 abh4

Finding the roots to equation (3.9) is equivalent to ﬁnding the roots of the following seventh order

15 polynomial in h:

R(K) 3(b − 1)(Q − 1)σ2R(K0) µ (K)2R(f 00)h7 − (b + (Q − 1)) h2 − τ set= 0. (3.10) 2 abQ abQ

2 The solution for (3.10) when στ = 0 is

(b + (Q − 1))R(K)1/5 h = 2 00 . µ2(K) R(f )abQ

2 Additionally, in the case that στ = 0, an infeasible solution to (3.10) is h = 0. Note that h is diﬀerent from the bandwidth that minimizes AMISE for i.i.d. data (2.2), but the bandwidth

−1/5 2 remains O((ab) ). For στ > 0, we are guaranteed at least one zero for h > 0. When h = 0, the left hand side of (3.10) is negative and eventually the leading term in the polynomial will dominate, as h increases, forcing the value for the left hand side of (3.10) to be positive. Using the Intermediate

Value Theorem, there must be at least one positive solution to the seventh order polynomial. To determine if the estimate of AMISE is a convex function of h, the second derivative of AMISE with respect to h is

∂2AMISE(fˆ (y)) (b + (Q − 1))R(K) 12(b − 1)σ2R(K0) R = 3h2µ2(K)R(f 00) + 2 + τ . ∂h2 2 abh3 abh5

Notice that every term in the above equation is positive, hence AMISE is a convex function for h > 0. A solution to (3.10) is the global minimum for h > 0. As the intra-class correlation increases, the positive root of (3.10) increases. For an example of the behavior of the roots of 3.10, we assume both τi and ij follow a normal distribution, values of roots are given in the Table 3.1 below for

2 selected values of στ , with a = 40, b = 10, and Q=500. Additionally in Figure 1, we see plots of 2 2 2 (3.10) with various values of στ , under the assumption that τi ∼ N(0, στ ) and ij ∼ N(0, 1στ ).

2 στ Root 0 0.3202 0.1 0.4724 0.25 0.5329 0.5 0.5857 0.75 0.6196 0.95 0.6403

2 Table 3.1: Solutions to (3.10) for various values of στ , for datasets a = 40 and b = 10, where 2 2 τi ∼ N(0, στ ) and ij ∼ N(0, 1 − στ ). Additonally, we consider 500 resamples.

16 2 Seventh order polynomial in (3.11) στ 2 2 στ = 0 στ = 0.1 1e−05 0.002 f(h) f(h) −1e−05 0.000 −3e−05 −0.002 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.5

h h

2 2 στ = 0.25 στ = 0.5 0.010 0.004 f(h) f(h) 0.000 0.000 −0.004 0.0 0.1 0.2 0.3 0.4 0.5 0.6 −0.010 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

h h

2 2 στ = 0.75 στ = 0.95 0.015 0.010 f(h) f(h) 0.000 0.000 −0.015

−0.015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6

h h

2 Figure 3.1: Plots of (3.10), as a function of h, for variaous values of στ , for datasets with a = 40 2 2 and b = 10, and τi ∼ N(0, στ ) and ij ∼ N(0, 1 − στ ). Additonally, we consider 500 resamples.

17 ˆ An additional property of interest may be the asymptotic distribution of fR(y). The next ˆ theorem will show that fR(y) can be asymptotically normal.

Theorem 3.1 Assume that h → 0, (ah)1/2h2 → 0, and ah3 → ∞ , let the resampled kernel density estimate be deﬁned as 1 X fˆ (y) = fˆ∗(y), R Q j j=1 where a 1 X y − Yij∗ fˆ∗(y) = K j ah k i=1 for very large Q. If (ah)−1 → ∞, and h → 0 as a → ∞, then

1/2 ˆ 2 (ah) (fR(y) − f(y)) → N 0, σ , (3.11) where a 2 R(K)f(y) Q − 1 X R(K)f(y) (ni − 1)σ y − µ σ2 = + + τ K0 (3.12) Q Q an an h3 h i=1 i i as a → ∞.

The assumption (ah)1/2h2 → 0 ensures that the squared bias approaches 0 faster than the variance approaches 0. We also need to assume that ah3 → ∞ to ensure that the asymptotic variance is

ﬁnite. Theorem 3.1 holds when h ∝ a−β, where β ∈ (1/5, 1/3). If the usual assumptions, namely

(ah)−1 → ∞, are made then the squared bias and variance converge to 0 at the same rate, causing

1/2 ˆ the mean of (ah) (fR(y) − f(y)) to depend on a. ˆ∗ Now we focus on bandwidth selection for estimator (3.2). Since fj (y), for j = 1, 2,...,Q, is based on a random sample, we can use Sheather and Jones’ (1991) method to ﬁnd an optimal

∗ bandwidth for each density estimate, these bandwidths will be denoted by hj . Each of the bandwidths are independent of the amount of intra-class correlation and contain information from each cluster, making each one a potential candidate for the bandwidth. We could then average these Q bandwidths, Q 1 X h = h∗ (3.13) R Q j j=1

∗ th where hj is the bandwidth using the j resample, and use the result as the bandwidth in (3.2). Another bandwidth selection method involves solving each bandwidth computed via Sheather and

18 Jones’ (1991) methods for R(f 00) in (2.2),

00 R(K) R(f ) = 5 2 , (3.14) h µ2(K)a providing Q estimates for R(f 00), which can be averaged, then substituted into (2.2) producing

R(K) 1/5 00 hR(f ) = 2 00 , (3.15) µ2(K) RR(f )aQ where Q 1 X R(K) R (f 00) = . (3.16) R Q (h∗)5µ2(K)a j=1 j 2

We average each estimate of R(f 00) because each contains independent observations and the same amount of information about their respective cluster.

If a balanced hierarchical dataset is being analyzed, it may not be necessary to use observations more than once. This would avoid the use of resampling, which signiﬁcantly reduces the computational burden. Another advantage of estimator (3.4) is the variance will be slightly less ˆ ˆ complicated compared to the variance of f(y). The AMISE for fC (y) is

h4µ (K)2R(f 00) R(K) (b − 1)σ2R(K0) AMISE(fˆ (y)) = 2 + + τ . (3.17) C 4 abh abh2

The bandwidth that minimizes (3.17) is the solution to the following seventh order polynomial h,

R(K) 3(b − 1)σ2R(K0) µ (K)2R(f 00)h7 − h2 − τ set= 0. (3.18) 2 ab ab

Like polynomial (3.10), polynomial (3.18) only has one positive root. Listed below in Table 3.2 are the roots of (3.18) for several values of intra cluster correlation. The solutions to (3.18) tend to be

2 στ Root 0 0.2542 0.1 0.3987 0.25 0.4510 0.5 0.4964 0.75 0.5253 0.95 0.5430

2 Table 3.2: Solutions to (3.18) for various values of στ , for datasets a = 40 and b = 10, and 2 2 τi ∼ N(0, στ ) and ij ∼ N(0, 1 − στ ).

19 smaller than the solutions to (3.10). We will also explore analogous bandwidth techniques to those ˆ proposed for fR(y). We will average b bandwidths, which are found using Sheather and Jones’ plug in method. We also estimate R(f 00) based on average of b estimates of R(f 00),

R(K) 1/5 00 hR(f ) = 2 00 , (3.19) µ2(K) RC (f )ab where b 1 X R(K) R (f 00) = . (3.20) C b (h∗)5µ2(K)a j=1 j 2

Next, we conduct simulations to compare kernel density estimates (3.4) and (3.2) by comparing

MISE, using many of the bandwidth selection methods outlined above.

Simulations

A variety of simulations were conducted to study the behavior of several methods of bandwidth selection. Both τi and ij will be randomly generated from normal, double exponential (heavy-tailed and symmetric distribution), and log normal (skewed distribution) distributions. A brief description of both the log normal and double exponential distributions as well as parame- terizations that were used are included in Appendix A. There are nine possible combinations of distributions for τi and ij and the cases we considered are displayed Table 3.3. Additionally six value of intra-class correlation were studied which ranging for 0 (i.i.d. data) to 0.95 (extremely correlated data) for each combination of distributions that were considered.

Distribution of τi Normal Log Normal Double Exponential Normal x x x Distribution of ij Log Normal x Double Exponential x

Table 3.3: List of all possible combinations for distributions of τi and ij, where x denotes the combinations that were considered.

In addition to studying the eﬀect of diﬀerent distributions, we will consider both balanced and unbalanced data sets. For balanced data, each simulation will be conducted with sample size of 400 consisting of 40 clusters with 10 observations. For unbalanced data, two separate simulation studies were conducted. Each study had the same sample size and number of clusters for each

20 simulated dataset. The first configuration has 62 clusters with sample size of 672, and the second configuration has 27 clusters with a sample size of 285.

We will be comparing seven methods of bandwidth selection. We observe which method produces the smallest MISE and study the behavior of the bandwidth as the intra-class correlation changes. We implement Sheather and Jones’ direct plug-in method (SJ), averaging multiple band- ˆ widths computed with only one observation from each cluster without resampling (hC ), appropriate ˆ 00 for balanced data, and with resampling (hR). We can obtain an estimate for R(f ), by averaging R(f 00) based on one observation from each cluster. We introduced two schemes to estimate R(f 00),

00 (3.20) used each observation once (R\(f )C ), which relies on balanced data, and (3.16) used obser-

00 vations more than once (R\(f )R). Lastly, we use numerical methods to ﬁnd a solution to (3.10) and(3.18); this solution corresponds to the bandwidth that minimizes our estimate for MISE.

Numerical methods are required to ﬁnd solutions to equation (3.10) and (3.18). The root-

Solve package in R was used to ﬁnd these solutions. If we examine the following Taylor expansion centered about µ, ∞ y − Yij X 1 y − µ K = K(l) (Y − µ)l, h hl h ij l=0

(m) th where K indicates the m derivative of the kernel function with respect to Yij, we typically consider terms that are o(h4) or o((nh)−1) ignorable, but, each term with a higher order derivative could be included in the Taylor approximation of covariance of two dependent kernel density estimates. Additionally, the joint behavior of the estimates is not characterized by this approximation.

2 These methods perform best for extreme levels of correlation, στ large, but can yield unexpected results for some of our simulations.

When both τi and ij are normally distributed, results in Tables 3.4, 3.9 and 3.14, the band-

ˆ ˆ 00 00 2 widths selected via hC , hR, R\(f )C , and R\(f )R tend to remain similar as στ increases. Regardless 2 of the value of στ , each observation marginally follows the standard normal distribution. There ˆ ˆ is not a large diﬀerence in MISE for hC and hR, suggesting that analysis of a balanced dataset may not require resampling, which will signiﬁcantly cut down the computational burden. For low

2 ˆ 2 values of στ , there is not an advantage to select bandwidths using hC , however when στ exceeds ˆ 0.5, hC tends to produce a better bandwidth than Sheather and Jones’ direct plug-in method. For

2 στ = 0.95, ﬁnding the roots of polynomials (3.10) and (3.18) tend to produce bandwidths leading to the smallest MISE. We also notice that estimating R(f 00) via resampling provides a result that

21 is worse than estimating R(f 00) only using each observation once.

If the distribution of ij deviates from normal and τi follows a normal distribution, results are displayed in Tables 3.4, 3.10, 3.15 for log normal errors and results are shown in Tables 3.6, 3.11, ˆ ˆ 3.16 for double exponential errors. Simulations show that hC and hR tend to provide a better kernel density estimate compared to those that employ Sheather and Jones bandwidth selection, with intra-class correlation greater than 0.5. Finding the solutions of seventh order polynomials only works well, compared to all other methods, for extreme correlation when errors follow a log normal distribution. However, when the errors follow a double exponential distribution solving the seventh order polynomial appears to be beneﬁcial with lower correlation. As the marginal distributions become closer to a normal distribution, bandwidths also tend to increase. The bandwidths increase

2 00 in στ because R(f ) decreases as the marginal distribution becomes closer to a standard normal density. Population densities that are not symmetric (log normal), and whose derivative have sharp peaks and heavy tails (double exponential) will tend have large values of R(f 00).

In the cases where ij were normally distributed, results are displayed in Tables 3.7, 3.12, 3.17 for log normal cluster effect and results are displayed in Tables 3.8, 3.13, 3.18 for double exponential cluster effect. Some of these simulations results indicate that it may not be beneficial to use an average of bandwidths, specifically when the cluster effects follow a log normal distribution. Wand and Jones (1995) discuss that the log normal distribution is difficult to estimate. In this circumstance ˆ ˆ hj,(3.5), and hj∗ ,(3.3), use too few observations to obtain a reasonable bandwidth, while Sheather

2 00 and Jones’ method uses each observation. As στ increases, R\(f )C provides a kernel estimate function that does not perform substantially diﬀerent from the Sheather and Jones’ estimate. The bandwidths resulting from solving the seventh order polynomials increases as correlation increases,

00 2 which is surprising since R(f ) increases as στ increase. When the error distribution is double ˆ ˆ 2 exponential, hC and hR tend to perform better than Sheather and Jones’ bandwidths for στ > 0.25. For extreme correlation, it appears that the solutions to a seventh order polynomial provide best bandwidth. In both of the above cases, bandwidths tend to decrease as the intra-class correlation increases. This is consistent with R(f 00) getting larger as the marginal distribution becomes more skewed or heavy tailed.

Here we have studied bandwidth selection methods with the goal to minimize the MISE of density estimate across all values of y. We ﬁnd methods that can estimate the marginal density ˆ ˆ more accurately than using Sheather and Jones’ direct plug in method, namely hC and hR, for

22 highly correlated data under several distribution assumptions. Next, we will attempt to estimate percentiles of the marginal distribution of an observation from a hierarchical dataset.

23 Balanced Data

Table 3.4: Simulation results for balanced data with 40 clusters and 1000 observations for each cluster, and τi follow a normal distribution and ij are normally distributed.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.0024 0.0029 0.0027 0.0029 0.0066 0.0023 0.0040 0.1 0.0025 0.0030 0.0029 0.0030 0.0067 0.0026 0.0027 0.25 0.0030 0.0034 0.0035 0.0034 0.0075 0.0034 0.0035 0.5 0.0045 0.0047 0.0050 0.0047 0.0090 0.0051 0.0052 0.75 0.0074 0.0069 0.0082 0.0069 0.0126 0.0068 0.0067 0.95 0.0185 0.0113 0.0170 0.0113 0.0261 0.0096 0.0096 (a) Average MISE for 1000 simulations.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.3090 0.4494 0.2354 0.4491 0.1011 0.2777 0.1994 0.1 0.3086 0.4489 0.2347 0.4486 0.1011 0.3866 0.3856 0.25 0.3073 0.4476 0.2332 0.4475 0.1010 0.4405 0.4438 0.5 0.3064 0.4469 0.2344 0.4473 0.1019 0.4856 0.4908 0.75 0.2943 0.4483 0.2395 0.4485 0.1041 0.5121 0.5181 0.95 0.2180 0.4427 0.2493 0.4430 0.1109 0.5294 0.5355 (b) Average bandwidth for 1000 simulations.

Table 3.5: Simulation results for balanced data with 40 clusters and 10 observations for each cluster, and τi follow a normal distribution and ij have a log normal distribution.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.0101 0.02356 0.01045 0.0236 0.0141 0.0833 0.0404 0.1 0.0066 0.0087 0.0069 0.0087 0.0129 0.0174 0.0140 0.25 0.0070 0.0074 0.0074 0.0070 0.0129 0.0102 0.0093 0.5 0.0071 0.0067 0.0077 0.0068 0.0133 0.0076 0.0074 0.75 0.0094 0.0079 0.010 0.0077 0.0152 0.0075 0.0075 0.95 0.0181 0.0108 0.0162 0.0116 0.0264 0.0097 0.0097 (a) Average MISE for 1000 simulations.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.1031 0.1961 0.1085 0.1959 0.1063 0.3372 0.17107 0.1 0.1717 0.2786 0.1511 0.2784 0.1443 0.4001 0.3594 0.25 0.2166 0.3417 0.1820 0.3421 0.1741 0.4498 0.4263 0.5 0.2579 0.4055 0.2154 0.4074 0.2064 0.4972 0.4820 0.75 0.2715 0.4399 0.2370 0.4400 0.2262 0.5272 0.5168 0.95 0.2175 0.4468 0.2529 0.4464 0.2460 0.5421 0.5338 (b) Average bandwidth for 1000 simulations.

24 Table 3.6: Simulation results for balanced data with 40 clusters and 10 observations for each cluster, and τi follow a normal distribution and ij have a double exponential distribution.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.0051 0.0087 0.0051 0.0086 0.0099 0.0114 0.0068 0.1 0.0042 0.0055 0.0045 0.0053 0.0093 0.0020 0.0048 0.25 0.0046 0.0051 0.0050 0.0050 0.0095 0.0016 0.0032 0.5 0.0062 0.0059 0.0067 0.0058 0.0113 0.0026 0.0030 0.75 0.0092 0.0077 0.0078 0.0077 0.0152 0.0044 0.0045 0.95 0.02081 0.0117 0.0178 0.0116 0.0290 0.0082 0.0082 (a) Average MISE for 1000 simulations.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.1808 0.3175 0.1573 0.3165 0.0679 0.3747 0.1904 0.1 0.2238 0.3571 0.1834 0.3559 0.0793 0.3752 0.2253 0.25 0.2527 0.3910 0.2049 0.3925 0.0885 0.3984 0.3402 0.5 0.2755 0.4246 0.2252 0.4239 0.0972 0.4525 0.4284 0.75 0.2723 0.4439 0.2377 0.4421 0.1037 0.5021 0.4889 0.95 0.2059 0.4464 0.2535 0.4467 0.1137 0.5351 0.5264 (b) Average bandwidth for 1000 simulations.

Table 3.7: Simulation results for balanced data with 40 clusters and 10 observations for each cluster, and τi follow a log normal distribution and ij are normally distributed.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.0023 0.0029 0.0028 0.0027 0.0064 0.0022 0.0038 0.1 0.0025 0.0030 0.0029 0.0030 0.0068 0.0027 0.0024 0.25 0.0030 0.0035 0.0034 0.0036 0.0075 0.0038 0.0035 0.5 0.0043 0.0049 0.0047 0.0051 0.0094 0.0065 0.0062 0.75 0.0075 0.0083 0.0080 0.0082 0.0126 0.0142 0.0137 0.95 0.0191 0.0206 0.0194 0.0202 0.0277 0.0493 0.0483 (a) Average MISE for 1000 simulations.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.3084 0.4483 0.2334 0.4490 0.1012 0.3980 0.2026 0.1 0.3083 0.4475 0.2338 0.4472 0.109 0.4340 0.3840 0.25 0.3014 0.4422 0.2315 0.4400 0.1000 0.4675 0.4394 0.5 0.2722 0.4049 0.2137 0.4058 0.093 0.4925 0.4752 0.75 0.2206 0.3402 0.1829 0.3414 0.080 0.5020 0.4908 0.95 0.1423 0.2472 0.1413 0.2456 0.063 0.4895 0.4824 (b) Average bandwidth for 1000 simulations.

25 Table 3.8: Simulation results for balanced data with 40 clusters and 10 observations for each cluster, and τi follow a double exponential distribution and ij are normally distributed.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.0022 0.0027 0.0026 0.0027 0.0065 0.0435 0.0435 0.1 0.0025 0.0030 0.0029 0.0031 0.0070 0.0446 0.0445 0.25 0.0030 0.0031 0.0035 0.0035 0.0074 0.0555 0.0554 0.5 0.0043 0.0035 0.0052 0.0050 0.0090 0.0555 0.0554 0.75 0.0103 0.0068 0.0118 0.0077 0.0126 0.0741 0.0740 0.95 0.0327 0.0200 0.03256 0.0161 0.0272 0.1085 0.1085 (a) Average MISE for 1000 simulations.

2 ˆ 00 ˆ 00 στ SJ hC R\(f )C hR R\(f )R rootSolve (3.18) rootSolve (3.10) 0 0.3089 0.4486 0.2337 0.4472 0.1010 0.6388 0.6351 0.1 0.3078 0.4486 0.2359 0.4477 0.1010 0.6365 0.6327 0.25 0.3017 0.4420 0.2300 0.4421 0.1000 0.6310 0.6276 0.5 0.2872 0.4233 0.2222 0.4250 0.0967 0.6235 0.6204 0.75 0.2554 0.3922 0.2071 0.3924 0.0906 0.6127 0.6101 0.95 0.1850 0.3434 0.1903 0.3398 0.0837 0.6000 0.5932 (b) Average bandwidth for 1000 simulations.

Unbalanced Data: 62 Clusters and 672 total observations

2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0015 0.2801 0.0024 0.4224 0.0052 0.1031 0.1 0.0017 0.2798 0.0025 0.4230 0.0054 0.1031 0.25 0.0022 0.2791 0.0028 0.4227 0.0057 0.1031 0.5 0.0036 0.2772 0.0035 0.4222 0.0068 0.1035 0.75 0.0064 0.2630 0.0052 0.4223 0.0092 0.1051 0.95 0.0164 0.1987 0.0082 0.4251 0.0172 0.1117

Table 3.9: Simulation results for unbalanced data with 62 clusters and τi follow a normal distribution and ij are normally distributed.

2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0071 0.090 0.0190 0.1750 0.0119 0.0460 0.1 0.0053 0.1530 0.0067 0.2551 0.0102 0.0652 0.25 0.0052 0.1924 0.0053 0.3162 0.0095 0.0793 0.5 0.0061 0.2331 0.0051 0.3821 0.0094 0.0953 0.75 0.0078 0.2445 0.0054 0.4143 0.0105 0.1041 0.95 0.016 0.1929 0.0079 0.4217 0.0176 0.1111

Table 3.10: Simulation results for unbalanced data with 62 clusters and τi follow a normal distribution and ij are log normally distributed.

26 2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0036 0.1572 0.0066 0.2873 0.0035 0.1456 0.1 0.0030 0.2003 0.0040 0.3297 0.0031 0.1718 0.25 0.0035 0.2281 0.0040 0.3657 0.0037 0.1920 0.5 0.0048 0.2500 0.0046 0.3986 0.0050 0.2115 0.75 0.0074 0.2500 0.0060 0.4164 0.0075 0.2244 0.95 0.0190 0.1804 0.0098 0.4212 0.0149 0.2406

Table 3.11: Simulation results for unbalanced data with 62 clusters and τi follow a normal distribution and ij have a double exponential distribution.

2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0015 0.2802 0.0022 0.4226 0.0051 0.1029 0.1 0.0018 0.2787 0.0024 0.4213 0.0054 0.1026 0.25 0.0022 0.2745 0.0028 0.4157 0.0056 0.1018 0.5 0.0034 0.2473 0.0039 0.3813 0.0070 0.0941 0.75 0.0058 0.1991 0.0059 0.3170 0.010 0.0800 0.95 0.0164 0.1266 0.01440 0.2239 0.0192 0.0560

Table 3.12: Simulation results for unbalanced data with 62 clusters and τi follow a log normal distribution and ij follow a normal distribution.

2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0015 0.2800 0.0022 0.4224 0.0051 0.1023 0.1 0.0014 0.2727 0.0024 0.4206 0.0054 0.1024 0.25 0.0017 0.2606 0.0029 0.4169 0.0058 0.1018 0.5 0.0039 0.2348 0.00388 0.3990 0.0070 0.0979 0.75 0.0098 0.1970 0.0058 0.3668 0.0093 0.0908 0.95 0.0297 0.1370 0.0114 0.3109 0.0188 0.0807

Table 3.13: Simulation results for unbalanced data with 62 clusters and τi follow a double exponential distribution and ij follow a normal distribution.

Unbalanced Data: 27 Clusters and 285 total observations

2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0029 0.3278 0.0041 0.4683 0.01394 0.0972 0.1 0.0035 0.3242 0.0042 0.4664 0.01433 0.0964 0.25 0.0048 0.3263 0.0050 0.4687 0.0148 0.0975 0.5 0.0084 0.3210 0.0066 0.4670 0.0172 0.0980 0.75 0.0144 0.3005 0.0095 0.4682 0.0220 0.1012 0.95 0.0411 0.1966 0.0154 0.4688 0.0391 0.1125

Table 3.14: Simulation results for unbalanced data with 27 clusters and both τi and ij follow normal distributions.

27 2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0130 0.1132 0.0302 0.2159 0.0280 0.0493 0.1 0.0107 0.1802 0.0116 0.2955 0.0245 0.0638 0.25 0.0125 0.2225 0.0098 0.3602 0.0238 0.0770 0.5 0.0141 0.2592 0.0094 0.4267 0.0230 0.0919 0.75 0.0192 0.2650 0.0111 0.4642 0.0253 0.1025 0.95 0.0436 0.1926 0.0168 0.4737 0.0409 0.1152

Table 3.15: Simulation results for unbalanced data with 27 clusters and τi follow a normal distribution and ij are log normally distributed.

2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0071 0.2003 0.0112 0.3409 0.0075 0.0670 0.1 0.0059 0.2442 0.0079 0.3832 0.0184 0.0781 0.25 0.0069 0.2740 0.0072 0.4121 0.0188 0.085 0.5 0.0090 0.2940 0.0080 0.4457 0.0106 0.0942 0.75 0.0150 0.2800 0.0102 0.4614 0.0252 0.1013 0.95 0.0350 0.1980 0.0160 0.4672 0.0440 0.1140

Table 3.16: Simulation results for unbalanced data with 27 clusters and τi follow a normal distribution and ij are double exponential distributed.

2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0029 0.3281 0.0040 0.4687 0.0136 0.0969 0.1 0.0036 0.3265 0.0038 0.4672 0.0142 0.0964 0.25 0.0048 0.3192 0.0050 0.4592 0.0154 0.0951 0.5 0.0080 0.2890 0.0081 0.4262 0.0180 0.0900 0.75 0.0135 0.2342 0.0134 0.3583 0.0239 0.0780 0.95 0.0392 0.1452 0.0345 0.2661 0.0460 0.0649

Table 3.17: Simulation results for unbalanced data with 27 clusters and τi follow a log normal distribution and ij follow a normal distribution.

2 ˆ ˆ 00 00 στ MISE SJ BW SJ MISE hR BW hR MISE R\(f )R BW R\(f )R 0 0.0029 0.3281 0.0039 0.4690 0.01343 0.09715 0.1 0.0035 0.3275 0.0044 0.4682 0.01385 0.0970 0.25 0.0049 0.3219 0.0051 0.4630 0.01510 0.0960 0.5 0.0087 0.3054 0.0076 0.4456 0.01826 0.0935 0.75 0.015 0.2670 0.0111 0.4120 0.0235 0.088 0.95 0.0406 0.1776 0.0214 0.3630 0.0426 0.0853

Table 3.18: Simulation results for unbalanced data with 27 clusters and τi follow a double expoential distribution and ij follow a normal distribution.

28 Chapter 4

Percentile Estimation

A tolerance interval contains at least a specified proportion of the population, 1 − p, with a specified degree of confidence, 1 − α. Lower tolerance intervals are of the form (L(X), ∞), where

L(X) is called the lower tolerance limit. A (1 − p, 1 − α) lower tolerance interval can be expressed mathematically by,

P (F (L(X)) ≤ p) = P (L(X) ≤ F −1(p)) = 1 − α. (4.1)

Lower tolerance intervals contain the p ∗ 100th population percentile with probability 1 − α. We will focus on lower tolerance limits through rest of this paper. Conﬁdence intervals for lower percentiles can be useful for quality control, or establishing grades for manufactured products.

Many researchers have developed methods for constructing tolerance limits that consider hierarchical data with both treatment eﬀects, {τi}, and errors, {ij}, following normal distributions. Krishnamoorthy and Matthew (2009) outlined four methods for ﬁnding tolerance intervals for hierarchical data. Many estimators for lower tolerance limits are of the form

p 2 2 µˆ − k σˆτ +σ ˆ , (4.2) where k is a critical value of a t−distribution. Several authors have studied selection of the non- centrality parameter and appropriate degrees of freedom for k. Mee-Owen’s (1983) approach uses

Satterthwaite (1946) approximation to obtain an appropriate value for the degrees of freedom of

2 2 σˆτ +ˆσ 2 2 . Vangel’s (1992) estimate adjusts for each observation containing less information when στ +σ

29 intra-class correlation is present. Mee-Owen and Vangel’s methods do not allow for unbalanced data. Krishnamoorthy and Matthew (2004), and Lin, Liao and Iyer (2008) provide methods of constructing generalized conﬁdence intervals. Both of these methods allow for unbalanced data. All four of these methods assume a normal population. First, we introduce the concept of bootstrapping and then discuss how bootstrapping can aid in ﬁnding tolerance limits . Later in this chapter, we will discuss methods of estimating tolerance limits that do not hinge on observations coming from a normal population.

Bootstrapping is a versatile tool for statisticians, which can be distribution free and is completely data driven. Peter Hall (1993) explains the bootstrap principle as estimating the number of freckles on the outer most doll of a Matryoshka doll only using the inner dolls. He also provides a formal description of the bootstrap principle. The sampling distribution, F1, contains all values of a statistic for all possible samples from the population, F0. Typically, there is only one sample available so we have very limited information regarding the sampling distribution. Resampling methods can be utilized to gain insight on the shape and spread of the sampling distribution. We

∗ ∗ ∗ ∗ will denote the original sample as X = {X1,X2,...,Xn} and a resample as X = {X1 ,X2 ,...,Xn}. A resample, X ∗, is an unordered set of values selected from X with replacement. Elements of X ∗ are

∗ −1 independent and identically distributed, and P (Xi = Xj) = n for all j = 1, . . . , n. The collection of statistics from all resamples is called the bootstrap distribution, which we will denote by F2. We use bootstrapping methods under the assumption that the relationship between (F1,F2) mimics the relationship between (F0,F1). Typically, bootstrap methods are used to get an estimate of standard error of a statistic. However, we will use bootstrap methods as a tool for estimation .

A general approach to estimating a tolerance limit involves constructing a bootstrap distri-

th bution of an order statistic. An order statistic, denoted X(k), is the k largest value in an ordered dataset. An estimate of a tolerance limit is the α ∗ 100 percentile of the bootstrap distribution of

X(k), where k = dp ∗ ne. In order to account for hierarchical data, Davidson and Hinkley (1999) describe methods that attempt to mimic the sampling scheme, and maintain the within and between cluster variation of the original data. First, deﬁne

xˆi =y ¯i·, andz ˆij = yij − y¯i·.

Algorithm 1

30 ∗ ∗ 1. Choose x1, . . . , xa by randomly sampling with replacementx ˆ1,..., xˆa.

∗ ∗ 2. Choose z1 , . . . , zab by randomly sampling with replacementz ˆ1,..., zˆab.

∗ ∗ ∗ 3. Set yij = xi + zij for all i = 1, . . . , a, j = 1, . . . , ni.

∗ −1 2 2 Unfortunately when this method is employed xi has inﬂated variance, E[a SSB] = στ + bσ 2 ∗ ∗ relative to στ . Instead of samplingx ˆi directly, we sample from sx¯ + (1 − s)xi , where

a SSW (1 − s)2 = − , (4.3) a − 1 M(M − 1)SSB

P 2 P P 2 P 2 SSB = i(¯yi· − y¯··) , SSW = i j(yij − y¯i·) , and M = (N − i ni /N)/(a − 1). The deﬁnitions of SSB and SSW do not match Table 1.2, but are consistent with deﬁnitions in Davidson and

Hinkley (1999). The right hand side of (4.3) can be negative which occurs when MSB < MSW .

Since the right hand side of (4.3) can be negative, typically when there is little evidence to support

∗ that the intra-class correlation is significant, we set s = 1. To ensure that the variance of the {zij} −1 −1 −1/2 is approximately (1 − ni ), we simply sample from zij/r where r = (1 − ni ) . Davidson and Hinkley point out that matching the first two moments is important in cases where the parameter of interest is a function of the first two moments. Since percentiles depend heavily on the first two moments of a random variable, we will use these variance stabilizing adjustments. Davidson and Hinkley (1999) also mention that resampling should work well as long as there are at least ten clusters and the number of observations from each cluster is moderately large .

Many authors, such as Hesterberg (2014), have pointed out that using a bootstrap distribution of order statistics may not be appropriate for estimation of percentiles. The bootstrap distribution of order statistics is discrete, and dependent on the sample. We will explore methods that combine kernel smoothing with bootstrapping to determine if smoothing can help improve percentile estimation. A naive approach is to construct kernel estimates of each bootstrap resample.

However, there is a simpler and less computationally expensive method outlined in Davidson and

Hinkley which constructs bootstrap resamples, then perturbs each selected observation by the random amount h∗δij, where δij ∼ K(·) and K is a second order kernel. The variance of an observation

−1 P P 2 2 R 2 generated by this scheme is N i j(yij − y¯··) + µ2(K) ∗ h , where µ2(K) = x K(x)dx. Mod- −1 P P 2 iﬁcations can be made to ensure that the sample variance is N i j(yij − y¯··) , which entails

31 shifting and scaling observations and using

a ni 1 X X y − (1 − v)¯y·· − vyij fˆ(y) = K , (4.4) Nhv hv i=1 j=1

2 P P 2 −1/2 where v = {1 + nh µ2(K) i j(yij − y¯··) } , to estimate the population pdf. The method resulting from this adjustment will be referred to as the shrunk smoothed bootstrap. Ultimately, each observation obtained from the shrunk smooth bootstrap is of the form

∗ ∗ y˜ij = (1 − v)¯y·· + vyij + hvδij. (4.5) wherey ¯∗ is the mean of the bootstrap resample. It is important that the smooth bootstrap resample has approximately same variance s the original sample as percentiles depend heavily on the ﬁrst two moments. The shrunk smoothed bootstrap does not acknowledge that we collected the data through a multistage sampling scheme .

We have previously discussed percentile estimation based on order statistics, which can accommodate hierarchical data, and the smoothed bootstrap procedure, which approximates a continuous population density for i.i.d. data. We present a new approach which combines algorithm 1 and smoothed bootstrap to both account for the sampling scheme as well as the discrete nature of the bootstrap distribution. We decompose the observations into a cluster effect,y ¯i·, and a residual yij − y¯i·. We use the smoothed bootstrap on both the cluster effects and residuals. We also use both the adjustment for bootstrapping hierarchical data and the adjustment for smoothing to ensure that the variances are consistent with the mean squared between and within of the original model. We define

xˆi =y ¯i·, zˆij = yij − y¯i·.

The algorithm that is described in detail in Algorithm 2 below.

Algorithm 2

∗ ∗ 1. Choose x1, . . . , xa by randomly sampling with replacementx ˆ1,..., xˆa.

2. Generate θ1, . . . , θa from Kτ and compute

∗ ∗ ∗ ηi = (1 − vτ (1 − s))¯x + vτ (1 − s)xi + hτ vτ (1 − s)θi,

32 2 2 −1/2 where vτ = (1 + hτ µ(Kτ )ˆστ ) , and s is deﬁned in (4.3).

∗ ∗ 3. Choose z1 , . . . , zab by randomly sampling with replacementz ˆ1,..., zˆab.

4. Generate δ11, . . . , δana from K and compute

∗ ∗ νij = (1 − vr)¯z + vzij + hvrδij.

2 2 −1/2 −1 −1/2 where v = (1 + h µ(K)/σˆ ) , and r = (1 − ni )

∗ ∗ ∗ 5. Set yij = ηi + νij for all i = 1, . . . , a, j = 1, . . . , ni.

An alternative approach to estimating percentiles, that involves smoothing but does not involve resampling, is to estimate the distribution function.

Braun (2008) suggested estimation of a percentile by using the kernel distribution directly.

A kernel distribution estimate at a point y is

n 1 X y − yi Fˆ(y) = K , (4.6) n F h i=1 F where KF is a kernel distribution function. The bandwidth that minimizes the AMISE is

f(q ) 1/3 √ p hF = 0 2 (4.7) πnf (qp)

th where qp is the p ∗ 100 percentile. In order to ﬁnd an appropriate bandwidth, a density and ﬁrst derivative of the density must be estimated. Sheather and Jones’ (1991) plug-in method can be employed to select the bandwidth of the kernel density estimate. To choose the bandwidth for the

ﬁrst derivative of the density, a normal scale rule is used,

2 2 √ ! σ5e(qp−µ) /(2σ ) 2 hn = 2 . (4.8) (qp − µ) n

Braun also noted that the distribution of percentile estimates is skewed to the right with mean qp

2 and asymptotic variance p(1−p)/[nf(qp) ]. A log transformation can be used to improve the normal approximation yielding a tolerance limit of the form

(exp([log(ˆqp) + zαse], ∞), (4.9)

33 th where zα is the (1 − α) ∗ 100 percentile of a standard normal distribution. Braun’s method does not account for the hierarchical nature of the model. Next, we discuss a method that also uses the kernel distribution function to obtain estimates for percentiles .

We will also construct a kernel density estimate via resampling (3.2). We integrate the kernel density function to obtain a kernel distribution function, which we denote by Fˆ. The estimated percentile corresponds to Z y arg min fˆ(t)dt = p, (4.10) y −∞ where p is the percentile of interest. Since fˆ(y) is a continuous density function, the value of y satisfying (4.10) is unique. This method does not require any adjustments for inﬂated variance, since we are not changing the observations to obtain an estimate of the distribution.

Simulations

A desirable property of tolerance intervals is that their coverage probability, the percentage of times the tolerance limits contains the true p ∗ 100th percentile, is close to the nominal rate 1 − α.

Ideally, tolerance limits should be in close proximity to the p ∗ 100th percentile of the underlying population distribution. Each method should maintain a coverage probability of approximately

(1 − α) ∗ 100%, and provide informative intervals, i.e. tolerance limits that are not too small. We will see that tolerance intervals with small coverage probability will have tolerance limits that may be closer to the percentile of interest. On the other hand, larger coverage probabilities can have tolerance limits that can severely underestimate the percentile of interest. The following simulation studies estimate tolerance limits for the same datasets from Chapter 3. Common percentiles of interest in manufacturing are 10th and 25th percentiles. We will construct (0.90, 0.95) and (0.75,

0.95) lower tolerance limits.

A simple technique for estimating tolerance limits for lower tolerance interval is to use order statistics. Gitlow and Awad (2013) noted that tolerance limits can be estimated by order statistic

th X(k), where k is chosen to ensure the proper coverage probability. We set k to be the α ∗ 100 percentile of a binomial distribution with n trials and probability of success p. We denote this method by NP. Recall, that binomial experiments assume independent trials, hence this method also does not account for the correlation structure. Another method, NPboot, can estimate the bootstrap distribution of the p ∗ 100th percentile by selecting the dn ∗ pe smallest order statistic.

34 This method does not account for the correlation structure. In an attempt to account for the correlation structure, we implement algorithm 1 and estimate the bootstrap distribution of the p ∗ 100 percentile, denoted as NPbootHD. For both methods, NPboot and NPbootHD, the α ∗ 100th percentile is recorded as an estimate for the (1 − p, 1 − α) lower tolerance limit.

We will also implement many techniques that involve kernel smoothing. We will use smoothed bootstrap on the raw data and implement Sheather and Jones’ direct plug-in method for bandwidth selection. This method will be referred to as SB SJ. Recall that this method treats the observations as i.i.d. In Chapter 3, we saw that kernel density estimator (3.2) can reduce the

MISE for hierarchical data as compared to (2.1). We use numerical integration to obtain an estimate for the population cumulative distribution function (c.d.f). However, we are not interested in estimating the entire c.d.f, but rather than ﬁnding inf{y : Fˆ(y) ≥ p}. To reduce the amount of integration, numerical methods for ﬁnding a solution to

Fˆ(y) − p = 0 (4.11) are employed. This method attempts to account for hierarchical data, and we will denote this method by SB W. We will also implement algorithm 2; we will denote this by SB HD.

For comparison, we use several algorithms that cater to hierarchical data, but assume observations are marginally normal. We will use the following abbreviations MO, V, KM, and LLI to denote Mee and Owen, Vangel, Krisnamoorthy and Matthew, and Lin, Liao and Iyer methods respectively. We expect these methods to perform quite diﬀerently when the marginal distribution of an observation departs from normality. We expect the methods that incorporate smoothing or bootstrapping to perform consistently regardless the marginal distribution, since the intervals they create involve critical values from non-central t-distributions.

The results from the simulations will be reported in the end of the chapter. For normal data, the results are presented in Table 4.1, 4.6, 4.11, 4.16, 4.21, and 4.26. We see that the traditional methods have the appropriate coverage probability. NP has the desired coverage probability when

2 στ = 0, but the coverage probability shrinks as intra-class correlation increases. NPboot exhibits similar behavior, but is not as dramatic as NP. Davidson and Hinkley’s algorithm (algorithm 1) provides tolerance limits that are too small for low intra-class correlation, but become too large when the correlation increases. SB SJ and SB W do not seem to provide tolerance limits that have

35 reasonable coverage probabilities, aside from when there is no intra-class correlation present. SB

HD provides tolerance limits that may be too small. and become more similar to the traditional methods as intra-class correlation increases. Notice that the tolerance limits for all methods increase as intra-class correlation increases, which should not be the case because at each level of intra-class correlation, the marginal distributions are all standard normal. Overall, methods behaved similarly regardless of how extreme the percentile of interest is.

We now consider the case when the cluster eﬀect, τ, follows a normal distribution and the errors, follow either log normal distribution or a double exponential distribution. These results are exhibited in tables 4.2, 4.3, 4.7, 4.8, 4.12, 4.13, 4.17, 4.18, 4.22, 4.23, 4.27, and 4.28. For small values

2 of στ , traditional methods produce tolerance limits with high coverage probability, but the coverage probability approaches the nominal rate as the correlation increases. The marginal distribution

2 becomes more similar to a standard normal distribution as στ increases to one. Figures A.1 and A.3 2 show the marginal distribution for each value of στ . We see that NPboot and NP produce reasonable tolerance limits for low correlation. NPHD seems to provide tolerance limits that are small, despite the coverage probability being slightly lower than desired. For (0.95, 0.90) tolerance intervals, SB

HD produces tolerance limits that do not depart greatly from NP HD, and both methods provide tolerance limits that are reasonably close to the true percentile when the marginal distribution is skewed. The tolerance limits from NP HD and SB HD are similar to the traditional methods for high values of intra-class correlation.

If we examine the scenario when the cluster eﬀect, τ, follows either a log normal or double exponential distribution and the errors follow a normal distribution. These results are given in tables 4.4, 4.5, 4.9, 4.10, 4.14, 4.15, 4.19, 4.20, 4.24, 4.25, 4.29, and 4.30. The marginal distribution deviates from a standard normal distribution as the intra-class correlation increases, this can be seen in Figure A.2 and A.4. There can be an advantage to using algorithm 2 speciﬁcally if the marginal distribution is skewed. We see NP HD and SB HD produce tolerance limits that have high coverage probabilities, and can provide precise estimates for extreme percentiles. This is due each

2 observation coming from an increasingly skewed distribution and containing less information as στ 2 increases. While NPHD and SB HD have coverage probability above or close to 0.95 for στ < 0.5, NPHD produce tolerance limits that are too large for high intra-class correlation, and SB HD has coverage probability that are larger than the desired rate. Traditional methods tend to produce tolerance limits with good coverage probability when the underlying distribution is symmetric.

36 Overall, we see that both smoothing and bootstrapping can improve percentile estimation.

When the marginal distribution is not symmetric, we can get the best overall results when smoothing and bootstrapping are combined in algorithm 2. The traditional methods seem to be robust to departures from normality, as long as the marginal distribution remains symmetric. The behavior of

SB HD seems to be impacted by how extreme the percentile of interest is. We see that with a more extreme percentile, p = 0.1, that the coverage probability is quite large. Despite the high coverage probability, SB HD can produce the most accurate tolerance limits. As we might expect, estimated tolerance limits become further from the true percentile as both the number of cluster and number of total observations decreases. Despite accounting for hierarchical data, integration of the density estimator (3.2) using the bandwidth selection method shown in (3.12) does not seem to produce desirable tolerance limits.

37 Results

(90, 95) Tolerance Limits

Table 4.1: Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPBoot NP NPHD 2 στ CP Mean CP Mean CP Mean 0 0.970 -1.4510 0.949 -1.4324 1.000 -1.5215 0.1 0.941 -1.4440 0.922 -1.4270 0.992 -1.5256 0.25 0.904 -1.4381 0.880 -1.4197 0.986 -1.5379 0.5 0.837 -1.4432 0.812 -1.4258 0.940 -1.5440 0.75 0.787 -1.4440 0.764 -1.4262 0.920 -1.6099 0.95 0.699 -1.4216 0.673 -1.4021 0.912 -1.6652 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 0 0.975 -1.4377 0.962 -1.4237 0.999 -1.5257 0.1 0.945 -1.4310 0.934 -1.4176 0.992 -1.5256 0.25 0.906 -1.4265 0.894 -1.4129 0.988 -1.5405 0.5 0.823 -1.4277 0.808 -1.4107 0.957 -1.5750 0.75 0.776 -1.4288 0.753 -1.4111 0.945 -1.6305 0.95 0.686 -1.4050 0.678 -1.3833 0.941 -1.6758 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 0 0.961 -1.4081 0.960 -1.4046 0.959 -1.4047 0.1 0.952 -1.4366 0.953 -1.4472 0.951 -1.4341 0.25 0.959 -1.4805 0.970 -1.5036 0.954 -1.4793 0.5 0.939 -1.5507 0.964 -1.5762 0.942 -1.5514 0.75 0.950 -1.6265 0.957 -1.6426 0.953 -1.6278 0.95 0.955 -1.6556 0.958 -1.6593 0.957 -1.6560 (c) Traditional methods

38 (90, 95) Tolerance Limits

Table 4.2: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPBoot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.979 -0.7881 0.969 -0.7850 1.000 -0.9993 -0.7516 0.1 0.839 -0.9554 0.811 -0.9481 0.980 -1.0601 -0.8848 0.25 0.805 -1.1157 0.772 -1.1045 0.956 -1.2309 -1.0184 0.5 0.778 -1.2874 0.750 -1.2723 0.930 -1.4359 -1.1659 0.75 0.699 -1.3988 0.676 -1.3824 0.893 -1.5869 -1.2806 0.95 0.724 -1.4345 0.701 -1.4160 0.910 -1.6888 -1.2816 (a) Order Statistics

SB SJ SB W SB HD CP Mean CP Mean CP Mean 10th percentile 0 1.000 -0.7992 1.000 -0.8193 1.000 -1.036 -0.7516 0.1 0.899 -0.9716 0.917 -0.9791 1.000 -1.143 -0.8848 0.25 0.829 -1.1248 0.832 -1.1252 1.000 -1.2923 -1.0184 0.5 0.784 -1.2863 0.788 -1.2830 1.000 -1.4741 -1.1659 0.75 0.703 -1.3867 0.689 -1.3734 1.000 -1.6084 -1.2806 0.95 0.713 -1.4200 0.691 -1.4034 1.000 -1.6979 -1.2816 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 10th percentile 0 1.000 -1.3856 1.000 -1.3820 1.000 -1.3825 -0.7516 0.1 1.000 -1.4364 1.000 -1.4471 1.000 -1.4340 -0.8848 0.25 0.999 -1.4797 0.999 -1.5028 0.999 -1.4786 -1.0184 0.5 0.992 -1.5490 0.994 -1.5742 0.992 -1.5500 -1.1659 0.75 0.954 -1.6215 0.962 -1.6376 0.958 -1.6230 -1.2806 0.95 0.956 -1.6785 0.956 -1.6821 0.955 -1.6790 -1.2816 (c) Traditional Methods

39 (90, 95) Tolerance Limits

Table 4.3: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPBoot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.953 -1.3492 0.928 -1.3254 0.985 -1.3810 -1.1380 0.1 0.967 -1.3554 0.951 -1.3330 0.979 -1.3636 -1.1429 0.25 0.923 -1.3842 0.905 -1.3632 0.953 -1.4261 -1.1857 0.5 0.825 -1.3960 0.801 -1.3777 0.941 -1.5001 -1.2359 0.75 0.749 -1.4278 0.718 -1.4098 0.910 -1.5984 -1.2810 0.95 0.699 -1.4160 0.670 -1.3980 0.912 -1.6666 -1.2816 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.945 -1.3293 1.000 -1.2268 1.000 -1.4040 -1.1380 0.1 0.967 -1.3400 1.000 -1.2652 1.000 -1.4313 -1.1429 0.25 0.914 -1.3730 0.989 -1.3230 1.000 -1.4668 -1.1857 0.5 0.819 -1.3861 0.884 -1.3585 1.000 -1.5329 -1.2359 0.75 0.737 -1.4145 0.707 -1.3945 1.000 -1.5979 -1.2810 0.95 0.683 -1.3993 0.670 -1.3832 1.000 -1.6609 -1.2816 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 10th percentile 0 1.000 -1.4014 1.000 -1.3982 1.000 -1.3980 -1.1380 0.1 0.995 -1.4384 0.996 -1.4489 0.995 -1.4359 -1.1429 0.25 0.991 -1.4952 0.992 -1.5186 0.990 -1.4940 -1.1857 0.5 0.967 -1.5474 0.975 -1.5729 0.967 -1.5479 -1.2359 0.75 0.953 -1.6244 0.962 -1.6405 0.952 -1.6257 -1.2810 0.95 0.949 -1.6569 0.951 -1.6605 0.950 -1.6580 -1.2816 (c) Traditional Methods

40 (90, 95) Tolerance Limits

Table 4.4: Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPBoot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.974 -1.4475 0.951 -1.4287 0.978 -1.4752 -1.2815 0.1 0.952 -1.4372 0.929 -1.4195 0.971 -1.4418 -1.2774 0.25 0.913 -1.3978 0.884 -1.3810 0.971 -1.4269 -1.2530 0.5 0.918 -1.2970 0.890 -1.2830 0.976 -1.3429 -1.1659 0.75 0.879 -1.1143 0.858 -1.1033 0.976 -1.1665 -1.0184 0.95 0.815 -0.8776 0.786 -0.8714 0.967 -0.9251 -0.8254 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.969 -1.4392 0.969 -1.4245 1.000 -1.5037 -1.2815 0.1 0.946 -1.4243 0.940 -1.4128 1.000 -1.5106 -1.2774 0.25 0.935 -1.3949 0.934 -1.3859 1.000 -1.4885 -1.2530 0.5 0.913 -1.2909 0.916 -1.2892 1.000 -1.3910 -1.1659 0.75 0.927 -1.1271 0.935 -1.1299 1.000 -1.2145 -1.0184 0.95 0.878 -0.8890 0.916 -0.9037 1.000 -0.9687 -0.8254 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.960 -1.4052 0.958 -1.4020 0.956 -1.4019 -1.2815 0.1 0.979 -1.4335 0.981 -1.4440 0.978 -1.4312 -1.2774 0.25 0.985 -1.4702 0.992 -1.4928 0.985 -1.4689 -1.2530 0.5 0.999 -1.5303 0.999 -1.5548 0.999 -1.5306 -1.1659 0.75 0.997 -1.5857 0.998 -1.6020 0.997 -1.5866 -1.0184 0.95 1.000 -1.6247 1.000 -1.6287 1.000 -1.6250 -0.8254 (c) Traditional Methods

41 (90, 95) Tolerance Limits

Table 4.5: Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPBoot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.980 -1.4505 0.966 -1.4337 0.984 -1.4792 -1.2815 0.1 0.942 -1.4413 0.919 -1.4229 0.958 -1.4486 -1.2694 0.25 0.900 -1.4421 0.888 -1.4246 0.940 -1.4923 -1.2794 0.5 0.805 -1.4016 0.775 -1.3827 0.917 -1.5386 -1.2359 0.75 0.751 -1.3768 0.719 -1.3565 0.910 -1.6186 -1.1857 0.95 0.737 -1.3810 0.716 -1.3515 0.921 -1.7660 -1.1455 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.974 -1.4355 0.968 -1.4207 1.000 -1.5238 -1.2815 0.1 0.938 -1.4325 0.930 -1.4183 1.000 -1.5283 -1.2794 0.25 0.888 -1.4201 0.866 -1.4016 1.000 -1.5437 -1.2694 0.5 0.814 -1.3999 0.773 -1.3684 1.000 -1.5913 -1.2359 0.75 0.764 -1.3799 0.718 -1.3215 1.000 -1.6530 -1.1857 0.95 0.704 -1.3448 0.615 -1.2503 1.000 -1.7320 -1.1455 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.974 -1.4088 0.972 -1.4053 0.972 -1.4055 -1.2815 0.1 0.952 -1.4342 0.955 -1.4445 0.950 -1.4317 -1.2794 0.25 0.956 -1.4884 0.966 -1.5115 0.955 -1.4872 -1.2694 0.5 0.952 -1.5458 0.964 -1.5710 0.951 -1.5461 -1.2359 0.75 0.973 -1.6150 0.979 -1.6311 0.971 -1.6158 -1.1857 0.95 0.968 -1.6795 0.969 -1.6831 0.968 -1.6800 -1.1455 (c) Traditional Methods

42 (90, 95) Tolerance Limits

Table 4.6: Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 62 clusters and 672 observations.

NPBoot NP NPHD 2 στ CP Mean CP Mean CP Mean 0 0.954 -1.3965 0.954 -1.3957 0.989 -1.4583 0.1 0.907 -1.3980 0.907 -1.3925 0.973 -1.4323 0.25 0.850 -1.3933 0.846 -1.3918 0.972 -1.4718 0.5 0.775 -1.3910 0.776 -1.3901 0.963 -1.5246 0.75 0.699 -1.3920 0.698 -1.3911 0.948 -1.5851 0.95 0.683 -1.4023 0.680 -1.4013 0.946 -1.6730 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 0 0.968 -1.3982 0.952 -1.4031 1.000 -1.5042 0.1 0.930 -1.3971 0.935 -1.4011 0.997 -1.5126 0.25 0.862 -1.3961 0.898 -1.4030 0.996 -1.5319 0.5 0.791 -1.3943 0.822 -1.3994 0.986 -1.5633 0.75 0.716 -1.3936 0.768 -1.3971 0.963 -1.6041 0.95 0.687 -1.4012 0.710 -1.3981 0.962 -1.6760 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 0 0.940 -1.3823 0.975 -1.4468 0.1 0.950 -1.4071 0.957 -1.4048 0.25 0.956 -1.4436 0.952 -1.4425 0.5 0.955 -1.4983 0.955 -1.4981 0.75 0.952 -1.5508 0.952 -1.5508 0.95 0.951 -1.5969 0.953 -1.5967 (c) Traditional Methods

43 (90, 95) Tolerance Limits

Table 4.7: Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a log normal distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.984 -0.7789 0.982 -0.7787 1.000 -0.9875 -0.7516 0.1 0.791 -0.9348 0.791 -0.9346 0.990 -1.0459 -0.8848 0.25 0.744 -1.0799 0.743 -1.0792 0.980 -1.2203 -1.0184 0.5 0.707 -1.2506 0.707 -1.2497 0.961 -1.4247 -1.1659 0.75 0.659 -1.3610 0.659 -1.3602 0.941 -1.5749 -1.2806 0.95 0.692 -1.3972 0.688 -1.3964 0.959 -1.6696 -1.2816 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.999 -0.7887 1.000 -0.8055 1.000 -1.0135 -0.7516 0.1 0.869 -0.9530 0.931 -0.9650 1.000 -1.1308 -0.8848 0.25 0.799 -1.0963 0.852 -1.1089 1.000 -1.2782 -1.0184 0.5 0.745 -1.2589 0.797 -1.2709 1.000 -1.4588 -1.1659 0.75 0.680 -1.3644 0.714 -1.3684 1.000 -1.5925 -1.2806 0.95 0.702 -1.3988 0.755 -1.3984 1.000 -1.6750 -1.2816 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 10th percentile 0 1.000 -1.3656 1.000 -1.4624 -0.7516 0.1 1.000 -1.4014 1.000 -1.3993 -0.8848 0.25 1.000 -1.4368 1.000 -1.4352 -1.0184 0.5 0.993 -1.4928 0.993 -1.4923 -1.1659 0.75 0.954 -1.5501 0.953 -1.5496 -1.2806 0.95 0.951 -1.5962 0.951 -1.5962 -1.2816 (c) Traditional Methods

44 (90, 95) Tolerance Limits

Table 4.8: Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a double exponential distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.959 -1.2942 0.958 -1.2928 0.992 -1.3659 -1.1380 0.1 0.931 -1.2953 0.929 -1.2940 0.989 -1.3443 -1.1429 0.25 0.865 -1.3124 0.859 -1.3117 0.966 -1.4001 -1.1857 0.5 0.784 -1.3505 0.780 -1.3496 0.948 -1.4895 -1.2359 0.75 0.650 -1.3650 0.648 -1.3640 0.935 -1.5696 -1.2810 0.95 0.642 -1.3877 0.643 -1.3872 0.953 -1.6614 -1.2816 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percenile 0 0.969 -1.2952 0.811 -1.2176 1.000 -1.4040 -1.1380 0.1 0.946 -1.2999 0.871 -1.2512 1.000 -1.4312 -1.1429 0.25 0.884 -1.3196 0.839 -1.2946 1.000 -1.4668 -1.1857 0.5 0.797 -1.3555 0.813 -1.3464 1.000 -1.5329 -1.2359 0.75 0.673 -1.3687 0.729 -1.3774 1.000 -1.5979 -1.2810 0.95 0.657 -1.3888 0.701 -1.3854 1.000 -1.6609 -1.2816 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 10th percentile 0 1.000 -1.3859 1.000 -1.4582 -1.1380 0.1 1.000 -1.4075 1.000 -1.4047 -1.1429 0.25 0.993 -1.4413 0.993 -1.4400 -1.2359 0.5 0.970 -1.4920 0.973 -1.4919 -1.2810 0.75 0.940 -1.5439 0.942 -1.5435 -1.2810 0.95 0.958 -1.5867 0.959 -1.5865 -1.2816 (c) Traditional Methods

45 (90, 95) Tolerance Limits

Table 4.9: Results from 1000 simulations where the data is generated with τi following log normal distribution and ij also following a normal distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.952 -1.3967 0.948 -1.3954 0.987 -1.4558 -1.2815 0.1 0.920 -1.3923 0.919 -1.3917 0.982 -1.4307 -1.2774 0.25 0.915 -1.3583 0.912 -1.3572 0.986 -1.4233 -1.2530 0.5 0.889 -1.2569 0.882 -1.2561 0.994 -1.3359 -1.1659 0.75 0.845 -1.0898 0.846 -1.0892 0.990 -1.1647 -1.0184 0.95 0.754 -0.8605 0.752 -0.8601 0.984 -0.9164 -0.8254 (a) Order Statistics

2 στ SB SJ SB W SB HD 10th percentile 0 0.976 -1.3980 0.963 -1.4039 1.000 -1.5037 -1.2815 0.1 0.938 -1.3947 0.958 -1.4006 1.000 -1.5107 -1.2774 0.25 0.927 -1.3637 0.941 -1.3717 1.000 -1.4885 -1.2530 0.5 0.929 -1.2663 0.941 -1.2776 1.000 -1.3906 -1.1659 0.75 0.897 -1.1048 0.939 -1.1173 1.000 -1.2145 -1.0184 0.95 0.859 -0.8767 0.934 -0.8930 1.000 -0.9687 -0.8254 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 10th percentile 0 0.957 -1.3848 0.983 -1.4372 -1.2815 0.1 0.969 -1.4089 0.972 -1.4058 -1.2774 0.25 0.988 -1.4409 0.987 -1.4400 -1.2530 0.5 1.000 -1.4874 1.000 -1.4872 -1.1659 0.75 1.000 -1.5230 1.000 -1.5224 -1.0184 0.95 1.000 -1.5282 1.000 -1.5283 -0.8254 (c) Traditional Methods

46 (90, 95) Tolerance Limits

Table 4.10: Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij also following a log normal distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.952 -1.3967 0.946 -1.3958 0.987 -1.4561 -1.2815 0.1 0.905 -1.3881 0.901 -1.3869 0.976 -1.4268 -1.2794 0.25 0.840 -1.3862 0.836 -1.3851 0.962 -1.4722 -1.2694 0.5 0.752 -1.3476 0.753 -1.3465 0.952 -1.5132 -1.2359 0.75 0.709 -1.3336 0.702 -1.3324 0.954 -1.6031 -1.1857 0.95 0.659 -1.2961 0.655 -1.2943 0.949 -1.6803 -1.1455 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.968 -1.3981 0.963 -1.4039 1.000 -1.5037 -1.2815 0.1 0.933 -1.3917 0.935 -1.3949 1.000 -1.5068 -1.2794 0.25 0.857 -1.3893 0.872 -1.3889 1.000 -1.5314 -1.2694 0.5 0.775 -1.3515 0.790 -1.3469 1.000 -1.5507 -1.2359 0.75 0.730 -1.3387 0.731 -1.3037 1.000 -1.6164 -1.1857 0.95 0.664 -1.2961 0.611 -1.2280 1.000 -1.6767 -1.1455 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 10th percentile 0 0.957 -1.3848 0.983 -1.4372 -1.2815 0.1 0.955 -1.3999 0.953 -1.3975 -1.2794 0.25 0.952 -1.4413 0.949 -1.4402 -1.2694 0.5 0.965 -1.4860 0.966 -1.4858 -1.2359 0.75 0.975 -1.5560 0.976 -1.5560 -1.1857 0.95 0.972 -1.5779 0.971 -1.5776 -1.1455 (c) Traditional Methods

47 (90, 95) Tolerance Limits

Table 4.11: Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 27 clusters and 285 observations.

NPBoot NP NPHD 2 στ CP Mean CP Mean CP Mean 0 0.971 -1.4844 0.961 -1.4672 0.992 -1.5526 0.1 0.936 -1.4774 0.918 -1.4592 0.975 -1.5198 0.25 0.881 -1.4757 0.863 -1.4577 0.968 -1.5757 0.5 0.793 -1.4688 0.773 -1.44500 0.960 -1.6562 0.75 0.704 -1.4548 0.677 -1.4362 0.937 -1.7622 0.95 0.669 -1.4657 0.655 -1.4500 0.934 -1.8994 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 0 0.978 -1.4739 0.952 -1.4031 0.999 -1.6006 0.1 0.946 -1.4653 0.935 -1.4011 0.996 -1.6098 0.25 0.878 -1.4647 0.898 -1.4030 0.990 - 1.6451 0.5 0.783 -1.4567 0.822 -1.3994 0.976 -1.7000 0.75 0.699 -1.4416 0.768 -1.3971 0.954 -1.7846 0.95 0.662 -1.4513 0.710 -1.3982 0.954 -1.9058 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 0 0.957 -1.4453 0.993 -1.6030 0.1 0.955 -1.4801 0.968 -1.4922 0.25 0.960 -1.5423 0.958 -1.5409 0.5 0.955 -1.6311 0.955 -1.6308 0.75 0.946 -1.7110 0.947 -1.7110 0.95 0.963 -1.7873 0.963 -1.7877 (c) Traditional Methods

48 (90, 95) Tolerance Limits

Table 4.12: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 27 clusters and 285 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.980 -0.7929 0.977 -0.7900 1.000 -1.0450 -0.7516 0.1 0.778 -0.9640 0.755 -0.9566 0.973 -1.10689 -0.8848 0.25 0.743 -1.1367 0.725 -1.1265 0.964 -1.3226 -1.0184 0.5 0.738 -1.3125 0.715 -1.2978 0.964 -1.5894 -1.1659 0.75 0.658 -1.4169 0.642 -1.4011 0.933 -1.7722 -1.2806 0.95 0.629 -1.4470 0.615 -1.4299 0.922 -1.9183 -1.2816 (a) Order Statistics

SB SJ SB W SB W 2 στ CP Mean CP Mean CP Mean 0 0.998 -0.8093 1.000 -0.8302 1.000 -1.0794 -0.7516 0.1 0.850 -0.9847 0.892 -0.98735 1.000 -1.2119 -0.8848 0.25 0.787 -1.1489 0.826 -1.1389 1.000 -1.3930 -1.0184 0.5 0.753 -1.3103 0.764 -1.2992 1.000 -1.6259 -1.1659 0.75 0.661 -1.4098 0.660 -1.3842 1.000 -1.7877 -1.2806 0.95 0.622 -1.4340 0.666 -1.4032 1.000 -1.9103 -1.2816 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 10th percentile 0 1.000 -1.4257 1.000 -1.6357 -1.2816 0.1 1.000 -1.4768 1.000 -1.4994 -1.2806 0.25 0.999 -1.5418 0.999 -1.5417 -1.1659 0.5 0.991 -1.6343 0.991 -1.6341 -1.0184 0.75 0.956 -1.7178 0.955 -1.7180 -0.8848 0.95 0.938 -1.7765 0.938 -1.7765 -0.7516 (c) Traditional Methods

49 (90, 95) Tolerance Limits

Table 4.13: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 27 clusters and 285 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.972 -1.4053 0.961 -1.3793 0.989 -1.4704 -1.1380 0.1 0.958 -1.4154 0.941 -1.3923 0.990 -1.4640 -1.1429 0.25 0.894 -1.4128 0.862 -1.3892 0.968 -1.5104 -1.1857 0.5 0.778 -1.4186 0.755 -1.4001 0.939 -1.6182 -1.2359 0.75 0.676 -1.4380 0.657 -1.4210 0.934 -1.7485 -1.2810 0.95 0.621 -1.4384 0.604 -1.4226 0.927 -1.9191 -1.2816 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.970 -1.3875 0.778 -1.2456 1.000 -1.5259 -1.1380 0.1 0.970 -1.4031 0.857 -1.2962 1.000 -1.5488 -1.1429 0.25 0.896 -1.4045 0.813 -1.3248 1.000 -1.6006 -1.1857 0.5 0.787 -1.4120 0.771 -1.3711 1.000 -1.6763 -1.2359 0.75 0.673 -1.4279 0.673 -1.3928 1.000 -1.7752 -1.2810 0.95 0.613 -1.4261 0.672 -1.4067 1.000 -1.9142 -1.2816 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 10th percentile 0 0.999 -1.4480 1.000 -1.6093 -1.1380 0.1 0.996 -1.4986 0.996 -1.5158 -1.1429 0.25 0.984 -1.5423 0.982 -1.5408 -1.1857 0.5 0.966 -1.6296 0.967 -1.6292 -1.2359 0.75 0.956 -1.7089 0.958 -1.7098 -1.2810 0.95 0.957 -1.7754 0.957 -1.7752 -1.2816 (c) Traditional Methods

50 (90, 95) Tolerance Limits

Table 4.14: Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 27 clusters and 285 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.966 -1.4863 0.954 -1.4688 0.979 -1.5529 -1.2815 0.1 0.954 -1.4747 0.936 -1.4565 0.980 -1.5143 -1.2774 0.25 0.926 -1.4335 0.904 -1.4151 0.981 -1.5118 -1.2530 0.5 0.857 -1.3076 0.824 -1.2927 0.980 -1.4185 -1.1659 0.75 0.858 -1.1287 0.829 -1.1180 0.985 -1.2331 -1.0184 0.95 0.784 -0.8862 0.756 -0.8801 0.974 -0.9703 -0.8254 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.969 -1.4392 0.918 -1.4324 1.000 -1.6010 -1.2815 0.1 0.946 -1.4243 0.907 -1.4231 1.000 -1.6039 -1.2774 0.25 0.935 -1.3949 0.887 -1.3939 1.000 -1.5862 -1.2530 0.5 0.913 -1.2908 0.874 -1.2951 1.000 -1.4844 -1.1659 0.75 0.927 -1.1271 0.894 -1.1362 1.000 -1.2976 -1.0184 0.95 0.878 -0.8890 0.898 -0.9188 1.000 -1.0560 -0.8254 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 10th percentile 0 0.943 -1.4460 0.993 -1.6078 -1.2815 0.1 0.956 -1.4796 0.980 -1.4944 -1.2774 0.25 0.982 -1.5324 0.985 -1.5308 -1.2530 0.5 0.995 -1.6086 0.994 -1.6077 -1.1659 0.75 1.000 -1.6381 1.000 -1.6380 -1.0184 0.95 0.999 -1.6677 0.999 -1.6678 -0.8254 (c) Traditional Methods

51 (90, 95) Tolerance Limits

Table 4.15: Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 27 clusters and 285 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.966 -1.4863 0.951 -1.4686 0.980 -1.5537 -1.2815, 0.1 0.928 -1.4726 0.905 -1.4560 0.973 -1.5259 -1.2794 0.25 0.872 -1.4653 0.848 -1.4464 0.967 -1.5925 -1.2694 0.5 0.783 -1.4606 0.760 -1.4407 0.941 -1.7356 -1.2359 0.75 0.668 -1.4159 0.667 -1.3942 0.929 -1.9480 -1.1857 0.95 0.644 -1.4225 0.629 -1.4025 0.932 -2.1499 -1.1455 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 10th percentile 0 0.973 -1.4752 0.919 -1.4324 1.000 -1.6008 -1.2815 0.1 0.928 -1.4655 0.878 -1.4279 1.000 -1.6168 -1.2794 0.25 0.872 -1.4550 0.835 -1.4159 1.000 -1.6591 -1.2694 0.5 0.782 -1.4475 0.752 -1.3903 1.000 -1.7648 -1.2359 0.75 0.694 -1.4043 0.669 -1.3193 1.000 -1.9390 -1.1857 0.95 0.630 -1.4038 0.602 -1.2625 1.000 -2.1133 -1.1455 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 10th percentile 0 0.941 -1.4460 0.993 -1.6073 -1.2815 0.1 0.950 -1.4859 0.968 -1.5023 -1.2794 0.25 0.962 -1.5463 0.961 -1.5446 -1.2694 0.5 0.946 -1.6355 0.948 -1.6351 -1.2359 0.75 0.956 -1.7101 0.956 -1.7101 -1.1857 0.95 0.959 -1.7693 0.960 -1.7691 -1.1455 (c) Methods from KM book

52 (75, 95) Tolerance Limits

Table 4.16: Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 0 0.958 -0.7954 0.952 -0.7928 0.980 -0.8262 0.1 0.915 -0.7909 0.907 -0.7880 0.949 -0.8150 0.25 0.869 -0.7921 0.861 -0.7892 0.961 -0.8453 0.5 0.801 -0.7879 0.798 -0.7852 0.932 -0.8939 0.75 0.753 -0.7927 0.752 -0.7904 0.939 -0.9509 0.95 0.698 -0.7765 0.691 -0.7737 0.935 -0.9875 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 0 0.968 -0.7956 0.900 -0.7538 0.999 -0.8592 0.1 0.936 -0.7921 0.838 -0.7512 0.987 -0.8683 0.25 0.887 -0.7906 0.780 -0.7482 0.985 -0.8888 0.5 0.818 -0.7880 0.710 -0.7446 0.958 -0.9225 0.75 0.763 -0.7924 0.663 -0.7462 0.950 -0.9661 0.95 0.703 -0.7740 0.601 -0.7245 0.956 -0.9934 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 0 0.968 -0.7793 0.959 -0.7733 0.958 -0.7727 0.1 0.951 -0.8100 0.951 -0.8070 0.951 -0.8039 0.25 0.965 -0.8486 0.967 -0.8508 0.959 -0.8436 0.5 0.944 -0.9021 0.946 -0.9073 0.939 -0.8994 0.75 0.949 -0.9543 0.949 -0.9582 0.948 -0.9534 0.95 0.949 -0.9715 0.951 -0.9725 0.953 -0.9714 (c) Traditional Methods

53 (75, 95) Tolerance Limits

Table 4.17: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.960 -0.6344 0.956 -0.6335 0.944 -0.6719 -0.5967 0.1 0.831 -0.6607 0.823 -0.6593 0.941 -0.7025 -0.6019 0.25 0.785 -0.7070 0.782 -0.7053 0.952 -0.7892 -0.6319 0.5 0.764 -0.7634 0.758 -0.7610 0.947 -0.8865 -0.6660 0.75 0.754 -0.7846 0.752 -0.7823 0.937 -0.9519 -0.6750 0.95 0.729 -0.7945 0.725 -0.7917 0.934 -1.0070 -0.6745 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.958 -0.6317 1.000 -0.6031 1.000 -0.7021 -0.5967 0.1 0.844 -0.6641 0.960 -0.6375 0.993 -0.7619 -0.6019 0.25 0.798 -0.7111 0.822 -0.6707 0.987 -0.8336 -0.6319 0.5 0.776 -0.7641 0.715 -0.7031 0.966 -0.9145 -0.6660 0.75 0.748 -0.7842 0.649 -0.7315 0.959 -0.96791 -0.6750 0.95 0.727 -0.7924 0.628 -0.7279 0.946 -1.01383 -0.6745 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 1.000 -0.7681 0.999 -0.7622 0.999 -0.76181 -0.5967 0.1 0.998 -0.8092 0.998 -0.8062 0.997 -0.8031 -0.6019 0.25 0.990 -0.8485 0.990 -0.8509 0.987 -0.8436 -0.6319 0.5 0.965 -0.9012 0.967 -0.9064 0.963 -0.8986 -0.6660 0.75 0.950 -0.9500 0.952 -0.9538 0.952 -0.9488 -0.6750 0.95 0.950 -0.9895 0.952 -0.9905 0.951 -0.9893 -0.6745 (c) Traditional Methods

54 (75, 95) Tolerance Limits

Table 4.18: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.953 -0.6004 0.946 -0.5981 0.999 -0.6866 -0.4901 0.1 0.903 -0.6403 0.900 -0.6380 0.977 -0.6980 -0.5353 0.25 0.863 -0.7006 0.862 -0.6980 0.966 -0.7817 -0.5841 0.5 0.793 -0.7405 0.785 -0.7380 0.949 -0.8611 -0.6348 0.75 0.748 -0.7770 0.740 -0.7746 0.949 -0.9410 -0.6637 0.95 0.727 -0.7816 0.723 -0.7788 0.944 -0.9928 -0.6739 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.975 -0.6108 0.918 -0.5686 1.000 -0.7188 -0.4901 0.1 0.935 -0.6549 0.858 -0.6161 0.999 -0.7593 -0.5353 0.25 0.892 -0.7106 0.820 -0.6706 0.987 -0.8283 -0.5841 0.5 0.805 -0.7447 0.715 -0.7030 0.970 -0.8897 -0.6348 0.75 0.752 -0.7774 0.649 -0.7315 0.962 -0.9573 -0.6637 0.95 0.725 -0.7789 0.628 -0.7279 0.952 -0.9985 -0.6739 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 1.000 -0.7745 1.000 -0.7689 1.000 -0.7679 -0.4901 0.1 1.000 -0. 8104 1.000 -0.8074 1.000 -0.8043 -0.5353 0.25 0.993 -0.8604 0.994 -0.8628 0.991 -0.8553 -0.5841 0.5 0.972 -0.8989 0.973 -0.9041 0.972 -0.8961 -0.6348 0.75 0.962 -0.9513 0.963 -0.9552 0.960 -0.9503 -0.6637 0.95 0.951 -0.9729 0.952 -0.9739 0.951 -0.9732 -0.6739 (c) Traditional Methods

55 (75, 95) Tolerance Limits

Table 4.19: Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.953 -0.7926 0.951 -0.7903 0.988 -0.8254 -0.6744 0.1 0.915 -0.7939 0.911 -0.7909 0.952 -0.8136 -0.6761 0.25 0.884 -0.7843 0.881 -0.7818 0.973 -0.8292 -0.6780 0.5 0.851 -0.7589 0.847 -0.7568 0.958 -0.8153 -0.6646 0.75 0.829 -0.6351 0.823 -0.7051 0.969 -0.7717 -0.6319 0.95 0.739 -0.7071 0.736 -0.6340 0.955 -0.6963 -0.5931 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.972 -0.7943 0.903 -0.7533 1.000 -0.8580 -0.6744 0.1 0.922 -0.7937 0.840 -0.7529 0.995 -0.8665 -0.6761 0.25 0.899 -0.7859 0.792 -0.7478 0.992 -0.8725 -0.6780 0.5 0.856 -0.7599 0.771 -0.7252 0.984 -0.8488 -0.6646 0.75 0.851 -0.7106 0.721 -0.7316 0.984 -0.7959 -0.6319 0.95 0.759 -0.6363 0.619 -0.6108 0.981 -0.7135 -0.5931 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.970 -0.7803 0.961 -0.7745 0.960 -0.7738 -0.6744 0.1 0.961 -0.8103 0.955 -0.8073 0.949 -0.8043 -0.6761 0.25 0.974 -0.8473 0.976 -0.8494 0.973 -0.8422 -0.6780 0.5 0.988 -0.8847 0.988 -0.8895 0.987 -0.8814 -0.6646 0.75 0.994 -0.9295 0.994 -0.9333 0.994 -0.9282 -0.6319 0.95 0.998 -0.9392 0.998 -0.9403 0.998 -0.9386 -0.5931 (c) Traditional Methods

56 (75, 95) Tolerance Limits

Table 4.20: Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 40 clusters and 10 observations from each cluster.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.952 -0.7922 0.948 -0.7896 0.989 -0.8249 -0.6744 0.1 0.923 -0.7915 0.918 -0.7888 0.960 -0.8166 -0.6725 0.25 0.870 -0.7781 0.861 -0.7754 0.955 -0.8371 -0.6637 0.5 0.808 -0.7471 0.803 -0.7445 0.954 -0.8562 -0.6348 0.75 0.791 -0.7040 0.791 -0.7017 0.949 -0.8638 -0.5841 0.95 0.717 -0.6392 0.715 -0.6366 0.954 -0.8568 -0.5136 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.978 -0.7936 0.897 -0.7524 0.998 -0.8576 -0.6744 0.1 0.936 -0.7926 0.836 -0.7513 0.997 -0.8697 -0.6725 0.25 0.885 -0.7794 0.776 -0.7378 0.987 -0.8811 -0.6637 0.5 0.819 -0.7518 0.729 -0.7093 0.976 -0.8913 -0.6348 0.75 0.805 -0.7126 0.721 -0.6685 0.961 -0.8915 -0.5841 0.95 0.740 -0.6461 0.664 -0.5989 0.973 -0.8780 -0.5136 (b) Method that involve kernel density estimation

MO V KM 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.969 -0.7784 0.962 -0.7728 0.962 -0.7718 -0.6744 0.1 0.963 -0.8115 0.957 -0.8085 0.957 -0.8055 -0.6725 0.25 0.964 -0.8481 0.964 -0.8504 0.958 -0.8429 -0.6637 0.5 0.979 -0.9036 0.980 -0.9087 0.979 -0.9010 -0.6348 0.75 0.982 -0.9551 0.983 -0.9590 0.981 -0.9538 -0.5841 0.95 0.994 -0.9798 0.994 -0.9808 0.994 -0.9800 -0.5136 (c) Traditional Methods

57 (75, 95) Tolerance Limits

Table 4.21: Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 0 0.956 -0.7630 0.949 -0.76260 0.983 -0.8060 0.1 0.904 -0.7637 0.902 -0.7632 0.969 -0.7985 0.25 0.831 -0.7626 0.832 -0.7623 0.971 -0.8350 0.5 0.762 -0.7602 0.762 -0.7600 0.969 -0.8813 0.75 0.711 -0.7653 0.710 -0.7650 0.957 -0.9298 0.95 0.685 -0.7634 0.684 -0.7630 0.965 -0.9798 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 0 0.972 -0.7648 0.884 -0.7419 0.999 -0.8381 0.1 0.920 -0.7652 0.827 -0.7419 0.993 -0.8519 0.25 0.835 -0.7638 0.798 -0.7417 0.994 -0.8751 0.5 0.759 -0.7622 0.739 -0.7398 0.980 -0.9064 0.75 0.717 -0.7674 0.677 -0.7402 0.974 -0.9423 0.95 0.688 -0.7641 0.653 -0.7361 0.968 -0.9852 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 0 0.937 -0.7584 0.985 -0.8253 0.1 0.952 -0.7827 0.953 -0.7800 0.25 0.955 -0.8141 0.952 -0.8126 0.5 0.955 -0.8562 0.955 -0.8560 0.75 0.952 -0.8946 0.951 -0.8945 0.95 0.945 -0.9241 0.947 -0.9239 (c) Traditional Methods

58 (75, 95) Tolerance Limits

Table 4.22: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.953 -0.6253 0.951 -0.6251 0.941 -0.6740 -0.5967 0.1 0.820 -0.6465 0.818 -0.6463 0.959 -0.6988 -0.6019 0.25 0.740 -0.6872 0.737 -0.6869 0.973 -0.7811 -0.6319 0.5 0.712 -0.7339 0.714 -0.7335 0.968 -0.8702 -0.6660 0.75 0.701 -0.7637 0.698 -0.7632 0.983 -0.9379 -0.6750 0.95 0.722 -0.7722 0.720 -0.7715 0.966 -0.9883 -0.6745 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.959 -0.6232 0.992 -0.5999 1.000 -0.6949 -0.5967 0.1 0.841 -0.6502 0.741 -0.6320 0.997 -0.7535 -0.6019 0.25 0.754 -0.6904 0.717 -0.6730 0.993 -0.8191 -0.6319 0.5 0.727 -0.7364 0.696 -0.7164 0.983 -0.8942 -0.6660 0.75 0.710 -0.7645 0.676 -0.7392 0.989 -0.9507 -0.6750 0.95 0.727 -0.7727 0.681 -0.7414 0.973 -0.9931 -0.6745 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 25th Percentile 0 1.000 -0.7497 1.000 -0.8505 -0.5967 0.1 0.998 -0.7795 0.994 -0.7770 -0.6019 0.25 0.985 -0.8100 0.987 -0.8082 -0.6319 0.5 0.958 -0.8508 0.955 -0.8503 -0.6660 0.75 0.958 -0.8943 0.958 -0.8938 -0.6750 0.95 0.958 -0.9267 0.957 -0.9267 -0.6745 (c) Traditional Methods

59 (75, 95) Tolerance Limits

Table 4.23: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.942 -0.5709 0.942 -0.5707 0.999 -0.6653 -0.4901 0.1 0.877 -0.6139 0.876 -0.6137 0.992 -0.6849 -0.5353 0.25 0.813 -0.6647 0.812 -0.6644 0.981 -0.7599 -0.5841 0.5 0.764 -0.7230 0.765 -0.7224 0.983 -0.8533 -0.6348 0.75 0.708 -0.7481 0.710 -0.7474 0.969 -0.9188 -0.6637 0.95 0.668 -0.7585 0.666 -0.7581 0.970 -0.9746 -0.6739 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.972 -0.5816 0.899 -0.5536 1.000 -0.6981 -0.4901 0.1 0.930 -0.6276 0.870 -0.6063 0.999 -0.7460 -0.5353 0.25 0.849 -0.6755 0.806 -0.6539 0.997 -0.8022 -0.5841 0.5 0.791 -0.7296 0.753 -0.7030 0.988 -0.8779 -0.6348 0.75 0.721 -0.7490 0.675 -0.7255 0.975 -0.9318 -0.6637 0.95 0.669 -0.7585 0.646 -0.7294 0.980 -0.9789 -0.6739 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 25th Percentile 0 1.000 -0.7547 1.000 -0.8303 -0.4901 0.1 1.000 -0.7843 1.000 -0.7805 -0.5353 0.25 0.996 -0.8145 0.996 -0.8131 -0.5841 0.5 0.984 -0.8582 0.984 -0.8585 -0.6348 0.75 0.956 -0.8892 0.956 -0.8892 -0.6637 0.95 0.960 -0.9158 0.951 -0.9156 -0.6739 (c) Traditional Methods

60 (75, 95) Tolerance Limits

Table 4.24: Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.944 -0.7644 0.944 -0.7639 0.977 -0.8058 -0.6744 0.1 0.918 -0.7641 0.919 -0.7641 0.983 -0.7989 -0.6761 0.25 0.874 -0.7603 0.870 -0.7596 0.977 -0.8158 -0.6780 0.5 0.828 -0.7391 0.827 -0.7388 0.984 -0.8090 -0.6646 0.75 0.792 -0.6909 0.789 -0.6906 0.982 -0.7635 -0.6319 0.95 0.741 -0.6272 0.746 -0.6270 0.973 -0.6893 -0.5931 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.966 -0.7658 0.885 -0.7433 0.999 -0.8391 -0.6744 0.1 0.932 -0.7665 0.871 -0.7442 1.000 -0.8527 -0.6761 0.25 0.895 -0.7618 0.823 -0.7401 0.994 -0.8567 -0.6780 0.5 0.848 -0.7409 0.776 -0.7220 0.994 -0.8390 -0.6646 0.75 0.810 -0.6944 0.783 -0.6775 0.993 -0.7858 -0.6319 0.95 0.750 -0.6284 0.643 -0.6099 0.983 -0.7043 -0.5931 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 25th Percentile 0 0.949 -0.7607 0.983 -0.8150 -0.6744 0.1 0.964 -0.7847 0.965 -0.7811 -0.6761 0.25 0.978 -0.8104 0.983 -0.8094 -0.6780 0.5 0.993 -0.8490 0.994 -0.8487 -0.6646 0.75 0.993 -0.8788 0.994 -0.8783 -0.6319 0.95 0.998 -0.8850 0.998 -0.8846 -0.5931 (c) Traditional Methods

61 (75, 95) Tolerance Limits

Table 4.25: Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 62 clusters and 672 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.944 -0.7644 0.945 -0.7643 0.999 -0.8387 -0.6744 0.1 0.890 -0.7581 0.885 -0.7580 0.998 -0.8478 -0.6725 0.25 0.833 -0.7517 0.832 -0.7512 0.991 -0.8655 -0.6637 0.5 0.766 -0.7144 0.763 -0.7141 0.984 -0.8670 -0.6348 0.75 0.728 -0.6742 0.729 -0.6741 0.988 -0.8655 -0.5841 0.95 0.656 -0.5938 0.659 -0.5935 0.975 -0.8336 -0.5136 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.967 -0.7658 0.886 -0.7433 1.000 -0.8205 -0.6744 0.1 0.917 -0.7600 0.843 -0.7373 1.000 -0.8335 -0.6725 0.25 0.852 -0.7547 0.788 -0.7309 0.993 -0.8541 -0.6637 0.5 0.784 -0.7199 0.725 -0.6985 0.987 -0.8590 -0.6348 0.75 0.758 -0.6838 0740. -0.6600 0.987 -0.8596 -0.5841 0.95 0.680 -0.6014 0.660 -0.5735 0.976 -0.8299 -0.5136 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 25th Percentile 0 0.949 -0.7607 0.984 -0.8150 -0.6744 0.1 0.953 -0.7774 0.954 -0.7745 -0.6725 0.25 0.955 -0.8118 0.957 -0.8105 -0.6637 0.5 0.978 -0.8464 0.971 -0.8463 -0.6348 0.75 0.991 -0.8982 0.992 -0.8983 -0.5841 0.95 0.996 -0.9107 0.996 -0.9104 -0.5136 (c) Traditional Methods

62 (75, 95) Tolerance Limits

Table 4.26: Results from 1000 simulations where the data is generated with τi following normal distribution and ij also following a normal distribution. Each dataset had 27 clusters and 285 observations.

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 0 0.970 -0.8178 0.834 -0.7587 0.998 -0.9119 0.1 0.900 -0.8144 0.775 -0.7553 0.992 -0.9343 0.25 0.851 -0.8134 0.736 -0.7551 0.988 -0.9731 0.5 0.767 -0.8167 0.702 -0.7554 0.978 -1.0340 0.75 0.708 -0.8025 0.633 -0.7426 0.975 -1.0937 0.95 0.666 -0.8159 0.625 -0.7459 0.977 -1.1836 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 0 0.947 -0.8092 0.996 -0.9752 0.1 0.948 -0.8442 0.964 -0.8565 0.25 0.954 -0.8939 0.952 -0.8922 0.5 0.959 -0.9627 0.958 -0.9623 0.75 0.953 -1.0161 0.951 -1.0161 0.95 0.957 -1.0692 0.958 -1.0693 (c) Traditional Methods

63 (75, 95) Tolerance Limits

Table 4.27: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a log normal distribution. Each dataset had 27 clusters and 285 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.964 -0.6423 0.958 -0.6402 0.975 -0.7231 -0.5569 0.1 0.794 -0.6723 0.785 -0.6694 0.964 -0.7522 -0.6019 0.25 0.756 -0.7310 0.751 -0.7264 0.971 -0.8735 -0.6319 0.5 0.727 -0.7786 0.712 -0.7723 0.979 -1.0033 -0.6660 0.75 0.708 -0.8142 0.696 -0.8077 0.969 -1.0973 -0.6750 0.95 0.668 -0.8175 0.660 -0.8110 0.961 -1.1713 -0.6745 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.962 -0.6383 0.952 -0.6042 1.000 -0.7021 -0.5569 0.1 0.804 -0.6745 0.705 -0.6403 0.993 -0.7620 -0.6019 0.25 0.776 -0.7325 0.716 -0.6889 0.987 -0.8336 -0.6319 0.5 0.736 -0.7770 0.642 -0.7309 0.966 -0.9146 -0.6660 0.75 0.700 -0.8083 0.640 -0.7459 0.959 -0.9679 -0.6750 0.95 0.660 -0.8115 0.601 -0.7377 0.946 -1.0138 -0.6745 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 25th Percentile 0 1.000 -0.7996 1.000 -1.0196 -0.5569 0.1 0.996 -0.8421 0.998 -0.8652 -0.6019 0.25 0.979 -0.8955 0.978 -0.8950 -0.6319 0.5 0.972 -0.9635 0.973 -0.9631 -0.6660 0.75 0.951 -1.0200 0.948 -1.0199 -0.6750 0.95 0.933 -1.0582 0.934 -1.0581 -0.6745 (c) Traditional Methods

64 (75, 95) Tolerance Limits

Table 4.28: Results from 1000 simulations where the data is generated with τi following normal distribution and ij following a double exponential distribution. Each dataset had 27 clusters and 285 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.959 -0.6340 0.948 -0.6265 0.999 -0.7445 -0.4901 0.1 0.871 -0.6632 0.860 -0.6564 0.974 -0.7530 -0.5353 0.25 0.841 -0.7228 0.830 -0.7162 0.981 -0.8591 -0.6348 0.5 0.748 -0.7686 0.739 -0.7619 0.973 -0.9846 -0.6660 0.75 0.700 -0.7986 0.694 -0.7923 0.958 -1.0847 -0.6637 0.95 0.641 -0.8070 0.628 -0.8003 0.964 -1.1695 -0.6745 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.986 -0.6407 0.869 -0.5833 1.000 -0.7837 -0.4901 0.1 0.904 -0.6761 0.809 -0.6231 1.000 -0.8322 -0.5353 0.25 0.860 -0.7304 0.775 -0.6749 0.990 -0.9123 -0.5841 0.5 0.763 -0.7686 0.716 -0.7234 0.979 -1.0129 -0.6348 0.75 0.703 -0.7957 0.659 -0.7392 0.967 -1.0971 -0.6637 0.95 0.638 -0.8008 0.607 -0.7305 0.967 -1.1709 -0.6739 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 25th Percentile 0 1.000 -0.8099 1.000 -0.9854 -0.4901 0.1 0.999 -0.8449 0.999 -0.8620 -0.5353 0.25 0.993 -0.9034 0.995 -0.9017 -0.5841 0.5 0.967 -0.9708 0.967 -0.9699 -0.6348 0.75 0.949 -1.0200 0.948 -1.0202 -0.6637 0.95 0.945 -1.0550 0.948 -1.0553 -0.6739 (c) Traditional Methods

65 (75, 95) Tolerance Limits

Table 4.29: Results from 1000 simulations where the data is generated with τi following log normal distribution and ij following a normal distribution. Each dataset had 27 clusters and 285 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.951 -0.8224 0.948 -0.8158 0.985 -0.8781 -0.6744 0.1 0.913 -0.8165 0.895 -0.8094 0.967 -0.8645 -0.6761 0.25 0.870 -0.8081 0.862 -0.8014 0.977 -0.8923 -0.6780 0.5 0.774 -0.7672 0.762 -0.7612 0.975 -0.8828 -0.6646 0.75 0.781 -0.7150 0.770 -0.7108 0.982 -0.8295 -0.6319 0.95 0.720 -0.6467 0.715 -0.6440 0.977 -0.7485 -0.5931 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.965 -0. 8196 0.832 -0.7597 0.997 -0.9135 -0.6744 0.1 0.927 -0.8131 0.802 -0.7528 0.990 -0.9281 -0.6761 0.25 0.875 -0.8048 0.747 -0.7482 0.988 -0.9421 -0.6780 0.5 0.782 -0.7666 0.709 -0.7253 0.988 -0.9210 -0.6646 0.75 0.787 -0.7162 0.713 -0.6819 0.990 -0.8610 -0.6319 0.95 0.722 -0.6468 0.634 -0.6155 0.988 -0.7766 -0.5931 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 25th Percentile 0 0.937 -0.8104 0.994 -0.9798 -0.6744 0.1 0.954 -0.8414 0.967 -0.8561 -0.6761 0.25 0.968 -0.8858 0.972 -0.8838 -0.6780 0.5 0.981 -0.9425 0.980 -0.9414 -0.6646 0.75 0.990 -0.9758 0.991 -0.9755 -0.6319 0.95 0.996 -0.9974 0.996 -0.9972 -0.5931 (c) Traditional Methods

66 (75, 95) Tolerance Limits

Table 4.30: Results from 1000 simulations where the data is generated with τi following double exponential distribution and ij following a normal distribution. Each dataset had 27 clusters and 285 observations.

NPboot NP NPHD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.951 -0.8224 0.949 -0.8157 0.986 -0.8788 -0.6744 0.1 0.899 -0.8154 0.888 -0.8085 0.966 -0.8735 -0.6725 0.25 0.841 -0.8065 0.827 -0.8000 0.973 -0.9233 -0.6637 0.5 0.786 -0.7861 0.774 -0.7794 0.961 -0.9849 -0.6348 0.75 0.710 -0.7240 0.698 -0.7174 0.966 -1.0081 -0.5841 0.95 0.663 -0.6675 0.654 -0.6610 0.974 -1.0583 -0.5136 (a) Order Statistics

SB SJ SB W SB HD 2 στ CP Mean CP Mean CP Mean 25th Percentile 0 0.969 -0.8196 0.832 -0.7598 0.996 -0.9134 -0.6744 0.1 0.905 -0.8139 0.798 -0.7561 0.993 -0.9384 -0.6725 0.25 0.835 -0.8039 0.752 -0.7470 0.990 -0.9717 -0.6637 0.5 0.785 -0.7862 0.708 -0.7270 0.980 -1.0179 -0.6348 0.75 0.719 -0.7283 0.660 -0.6680 0.981 -1.0305 -0.5841 0.95 0.668 -0.6682 0.630 -0.5964 0.982 -1.073 -0.5136 (b) Method that involve kernel density estimation

KM LLI 2 στ CP Mean CP Mean 25th Percentile 0 0.941 -0.8104 0.994 -0.9792 -0.6744 0.1 0.951 -0.8474 0.967 -0.8640 -0.6725 0.25 0.964 -0.8957 0.961 -0.8937 -0.6637 0.5 0.961 -0.9659 0.960 -0.9655 -0.6348 0.75 0.973 -1.0180 0.975 -1.0178 -0.5841 0.95 0.986 -1.0550 0.987 -1.0550 -0.5136 (c) Traditional Methods

67 Chapter 5

Application

There is a huge demand for lumber and in an to attempt to satisfy this demand there are numerous lumber mills across the United States. One way to study the marginal distribution of a strength measure of a board may be to collect boards from a random sample of mills in the United

States and analyze the data using model (1.1). Recall that the variance of an observation in model

2 2 (1.1) is στ + σ . The ﬁrst term of the variance is due to diﬀerences in manufacturing practices from mill to mill. The second of the variance can be explained by the natural variation between the trees themselves.

There are many ways to quantify the quality of lumber; three examples are moisture content, modulus of elasticity, and modulus of rupture. The modulus of elasticity (MOE) is a measure of the maximum amount of force that a board can withstand and still retain its shape. The modulus of rupture (MOR) is the maximum amount of force a board can withstand without breaking. There is federal regulation to ensure that wood used for construction is of a certain quality; these standards are deﬁned in terms of lower percentiles (for example 10th or 25th) of ratio of measures of strength.

Our data set has information from 41 diﬀerent mills (clusters), and there is slight imbalance in the data. For most mills, we have 10 observations, but for two mills we only have 7 and 8 observations, and for two others we have 9 observations. Lumber strength does not follow a normal distribution, but rather has been successfully modeled with Weibull distribution. Figure 5.1 shows histograms of raw data, estimated cluster means, and residuals, yij − yi·, for MOE, MOR, and MOR/MOE.

68 Histogram of Raw Data for MOE Histogram of Cluster Means for MOE Histogram of Residuals for MOE 15 80 80 10 60 60 Frequency Frequency Frequency 40 40 5 20 20 0 0 0

0.5 1.0 1.5 2.0 2.5 1.0 1.2 1.4 1.6 1.8 −1.0 −0.5 0.0 0.5 1.0

MOE MOE MOE

(a) Histogram of MOE observed (b) Histogram of estimated clus- (c) Histogram of Residuals of values ter eﬀects of MOE MOE

Histogram of Raw Data for MOR Histogram of Cluster Means for MOR Histogram of Residuals for MOR 4e−04 0.00020 0.00015 3e−04 0.00015 0.00010 2e−04 Density Density Density 0.00010 0.00005 1e−04 0.00005 0e+00 0.00000 0.00000 2000 4000 6000 8000 10000 12000 14000 3000 4000 5000 6000 7000 8000 9000 −6000 −4000 −2000 0 2000 4000 6000

MOR MOR MOR

(d) Histogram of MOR observed (e) Histogram of estimated clus- (f) Histogram of Residuals of values ter eﬀects of MOR MOR

Histogram of Residual for Ratio Histogram of Cluster Means for Ratio Histogram of Residual for Ratio 0.00030 8e−04 0.00025 0.00025 0.00020 6e−04 0.00020 0.00015 Density Density Density 0.00015 4e−04 0.00010 0.00010 2e−04 0.00005 0.00005 0e+00 0.00000 0.00000 2000 4000 6000 8000 10000 3500 4000 4500 5000 −4000 −2000 0 2000 4000

Ratio Ratio Ratio

(g) Histogram of MOE/MOR (h) Histogram of estimated clus- (i) Histogram of Residuals of observed values ter eﬀects of MOE/MOR MOE/MOR

Figure 5.1: Visually summary of MOE, MOR, and MOE/MOR.

69 To check the appropriateness of our model, we perform Levene’s test

2 2 2 H0 : σ1 = σ2 = ··· = σ41

H1 : At least one diﬀerence.

The results are recorded in Table 5.1 for MOE, MOR and MOR/MOE. MOE and MOR appear to not have constant variance among clusters, since their respective p-values for Levene’s test are much smaller than 0.05. While MOR/MOE does have constant variance across clusters, since the p-value for Levene’s test is larger than 0.05. Hence (1.1) may be an appropriate model for MOR/MOE.

Quantity of Interest Mean Standard Deviation ICC[ Fobs P-value MOE 1.362021 0.3811267 0.1333063 1.911005 0.001135124 MOR 5884.724 2497.742 0.08196682 3.039252 < 0.0001 MOR/MOE 4303.697 1294.481 0.04744078 1.218551 0.1781634

Table 5.1: Results from Levene’s test for several measure of lumber strength, as well as summary statistics. Here ICC[ is the estimated intra-class correlation for each quantity fo interest.

Kernel Density Estimate of MOE

1.2 SJ h resamp rootresamp 1.0 0.8 0.6 Density 0.4 0.2 0.0

0.5 1.0 1.5 2.0 2.5

MOE

Figure 5.2: Kernel density estimate of marginal distribution of MOE of board. The blue kernel density estimate uses resampling to compute the bandwidth, while the red uses the Sheather and Jones plug in method.

A ﬁrst step to obtaining estimates of percentiles is to get a good estimate of the marginal density of any quality measure. We will use two kernel smoothing to attempt to estimate the marginal distribution of MOE, MOR, or MOR/MOE. We will use bandwidth selection methods outlined in Section 3. In particular, we compare bandwidths found using resampling (3.3) and the

70 Kernel Density Estimate of MOR

SJ h resamp rootresamp 0.00020 0.00015 Density 0.00010 0.00005 0.00000 2000 4000 6000 8000 10000 12000 14000

MOR

Figure 5.3: Kernel density estimate of marginal distribution of MOR of board. The blue kernel density estimate uses resampling to compute the bandwidth, while the red uses the Sheather and Jones plug in method.

Kernel Density Estimate of Ratio of MOR and MOE

SJ h resamp rootresamp 0.00025 0.00020 0.00015 Density 0.00010 0.00005 0.00000 2000 4000 6000 8000 10000

MOR/MOE

Figure 5.4: Kernel density estimate of marginal distribution of the ratio MOR/MOE of board. The blue kernel density estimate uses resampling to compute the bandwidth, while the red uses the Sheather and Jones plug in, standard plug in method developed by Sheather and Jones. From Figures 5.3 and 5.4, we can see that using the Sheather and Jones plug in method leads to a density estimate that is noisy and attempts to capture too many features of the populations. On the other hand, the kernel density which uses resampling techniques to compute bandwidth produces a density estimate that is much smoother yet seems to estimate the population well. We see that in Figure 5.2, there is not a dramatic difference in the density estimates. A possible explanation of this difference is the variance of MOE is significantly smaller than that of MOR or the ratio MOR/MOE.

71 Lastly, we use methods in Chapter 4, namely SB HD, KM, and LLI, to estimate the 10th and 25thpercentile of MOE, MOR, and MOE/MOR. The results are summarized below in Table

5.2. Summary statistics are provided in Table 5.1. We see that all three methods produce similar estimates for the tolerance limits relative their respective standard deviation.

Estimated 10th Percentile Estimated 25th Percentile KM LLI SB HD NP KM LLI SD HD NP MOE 0.811 0.812 0.817 0.910 1.052 1.053 1.030 1.117 MOR 2301.73 2312.93 2275.10 2961.08 3883.66 3885.90 3526.96 3982.19 MOE/MOR 2461.12 2470.62 2415.06 2600.61 3280.73 3286.00 3114.78 3284.73

Table 5.2: Estimate for 10th and 25th percentile of measure of lumber strength.

72 Chapter 6

Conclusion

We have shown that kernel density estimation can be extended to situations where complex sampling schemes are employed. We proposed two novel kernel density estimators that combined density estimates which were based upon collections of i.i.d. observations. Approximations of

AMISE were derived and a variety of bandwidth selections were explored for each proposed kernel density estimators. Simulations indicate that accounting for hierarchical data, with both balanced and unbalanced data, using our kernel density estimators can provide more accurate density estimates for high values of intra-class correlation than traditional kernel density estimators using

Sheather and Jones for bandwidth selection.

We also have illustrated that using bootstrapping techniques that mimic the sum of squares, suggested by Davidson and Hinkley (1999), for one-way random eﬀects models can be useful for estimation of tolerance limits for hierarchical data. We also discovered that applying the smooth bootstrap to the estimated cluster means and residuals, from the decomposition that Davidson and

Hinkley suggested, can lead to sensible tolerance limits. We found that drawing bootstrap resamples from a kernel density estimates, that either attempt to account for the sampling scheme or treat data as i.i.d., produced tolerance limits with low coverage probabilities. Methods that employed smoothing and bootstrapping performed consistently for each data conﬁguration considered, while methods which assume normality produced reasonable tolerance limits when the marginal density was symmetric. If the underlying population density is skewed, then Davidson and Hinkley and it’s smoothed version produce the best tolerance limits of the methods explored.

73 Chapter 7

Future Work

In the application chapter, we performed Levene’s test to determine if there is signiﬁcant evidence indicating that there is heteroskedasticity between the groups. Model (1.1) is not appropriate if there is heteroskedasticity. A model that can accommodate diﬀerence in the variability from group to group is

yij = µ + τi + ij i = 1, . . . , a, j = 1, . . . , ni, (7.1)

where {τ1, τ2, . . . , τa} is a collection of i.i.d random variables.. We assume that observations from each group { , , } have mean zero and variance σ2 for each cluster i = 1, . . . , a. Each i1 i2 ini i collection of {i1, i2, . . . , ini } is allowed to have a diﬀerent variance, selecting one observation from each cluster only provides an independent but not identically distributed sample. It may be possible to think of each observation coming from a mixture model. If a priori information is available,

we can specify a distribution for σi and using Gibbs sampling techniques to estimate a posterior distribution for each observation. Once this posterior distribution is estimated, we could generate samples from it and estimate tolerance limits. A downside to a Bayesian approach is that we move from making as few assumptions as possible to using a method that requires parameterizing τi and

ij. One sided (1 − p, 1 − α) tolerance intervals are (1 − α)100% conﬁdence intervals for the pth percentile of the population of interest. When considering two sided tolerance limits, estimation of tolerance limits is not as simple as estimating a percentile. Rebafka (2007) proposed a method

74 for ﬁnding two sided tolerance limits that utilizes the double bootstrap and caters to hierarchical linear models. A potential downside to this method is that both the upper and lower tolerance limit will be sample values. An avenue for future research could be to attempt to extend the smoothed bootstrap methodology proposed in algorithm 2, in chapter 4, to two sided tolerance intervals.

75 Appendices

76 Appendix A

Marginal Distribution with

Non-Normal Errors or Cluster

Eﬀects

We wish to perform simulations to explore the robustness of methods developed for both kernel density estimation and ﬁnding tolerance limits. To that end, we generate datasets from (1.1) with diﬀerent distributions for τi and ij, where we assume that Yij has mean 0 and unit variance. We will consider a distribution that is skewed right and a distribution that is symmetric.

Log Normal Distribution

Suppose that U is a normal random variable with mean θ ∈ R and variance σ2 > 0. If we consider the transformation V = log(U), then V is said to follow a log normal distribution with location parameter θ and scale parameter σ2. V has pdf

1 (log(v) − θ)2 f(v) = √ exp − , vσ 2π 2σ2 and V has mean exp(θ + σ2/2) and variance (exp(σ2) − 1) exp(2θ + σ2).

77 If we consider the hierarchical model (1.1) with µ = 0, we are left with

Yij = τi + ij.

i.i.d 2 i.i.d Suppose that τi ∼ N(0, στ ) and ij ∼ LN(ν, η), we seek a parameterization of ν and η such that 2 ij has a mean of 0 and a variance of 1 − στ . Note that since possible values of a log normal random variable are positive, then the mean of a log normal random variable must also be positive. We will consider the random variable ij − 1 which can have mean 0. We can solve the following system of equations:

E[ij − 1] = 0 (A.1) 2 Var(ij − 1) = Var(ij) = 1 − στ , which can be rewritten as exp(ν + η2/2) = 1 (A.2) 2 2 2 (exp(η ) − 1) exp(2ν + η ) = 1 − στ .

2 2 2 2 2 This yields ν = −η /2 and η = log(2 − στ ). Finally, if ij ∼ LN(− log(2 − στ )/2, log(2 − στ )), then

Yij − 1 has mean 0 and variance 1. Through the use of convolutions we can compute both the pdf and the cdf of the sum of a normal random variable and log normal random variable. Using the above parameterization, Table

A.1 contains quantities we will use to assess the performance of tolerance limits. Additionally, plots are provided, in Figure A.1, of the marginal density of Yij − 1. Notice that with extreme high values

2 of στ , the marginal pdf does not depart much from a standard normal pdf.

ρ \Quantile 0.1 0.25 0 -0.7516 -0.5967 0.1 -0.8848 -0.6019 0.25 -1.0184 -0.6319 0.5 -1.1659 -0.6660 0.75 -1.2806 -0.6750 0.95 -1.2816 -0.6745

th th 2 2 Table A.1: The 10 and 25 percentile of the sum of a τi ∼ N(0, στ ) and ij − 1 ∼ LN(0, 1 − στ ) with varying amounts of correlation.

i.i.d 2 i.i.d 2 Let’s consider model (1.1) τi ∼ LN(ν, η ) and ij ∼ N(0, σ ), we seek a parameterization 2 of ν and η such that τi has a mean of 0 and a variance of στ . We can solve the following system of

78 2 Density of τi + εij − 1 with various values of στ 2 2 στ = 0 στ = 0.1

fτ+ε−1(x) 0.6 fτ+ε−1(x)

0.8 φ(x) φ(x) ) ) 0.4 x x ( ( f f 0.4 0.2 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x

2 2 στ = 0.25 στ = 0.5

fτ+ε−1(x) 0.4 fτ+ε−1(x)

0.4 φ(x) φ(x) 0.3 ) ) x x ( ( f f 0.2 0.2 0.1 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x

2 2 στ = 0.75 στ = 0.95 0.4 0.4

fτ+ε−1(x) fτ+ε−1(x) 0.3 0.3 φ(x) φ(x) ) ) x x ( ( 0.2 f f 0.2 0.1 0.1 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x

2 Figure A.1: Above are plots of the marginal pdf of an observation from model (1.1) if τi ∼ N(0, στ ) 2 and ij ∼ LN(0, 1 − στ ). equations:

E[τi − 1] = 0 (A.3) 2 Var(τi − 1) = Var(τi) = στ , which can be rewritten as exp(ν + η2/2) = 1 (A.4) 2 2 2 (exp(η ) − 1) exp(2ν + η ) = στ .

2 2 2 2 2 This yields ν = −η /2 and η = log(1 + στ ). Finally, if τi ∼ LN(− log(1 + στ )/2, log(1 + στ )), then Yij − 1 has mean 0 and variance 1. Table A.2 contains quantities we will use to assess the performance of tolerance limits. Additionally, plots are provided, in Figure A.2, of the marginal density of Yij − 1.

79 ρ \Quantile 0.1 0.25 0 –1.2815 -0.6744 0.1 -1.2794 -0.6761 0.25 -1.2530 -0.6780 0.5 -1.1659 -0.6646 0.75 -1.0184 -0.6319 0.95 -0.8254 -0.5931

th th 2 2 Table A.2: The 10 and 25 percentile of the sum of a τi − 1 ∼ LN(0, στ ) and ij ∼ N(0, 1 − στ ) with varying amounts of correlation.

2 Density of τi + εij − 1 with various values of στ 2 2 στ = 0 στ = 0.1 0.4 0.4

fτ+ε−1(x) fτ+ε−1(x) 0.3 0.3 φ(x) φ(x) ) ) x x ( ( 0.2 0.2 f f 0.1 0.1 0.0 0.0

−4 −2 0 2 4 −4 −2 0 2 4

x x

2 2 στ = 0.25 στ = 0.5 0.4 0.4 fτ+ε−1(x) fτ+ε−1(x)

0.3 φ(x) φ(x) 0.3 ) ) x x ( ( f f 0.2 0.2 0.1 0.1 0.0 0.0

−4 −2 0 2 4 −4 −2 0 2 4

x x

2 2 στ = 0.75 στ = 0.95 0.5

fτ+ε−1(x) fτ+ε−1(x) 0.6 0.4 φ(x) φ(x) 0.3 ) ) x x 0.4 ( ( f f 0.2 0.2 0.1 0.0 0.0

−4 −2 0 2 4 −4 −2 0 2 4

x x

Figure A.2: Above are plots of the marginal pdf of an observation from model (1.1) if τi − 1 ∼ 2 2 LN(0, στ ) and ij ∼ N(0, 1 − στ ).

Double Exponential (Laplace) Distribution

Let X be distributed double exponential with location parameter θ and scale parameter b has pdf 1 |x − θ| exp − . 2b b

80 X has mean θ and variance 2b2.

Again we consider the model (1.1) setting µ = 0,

yij = τi + ij.

i.i.d 2 i.i,d 2 2 If we assume that τi ∼ N(0, στ ) and ij ∼ DE (0, b) , and solve 2b = 1 − στ , which yields

q 2 q 2 1−στ 1−στ b = 2 . If ij ∼ DE 0, 2 , then Yij has mean of 0 and variance 1. Table A.3 lists the th th 10 and 25 percentile of distribution of a single Yij. We also include comparisons between the marginal distribution if Yij and a standard normal distribution, in Figure A.3, for several values of

2 στ .

2 Density of τi + εij with various values of στ 2 2 στ = 0 στ = 0.1

fτ+ε(x) fτ+ε(x) 0.6

φ(x) 0.4 φ(x) ) ) 0.4 x x ( ( f f 0.2 0.2 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x

2 2 στ = 0.25 στ = 0.5

fτ+ε(x) 0.4 fτ+ε(x) 0.4 φ(x) φ(x) 0.3 0.3 ) ) x x ( ( f f 0.2 0.2 0.1 0.1 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x

2 2 στ = 0.75 στ = 0.95 0.4 0.4 fτ+ε(x) fτ+ε(x) φ( ) φ( ) 0.3

0.3 x x ) ) x x ( ( 0.2 f f 0.2 0.1 0.1 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x

2 Figure A.3: Above are plots of the marginal pdf of an observation from model (1.1) if τi ∼ N(0, στ ) 2 and ij ∼ DE(0, 1 − στ ).

Let’s consider the model (1.1) setting µ = 0,

yij = τi + ij,

i.i,d q 2 i.i.d 2 2 2 2 στ where τi ∼ DE(0, στ ) and ij ∼ N 0, 1 − στ . We simply solve 2b = στ , which yields b = 2 . If

81 q 2 στ th th τi ∼ DE 0, 2 , the Yij has mean of 0 and variance 1. Table A.4 lists the 10 and 25 percentile of distribution of a single Yij. We also include comparisons between the marginal distribution if Yij

2 and a standard normal distribution, in Figure A.4, for several values of στ .

2 Density of τi + εij with various values of στ 2 2 στ = 0 στ = 0.1 0.4 0.4

fτ+ε(x) fτ+ε(x) 0.3 φ(x) 0.3 φ(x) ) ) x x ( ( 0.2 0.2 f f 0.1 0.1 0.0 0.0

−4 −2 0 2 4 −4 −2 0 2 4

x x

2 2 στ = 0.25 στ = 0.5 0.4 0.4 fτ+ε(x) fτ+ε(x)

0.3 φ(x) φ(x) 0.3 ) ) x x ( ( f f 0.2 0.2 0.1 0.1 0.0 0.0

−4 −2 0 2 4 −4 −2 0 2 4

x x

2 2 στ = 0.75 στ = 0.95

fτ+ε(x) fτ+ε(x) 0.5 0.4

φ(x) φ(x) 0.4 0.3 ) ) x x 0.3 ( ( f f 0.2 0.2 0.1 0.1 0.0 0.0

−4 −2 0 2 4 −4 −2 0 2 4

x x

2 Figure A.4: Above are plots of the marginal pdf of an observation from model (1.1) if τi ∼ DE(0, στ ) 2 and ij ∼ N(0, 1 − στ ).

ρ \Quantile 0.1 0.25 0 -1.1380 -0.4091 0.1 -1.1429 -0.5353 0.25 -1.1857 -0.5184 0.5 -1.2359 -0.6348 0.75 -1.2810 -0.6637 0.95 -1.2816 -0.6739

th th 2 2 Table A.3: The 10 and 25 percentile of the sum of a τi ∼ N(0, στ ) and ij ∼ DE(0, 1 − στ ) with varying amounts of correlation.

82 ρ \Quantile 0.1 0.25 0 -1.2815 -0.6744 0.1 -1.2794 -0.6725 0.25 -1.2694 -0.6637 0.5 -1.2359 -0.6348 0.75 -1.1857 -0.5841 0.95 -1.1455 -0.5136

Table A.4: The 10th and 25th percentile of the marginal distribution of observations from model 2 2 (1.1) when τi ∼ DE(0, στ ) and ij ∼ N(0, 1 − στ ).

83 Appendix B

Approximate Covariance of

Density Estimates in (3.3)

The following statement of Multivariate Taylor’s Theorem is from Large Sample Techniques for Statistics by Jiang and Jiming in 2010.

Theorem(Multivariate Taylor’s Theorem): Let f : D → R, where R ⊂ Rs, Suppose that there is a neighborhood of a, Nδ(a) ⊂ D such that f and it’s (l + 1)st partial derivatives are continuous in

Nδ(a). Then for x ∈ Nδ(a), we have

l X 1 k 1 l+1 f(x) = f(a) + (x − a)T ∇ f(a) + (x − a)T ∇ f(z) (B.1) k! (l + 1)! k=1 where z = tx + (1 − t)a for t ∈ [0, 1]. An expression for a ﬁrst order approximation is

∂f(a) f(x) ≈ f(a) + (x − a), (B.2) ∂xT which is an abuse of notation the derivative of f is taken with respect to each of the variables, then evaluated at a.

84 Taylor Expansion of a Kernel Function

Let K(·) be a kernel function as deﬁned in Chapter 2. A ﬁrst order Taylor approximation is given by 1 y − Y 1 y − µ (Y − µ) y − µ K 11 ≈ K + 11 K0 . (B.3) h h h h h2 h

Our aim is to derive an expression for the covariance of two kernel functions that are centered about two observations from the same cluster. In other words, we seek an expression for

1 y − Y 1 y − Y 1 y − Y y − Y Cov K 11 , K 12 = E K 11 K 12 h h h h h2 h h 1 y − Y 1 y − Y − E K 11 E K 12 (B.4) h h h h

Note that Y11 and Y12 are identically distributed. An approximation of the ﬁrst moment of (B.3) is

1 y − Y 1 y − µ (Y − µ) y − µ E K 11 ≈ E K + 11 K0 h h h h h2 h 1 y − µ = K h h

An approximation of the ﬁrst term in (B.4) is

1 y − Y y − Y E K 11 K 12 h2 h h 1 y − µ (Y − µ) y − µ y − µ (Y − µ) y − µ = E K + 11 K0 K + 12 K0 h2 h h h h h h " # " # 1 y − µ2 (Y − µ)(Y − µ) y − µ2 = E K + E 11 12 K0 h2 h h4 h 1 y − µ2 σ2 y − µ2 = K + τ K0 h2 h h4 h

Now we plug these two results into (B.4) to obtain

1 y − Y 1 y − Y σ2 y − µ2 Cov K 11 , K 12 ≈ τ K0 . (B.5) h h h h h4 h

85 Appendix C

Proofs

Proof of Lemma 3.1

ˆ∗ We ﬁrst need to account for the resampling so we take the mean of fj (y) with respect to the empirical cdf,

a 1 X y − Yij∗ E∗[fˆ∗(y)] = E∗ K j ah h i=1 a ni 1 X 1 X y − Yik = K . (C.1) ah ni h i=1 k=1

Taking the mean of (C.1), treating Yik as random,

a ni 1 X 1 X y − Yik E[E∗[fˆ∗(y)]] = E K j ah n h i=1 i i=1 a 1 X y − Yik = E K ah h i=1 h2µ (K)f 00(y) = f(y) + 2 + o(h2). (C.2) 2

Thus the bias is not aﬀected if we use a resampled kernel density estimator.

The second moment of a resampled kernel density, we take the second moment with respect to the empirical cdf yielding

86   a !2 1 X y − Yij∗ E∗[fˆ∗(y)2] = E∗ K j  ah h  i=1 a " 2# a 1 X y − Yij∗ 1 X X y − Yij∗ y − Yi0(j0)∗ = E∗ K + E∗ K K (ah)2 h (ah)2 h h i=1 i=1 i06=i a ni 2 a ni ni0 1 X 1 X y − Yik 1 X X 1 X X y − Yik y − Yi0k0 = 2 K + 2 K K . (ah) ni h (ah) nini0 h h i=1 k=1 i=1 i06=i k=1 k0=1 (C.3)

Now we take the mean of (C.3) with respect to Yik,

ˆ∗ 2 ∗ ˆ∗ 2 E[fj (y) ] = E[E [fj (y) ]]

" a ni 2# 1 X 1 X y − Yik = E 2 K (ah) ni h i=1 k=1   a ni ni0 1 X X 1 X X y − Yik y − Yi0k0 + E  2 K K  (ah) nini0 h h i=1 i6=i0 k=1 k0=1 " a 2# a 1 X y − Yik 1 X X y − Yik y − Yi0k0 = E K + E K E K (ah)2 h (ah)2 h h i=1 i=1 i06=i R(K)f(y) a − 1 h2µ (K)f 00(y) 2 = + o((ah)−1) + f(y) + 2 + o(h2) . ah a 2

ˆ∗ To obtain the variance (3.3), we subtract the squared mean from the second moment of fj (y), hence ˆ∗ fj (y) is a consistent estimator of f(y). This results from

ˆ∗ ∗ ˆ∗ 2 ∗ ˆ∗ 2 Var(fj (y)) = E[E [fj (y) ]] − E[E [fj (y)]] R(K)f(y) a − 1 h2µ (K)f 00(y) 2 = + f(y) + 2 + o(h2) ah a 2 h2µ (K)f 00(y) 2 − f(y) + 2 + o(h2) 2 R(K)f(y) 1 h2µ (K)f 00(y) 2 = − f(y) + 2 + o(h2) ah a 2 R(K)f(y) = + o((ah)−1). (C.4) ah

87 Notice that the variance of a kernel density estimator is the same asymptotically regardless if resampling is used. Also note that both the bias and variance go to 0 as a → ∞.

Proof of Lemma 3.2

We start out by ﬁnding the empirical covariance of two resampled density estimates, which yields

∗ ˆ∗ ˆ∗ ∗ ∗ ∗ ∗ ∗ 2 Cov (fj (y), fj0 (y)) = E [fj (y)fj0 (y)] − E [fj (y)] a a 1 X X y − Yij∗ y − Yi0(j0)∗ = E∗ K K − E∗[fˆ∗(y)]2 (ah)2 h h j i=1 i0=1 a ( ni " 2 # 1 X X y − Yik = E∗ K (j∗ = k, (j0)∗ = k) P (j∗ = k, (j0)∗ = k) (ah)2 h i=1 k=1   ni X X ∗ y − Yik y − Yik0 ∗ 0 ∗ 0 ∗ 0 ∗ 0  + E K K j = k, (j ) = k  P (j = k, (j ) = k ) h h k=1 k06=k  a a 1 X X y − Yij∗ y − Yi0(j0)∗ + E∗ K K − E∗[fˆ∗(y)]2 (ah)2 h h j i=1 i06=i a ( " 2 # 1 X 1 y − Yik = E∗ K (j∗ = k, (j0)∗ = k) (ah)2 n h i=1 i ni − 1 ∗ y − Yik y − Yik0 ∗ 0 ∗ 0 + E K K j = k, (j ) = k ni h h a a 1 X X y − Yij∗ y − Yi0(j0)∗ + E∗ K K − E∗[fˆ∗(y)]2 (ah)2 h h j i=1 i06=i   a ni 2 ni 1 X  1 X y − Yik 1 X X y − Yik y − Yik0  = K + K K (ah)2 n2 h n2 h h i=1  i k=1 i k=1 k06=k 

a ni ni0 1 X X 1 X X y − Yik y − Yi0k0 ∗ ˆ∗ 2 + 2 K K − E [fj (y)] . (ah) nini0 h h i=1 i06=i k=1 k0=1

88 To understand the joint behavior of two resampled kernel density estimates when the sample is considered random, we take the expected value of the empirical covariance

" a ( ni 2 ni ) ∗ ∗ 1 X 1 X y − Yik 1 X X y − Yik y − Yik0 Cov(f (y), f 0 (y)]) = E K + K K j j (ah)2 n2 h n2 h h i=1 i k=1 i k=1 k0=k  a ni ni0 1 X X 1 X X y − Yik y − Yi0k0 ˆ∗ 2 + 2 K K − E[fj (y)]  (ah) nini0 h h i=1 i06=i k=1 k0=1 a ( " 2# ) 1 X 1 y − Yik ni − 1 y − Yik y − Yik0 = E K + E K K (ah)2 n h n h h i=1 i i a 1 X X 1 y − Yik 1 y − Yi0k0 + E K E K − E[E∗[fˆ∗(y)]2] a2 h h h h j i=1 i06=i a ( 2 0 y−µ 2) 1 X 1 R(K)f(y) ni − 1 στ K ≈ + o(a−1) + h (ah)2 n h n h2 i=1 i i a 2 00 2 2 00 2 1 X X h µ2(K)f (y) h µ2(K)f (y) + f(y) + + o(h2) − f(y) + + o(h2) a2 2 2 i=1 i06=i a ( 2 2) X R(K)f(y) (ni − 1)σ y − µ = (ah)−1 + τ K0 + O(a−1), (C.5) an an h3 h i=1 i i

h i y−Yik y−Yik0 which is the desired result. The approximation of E K h K h is discussed in Ap- pendix C.

Proof of Theorem 3.1

Parzen (1962) showed that assuming (ah)1/2h2 → 0 (or h ∝ aβ where β < −2/5), and h → 0 as a → ∞, then 1/2 ˆ∗ (ah) (fj (y) − f(y)) → N (0,R(K)f(y)) .

Now we consider

 Q  1 X (ah)1/2(fˆ (y) − f(y)) = (ah)1/2 fˆ∗(y) − f(y) R Q j  j=1 Q 1 X = (ah)1/2(fˆ∗(y) − f(y)). Q j j=1

89 Since we are summing Q(< ∞) quantities, we can take the limit as a → ∞ and then sum Q dependent asymptotically normal random variables. Assuming that the squared bias goes to 0 faster than the

1/2 ˆ ˆ variance, (ah) (fR(y) − f(y)) has mean of 0. Now seek the variance of the random variable in (3.11),

 Q Q  1/2 ah X ∗ X X ∗ ∗ Var((ah) (fˆ (y) − f(y))) = Var(fˆ (y)) + Cov(fˆ (y), fˆ 0 (y)) R Q2  j j j  j=1 j=1 j06=j " a 2 0 # ah R(K)f(y) X R(K)f(y) (ni − 1)σ R(K ) = Q + (Q − 1)Q + τ , Q2 ah an an h3 i=1 i i which simpliﬁes to the desired result.

90 Bibliography

[1] W. Braun and L. Zhou. One sided tolerance limits via smoothing. Qualitative Technology & Quantitative Management, 5(4), 2008.

[2] Robert V. Breunig. Density estimation of clustered data. Econometric, 20(3), 2001. [3] A. C. Davison and D. V. Hinkley. Bootstrap Methods and Their Applications. Cambridge University Press, 1999. [4] J. Faraway and M. Jhun. Bootstrap choice of bandwidth for density estimation. Journal of American Statistical Association, 86(412), 1990. [5] Howard Gitlow and Hernan Awad. Intro stats student need both conﬁdence and tolerance (intervals). The American Statistician, 67(4), 2013. [6] Peter Hall. The Bootstrap and Edgeworth Expansion. Springer-Verlag, 1992.

[7] T. Hesterberg. What teachers should know about the bootstrap, resampling in the undergrad- uates statistics curriculum. 2014. [8] Elaine B. Hoﬀman, Pranab K. Sen, and Clarice R. Wienberg. Within-cluster resampling. Biometrika, 88(4), 2001.

[9] Jiming Jiang. Large Sample Techniques for Statistics. Springer, 2010. [10] K. Krishnamoorthy and T. Matthew. One-sided tolerance limits in balanced and unbalanced one-way random models based on generalized conﬁdence intervals. Technometrics, 46(1), 2004. [11] K. Krishnamoorthy and T. Matthew. Statistical Tolerance Regions. Wiley, 2009.

[12] T. Lin, C. Liao, and H. K. Iyer. Tolerance intervals for unbalanced one-way random eﬀects models with covariates and heterogeneous variances. Journal of Agricultural, Biological, and Environmental Statistics,, 13(2), 2008. [13] R. W. Mee and D.B Owen. Improved factors for one-sided tolerance limits for balanced one-way ANOVA random models. Journal of the American Statistics Association, 76(384), 1983.

[14] E. Parzen. On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33(3), 1962. [15] T. Rebafka, S. Cl´emen¸con,and M. Feinberg. Bootstrap-based tolerance intervals for applications to method validation. Chemometric and Intelligent Laboratory Systems, 89, 2007.

[16] S.J. Sheather and M. C. Jones. A reliable data-based bandwidth selection method for kernel density estimation. Journal of Royal Statistical Society, 53(3), 1991.

91 [17] M Vangel. New methods for one-sided tolerance limits for a one-way balanced random-eﬀects anova model. Technometrics, 34(2), 1992. [18] M.P. Wand and M.C. Jones. Kernel Smoothing. London: Chapman & Hall, 1995.