Tolerance Intervals in a Heteroscedastic Linear Regression Context with Applications to Aerospace Equipment Surveillance

Hindawi Publishing Corporation International Journal of Quality, Statistics, and Reliability Volume 2009, Article ID 126283, 8 pages doi:10.1155/2009/126283 Research Article Tolerance Intervals in a Heteroscedastic Linear Regression Context with Applications to Aerospace Equipment Surveillance Janet Myhre,1 Daniel R. Jeske,2 Michael Rennie,3 and Yingtao Bi2 1 Reed Institute for Applied Statistics, Claremont McKenna College, Claremont, CA 91711, USA 2 Department of Statistics, University of California, Riverside, CA 92521, USA 3 Mathematical Research and Analysis Corporation, Claremont, CA 91711, USA Correspondence should be addressed to Daniel R. Jeske, [email protected] Received 1 June 2009; Revised 12 October 2009; Accepted 18 December 2009 Recommended by Satish Bukkapatnam A heteroscedastic linear regression model is developed from plausible assumptions that describe the time evolution of performance metrics for equipment. The inherited motivation for the related weighted least squares analysis of the model is an essential and attractive selling point to engineers with interest in equipment surveillance methodologies. A simple test for the significance of the heteroscedasticity suggested by a data set is derived and a simulation study is used to evaluate the power of the test and compare it with several other applicable tests that were designed under different contexts. Tolerance intervals within the context of the model are derived, thus generalizing well-known tolerance intervals for ordinary least squares regression. Use of the model and its associated analyses is illustrated with an aerospace application where hundreds of electronic components are continuously monitored by an automated system that flags components that are suspected of unusual degradation patterns. Copyright © 2009 Janet Myhre et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. Introduction similarly used for monitoring environmental applications [1]. 1.1. Background. The model and analyses developed in this Compared to having engineers individually examine the paper address a problem encountered when analyzing data data from all the combinations of metrics and part types, from service life tests of aerospace hardware packages. Data the automated monitoring and flagging process is quite cost- forasmanyas700performancemetricsperparttype effective. However, when the variance of the metric increases are automatically stored during surveillance testing and with time, the tolerance intervals that result from the ordi- subsequently input into a software program where, up to nary least squares analyses fail in the sense that the intervals this point, an ordinary least squares line based on normal- become too narrow as time increases. Figure 1 illustrates theory has been routinely fit to the data using time-in- this point, where the 111 observations are the performance service as the explanatory variable. The software program metric for one type of part. It is evident from the figure that also outputs tolerance intervals based on the ordinary least the random errors associated with the regression model are squares analysis. Engineers monitoring this process are heteroscedastic. Heteroscedasticity is, of course, not unique alerted only to those cases where observations in the scatter to our application. It arises routinely in other applications plot fall outside the tolerance intervals or the tolerance such as economics, behavioral sciences, social sciences, interval crosses a given limit within some specified future environmental science, and computer vision. The works in time interval (e.g., 60 months). In cases where the alert [2–6] provide examples within these disciplines, respectively. suggests an increasing accelerated degradation, a proactive Figure 1 also shows two sets of pointwise 95%-content corrective action (e.g., part replacement) may be initiated. In tolerance intervals constructed at the 90% confidence level. cases where the alert suggests less than expected degradation, The dashed lines represent the tolerance intervals derived a cursory investigation to determine if the part is being from an ordinary least squares analysis, while the solid utilized properly is initiated. Tolerance intervals are often lines represent the tolerance intervals derived from an 2 International Journal of Quality, Statistics, and Reliability 175 2 σs , is responsible for the heteroscedasticity is somewhat 150 unique to our model since the large literature pertaining 125 to heteroscedasticity models usually assumes variances of 2 · the form h(σe + λti), where h( ) is an arbitrary function 100 (e.g., a polynomial or the exponential function); see [2, 7] 75 and references therein. While these types of models have Metric 50 seen many successful applications, the lucid interpretation 2 2 25 of the heteroscedastic variance function σe + σs t was crucial when eliciting buy-in from our aerospace engineering project 0 stakeholders. −25 0 3 6 9 12 15 18 Time in service (years) 1.3. Tolerance Interval for Heteroscedastic Regression. Defin- = 2 2 2 ing ρ σs /σe , the log-likelihood function for (β0, β1, σe , ρ) Figure 1: (Data Set 1) Pointwise 95%-content tolerance intervals based on the observations is with 90% confidence. Dashed lines correspond to an ordinary n least squares analysis and solid lines correspond to the estimated 2 =−n 2 − 1 weighted least squares analysis. l β0, β1, σe , ρ log σe log 1+ρti 2 2 i=1 (1) n − − 2 − 1 yi β0 β1ti 2 . estimated weighted least squares analysis that is described 2σe i=1 1+ρti in Section 1.2. The bold line in the figure is the estimated weighted least squares regression line. It is evident in For a fixed ρ the conditional MLEs, say β0(ρ)andβ1(ρ), of β0 Figure 1 that the ordinary least squares tolerance intervals are and β1 satisfy the linear system of equations: inadequate in the sense that they are too wide for small time values and too narrow for large time values. On the other n n n 1 ti = yi hand, the more pronounced curvature associated with the β0 ρ + β1 ρ , 1+ρti 1+ρti 1+ρti tolerance intervals derived from the estimated weighted least i=1 i=1 i=1 (2) squares analysis more adequately captures the range of the n n 2 n ti ti yiti observations at both ends of the time spectrum. β0 ρ + β1 ρ = , i=1 1+ρti i=1 1+ρti i=1 1+ρti 1.2. Model for Heteroscedasticity. Examination of aerospace 2 2 = n − − data for many part types and many performance metrics led and the conditional MLE of σe is σe (ρ) [ i=1(yi β0(ρ) 2 us to propose the following model for heteroscedasticity. Let β1(ρ)ti) /(1+ρti)]/n. The conditional MLEs, β0(ρ)andβ1(ρ), X(t) denote the metric level at time t for a particular part, are the weighted least squares estimator of the slope and and for motivational purposes, assume that the underlying intercept and have an estimated variance-covariance matrix = = 2 −1 −1 × process is a discrete-time valued process (t 0, 1, ...). equal to Σ(ρ) σe (ρ)[X Ω (ρ)X] ,whereX is the n 2 Assume that the initial value of the process is X(0) = β0 + matrix whose first column is all ones and whose second { }n = e,whereβ0 is an unknown constant and e is a normally column is the set of values ti i=1 and Ω(ρ) Diag(1 + distributed random variable with zero mean and unknown ρt1, ...,1+ρtn). 2 variance σe > 0. If the degradation process has both a The profile log-likelihood function (e.g., see reference =− 2 − n − constant deterministic trend, say β1, and also a random [8]) of ρ is lp(ρ) n log σe (ρ)/2 i=1 log(1+ρti)/2 n/2, stochastic perturbation, it would follow that X(t) = X(t − which can be maximized to find the MLE of ρ,sayρ.The 1)+β1 +st,wherest is a normally distributed random variable MLEs of , and 2 are then are = (), = (), 2 ≥ β0 β1 σe β0 β0 ρ β1 β1 ρ with zero mean and unknown variance σs 0. It follows 2 2 t and σ = σ (ρ). Unbiasedness of β0 and β1 follows from from successive substitutions that X(t) = β +β t+e+ = s , e e 0 1 i 1 i general results in [9]. or equivalently, X(t) = β0 +β1t+e+δt,whereδt is a normally distributed random variable with zero mean and unknown A pointwise 100γ%-content tolerance interval with con- 2 fidence level 100(1 − α)% for a regression line in the context variance σs t.ThemodelforX(t) is (drifting) Brownian motion, except for the fact that the variance function is of homoscedastic normal errors was derived in [10]. The 2 following theorem, which is proved in the Appendix A, displaced from the origin by σe . In our application, the process X(t) is measured one time on each unit; so the data extends that result to our context. are a collection of independent observations Y = X (t ), i i i Theorem 1. If ρ is known, a pointwise 100γ%-content toler- where X (t) is the process associated with the ith unit. Thus, i ance interval with confidence level 100(1 − α)% for a normal we have the model Y = β + β t + e + δ (i = 1, ..., n)for i 0 1 i i i distribution that has mean β + β t and variance σ2(1 + ρt) is the observations. 0 1 e Equivalently, the model implies that observations are 2 σ ρ 1+ρt n independent and normally distributed with means β0 + β1ti + ± e ,(3) 2 2 β0 ρ β1 ρ t r ρ 2 and variances σe + σs ti. The fact that a variance component, χn−2, 1−α International Journal of Quality, Statistics, and Reliability 3 2 − 2 where χn−2, 1−α is the upper 1 α percentile of a chi-square of s, and having a MVNn(0, σe In) distribution.

Tolerance Intervals in a Heteroscedastic Linear Regression Context with Applications to Aerospace Equipment Surveillance

An Overview of Environmental Statistics

Data Analysis Considerations in Producing 'Comparable' Information

The Complete List of Courses, Including Course Descriptions, Prerequisites

Advanced Statistics for Environmental Professionals

Environmental Statistics

Douglas William Nychka

Statistical Concepts in Environmental Science

Environmental Data and Statistics - S

Leveraging Probability Concepts for Genotype by Environment Recommendation

MATH2740: Environmental Statistics

Addressing Challenges of Hierarchical Structural Equation Modeling in Animal Agriculture

Statistics.Pdf