Estimation of Moments and Quantiles Using Censored Data

WATER RESOURCES RESEARCH PAGE, 4 . , VOLNO S, 1005-1012.32 , APRIL 1996 Estimatio momentf no quantiled an s s using censored data Charles N. Kroll and Jery R. Stedinger School of Civil and Environmental Engineering, Cornell University, Ithaca, New York Abstract. Censored data setoftee sar n encountere waten di r quality investigationd an s streamflow analyses. A Monte Carlo analysis examined the performance of three technique r estimatinfo s momente gth quantiled san distributioa f so n using censored data sets. These techniques includ lognormaea l maximum likelihood estimator (MLE) loga , - probability plot regression log-partiaestimatorw ne a d lan , probability-weighted moment estimator. Data sets were generated fro mnumbea f distributiono r s commonly useo dt describe water quality and water quantity variables. A "robust" fill-in method, which circumvents transformation bias in the real space moments, was implemented with all three estimation technique obtaio t s completna e sampl computatior efo sample th f no e mea standard nan d deviation. Regardles underlyine th f so g distributionE ML e th , generally performe r bettewels o d a s a l r tha othee nth r estimators, thoug momene hth t and quantile estimators using all three techniques had comparable log-space root mean square errors (rmse) for censoring at or below the 20th percentile for samples sizes of n = 10, the 40th percentile for n = 25, and the 60th percentile for n = 50. Compariso log-space th f no e rms real-spacd ean e rmse indicated tha log-spaca t e rmse was a better overall metric of estimator precision. Introduction showed that more sophisticated statistical techniques performed better than these simple "replacement" methodsn I . Whe datna t containse a s some observations withi- re na particular, the log-probability plot regression method provided stricted rang valuef eo t otherwissbu t measuredeno calles i t i , d the best estimators of the mean and standard deviation, while a censored dat t [Cohen,se a 1991]. Censored data sete ar s lognormae th l maximum likelihood method provide bese dth t commonly fielde founth f waten o sdi r quality, where labora- estimators of the median and interquartile range. Estimation tory measurement f contaminanso t concentration e oftear s n of quantiles other than the median was not considered by reported as "less than the detection limit." Censored data sets Gillio Helseld man . Helsel Cohnd an [1988] extended Gilliom alse ar o foun waten di r quantity analyses when river discharges and HelseFs wor datko t a sets with several censoring thresholds. less than a measurement threshold level are reported as zero. This study extends the work of Gilliom and Helsel to the In some regions, historical river discharge records report over estimation of several quantiles and considers new estimators. halannuae fth l minimum flow zers sa o [Hammett, 1984]. These e log-probabilitTh y plot regression method e (LPPRth d an ) discharges may have been zero, or they may have been between lognormal maximum likelihood method (MLE) are evaluated zero and the measurement threshold and thus reported as along with a new method based on partial probability-weighted zero efficientlo t concerf O .w ho s ni y estimate moments, quan- moments (PPWM). As with the MLE and LPPR estimators, tiles othed an , r descriptive statistic underlyine th f so g contin- our PPWM estimator assumes that the data are described by a uous distribution using such censored data sets. situatioThe n wher dateall a below fixea d valu censoreeare d lognormal distribution. It employs with the logarithms of the is referre typs a eI censoringo dt . With typ eI censoring e th , flow xlata the censored-sample probability-weighted moment number of values censored is a random variable. With type II (PWM) estimators derived by Wang [1990] to obtain the pa- censoring, a fixed number of data points are always censored rameter lognormaa f o s l distribution. Wang employe- es s dhi ancensorine dth g threshol randoa s di m variable [David, 1981]. timator rean si lgeneralize a spac t fi o et d extreme value (GEV) Censored water qualit wated yan r quantity data should resem- distribution performance .Th probabilitf eo y weighted moment ble type I censoring because the censoring threshold is fixed by estimators with complete samples has been examined for a measuremene th t technolog physicae th d yan l setting. numbe f distributionso r mann i d y an ,estimatorcases M PW , s numbeA f studiero s have f simplsuggesteo e us e "ree dth - e higheoth f r moment d quantilean s a distributio f o s n have placement" techniques for estimating the mean and standard performed favorably with product-momen maximud an t m like- deviatio typf no ecensoreI d data sets [Cohen Ryan,d an 1989; lihood estimators [Landwehr et aL, 1979; Hashing et aL, 1985; Newman et al, 1990]. These techniques replace all the cen- Hashing Wallis,d an 1987]estimatorM PW . lineae sar r combi- sored observations with some value betwee- de e nth zerd oan nation observatione th f so thud lese an ssar s sensitive th o et tection limit. Gilliom Helseld an [1986 Helseld Gilliomd ]an an largest observation sampla n si e than product-moment estima- [1986] examine performance dth varieta f eo techniquef yo o st tors that squar cubd observationse ean e th merie .Th probf o t - estimate the mean, standard deviation, median interquard an , - ability-weighted moment estimators with censored samples has tile range using type I censored water quality data. They analyzede b ye o t t . This study focuse estimation so meane th f no , standar- dde Copyright Americae 199th y 6b n Geophysical Union. viation, and interquartile range of a distribution, as well as Paper number 95WR03294. quantiles with nonexceedance probabilities of 10% and 90%. A 0043-1397/96/95WR-03294$05.00 "robust" fill-in metho implementes di d with each estimation 1005 1006 KROLL AND STEDINGER: ESTIMATION OF MOMENTS AND QUANTILES techniqu obtaio et ncompleta e sampl computatior efo e th f no date th a r abovFo threshole eth - logarithe or dth e th f mo sample mea varianced nan . Gilliom Helseld an [1986] used this dered values, Y/5 are regressed against the corresponding "nor- "robust" fill-in method only with a log-probability plot regres- mal scores" corresponding to the model sion estimator thin I . s study this metho alss di o used wite hth lognormal maximum likelihoo d partiaan d l probability- I = (LY £/ = c (3) weighted moment estimators differeno Tw . t metric usee ar s d where < £ e invers1th (p s i ) e cumulative normal distribution to compare estimators. Data are generated from distributions i function e resultinevaluateth e d a ar d (LY an g t pYan a di9 commonly observed in the water quality and water quantity estimators of the mean and standard deviation of the log- fields, including three distribution t consideresno Gillioy db m transformed data obtained using ordinary least squares regres- and Helsel; their extreme case for the gamma distribution sion. These LPPR estimator similae sar thoso t r e derivey db (coefficient of variation = 2.0) was omitted. Gupta [1952] and have been implemented in a number of studies of estimation with censored data sets [Gilliom and Estimation Techniques Helsel, 1986; Helsel and Gilliom, 1986; Helsel and Cohn, 1988; Helsel, 1990]. All three estimation techniques make the assumption that underlyine th g distributio date th lognormal s ai f no . Helseld an Partial Probability Weighted Moments Hirsch [1992, p. 360] observe that the lognormal distribution flexibla s ha e shape thed an ,y provid reasonablea e description For a variable Y, probability-weighted moments are defined as of many positive random variables with positively skewed dis- p, = E{Y[F(Y)]r} (4) tributions lognormae Th . l distributio bees nha n a show e b o nt good descriptor of low river flows [Vogel and Kroll, 1989] and wher ecumulative F(Y)th s i e distribution function (CDFr )fo water quality data [Gilliom Helsel,and 1986]. continuoua r YFo . s random variable, PWMwrittee b n sca n Lognormal Maximum Likelihood Estimator |3r= Y(F)FdF (5) Conside orderen ra d censored data set Xl < X2 ''' ^ Xc < Xc +l '— ^ Xn, where the first c observations are censored and reported only as below some fixed measurement threshold. wher evaluateY e= f F inverse o F(Y) Y(F)d th F s an i de CD Let Yt = In (Xt) and let T be the log of the measurement e probabilitath t a censore yr F.Fo d sample, Wang [1990] threshold. Assuming that X is lognormally distributed and in- define PPWda Ms a dependent likelihooe th , d functio date th s ai r nfo Y(F)FrdF (6) T - L - /Y c\(n - c}\ T'n CTy 0-y l=C + l where PT — F(T), the probability of censoring, and T is the are where <£ and <f> the distribution and density function of a censoring threshold. standard normal variate e log-transe meath th , f s o i nIJL - Y Assumin lognormalle date ar g th a X y distributed= Y d an , forme de standar data th d os yi an , d deviatio e logth -f o n log (X), s normallthei e nY th f o y g distributedlo e th s i T . transformed data. By taking the logarithm of (1) and setting censoring threshold normae th r l Fo .distributio inverse nth e the partial derivatives with respect to JLLY and oy to zero, one CDF for a random variable Y is can solve for the maximum likelihood estimators (MLE) (LY and 6y [Cohen, 1991].

Estimation of Moments and Quantiles Using Censored Data

The Statistical Analysis of Distributions, Percentile Rank Classes and Top-Cited

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

A Simple Censored Median Regression Estimator

A Logistic Regression Equation for Estimating the Probability of a Stream in Vermont Having Intermittent Flow

HST 190: Introduction to Biostatistics

Integration of Grid Maps in Merged Environments

Linear Regression with Type I Interval- and Left-Censored Response Data

How to Use MGMA Compensation Data: an MGMA Research & Analysis Report | JUNE 2016

Statistical Methods for Censored and Missing Data in Survival and Longitudinal Analysis

Normal Probability Distribution

2018 Statistics Final Exam Academic Success Programme Columbia University Instructor: Mark England

Common Misunderstandings of Survival Time Analysis