Quantitative Exposure Data: Interpretation, Decision Making, and Statistical Tools Purpose of Exposure Assessment

• To decide two things: – Is the SEG’s exposure profile (exposure and its variability) adequately characterized? – It the exposure profile acceptable?

• A baseline exposure assessment (or comprehensive exposure assessment) requires characterization of the SEG’s exposure profile. – An exposure profile is a summary “picture” of the exposure experienced by an SEG.

• A compliance-based program will focus efforts on exposures near OELs. Exposure Acceptability Judgments

• A variety of tools and factors are related to the judgment of exposure acceptability. – process experience – material characteristics – toxicity knowledge – work force characteristics – frequency of task – frequency of peak excursions – monitoring results – statistical tools – confidence in exposure limit – modeling techniques – biological monitoring – availability and adequacy of engineering controls Statistical Considerations

• Statistical tools are powerful only if their theoretical bases and limitations are understood by the person using them.

• Statistical issues must be considered early in the assessment process. They should be included in the development of the exposure assessment strategy and when determining a monitoring strategy. – Difficulties • random sampling • sufficient data

• In spite of their limitations, statistical tools are useful because they help form a picture of the exposure profile. If their limitations are understood, they will greatly enhance knowledge of the exposure profile. Sample Size Estimation

Approximate Sample Size Requirements to be 95% Confident that the True Mean Exposure Is Less Than the Long-term Occupational Exposure Limit (Power = 90%) Sample Size (n) Ratio: Low Moderate High ture variability variability variability mean/OEL (GSD=1.5) GSD = 2.0 (GSD = 2.5) GSD = 3.0 (GSD = 3.5) 0.75 25 82 164 266 384 0.50 7 21 41 67 96 0.25 3 10 19 30 43 0.10 2 6 13 21 30 Exposure Distribution and Parametric or Nonparametric Statistical Tools

• A population distribution is a description of the relative frequencies of the elements of that population.

• Parametric statistics – The most powerful statistical tools require knowledge or assumptions about the population’s distribution.

• Nonparametric statistics – When the underlying distribution of exposure is not known, nonparametric statistics should be used. – These statistical tools tend to focus on robust measures such as the distribution median or other percentile because they are less sensitive to outliers and spurious data. – low statistical power and more measurements needed Common Distribution in Industrial Hygiene

• The random sampling and analytical errors associated with an air monitoring result is usually presumed to be normally distributed.

• The random fluctuations in exposure from shift to shift or within shifts tend to be lognormally distributed.

• Exposure fluctuations account for the vast majority of an exposure profile’s variability (usually more than 85%).

• If we have resources to commit to exposure monitoring, usually the most efficient approach would call for putting resources into more measurements rather than into more precise sampling methods.

Distribution Verification

• A logprobability plot is the simplest and most straightforward way to check data for lognormality.

• The Shapiro and Wilk Test (W-test) is the most rigorous test for lognormality.

• If the data form a straight line on the lognormality plot, it signifies that the data follow a lognormal distribution. Then, the line can be used to estimate the distribution’s geometric mean and geometric standard deviation. .995 .99Logprobability Plot .98 .95 .9

.8 .7 .6 .5 .4 .3 .2

Cumulative Probability Cumulative .1 .05 .02 .01 .005

50100 5001000 Concentration (mg/m3) Point plotting method = km Making a Probability Plot

• Procedures – Rank order the data, lowest to highest. – Rank each value from 1 (lowest) to n (highest). – Calculate the plotting position for each value. • Plotting position = rank/(n+1) – Plot the concentrations against the plotting positions. – Examine a best line through the plotted data. – Determine whether the data provide a reasonable fit for the straight line. – Estimate the distribution GM, GSD and percentiles of interest from the best-fit line.

W-test for distribution Goodness-of-fit

• W-test is one of the most powerful tests for determining goodness- of-fit for normal or lognormal data when n is fairly small (n 50).

• The W-test is performed as follows: – Order the data, smallest to largest. – Calculate k: k =n/2 if n is even; k = (n-1)/2 if n is odd. – Calculate the W statistic:

2  k 

ai (xni1  xi )  i1  W  2 S (n 1) – The data are form a normal (or lognormal if applied to the logtransformed data) population if W is greater than a certain value.

Sampling Randomly from Stationary Populations (1)

• Random sampling – Each element in the population must have equal likelihood of being observed. – Practical considerations of travel constraints, weather, process operation parameters, budgetary limits, and the need to characterize multiple exposure profiles make statistically randomized sampling extremely difficult in the real world. – To avoid known bias: • If possible, avoid clustering your monitoring into consecutive periods. • To monitoring different seasons to avoid biases introduced by factors that change with weather conditions. • To understand process cycles and avoid biases they might introduce. • To include both typical and unusual events.

Sampling Randomly from Stationary Populations (2)

• Autocorrelation – Autocorrelation occurs when the contaminant concentration in one time period is related to the concentration in a previous period.

– Clustering all samples in one period when autocorrelation occurs will result in an underestimate of variability in the exposure profile and an imprecise estimate of the mean exposure.

– Autocorrelation can also result in underestimating or overestimating the true degree of exposure depending on whether a high or low concentration cycle happened to haven been grabbed.

Sampling Randomly from Stationary Populations (3)

• Stationary population – Definition of Stationary • A random process is said to be stationary if its distribution is independent of the time of observation.

– Stationary population • An underlying population that does not change during the exposure assessment period. That is, the mean and variance of this population are stable over time.

– If the population changes significantly over the random sampling period, only calculations of sample descriptive statistics and decision making on the basis of professional judgment are recommended.

– One simple procedure that can help subjectively check for population stability is to plot the monitoring data chronologically by time of monitoring. If any trends in the data are apparent, that is a sign the underlying process is not stationary. Similar Exposure Interval

• A similar exposure interval is defined as a period in which the distribution of exposures for a SEG would be expected to be stationary.

• The measurements needed to characterize the exposure profile would be taken randomly within a similar exposure interval.

Relationship of Averaging Times

• It is inappropriate to average short-term data with full-shift data. Short-term data tends to be distributed differently than full-shift data.

• Mixing of data from different averaging times makes estimates of variance inaccurate and precludes use of most common statistical tools.

• Techniques are being developed to predict long-term exposure profiles based on a time-weighted combination of exposure profiles for the several short-term tasks. These techniques hold great promise for providing more detailed characterizations of exposures and for optimizing sampling efficiency using stratified random sampling of critical tasks.

Nondetectable Data

• Monitoring result below the analytical limit of detection should not be discarded.

• Several techniques are available for including below detection limit data in statistical analysis.

• A factor of 0.7 times the detection limit may be most appropriate for data with relatively low variability (GSD < 3).

• A factor of 0.5 times the detection limit may be best when the variability is high (GSD > 3). If more than 50% of data are below the detection limit then special techniques may be required.

• Probability plot is another way to include data below the detection limit in the statistical analysis. These plots allow extrapolation of the data above the detection limit to account for the data below the detection limit for determination of a reasonable estimate of the average and variability. Statistical Techniques

• There is no ideal statistical technique for evaluating industrial hygiene monitoring data. • All measurements to be analyzed statistically should be valid in that: – They were collected and analyzed using a reasonably accurate and reasonably unbiased sampling and analytical method. – They adequately represent personal exposure. • Descriptive statistics – arithmetic man, standard deviation, median, range, maximum, minimum, and fraction of samples over the OEL. • Inferential statistics – quantitative estimate of exposure profile – arithmetic mean and upper tail – If a decision must be made with a few measurements (for example, 10), confidence is highest for the estimate of the mean , lower for the estimate of variance, and lowest for estimate of lower or upper percentiles. Focus on the Arithmetic Mean

• For chronic-acting substances, the long-term average exposure (exposure averaged over weeks or months) is a relevant index of dose and. Therefore, a useful parameter on which to focus for evaluating the health risk posed by such an exposure. • For agents causing Body damping of swings in exposure • Statistically defined OEL – definition • It is an acceptable exposure profile defined by the OEL’s sponsoring organization. • It should clearly stated whether: – The OEL is interpreted as a long-term average (i.e., arithmetic mean of the distribution of daily average exposures); – A permissible exceedance of day-to-day exposures (e.g., 5%); or – A never-to-be-exceeded maximum daily average (e.g., 100% of the daily average exposures are less than the OEL).

Arithmetic Mean of a Lognormal Distribution

• It is the arithmetic mean, not the geometric mean, of a lognormal exposure distribution is the best descriptior of average exposure.

• The difference between arithmetic mean and geometric mean of a lognormal distribution increases when variance in the distribution increases.

Estimating the Arithmetic Mean of a Lognormal Distribution

• The recommended method for all sample sizes and GSDs is the minimum variance unbiased estimate (MVUE). – Unbiased and minimum variance

• The maximum likelihood estimate (MLE) is easy to calculate and is less variable than the simple mean for large data sets (N > 50) and high GSDs.

Confidence Limits Around the Arithmetic Mean of a Lognormal Distribution

• Confidene limits allow one to gauge the uncertainty in the parameter estimate. The wider the confidence limits, the less certain the point estimate.

• Land‘s “exact” procedure is suggested for calculating confidence limits for arithmetic mean estimates.

Focus on the Upper Tail

• For agents causing acute effects, the average exposure is not as important as understand how high the exposure may get because those few high exposure might pose a more important risk to health than average exposure at lower levels.

• An examination of the exposure profile‘s upper tail will allow an estimate of the relative frequency with which OEL may be exceeded.

Estimating Upper Percentiles

• An examination of the exposure profile‘s upper tail will allow an estimate of the relative frequency with which OEL may be exceeded. Tolerance Limits

• To statistically demostrate that no more than a given percentage of exposures are greater than a standard with some confidence. – An industrial hygienist can have 95% confidence that no more than 5% of the exposures exceed the standard. – In effect, an upper one-sided 95% confidence limit on the estimate of the 95% percentile.

• Advantges: – Tolerance limits are helpful for defining upper end of an exposure profile. – Tolerance limits approach may be appropriate for compliance testing.

• Disadvantages: – It is very sensitive to sample sizes and the distribution‘s standard deviation. How to Choose --- The Mean or the Upper Trial

• In determining compliance with most regulatory and authoritative OELs that exist today, a focus on the upper tril would be most appropriate.

• In 1978, OSHA expressed in the preamble to its lead PEL: – OSHA recongizes that there will be day-to-day variability in airborne lead exposure experienced by a single employee. The permissible exposure limit is a maximum allowable value which is not to be exceeded: hence exposure must be controlled to an averahge value well below the permissible exposure limit in order to remian in compliance. Analysis of Variance to Refine Critical SEGs

• Analysis of variance (ANOVA) is a statistical technique that can be used to compare the variability of individual workers‘ exposures with the exposure variability of the overall SEG. – ANOVA ise used to examine the exposure variability for each monitored individual (within worker variability) and compare it with the worker-to-worker variability in the SEG (between-worker variability).

• This approach can be used to check the homogeneity of the critical SEGs for which risk of individual misclassification is most severe and to reassign individuals as necessary. Examining the Arithmetic Mean: Mean Estimates and Confidence Intervals Arithmetic Mean

• Understanding the mean of the exposure profile may be important when judging exposure – Several short term measurements are used to characterize a daily average. – Several day-long TWA measurements are being used to estimate the long-term average of a day-to-day exposure profile. Arithmetic Mean

• The best predictor of dose is the exposure distribution’s arithmetic mean, not the geometric mean. The general technique is to: 1. Estimate the exposure distribution’s arithmetic mean. 2. Characterize the uncertainty in the arithmetic mean’s point estimate by calculating confidence limits for the true mean. 3. Examine the arithmetic mean’s point estimate and true mean confidence limit(s) in light of an LTA-OEL or other information to make a judgment on the exposure profile. Confidence Intervals

• Upper confidence limit (UCL): – conservatively protective of worker health, UCL for the arithmetic mean estimate is emphasized

• UCL1,95% (arithmetic mean’s one-sided 95% UCL ) < LTA-OEL – the industrial hygienist would be at least 95 % sure that the exposure profile’s true mean was below the LTA- OEL • Place all of the statistical power into characterizing the single boundary most important to the judgment 95% Upper Confidence Interval for Arithmetic Mean

Arithmetic Mean Point Estimate

95% Upper Confidence Interval for the Arithmetic Mean 95% certain that the exposure profile true mean exposure is less than this value. Probability Plotting and Goodness-of-Fit

• Parametric methods: – rely on the assumption about the shape of the underlying population distribution • Most exposure distributions are right-skewed and can be reasonably approximated by the lognormal distribution – If the probability plotting and goodness-of-fit techniques verify a lognormal distribution, the tools for lognormal distributions should be used Probability Plotting and Goodness-of-Fit (Cont.) • If the data do not seem to fit a lognormal distribution, but they do seem to fit a normal distribution, the tools for normally distributed data should be used • If the data do not seem to fit either the normal or lognormal distribution – Whether the SEG has been properly defined – Whether there has been some systematic change to the underlying exposure distribution – Descriptive and nonparametric statistics Characterizing the Arithmetic Mean of a Lognormal Distribution

• Easy to calculate but less accurate – Sample Mean and t-Distribution Confidence Limits • more variable for large sample size – Maximum likelihood estimate and Confidence limits • underestimate variability, too narrow • Accurate but more difficult to calculate – Minimum variance unbiased estimate (MVUE) Point estimate only – Land’s “Exact” confidence limit  Confidence limits only Which To Use  Point Estimate of the True Mean of the Lognormal Distribution

• If a computer or programmable calculator is available, the MVUE should be used as the preferred point estimate of the true mean of the lognormal distribution If not • Sample mean: the GSD is small (<2) or there are few samples (<15-20) • MLE: the sample size is large (>15-20) Which To Use Confidence Limits for the True Mean of the Lognormal Distribution

• Land’s method: – exact confidence limits for the true mean – if a computer is available • MLE method: – if a computer is not available – underestimate the true upper confidence limit • Easy-to calculate sample mean and t- distribution confidence interval – Many monitoring results available (>30) Specific Techniques Sample Mean and t-Distribution Confidence Limit • Sample mean as a point estimate for the exposure distribution arithmetic mean – no computer or programmable calculator available – few samples (<15-20) and a small GSD (<2) • Simple t-distribution C.I. procedure – Developed for use with normal distributions – Also works well for many non-normal distributions (including the lognormal distribution) when sample sizes are large (n>30 , GSD<1.5) • Sample mean and t-distribution method: – exposure distribution is better characterized by a normal distribution than a lognormal distribution Calculation of the Sample Mean and Confidence Limit

• Step 1: Calculate the sample mean ( x ) and sample standard deviation (s) • Step 2: Calculate the confidence limits s CL  x  t ( ) n s UCL1,95%  x  t 0.95( ) n s LCL  x t 0.95( ) • Step 3: 1,95% n Computer the UCL to the LTA-OEL Maximum Likelihood Estimate and Confidence Limits for the Arithmetic Mean of a Lognormal Distribution

• MLE: better point estimate than the sample mean – More than 15-20 samples or a high GSD – Easy to calculate – Underestimate variability in many cases – the computed UCL should be interpreted cautiously because it will often be lower that the exact UCL Maximum Likelihood Estimate and Confidence Limits

• Step 1:

Calculate the mean ( y ) and standard deviation ( s y ) of the logtransformed data where y=ln(x) • Step 2: 1 n 1 Calculate the MLE: 2 MLE  exp[y  ( )sy ] • Step 3: 2 n Calculate the UCL and/or LCL for the MLE s  n 1 CL  exp[ln MLE  t  y ] n • Step 4: Compare the UCL to the LTA-OEL

Minimum Variance Unbiased Estimate of the Arithmetic Mean of a Lognormal Distribution

• MVUE: the preferred point estimate, used routinely unless no computer available • Calculated iteratively • Calculation using five terms will give results correct to three significant figures for sample sizes from 5 to 500 and GSDs from 2 to 5 Minimum Variance Unbiased Estimate Procedures

• Step 1:

Calculate the mean ( y ) and standard deviation ( s y ) of the logtransformed data where y=ln(x) • Step 2: Calculate the MVUE (n 1) (n 1)3 l 2 MVUE  exp( y)[1 l    n n2 (n 1) 2! (n 1)5 l 3  ...] n3 (n 1)(n  3) 3! s2 where: l  y 2 Land’s “Exact” Estimate of the Arithmetic Mean Confidence Limits for a Lognormal Distribution

• Land’s exact method: the most accurate and least-biased estimate, should be used whenever possible • Hewett & Ganser Graphic technique: – Used for interpolating one of the parameters needed for the calculation – Equations to approximate the curves in the graphs Land’s “Exact” Estimate Procedure • Step 1: Calculate the mean ( y ) and standard deviation ( s ) of the logtransformed data where y=ln(x) y • Step 2: Obtain the C-factor for Land’s formula (C(Sy, n, 0.05) for 95% LCL and C(Sy, n, 0.95) for 95% UCL) • Step 3: Calculate the 95% UCL (or 95% LCL) s CL  exp[ln(uˆ)  C  y ] n 1

1 2 where: uˆ  exp(y  sy ) • Step 4: 2 Compare the 95% UCL to the LTA-OEL