Robust – MAD Method

In the course of repeated chemical analysis which is similar to normal (roughly symmetrical and unimodal) , we often encounter a few apparent which of course can be statistically identified and deleted. The statistical methods for outliers are Dixon’s, Grubb’s and many other test statistics.

The use of robust statistics however enables us to circumvent this sometimes contentious issue of the deletion and provides perhaps the best method of identifying them. In fact, robust methods reduce the influence of outlying results and heavy tails in distributions, thus providing statistics that describe the distribution of the central or “good” part of the collected.

Let’s see how the simple MAD method which calculates robust and standard without requiring any decisions about rejecting outliers works.

MAD stands for “ of Absolute Deviations”, and is a more robust than sample and . Recall that the median is the middle value in an ordered sequence of data. Thus, the median is unaffected by extreme values in a set of data.

Suppose we have replicated results as follows:

145 157 183 151 143 147 153 163 130 148

Putting these figures in ascending order, we have:

130 143 145 147 148 151 153 157 163 183

Therefore the median of these results is the mean of 148 and 151 , namely

∧ 149.5. We call this median µ = 149 5. as the robust estimate of the mean which is unaffected by making the extreme values even more extremes.

Now let’s subtract the median of the data from each individual result and ignore the sign of the deviation, giving the absolute deviations:

19.5 6.5 4.5 2.5 1.5 1.5 3.5 7.5 13.5 33.5

If we sort these absolute deviations into ascending order, we have:

1

1.5 1.5 2.5 3.5 4.5 6.5 7.5 13.5 19.5 33.5

Hence, the median of these results (the median of absolute deviation MAD ) is the mean of 4.5 and 6.5 , namely 5.5 . Again, we see that this median is not affected by the magnitude of the extreme differences.

In general, we can express MAD as below:

MAD = median i ( | Xi – median j (Xj) | ) [1]

In order to use MAD as the robust estimator of standard deviation, we take:

∧ σ = K.MAD [2] where K is a constant factor, depending on the types of . Use the K factor of 1.4826 which is derived from the properties of the . Why?

Since MAD is the middle value of a sequence of results, the probability of

∧ finding a value X on either side of mean µ is 50%. We may express this as below: ∧ P(| X − µ |≤ MAD ) = 5.0 By dividing the above equation by standard deviation σ, we have:

 ∧   X − µ MAD  P ∧ ≤ ∧  = 5.0    σ σ 

∧ X − µ Since Z = ∧ , we now have: σ

 MAD  P Z ≤  = 5.0  ∧   σ 

2

If ɸ is the cumulative normal distribution function, we must have that:

 MAD   MAD  φ  −φ−  = 5.0  ∧   ∧   σ   σ 

 MAD   MAD  φ−  = 1−φ  but  ∧   ∧   σ   σ 

 MAD   MAD   MAD   MAD  φ  −φ−  = φ  −1+ φ  = 5.0 therefore:  ∧   ∧   ∧   ∧   σ   σ   σ   σ 

 MAD  2φ  = 1+ 5.0 = 5.1 and hence,  ∧   σ 

   MAD  φ = 75.0 or,  ∧  [3]  σ 

MAD −1 or, ∧ = φ 75.0( ) = .0 6745 σ

Note: In Excel, the cumulative distribution of ɸ-1(0.75) = 0.6745 is given by function “=NORM.INV(0.75,0,1)”.

Therefore, Equation [2] gives the constant K = {1/ ɸ-1(0.75)} = 1.4826, the reciprocal of ɸ-1(0.75).

Returning to the example, the robust estimate of the standard deviation, is ∧ hence σ = 5.5 × .1 4826 = 2.8 (to 2 significant figures). The robust estimates

∧ ∧ are thus µ = 149 ;5. σ = 2.8

In conclusion, the MAD method is quick and simple and has a negligible deleterious effect on the statistics if the dataset does include outliers.

3