Asymptotic Values and Expansions for the Correlation Between Different

Asymptotic Values and Expansions for the Correlation Between Diﬀerent Measures of Spread

Anirban DasGupta

Purdue University, West Lafayette, IN

L.R. Haﬀ

UCSD, La Jolla, CA May 31, 2003

ABSTRACT

For iid samples from a normal, uniform, or exponential distribution, we give exact formulas for the correlation between the sample variance and the sample range for all ﬁxed n. These exact formulas are then used to obtain asymptotic expansions for the correlations. It is seen that the correlation log√ n √1 converges to zero at the rate n in the normal case, the n rate in the 2 (log√n) uniform case, and the n rate in the exponential case. In two of the three cases, we obtain higher order expansions for the correlation. We then obtain the joint asymptotic distribution of the interquartile range and the standard deviation for any distribution with a ﬁnite fourth moment. This is used to obtain the nonzero limits of the correlation between them for some important distributions as well as some potentially useful practical diagnostics based on the interquartile range and the standard deviation. It is seen that the correlation is higher for thin tailed and smaller for thick tailed distributions. We also show graphics for the Cauchy distribution. The graphics exhibit interesting phenomena. Other numerics illustrate the theoretical results.

1 1 Introduction

The sample standard deviation, range, and the interquartile range are variously used as measures of spread in the distribution of a ran- dom variable. Range based measures are still common in process control studies, while measures based on the interquartile range are sometimes used as robust measures of spread.

It would seem natural to expect that the three measures of spread share some reasonable positive dependence property for most types of populations, and perhaps for all sample sizes n. The purpose of the present article is to investigate the interrelations between them in greater depth than has been done before, mathematically as well as numerically. For example, we investigate the correlation between the sample range (W ) or the interquartile range and the sample standard deviation s, both for ﬁxed samples and asymptotically. We also investigate the joint asymptotic behavior of the standard deviation and the interquartile range in the sense of their joint distribution, and use these asymptotics to evaluate the exact limiting values of their correlation for a number of important distributions.

As a matter of mathematical tractability, it is much easier to analyze the correlation between s2 and W , both for fixed samples and asymptotically. In the next three sections , we present exact formulae for the correlation between s2 and W , for every fixed sample size n, when the underlying population is a normal, or an Exponential, or a uniform. They are common distributions, and manifest a symmetric population with no tail (uniform), a symmetric population with thin tail (normal), and a skewed population widely used in practice. Another reason for specifically working with these three cases

2 is that they seem to be the only three standard distributions for which an exact formula for the correlation can be given for fixed sample sizes. Using the fixed sample formulas, we then derive asymptotic expansions for the correlation. The first term in the expansion gives us the rate of convergence of the correlation to zero in each case. For instance, we prove that the √1 correlation converges to zero at the rate n in the uniform case, at the rate 2 log√ n (log√n) n in the normal case, and at the rate n in the exponential case. These derivations involve a great deal of calculations, much of which, however, has been condensed for the sake of brevity.

Next, by use of the Bahadur representation of sample quantiles, we work out the asymptotic bivariate normal distribution for interquartile range and standard deviation for any distribution with four moments. We apply it to obtain the limits of the correlation between them for ﬁve important distributions. The general result can be used to obtain the limiting correlation for any distribution with four moments. We also use this general result to form some quick diagnostics based on the ratio of the interquartile range and standard deviation.These may be of some practical use.

The article ends with graphics based on simulations for the scatterplots of s against W and IQR for the Cauchy distribution. The graphics show some interesting outlier and clustering phenomena.

We hope that the asymptotic calculations and the graphics presented here give some insight to a practicing statistician as well as to an applied probabilist. We also hope the asymptotic expansions are of some independent technical interest. There is considerable literature on a related classical W problem, namely the distribution of s ; see, for example, Plackett(1947), David, Hartley and Pearson(1954), and Thomson(1955). We have not ad-

3 dressed that problem here.

2 Uniform Distribution

Using a well known property of the order statistics of an iid uniform sample, 2 we derive below an explicit formula for ρn = Corr(s ,W) for every ﬁxed sample size n. For brevity of presentation, certain intense algebraic details have been omitted.

Theorem 1.LetX1,X2,...,Xn be iid U[a, b]. Then for each n ≥ 2, √ 2 5n(n+2) ρn √ = (n+3) 2n+3 (1)

Proof Without loss of generality, we may assume that a =0andb =1. We derive expressions for Cov(s2,W),Var(s2), and Var(W ) in the following P n X −X 2 W steps. We will evaluate the correlation between i=1( i ) and ,which is the same as ρn.

Cov(s2,W):

Step 1. It is well known that for U[0, 1] data, W has the density n(n − w(n−2) − w ,

Step 2. Towards this end, we use the fact that conditional on X(n),X(1), the remaining order statistics X(2),...,X(n−1) are distributed as the order statistics of a sample of size n-2 from U[X(1),X(n)].

4 Thus, denoting the order statistics of an iid sample of size n-2 from U[0, 1] as U(1),...,U(n−2),wehave:

P E W n X − X 2 ( i=1( i ) ) P EE W n X − X 2|X ,X = ( i=1( i ) (1) (n)) P EE W X − X 2 X − X 2 n−1 X − X 2 |X ,X = ( [( (1) ) +( (n) ) + i=2 ( i ) ] (1) (n)) P E W 3 E 1 n−2 U 2 E W 2 E n−1 − n−2 U 2 E W 2 E n−2 1 − = ( ) ( n + n ) + ( ) ( n n ) + ( ) ( i=1 ( n U − n−2 U 2 [ (i) n ]) ).

3 P E(W ) E n − U 2 n − − n − U 2 n − n2 n−2 U − = n2 [(1 + ( 2) ) +( 1 ( 2) ) + 2+ i=1 ( (i) P U 2 U 2 − n n−2 U − U 2 U + n ) 2 i=1 ( (i) + n )] (2)

E W 3 n(n−1) ,E U 1 Step 3. From (2), using the facts that ( )= (n+2)(n+3) ( )= 2 , Var U 1 and ( )= 12(n−2) , after a few lines of algebra, it follows that : P Cov n X − X 2,W ( i=1( i ) ) P P E n X − X 2W − E n X − X 2E W = ( i=1( i ) ) i=1( i ) ( )

n−1 = 3(n+1)(n+3) (3) P Var n X − X 2 Step 4. Now, ﬁnally, we have to evaluate ( i=1( i ) ).

First note the identity

P P P n X −X 2 2 n X2 2 − nX2 n X2 n2X4 ( i=1( i ) ) =( i=1 i ) 2 i=1 i + (4) P E n X − X 2 2 Therefore, [ i=1( i ) ] P P Var n X2 E n X2 2 − n2E X2X2 n2E X4 = ( i=1 i )+( i=1 i ) 2 ( 1 )+ ( )(5)

5 P P Var X2 4n E n X2 2 n2 Step 5. Of these, obviously, ( i )= 45 and ( i=1 i ) = 9 .

E X2X2 And ( 1 ) P P 1 E X4 X3 X X2 X 2 = n2 ( 1 +2 1 i=16 i + 1 ( i=16 i) )

2 1 1 n−1 1 n−1 (n−1) = n2 ( 5 + 4 + 3 ( 12 + 4 ))

15n2+20n+1 = 180 (6)

Step 6. Finally, E(X)4

P P P P 1 E X X X X X2X X X2X2 X3X = n4 ( i=6 j=6 k=6 l i j k l+ i=6 j=6 k i j k+ i=6 j i j + i=6 j i j+ P 4 i Xi )(7)

n(n−1)(n−2)(n−3) 4! n(n−1)(n−2) 4! There are 4! 1!1!1!1! terms of the ﬁrst kind, 3! 3 2!1!1! n(n−1) 4! n(n−1) 4! terms of the second kind, 2! 2!2! terms of the third kind, 2! 2 3!1! terms of the fourth kind, and, n terms of the ﬁfth kind. Therefore, from (7), on some tedious algebra, it follows that :

E(X)4

1 15n3+30n2+5n−2 = n3 240 (8)

Step 7. Combining the previous steps,

P n 2 2n2+n−3 Var Xi − X ( i=1( ) )= 360n ,

Var W 2(n−1) ( )= (n+1)2(n+2) , P n 2 n−1 Cov Xi − X ,W and, ( i=1( ) )= 3(n+1)(n+3) , from which the formula for ρn follows.

6 Example 1 Using the exact formula of Theorem 1, the following correlations are obtained for selected values of n :

Table 1

n 2 3 4 5 10 15 20 30 50 100

ρn .956 .962 .944 .917 .786 .691 .622 .529 .424 .308

We see that in the uniform case, the correlation between s2 and W is quite high till about n = 30, and the correlation seems to be the maximum when n =3.

Theorem 2. ρn admits the asymptotic expansion

q √ √ 10 11 5 251 5 −7/2 ρn = − √ + √ + O(n ). (9) n 2 2n3/2 16 2n5/2

This asymptotic expansion follows on very careful combination of terms from the exact formula in (1). We omit the derivation. Note that we are able to give a third order expansion in the uniform case. This is because formula (1) is amenable to higher order expansions. The accuracy of the third order expansion is excellent; for example, for n =15,ρn = .691, and the third order asymptotic expansion gives the approximation .695.

3 Normal Distribution

The formula for ρn in the normal case follows from a familiar application of Basu’s theorem(Basu(1955)).

2 Theorem 3.LetX1,X2,...,Xn be iid N(µ, σ ).

7 n+2 Γ( 2 ) Let an = n−1 ; Γ( 2 )

n+3 Γ( 2 ) bn = n−1 ; Γ( 2 )

2 γn = E(W ); δn = E(W );

(n−3)/2 n 2 2 (Γ( 2 )) and λn = E(s)=√ π(n−1)(n−2)!

(10)

Then,

√ γ 2 2an − n( 3/2 1) (n−1) λn ρn = q . 4bn 2 − δn−γ ( (n−1)2 1)( n)

(11)

Proof Without loss of generality, we may take µ =0,σ =1.

W s Step 1. By Basu’s theorem(Basu(1955)), s and are independent, and hence, Cov(W, s2)=E(Ws2)−E(W )E(s2)=EE(s2E(W |s))−E(W )E(s2)= √ 3 E(W ) 2 2 2an 3 E(s ) −E(W )E(s )=γn( 3/2 −1), by a direct calculation of E(s ) E(s) (n−1) λn using the chisquare(n − 1) density.

Var s2 E s4 − 4bn − Step 2. Next, ( )= ( ) 1= (n−1)2 1, again by a direct calculation of E(s4)fromthechisquare(n−1) density. Also, from deﬁnitions 2 of δn and γn,Var(W )=δn − γn.

Step 3. Using the expressions for Cov(W, s2),Var(W ), and Var(s2), the formula for ρn follows on algebra.

Example 2 Although a general closed form formula for γn and δn in terms of elementary functions is impossible, they can be computed for any

8 given n (closed form formulas for small n are well known; see David(1970), or Arnold, Balakrishnan and Nagaraja(1992)). The following table gives the numerical value of the correlation ρn for some selected values of n.

Table 2

n 2 3 4 5 10 15 20 30 50 100

ρn .935 .952 .955 .951 .891 .844 .809 .772 .687 .602

The Table reveals that the correlations are substantially higher than in the uniform case (see Table 1). Even at n = 100, the correlation is .6. Also, again we see that the maximum correlation is for a small n, namely n = 4.

The next result gives the asymptotic order of ρn. Due to the great technical complexity in ﬁnding higher order approximations to the variance of W for normal samples, this result is weaker than the asymptotic expansion in Theorem 2 for the uniform case.

√ 5 3 log n Theorem 4. ρn ∼ √ √ (12) 2 2π n

Remark It is rather interesting that asymptotically, the correlation between s and W is of the same order, with the exception that the constant is √ √ 2 6 , which is about 1.559, while the constant in Theorem 4 above is 5√ 3 , π 2 2π which is about .975. Thus, s and W are slightly more correlated than s2 and W . This is in fact manifest for ﬁxed n as well. For example, at n =10, 20, 30, 100, the correlation between s and W can be computed to be .901, .812, .773 and .606, compared to .891, .809, .772 and .602 for s2 and W (Table 2).

Proof. The proof of Theorem 4 requires use of asymptotic theory for W

9 for the normal case as well as careful algebra with Stirling’s approximation for the various terms in ρn.

n+2 3/2 3 2 9 an ∼ − Step 1. By use of Stirling’s formula, ( 2 ) (1 n+2 ) (1 + 4(n+2) ), after several lines of algebra.

1 1 λn − o Step 2. Again, by Stirling’s formula, =1 2n + ( n ), after algebra.

√Step 3. From√ these, on a little more algebra, one gets Cov(s2,W)= 10 2logn o log n 4n + ( n ).

Var s2 2 o 1 Step 4. By standard uniform integrability arguments, ( )= n + ( n ). √ −1 Step 5. For iid N(0, 1) samples, 2logn(W −µn) ⇒ H, for µn =Φ (1− 1 H −x K − x ,K n ), with denoting a distribution with density 2 exp( ) 0(2 exp( 2 )) 0 being the appropriate Bessel function (see, e.g., Serﬂing(1980)). The vari- H π2 ance of the distribution , by a direct integration, works out to 3 .From uniform integrability arguments, that hold for the normal case, it follows Var W ∼ π2 that ( ) 6logn .

Step 6. The ﬁrst order approximation to ρn now follows by combining Steps3,4,and5.

Remark. Comparing the leading term for ρn in the uniform case (Theo- rem2)toTheorem4,weseethatthecorrelationdiesoutataslowerratein the normal case. This is interesting. This asymptotic observation is in fact clearly observed by comparison of Table 1 and Table 2 as well.

10 4 Exponential Distribution

An exact formula for the correlation ρn for the exponential case will be derived by using a representation for the order statistics of an exponential sample in terms of iid exponential variables.

Theorem 5.LetX1,X2,...,Xn be iid Exp(λ) variables. Let

P P a n−1 1 b n−1 1 n = i=1 i ; n = i=1 i2 ; (13) P P nb −a n a −nb n n−1 ai − n−1 a −n ρ 4( n n)+2(( +1)√n n+ i=1 i i=1 i +1) Then, n = 3 2 (14) (8n −14n +6n)bn

Proof We may assume without loss of generality that λ = 1. The derivation uses the following well known representation for the order statistics of an Exp(1) sample :

Let Z1,Z2,...,Zn be iid Exp(1) variables. Then the order statistics

X(1),X(2),...,X(n) of an iid Exp(1) sample admit the representation X(i) = P i Zk k=1 n−k+1 .

Step 1. First note the obvious fact that ρn also equals the correlation P 2 2 between i

2 Step 2. Using the above representation for X(i), for i

P P 2 P P P j Zk j−1 j ZkZl = i

P k− Z2 P P n ( 1) k n l−1 (k−1)ZkZl = k=2 n−k+1 +2 l=3 k=2 n−k+1 ,

11 by rearranging the order of summation in the iterated sums.

P W n Zj Step 3. Likewise, obviously, = j=2 n−j+1 .

P k− Z2 P Z P Cov n ( 1) k , n j n−1 n−i Step 4. By an easy calculation, ( k=2 n−k+1 j=2 n−j+1 )=4 i=1 i2 = 4(nbn − an).

P P P Cov n l−1 (k−1)ZkZl , n Zj Step 5. On the other hand, ( l=3 k=2 n−k+1 j=2 n−j+1 ) P P P P n l−1 k−1 n l−1 k−1 Cov ZkZl,Zk Cov ZkZl,Zl = l=3 k=2 (n−k+1)2 ( )+ l=3 k=2 (n−k+1)(n−l+1) ( ) P P P n−1 (k−1)(n−k) n−1 k−1 n−k 1 = k=2 (n−k+1)2 + k=2 n−k+1 j=1 j (16) P P P Cov n l−1 (k−1)ZkZl , n Zj ¿From (16), by a change of variable, ( l=3 k=2 n−k+1 j=2 n−j+1 ) P P P n−1 (n−i)(i−1) n−1 n−i i−1 1 = i=2 i2 + i=2 i j=1 j P P n a − nb n n−1 ai − n−1 a − n =( +1) n n + i=1 i i=1 i + 1. (17)

Step 6. Having the covariance term done, we now have to ﬁnd the P 2 variances of i

Towards this end, one obtains the following expressions on detailed calculation :

P E n X2 2 n2 n ( i=1 i ) =4 +20 ;

2 E X2X2 2(n +5n+6) ( 1 )= n2 ;

12 E X4 n3+6n2+11n+6 and, ( )= n3 . (18)

Combining all these expressions into the binomial expansion, as in The- orem 1, one gets :

P 2 3 2 Var( i

Step 7. The formula for ρn now follows by substituting the covariance and the variance expressions from Steps 1-6.

Example 3 Exact values of ρn are given in the Table below for some selected values of n.

Table 3

n 2 3 4 5 10 15 20 30 50 100

ρn .894 .861 .844 .831 .786 .753 .727 .688 .634 .558

Remark From Table 3, we notice an interesting diversity of phenomenon from the uniform and the normal case. For small n, the correlations are smaller in the exponential case, and the maximum correlation appears to be attained at n = 2 itself, while for larger n, the correlation is in between the correlations for the uniform and the normal case.

The ﬁnal result gives an asymptotic expansion for ρn with an error of the √1 order of n . Although it is a two-term expansion, due to the error being of √1 the order of n , the accuracy is poor unless n is very large. For the sake of completeness, however, it is nice to have the expansion.

√ 2 3 (log n) log n 1 ρn √ C √ O √ Theorem 6. = 2 ( n +2(1+ ) n )+ ( n ) (20),

13 where C is the Euler constant.

Proof : The proof uses the asymptotic expansions, derived from the Euler summation formula, given below :

P ∞ 1 1 1 1 O n−4 i=n i2 = n + 2n2 + 6n3 + ( ), P n 1 C n 1 − 1 O n−4 i=1 i = +log + 2n 12n2 + ( ). (21)

Step 1. By using (21), nbn − an = O(n), on some algebra.

Step 2. Also by using (21), (n +1)an − nbn − n +1=n log n + O(n), on algebra.

P n−1 a n n O n Step 3. Similarly, i=1 i = log + ( ). P n−1 ai Step 4. The ﬁnal term i=1 i is the hardest one. We analyze this term by using the summation by parts formula and (21):

P P a 2 n−1 ai − n−1 i+1 a2 −C n − (log n) O n 2 i=1 i = i=1 i + n =( log 2 + (1)) + ((log ) + C n O (log n)2 C n O 2 log + (1)) = 2 + log + (1).

These four steps take care of the covariance term in ρn. √ √ 3/2 Step 5. As regards the variance terms in ρn, 8n3 − 14n2 +6n =2 2n (1+ 1 π2 1 O bn O ( n )), and = 6 (1 + ( n )).

Step 6. Substituting these approximations into the exact formula (14) for

ρn, the asymptotic expansion in (20) follows after some algebra.

14 5 Interquartile Range and Standard Devia- tion

The interquartile range is another well accepted measure of spread. As such, the correlation between it and the sample standard deviation is also of in- trinsic interest. Unlike the correlation between s and W , the correlation between s and the interquartile range does not converge to zero as n →∞. In this section, we ﬁrst derive the joint asymptotic distribution between s and the interquartile range for any population with four moments, and use it to obtain the limiting correlation between s and the interquartile range for a number of important distributions. We see some interesting eﬀects of the thickness of the tail of the population on the correlation between s and the interquartile range.

5.1 Joint Asymptotics of Interquartile Range and Stan- dard Deviation

Theorem 7.LetX1,X2,...,Xn be iid observations from a CDF F on the real line with a ﬁnite fourth moment and a positive density function f(x).

A =((aij)), Σ=((σij)), where

a a ,a ,a − µ ,a 1 ,a 11 = 12 =0 13 =1 21 = σ 22 = 2σ 23 = 0; (22)

15 R R ξp ξp 1 xf(x)dx 2 xf(x)dx σ σ2,σ E X3 − µE X2 ,σ −∞ − −∞ − 11 = 12 = ( ) ( ) 13 = f ξ f ξ R ( p1 ) R ( p2 ) ξp ξp 1 x2f(x)dx 2 x2f(x)dx p1 p2 4 2 2 −∞ −∞ µ( − ),σ22 = E(X ) − (E(X )) ,σ23 = − − f(ξp1 ) f(ξp2 ) f(ξp1 ) f(ξp2 ) 2 p1 p2 p2(1−p2) p1(1−p1) 2p1(1−p2) E(X )( − ),σ33 = 2 + 2 − . (23). f(ξp1 ) f(ξp2 ) f (ξp2 ) f (ξp1 ) f(ξp1 )f(ξp2 )

Proof The main tool needed in deriving the joint asymptotic distribution of (Q, s)istheBahadur representation :

X − X Z o √1 ([np2]) ([np1]) = + p( n ), where

p2−I p1−I Xi≤ξp2 Xi≤ξp1 Zi = ξp2 + − ξp1 − ; see, e.g., Serﬂing(1980). f(ξp2 ) f(ξp1 )

Step 1. By the multivariate central limit theorem, on the obvious center- √ ing and normalization by n, (X,X2, Z) ⇒ N(0, Σ). √ Step 2. Consider the transformation h(u, v, z)=(z, v − u2). The gradi- ent matrix of h has ﬁrst row = (0, 0, 1) and the second row = (− √ u , √ 1 , 0). v−u2 2 v−u2

Step 3. The stated joint asymptotic distribution for (Q, s) now follows from an application of the delta theorem and the Bahadur representation stated above. We omit the intermediate calculation.

An important consequence of Theorem 7 is the following result :

√ ξ −ξ n Q − p2 p1 ⇒ Corollary 1 Under the hypotheses of Theorem 7, ( s σ ) ξ −ξ N ,c0A A0c c0 1 , p1 p2 (0 Σ ), where =(σ σ2 ).

The corollary follows on another application of the delta theorem to the result in Theorem 7.

∗ Corollary 2 Let ρn = Corr(IQR,s). Then

∗ a) limn→∞ ρn = .6062 if F = Normal;

16 ∗ b) limn→∞ ρn = .8373 if F = Uniform;

∗ c) limn→∞ ρn = .3982 if F = Exponential;

∗ d) limn→∞ ρn = .4174 if F = Double Exponential;

∗ e) limn→∞ ρn = .3191 if F = t(5). Corollary 2 follows on doing the requisite calculation from the matrix AΣA0 in Theorem 7. Thus, note that the correlation between IQR and s does not die out as n →∞, unlike the correlation between W and s.Itis also interesting to see the much higher correlation between IQR and s for the Uniform case, and that successively heavier tails lead to a progressively smaller correlation. The Table below provides the values of the correlation for some selected ﬁnite values of n.

Example 4

Table 4

n 10 15 20 30 40 50

Normal

∗ ρn .583 .598 .594 .597 .600 .599

Uniform

∗ ρn .699 .774 .787 .792 .803 .813

Exponential

∗ ρn .412 .433 .436 .398 .421 .406

17 Double Exponential

∗ ρn .470 .476 .441 .447 .423 .424

t(5)

∗ ρn .464 .489 .453 .442 .440 .423

The correlation is remarkably stable except for the uniform case and in the uniform case it is stable for n ≥ 15 or so, that is unless n is quite small. We ﬁnd this stability of the correlation across a very large range of values of n interesting and also surprising.

5.2 Thumb Rule Based on IQR and s

Simple thumb rules for quick diagnostics can be formed by using the result of Corollary 1. We present them only for normal and Exponential data here; but they can be formed for any distribution with four moments by using Corollary 1. √ n IQR − . ⇒ N , . F Corollary 3 a) ( s 1 349) (0 1 566) if = normal; √ n IQR − . ⇒ N , . F b) ( s 1 099) (0 3 060) if = Exponential.

p 1 ,p 3 Corollary 3 is a direct consequence of Corollary 1 by taking 1 = 4 2 = 4 , and on computing the mean and the variance of the limiting normal distri- IQR bution for s by using the expressions in Corollary 1.

Using 1.5 standard deviation around the mean value as the plausible IQR range (choice of 1.5 is evidently somewhat subjective) for s ,wehavethe following thumb rules for normal and Exponential data :

18 Thumb Rule

IQR For iid normal data, s should be in the intervals [.85,1.85], [.975,1.725], [1.05,1.65], [1.08,1.6] for n =15, 25, 40, 50 respectively.

IQR For iid Exponential data, s should be in the intervals [.4,1.75], [.6,1.6], [.68,1.5], [.75,1.45] for n =15, 25, 40, 50 respectively.

The overlaps between the two sets of intervals decrease as n increases. The thumb rule for the normal case above may be of some practical use.

6 The Cauchy Case

The Cauchy case is always an interesting one to address because of its lack of moments. Thus, the correlation between s and either W or the interquartile range is not defined. Still, one would certainly expect some positive dependence. This is explored in this final section by use of graphics based on simulations for three different sample sizes : n = 10, 25, and 100. The s vs. IQR scatterplots are fundamentally different; they show a massive concentration of the points close to the vertical axis, and a small fraction of stray points. We believe this is the connection of the association between s and IQR with the thickness of the tail of the population previously seen in section 5. But now we see it in almost an extreme form for the Cauchy case. The graphics show two interesting phenomena for the W vs. s scatterplots : there is always an outlier in the scatterplots, and there is also an interesting clustering. The clustering gets somewhat blurred as n increases. However, there is an obvious positive dependence seen in each scatterplot. It would be interesting to quantify this mathematically by using some measure

19 of dependence other than correlation.

References

Arnold,B., Balakrishnan,N. and Nagaraja, H.(1992). A First Course In Order Statistics, John Wiley, New York.

Basu,D.(1955). On Statistics Independent of a Complete Suﬃcient Statis- tic, Sankhya,15,377-380.

David,H.A.(1970). Order Statistics, John Wiley, New York.

David,H.A., Hartley,H.O. and Pearson,E.S.(1954). The Distribution of the Ratio, in a Single Sample, of Range to Standard Deviation, Biometrika, 482-493.

Plackett,R.L.(1947). Limits of the Ratio of Mean Range to Standard Deviation, Biometrika, 34, 120-122.

Serﬂing,R.(1980). Approximation Theorems of Mathematical Statistics, John Wiley, New York.

Thomson, G.W.(1955). Bounds for the Ratio of Range to Standard De- viation, Biometrika, 42, 268-269.

van der Vaart, A.W.(1998). Asymptotic Statistics, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge

20 Range 500 Simulations (n = 10)

4000

3000

2000

1000

s 200 400 600 800 1000 1200

IQR 500 Simulations (n = 10)

s 200 400 600 800 1000 1200

21 Range 500 Simulations (n = 25)

3000

2500

2000

1500

1000

500

s 100 200 300 400 500 600

IQR 500 Simulations (n = 25)

s 100 200 300 400 500 600

22 Range 500 Simulations (n = 100)

30000

25000

20000

15000

10000

5000

s 500 1000 1500 2000 2500 3000

IQR 500 Simulations (n = 100)

3.5

2.5

s 500 1000 1500 2000 2500

1.5