<<

JOURNAL OF RESEARCH of the National Bureau of Sta ndards-B. Mathematics and Ma thematical Physics Vol. 65B, No.2, April- June 1961

Some Properties of the Empirical Distribution Function of a Random Process M. M. Siddiqui

(Ja nuary 24, 1961)

Let {X (t) I be a co n t inuous-time, co ntinuous-i n-the-, real, stri cLly sLatio nar.v random process. If x is a given number, denote by PT (X) the proportion of t he time X(t) exceeds x, 0 5, t5, T . The covari ance of PT(Xj) and PT (X2) is obtain ed. An approx im ate soLu t ion for t he va ri a nce of a samp le quantile is also given. The general resul ts are special­ ized for Gaussian a nd x2 processes.

1. Introduction

The proportion-oI- time distribution plays a very impor tant role in the st ttistical analysis of tim e series data. The pmctice is to p(x), the propor tion of the time the record, X(t ), exceeds x, again st x [e.g., l ,2J.I In geneml, if X(t) is a wide-sense a nd a over 0 ;;;:; t;;;:; T is available, nn estimate, P7'(x), of the probnbility that X(t) ~x is ob tain ed from the snmple. In this paper we will derive the covariance of PT(X j ) and P7'(X?) . An ap­ proximate solution for the of a snmple quan tile will nlso be given. The gener a1 resul ts will be speciali zed for Gaussian nnd x2 processes. The ayleigh proce s is n special cnse of the x2 process; however, due to its wide applications, the r es ul t for this particular case will also be stated. 2. Some Properties of Time Averages

Let {X 1 (t ),X2 (t)), t ~ R(I), be a wide-sense stationary, real, con tinuous-parameter, co n­ tinuous-in-the-mean, vector process. t will be called time and R (I) will be taken to be a real line.

For i = l. 2. and for all t and s, let

EX,(t )= m" } EX,(t )X,(t + s) -m;= R ,(s), (2. 1)

EX)( t )X 2 (t + s) - m)m2 = R) 2(S), where E X deno tes tbe mathematical expectation of the X . I t will be assumed that all quantities in (2.1) are finite. W'hen statemen ts are made which apply to either of the processes, or when the proper ties of a sin gle process are considered, the process will be r eferred to as { X(t ) }, and tbe subscripts will be omitted, e.g., EX(t )= m, E X (t )X(t + s)-m2= R (s) . Let T be a positive number, N a positive integer , d= TIN, and denote X i (kd) by Xj(k).

R j(kd) by R j(k ), etc., since there will be little danger of co nfusion. { X )(k ), X 2 (k ) ), k = O, ± I , ± 2, . . . , is, then, a wide-sense stationnry, real, discrete-parameter, vector process. , Ve may consider the di screte-time process nnd extend the res ul ts to the con tinuous-time process by taking the limit as N 00 , T fixed ; or, we m ay consider the con tinuous-time process and obtain results for the discr ete-time process by replacing integrals by appropriate summa­ tions. In general, we will pursue the Intter procedure.

j Figures in brackets iudicate the li terature references at the end of this paper. 117 Let t l , t 2 be arbitrary numbers, and TI , Tz arbitrary positive numbers. Define

i = 1,2. (2.2)

It will be assumed that the subscripts on X are so chosen that T2 i;;. TI .

For the discrete-time case, we take kl' k2 arbitrary integers; N I , N2 arbitrary positive in­ tegers, choosing the subscripts on X so that N z i;;. N I ; and define

i = 1,2. (2.2') Now i=1,2, (2.3) and

(2.4)

The interchange of operations of expectation and integration in (2.3) and (2.4) is justified asEX1(t)< ro , i = 1,2. We now make the t.ransformation u = t- t', v= t'. The Jacobian of the transformation is unity. The limits of u and v arc given by

Hence (2.4) becomes

. (t t _ )_ { t2 - u, when u~ s , max 1, 2 U - . (2.5) t l , otherwIse.

Replacing t l by t, we obtain, for arbitrary t and s,

(2.6)

Remark 1. Setting XI (t ) = X z(t) =-, X (t) in (2.6), we obtain eov {XTj( t) , X T2(t +S) }=(T1T2)-1 [Js (T1 -s+u)R(u)du -Tj+s

(2.7) us Remark 3. Setting 8= 0 in (2.S), we have

(2.9)

From (2.3) to (2.9) we deduce thfL t, for fixed TI and T2 , { X IT/t), X27'2(t)} is a wide-sense ta tionary cont inuous-time a nd continuous-in-the-mean process. For the discr ete-time case, we will sta te only the results corresponding Lo (2 .6) and (2.9). IVhen translating a sum of integrals, such as

into a Riem m1l1 sum, we should be car eful to no Le that, whereas b fLppefLrs twice in the limits of the integr als, the cOlTesponding quant ity will appeal' only once in Lh e sum of Lwo summations. Thus, for arbiLl'al'Y integers Ie a nd 8,

(2 .6') imil fLrly, (2.9')

The results (2.9) and (2.9' ) are well known [3, p. SO] . 3. Empirical Distribution Function

It will be assll med furL iter that {X (t )} is a stricLly sLationary process 0 that Lhe probabili ty distribution of X (tr + h), ..., X(tn+ h) is the sam e as that of X(tr ), . . . , X(tn ) for every

election of n, tl , ... , t n, a nd h. D efine Jl , if X(t)~x, p (t ;X)= 1 (3. 1) I... 0, otherwise. Then 1 ( T PT(X) = 'f Jo pe t ; x)dt (3.2) is the propor tion of Lhe time x is exceeded by X (t ) in t he time interval (0, T ), i .e., TpT(X ) is the L ebesgue measure of the set 8 x= { t : X(t) ~ x, °~t ~ T } . W e note the following:

(1) p et; - ro )=1 for ,1ll t; hence lJT( - ro )= 1; (2) p et; + ro )=0 for all t; hence PTe + ro )=0; (3) 1£ X2 >xI , then X(t) ~ X2 implies X (t) ~ Xl'

8 xz is, th er efor e, a subset of 8 X 1, a nd hell ce

PT(X2) ~h(XI)' Prop erties (1) t o (3) imply tha t qT(X) = 1- heX) is a distribution function. qT(X) will b e called 119 the empirical univariate distribution function of the process {X(t) }. We will, however, consider PTeX) rather than qT(X), Suppressing x from pet , x) for the time being, we have

PI' [p(tl + h) = l, .. . , p(tk + h) = l, P(tH I+ h) = O, ... , p(tn+h) = O]

= Pr [X(ti + h) ~ x, .. . , X(tk+ h) ~ x, X(tH I + h)

= Pr[X(tl)~X , ... , X(tk)~X, X(tH l)

P(x) = Pr(X(t) ~ x), (3.4) we have EpT(t; x)=P(x), 1'= 1,2, ... , 1 val' pet; x)=P(x)[l-P(x)], (3.5) A(s; x) =cov {pet; x), p(t+ s; x) } J

= Pr{X(O) ~ x, Xes) ~ X} -P2(X).

Let X2 ~ Xl; then

A(s; Xl, X2)=COV {pet; Xl),P(t+s; Xz) } = Ep(t; Xl)P(t +S; X2)-P(Xl)P(X2)

= Pr{X(t) ~ Xl, X(t+s) ~ XZ}-P(Xl)P(X2)

= Pr{X(O) ~ Xl, Xes) ~ xd -P(Xl)P(X2). (3.6) We note the following: EpT(X) = P(X) A(s; x, x)=A(s; x),

A(O; Xl, x2)=P(x2)[1-P(Xl)]. Using (2.6) with s=O, Tl= T2= T, we have

} (3.7) 2 G(x, x) = var PT(X) ='1' f70 ' (1-'1' S) A(s; x)ds.

For the discrete-time case, PN(x)=N-l f p(k; x), k= l 1 N-l cov {PN(Xl),PN(XZ)}=N-lP(X2)[1-P(xl)1+ 2N-l ~ (l-sjN)A(s; Xl, X2), (3.7 ' )

val' PN(x) = N-lP(X)[1-P(x)]+2N-l ~l (l-sjN)A(s; x). J

120 To estimate A(s; Xl, X2) and A(s; x) , define

· . )_ { lifX(t)~XI,X(t +S)~X2 p (t , t + S, Xl, X2 - . o otherwIse; and use

to estimate Pr{ X(t) ~ Xl' X(t+s) ~ xd .

4 . Variance of a Quantile

Given x, PT(X) is a random variable. In the preceding section we obtained the variance of

Ih'(X), and the covariance of P1'(XI ) and PT(X2)' In this section we will obtain all approximation to the variance of a quantile of the empirical distribution function. We suppose that a number q, 0< q< 1, is specified. If U denotes the number such that

(4.1 }

then U is a random variable. Let ~ be the number s llch that PW= q, (4.2) i. c. , ~ is the qth quantile of the univariate distribution function of the proce s (X(t)]. In what follows the primes will denote differentiation with respect to the indicated argu­ ment. Fmthermol'e, it will be assumed that P(~ ) and PTW have derivatives. Now

whel'e= "approximately equal to ;" hence,

PTW - PW= -(U- ~ ) p~W ; (4.3)

0= - E ( U - mp~W-p'W]-P'(~ ) E(U - ~ ). Now

E1A,(O = Elim PT(~ + h)-7JT(O lim PC~ + h) - P(~) P'(~); h--70 h h--70 h therefore

cov (U,p~W = -P'WE(U - ~ ). As a first approximation we suppose that

(4.4) so that cov(U, p~C~)) = O. (4.5)

As a further approximation suppose that U and P~W are independent. Writing oy=y-Ey, where y is a random variable, we have

OP1'CO =-p~(OoU,

val' PTW=varUEp~ 2 m . (4 .6) Now 121 = P!2 ( ~ ) + lim ( hk ) - I[C(~ + h , ~ + k) - C(~ + h , O-C(~, ~ + k ) + C(~, OJ h, k--,>O

( 4.7) and val' PT(O is given by (3 .7 ). The result (4.7) assumes the existence of various derivatives appearing in the equation. In applying this result it is necessary to look into various approxi­ mating assumptions which are used in arriving at (4 .7 ). The approximation (4 .5) seems fairly r easonable; but the assumption of independence of U and PT(~ ) needs careful examination in each individual case. From (3. 7)

02C(Xl, X2) (4.8) OXI OX2 in case the interchange of the integration and the partial differentiation is justified. If

then

and therefore val' U =val' PT(~) /P!2(O[1 + O(T- l) 1 (4.9)

5. Applications

5.1. Gaussian Process If {X(t) } is Gaussian, write

Y(t) = [X(t) - mJ/CT It is known [4, pp. 355- 6J that

PI' {Y(t) ~ ylJ ..... Y(t+s) ~y z} = (27r) - 1[1- p2 (S)] - 112 J'oo ( 00 exp ~ - zi-9~~~ ) ~(z)t z ~ ~ dZ1cl z2 YI J Y2 I... ~ P S ) y~+y§

00 = P(Yl)P(Y2) + ~ pj(s)r j(Yl)r j(Yz). (5.1 ) j=1 H ere

P(Y) = (27r )- 112iOO e- z2 /2 dz;

122 H ,(x) is the Hermite polynomial of degree r given by

H () '? fO( ll l )' 'I? , 1'(1'-1) , _2+1'(I'- J)(1'- 2)(1'-3) , - 4 , x = ex-"-GGX e -x- " = x-~x 222! x - .. . , and

is known as the tetrachoric function of order r, and its tables are available [5]. Thus

(5 .2)

InsCl'ting these values in (3.7), we have, since p( - s) = p(s),

(5.3)

5.2. Variance of a Quantile

Note that:

Differentiating the first equation in (5.3) with respect to Xl and X2, we get the formal expreSSiOn

(5.4)

If the integral converges, we can evaluate it, and using (4 .7 ) obtain the variance of the sample quantile U. A special case . If x=m,

( T 00 varpT( m) = 2T-l J 0 (I -siT) j~ pies) T; (0) ds.

Now

j = 1,2, . .. , (- 1)i(2j)! j= O, 1, ....

Hence

'. ()_~ ~ (2j) ! ( T (1- IT) 21+1 ( ) l (5.5) valpT m - 7rTf~7f2 2 j (j !) 2(2j + l) J o s p s GS.

123 If U is the sample , then ~ = m, and

in case the integral converges. We note that, for large value of j

(2j)! 1 2 2j( j! )2~ Crrj) 1/2'

Hence the series (5.5) will always converge. A sufficien t condition for the convergence of the integral in (5 .6 ) is that lim sb {1- p2(S)}-1/2 8-70 exists for some b,O < b< 1. If the limit is nonzero for some b ~ 1 then the integral diverges . Example 1. If p (s) = p( -s) =e-M , A > O ,S~ O ,

T (1- IT) -JMd _~ _ _ l _ +O(T - I -JA7') s, e s- )\ ' 2 2T e, j = 1,2, . . .. io .1 J A Therefore ( 1 ~ (2j)! 1 ~ (2j)! +O(T - I - AT) val' PT m) = 7rAT ~ 22 J(j!)2(2j +1)2 7rA2T2 ~ 221( j! )2(2j+ 1)3 e

== 0.35/ (AT)-0. 33/ (AT)2 + O(T - Ie- AT) . Also

H ence, if U is the sample median,

2 Example 2. If p(s)=e 2a , a similar calcul ation shows that

The integral (5.6) does not converge and we fail in evaluating the variance of the sample m edi an by this method. In fact, an examination of the assumptions leading to the equation (4. 7) shows that, in this case, Ep~~m does not exist so that PTm is not differentiable. E xample 3 . Let p(s)=cos (7rs/k). Insteac1 of using (5 .5 ) to evaluate val' p7,(m ), we proceed somewhat differently. Note that [6, p. 290]

1 1 PI' { Y(t )~ O , Y(t +s)~O } =2 -27r cos - I pes),

124 Hence

1 1 1 val' PT(m)= 2T J10 '(1-'1' S ) dS-7rT J0 T(1-'1' S) cos - p(s) ds.

Now

_ { ~(S-2jk), if 2jk~s « 2j + I )k, j = O, 1, .. . , cos 1p(S)= -~ (s-2jk), if (2j- I)k~ s< 2jk , j = I, 2, ....

where

Thus, after some simplification, we obtain

Obviously val' PT(m)=O whenever 1\= 0, l even, i.e., whenever Ti a multiple of 2lc, the period of the process, and also val' lh,(m)--70 as T --7 oo.

5.3. X2 Process

Let { X t(t) }, i = l, 2, . . ., n, be independent stationary Gaussian processes with the same variance, (]'2, and the same function

p(s) = [EXt (t)Xt(t + s) - m7J /(]'2. Define

{ Y (t )} will be called a x2 process as the univariate distribution of y et ) is a x2 distribution with n degrees of freedom which has the density function

= 0 otherwise.

It is easy to verify that { Y(t) } is strictly stationary. 125 The characteristic function, ¢(u, v), of [Y(t ), Y(t +s)] is easily evaluated to be

¢( U, v) = EeiU Y (t)+iv ¥ Ct+s)

= [1-2iu-2iv-4uv( 1-p2(s))]-n/2

4 "( ) ] -nl ? = (1-2iu) -n/2(l-2iv) -nI 2 [ 1+ ~- s uv. - (1- 2w)(1- 2w)

00 (-l)ir (n/2+ j ) {2p (s)PJuJvJ . (5 .7) = ~ r (n/2)r(j + 1) (1-2iu)nl2+J (1-2iv)nl2+J' as 4uv I 41uv i I(l-2iu) (1- 2iv) = [( 1 + 4u2) (1 + 4V2) FF< l, the binomial expansion is uniformly convergent. For a full discussion of the distribution of [Y(t ), Y(t + s) ] the r eader is referred to [7]. A technique of obtaining the density function of [Y(t), Y(t + s) ] will be presented here, which is also applicable to other distributions which can be expanded in a series of orthogonal polynomials. Note the fo llowing: 1 iuZ (1) f oo e- g( z; n /2) = 27f -00 (1- 2iu)n/ 2 elu,

(2) It is easy to verify from the definition of Laguerre polynomials [8, Ch. 5] that, for a>O, j~a - 1 , eli . r( '+ l)r(a- .) . - g( z , a)=g(z, a-J) .7 . J L ~ a - ' - I ) ( z/ 2) el zi ' , 2Jr(a) 1 , where

is the generali zed Laguerre polynomial of degree j. Hence, if n~2-n = 1 may be reduced to the Gaussian case discussed in 5.1- , by the inversion of (5.7) the probability density function, 9( zIA; n/2), of [Yet ), Y (t +s)] is given by

• _ 00 [2p(s)j2Jr(n/2+ j).E!...... E!.... . ') . g( ZI' Z2, n / 2) - ~ r(n/2)r(j+ l ) clz{ g( ZI, n/2+ J) cl z~ g( Z2 , n / ~ + J)

=g(ZI; n/2)g(z2; n /2) i: r(f t~;~1~{ 2 ) p2J(s) L ,

Now, using the first equation in (5.8), we obtain

00 [2p(s)j2Jr (n/2+ j) cl j-l . . elJ- 1 . . = P (Yl)P(yZ)+ ~ r(n/2) r (j+ 1) cly( - l g(Yl, n/2+ J) cly ~-l g(Yz, n/2+ J)

= P(Yl)P(Y2)+g(Yl;n/2+ 1)g(yz;n/2+ 1) :t 2nr(j:; r::: \)p2J(S) '= 1 J n J L i':..?) (Yd2)Ll"-' 12 )(y d2) , (5.9) 126 'where P(y)=1 '" g( z; n/2)cl z. (5.10)

From (3.6) and (5.9), thc covariance of P(t;YI) and p(t+S;Y2),

A( ) ( /2+ 1) ( /2+ 1) ~ 2nr(j) r(n/ 2+ 1)p2j(s) L (n/2) ( /2)L(n,?)( / ) S;YI,Y2 = g YI;n g Y2;n ?-J 'r( /2+ ') i - I YI i - i Yz 2 . J= I J n J

Inserting this value in (3.7), we obtain an expression for the covariance of P1'(Yl) and PT(Y2)' The variance is given by setting Yl=Y2 = Y'

A special case. n= 2 gives a X2 process with two degrees of frcedom, i. e., the Rayleigh process. We find

8 S cov {Pr(Yl) ,PT(YZ) }=-T g(Yl;2)g(Y2;2) ::t ~ L f1}, 1( Yl/2)L jlJ. 1 (Y2/2) ( T (1- T- ) p2i( )ds. (5.11) J= lJ Jo

The author thanks Dr. E. L . Orow for his helpful di sc ussions during the pl'epfU'ation of this paper.

6 . References

[1] A. D . Watt and R. W. Plush, ~1 eas ur e d distributions of the instantaneous envelope a mplitude and instiln­ taneous frequency of carriers plus thermal and atmo pheri c noise, Statistical Methods in Radio vV a\'e Propagation, pp. 233- 247 (Pergamon Press, London, England, 1960). [2] Ray mond E. McGavin and Leo J . Maloney, Study at 1046 megacycles per second of the r cfiection coefficient of irregular t errain at grazing angles, J . Research KBS 63D, 235- 248 (1959) . [3] W. B. Davenport and W. L. Root, An Int rodu ction to t he Theory of R andom Signals and Noises C\1cGraw­ IIill Book Co., Inc., New York, N .Y., 1958). [4] M. G. Kendall, The AdvaJlced Theory of Stat istics, I (Charles Griffin and Co., Lo ndon, England, 19~8). [5] K . P earson, Tables for Statisticians and Biometri cians, Parts I and II (Cambridge Un iv. Press, Cambridge, England, 1930 and 193 1). [6] H arald Cramer, Mathematical Methods of (Princeton Un iv. Press, Princeton, N.J., 1948). [7] M. Nakagami, The m-d istribution- A general formula of in tensity d istribu tion of radio fading, Statistical Methods in R adio Wave Propagation, pp. 3- 36 (Pergamon Press, London, England, 1960 ) . [8] Gabor Szego, Orthogonal Polynomials (A m. M ath. Soc. New 'Yo rk, N.Y., 1939).

(Paper 65B2- 50 )

I l

127