Perturbations of Operators with Application to Testing Equality of Covariance Operators.

by

Krishna Kaphle, M.S.

A Dissertation

In

Mathematics and Statistics

Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philoshopy

Approved

Frits H. Ruymgaart

Linda J. Allen

Petros Hadjicostas

Peggy Gordon Miller Interim Dean of the Graduate School

August, 2011 c 2011, Krishna Kaphle Texas Tech University, Krishna Kaphle, August-2011

ACKNOWLEDGEMENTS

This dissertation would not have been possible without continuous support of my advisor Horn Professor Dr. Firts H. Ruymgaart. I am heartily thankful to Dr. Ruym- gaart for his encouragement, and supervision from the research problem identification to the present level of my dissertation research. His continuous supervision enabled me not only to bring my dissertation in this stage but also to develop an understand- ing of the subject. I am lucky to have a great person like him as my advisor. I am grateful to Horn. Professor Dr. Linda J. Allen for her help throughout my stay at Texas Tech. I would also like to thank Dr. Petros Hadjicostas for encouragement and several valuable remarks during my research. I am also heartily thankful to Dr. David Gilliam for his support and help on writing Matlab code. I am pleased to thank those who made this dissertation possible. Among those, very important people are Dr. Kent Pearce, chair, Dr. Ram Iyer, graduate advisor, and all the staffs of Department of Mathematics and Statistics, who gave me the moral, financial, and technological supports required during my study. I also would like to make a special reference to Dr. Magdalena Toda, undergraduate director, for the encouragement and support I got from her. I would like to thank all Professors of Department of Mathematics and fellow graduate students for helping me on whenever needed to finish this dissertation.

ii Texas Tech University, Krishna Kaphle, August-2011

TABLE OF CONTENTS

Acknowledgements ...... ii Abstract ...... v List of Tables ...... vi List of Figures ...... vii 1. Introduction ...... 1 2. Elements of Theory ...... 3 2.1 Hilbert spaces ...... 3 2.2 Projection and Riesz representation ...... 5 2.2.1 Projections ...... 8 2.2.2 Spectral Properties of Operators ...... 11 3. Random Variables in A Hilbert Space ...... 19 3.1 Random variables ...... 19 3.2 Probability Measures on H ...... 22 3.2.1 Some remarks on B ...... 22 3.2.2 Probability measures on H ...... 23 3.2.3 Gaussian distributions ...... 26 3.2.4 Karhunen - Lo`eve expansion ...... 28 4. Random Samples and Limit theorems ...... 29 4.1 Random Samples ...... 29 4.2 Some Central Limit Theorems ...... 31 5. Functions of covariance operators and Delta method ...... 41 5.1 Functions of bounded linear operators ...... 43 5.1.1 Fr´echet derivative ...... 45 5.1.2 Delta Method ...... 48 6. Perturbation of eigenvalues and eigenvectors ...... 51 6.1 Perturbation theory for operators ...... 51 6.2 Perturbation theory for matrices ...... 56 7. Testing equality of covariance operators ...... 61 7.1 Finite dimensional case ...... 61 7.2 Infinite dimensional case ...... 62

iii Texas Tech University, Krishna Kaphle, August-2011

7.2.1 Test statistic under null hypothesis ...... 62 7.2.2 Estimation of variance ...... 67 8. Generalized test ...... 70 8.1 The two-sample case ...... 73 8.2 Test statistics ...... 75 8.3 Estimation of the variance ...... 79 8.3.1 The Gaussian case ...... 80 9. Some Simulations ...... 82 9.1 Test using single eigenvalue ...... 83 9.2 Test using the first m largest eigenvalues ...... 85 9.2.1 The null Hypothesis ...... 85 9.2.2 The fixed alternative ...... 86 9.2.3 Identification of regularization parameter ...... 87 9.2.4 Local alternatives ...... 88 10. conclusion ...... 93 Bibliography ...... 94 Appendix: Matlab Code ...... 98

iv Texas Tech University, Krishna Kaphle, August-2011

ABSTRACT

The generalization of multivariate statistical procedures to infinite dimension nat- urally requires extra theoretical work. In this dissertation, we will focus on testing the equality of covariance operators. We derive a procedure from the Union Intersec- tion principle in conjunction with a Likelihood Ratio test. This procedure leads to a statistic which is the largest eigenvalue of a product of operators. We generalize this procedure by using a test statistic that is based on the first m ∈ N largest eigenvalues. Perturbation theory of operators and functional calculus of covariance operators are extensively used to derieve the required asymptotics. It is shown that the power of the test is improved with inclusion of more eigenvalues. We perform simulations to corroborate the testing procedure, using samples from two Gaussian distributions.

v Texas Tech University, Krishna Kaphle, August-2011

LIST OF TABLES

9.1 Test statistic and the regularization parameter ...... 85 9.2 Power vs inclusion of eigenvalues ...... 86 9.3 Fraction of rejections under the null Hypothesis ...... 87 9.4 Type II error ...... 92

vi Texas Tech University, Krishna Kaphle, August-2011

LIST OF FIGURES

8.1 Contours used for integration ...... 71 8.2 Contours enclosing m eigenvalues ...... 72 9.1 Histogram of 1000 test statistic values under Null ...... 84

9.2e ˆ1 vs e1 ...... 84 9.3v ˆ vs v ...... 84 9.4 Histogram of test statistic values under null when two and three eigen- value are taken...... 86 9.5 Histograms of test statistics under Local alternative when two, and three eigenvalue is taken...... 87 9.6 Histogram of test statistics under Null when epsilon is 0.5 two, and three eigenvalue is taken...... 88 9.7 Histogram of test statistics under Null when epsilon is 1.0 when two, and three eigenvalues are taken...... 89 9.8 Histogram of test statistics under Null when epsilon is 1.5 when two, and three eigenvalues are taken...... 89 9.9 QQ plot of the test statistics under Null when epsilon is 0.5 when two, and three eigenvalue are taken...... 90 9.10 QQ plot of the test statistics under Null when epsilon is 1.0 when two, and three eigenvalue are taken...... 90 9.11 QQ plot of the test statistics under Null when epsilon is 1.5 when two, and three eigenvalue is taken...... 91 9.12 Fraction of rejection vs Gamma for fixed sample size ...... 91

vii Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 1 INTRODUCTION

Univariate statistical theory is concerned with statistical inference based on the samples from a distribution on the real line. Multivariate statistics refers to the in- ference based on the samples from a distribution on Euclidean, i.e. finite dimensional spaces. Functional data analysis deals with samples in infinite dimensional spaces, typically function spaces. Indeed, the generic sample element is usually a function and an infinite dimensional object. It may also be an object of very high finite dimension. Throughout this research we will always assume the data to be infinite dimensional. More specifically, the generic sample element will be supposed to be an element of an infinite dimensional Hilbert space H. We will keep H abstract. In the theory, this has the advantage that the properties obtained for sample means, for instance, entail at once properties of the sample covariance operator, because it is also a sample mean in a Hilbert space (albeit not the same Hilbert space in which the sample elements assume their values). A good example of a Hilbert space that might be used for functional data is L2(0, 1), the space of all square integrable functions on [0,1]. Many classical statistical problems can be formulated for functional data [36]. This dissertation focuses on the testing equality of the covariance structures of two populations, based on random samples. Before going any further in the development of the theory, we will discuss some examples of functional data analysis (see [29] for detail discussion). 1. As a generic sample element we may consider the angle of the hip and knee over a child’s gait cycle. The cycle begins when the heel touches the ground and ends when it touches the ground again. A natural question that arises is whether there is a relation between the two cycles (see [26])?

2. Near - infrared spectroscopy is applied to different varieties of wheat. Let Xi(t) denote the density of the reflected radiation recorded at the spectrometer when the wave length equals t and ηi represents the level of a given protein for the i-th type of wheat. Theory suggests that the relation between ηi and Xi’s of the form

Z b ηi = C + Xi(t)f(t)dt + error, for i =1, 2, . . . , n a

1 Texas Tech University, Krishna Kaphle, August-2011 where C is a constant. This represents a functional regression model(see [19]). In the above examples we have seen several statistical problems for functional data that are well - known in multivariate statistics: For example, the two - sample problem, multiple and canonical correlation, and regression. Principal component analysis is another point of interest that extends to functional data. In this research we will need a version of the central limit theorem in Hilbert spaces, that is somewhat more general then the central limit theorem in its simplest form. Although very general central limit exists (see [24]), we prefer to present an independent proof tailored to the situation at hand. This yields the asymptotic distribution of both the sample mean and the sample covariance operators for certain triangular arrays of H- valued random variables. An important tool for the asymptotic distribution of the test statistic that we purpose in Chapter 7 to deal with equality of covariance structures, is a delta method for analytic functions of random operators. There are many analogies between infinite dimensional Hilbert spaces and Eu- clidean spaces. However, there are some differences: For example, the closed unit ball in H is not compact in the norm topology, there is no Lebesgue measure on H, the option of defining a density with respect to the Lebesgue measure is no longer available for H. The Gaussian probability distributions can be defined on H but two Gaussian distribution could be either orthogonal or equivalent. Apart from some Hilbert space theory, we will discuss random elements, probability distributions, central limit theorems, some operator and perturbation theory for H.

2 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 2 ELEMENTS OF HILBERT SPACE THEORY

In this chapter, we will focus on some theory of Hilbert spaces.

2.1 Hilbert spaces A real is a vector space equipped with a real valued inner product h., .i which is symmetric, linear in both components and satisfies hx, xi ≥ 0 for every element x in the space. Associated with the inner product is the norm p kxk = hx, xi. A sequence {xn} in the inner product space is said to be Cauchy if

for every  > 0 there is an integer N such that kxn − xmk < , whenever n, m > N. The inner product space is said to be complete if every Cauchy sequence converges to a limit in the space.

Definition 2.1.1. A Hilbert space H is a complete inner product space.

For any two elements x, y in H, the distance ρ(x, y) = kx − yk defines a metric in it. A subset of a Hilbert space is called a subspace if it is a Hilbert space in its own right. Some of the important properties of a Hilbert space H are: (a) The Cauchy - Schwarz Inequality: For any x, y ∈ H,

|hx, yi| ≤ kxk kyk . (2.1)

(b) The Triangle Inequality: For any x, y ∈ H,

kx + yk ≤ kxk + kyk . (2.2)

(c) The Continuity of The Inner Product: The inner product is jointly con- tinuous with respect to the induced norm. That is

xn → x and yn → y ⇒ hxn, yni → hx, yi .

(d) The Parallelogram Law: For any x, y ∈ H,

kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2.3)

3 Texas Tech University, Krishna Kaphle, August-2011

(e) The Polarization Identity: For any x, y ∈ H,

1 hx, yi = kx + yk2 + kx − yk2 . (2.4) 4 Complete normed vector spaces are called Banach spaces. Hilbert spaces are special Banach spaces in which the norm is induced by an inner product. The following proposition relates Banach spaces and Hilbert spaces.

Proposition 2.1.1. Every in which the parallelogram identity holds is a Hilbert space, and the inner product is uniquely determined by the polarization identity.

2 P 2 Some examples: (i) The space l of sequences {xn} such that j xj < ∞. The inner ∞ X 2 product is defined by hx, yi = xiyi. (ii) The space L (0, 1) of all square integrable i=1 Z 1 functions f on [0, 1]. The inner product is defined by hf, gi = f(t)g(t)dt. 0 The following example shows that not every linear inner product space is a Hilbert space. The space C[0, 1] of all continuous functions f on [0, 1]. The inner product Z 1 can be defined by hf, gi = f(t)g(t)dt, but the space is not complete: fn(t) = 0 max {0, min(1, n(t − 1/2))} is a Cauchy sequence whose limit g(1/2) = 0, and g(t) = 1, t 6= 1/2 is not continuous on [0,1]

Definition 2.1.2. Orthogonality Two vectors x and y in H are said to be orthog- onal, if hx, yi = 0.

We write x⊥y if x and y are orthogonal vectors. A set S ⊂ H is said to be an orthogonal set if all vectors in it are non-zero and are mutually orthogonal. An orthogonal set with only unit vectors is called an orthonormal set. We will end this section with one of the very important identities for any inner product space.

Pythagorean Identity: If x1, x2, . . . , xn is an orthogonal system, then

2 2 2 2 kx1 + x2 + ... + xnk = kx1k + kx2k + ... + kxnk . (2.5)

4 Texas Tech University, Krishna Kaphle, August-2011

2.2 Projection and Riesz representation In this section we will discuss two important classes of mappings on H: one assum- ing values in R and the other assuming values in H itself. A mapping l : H → R is called a functional on H and L : H → H is called an operator on H. Any mapping φ in a linear space is linear if φ(ax + by) = aφ(x) + bφ(y) for all a, b ∈ R and x, y ∈ H, and it is said to be continuous if φ(xn) → φ(x) whenever xn → x in H.

Definition 2.2.1. A bounded linear functional is a mapping l : H → R which is linear and |l(x)| sup = klk < ∞. x6=0 kxk

This definition entails |l(x)| ≤ klk kxk , ∀x ∈ H. Without confusion we have used the same notation for the norm klk of l and kxk of x, although l is not a element of H. See, however, Theorem 2.2.1.

Proposition 2.2.1. The linear functional l on H is bounded if and only if it is continuous.

Proof. Let l be bounded, and xn → x. We have,

|l(xn) − l(x)| = |l(xn − x)| ≤ klk kxn − xk → 0, when n → ∞.

Hence l is continuous. Conversely, let l be continuous. Suppose klk = ∞. Then there x exists a sequence x with kx k = 1 and kl(x )k > n for each n. Let y = n for each n n n n n n. Thus, we have, l(yn) → 0 and kl(yn)k ≥ 1, which is a contradiction.

It is worth to note that the boundedness not only implies continuity but also uniform continuity since for any x ∈ H we have |l(x + h) − l(x)| = |l(h)| ≤ klk khk. The following theorem is of extreme importance and its proof can be found in any standard functional analysis book (see [31], [10], [9] and [25]).

Theorem 2.2.1. Riesz representation theorem To each bounded linear functional

l : H → R, there corresponds a unique vector al ∈ H, such that

l(x) = hx, ali , x ∈ H, moreover klk = kalk . (2.6)

5 Texas Tech University, Krishna Kaphle, August-2011

The vector al is called the representer of functional l.

Definition 2.2.2. A linear operator T : H → H is called bounded if

kT xk sup = sup kT xk = kT kL ≤ ∞. x6=0 kxk kxk=1

As in the case of functionals, the boundedness of T entails its uniform continuity. The class of all bounded linear operators on H will be denoted by L(H) or simply by L.

Proposition 2.2.2. L is a Banach space under the norm k.kL .

Proof. Let T1,T2 in L and a, b ∈ R. For any x ∈ H, we have k(aT1 + bT2)xk =

kaT1(x) + bT2(x)k ≤ (|a| kT1k + |b| kT2k) kxk . Thus, aT1 + bT2 ∈ L. Now, let {Tn}

be a Cauchy sequence in L. Since kTn(x) − Tm(x)k ≤ kTn − Tmk kxk , the sequence

Tn(x) is Cauchy in H, thus it is convergent. Let T (x) be its limit. Since Tn is Cauchy,

there is M < ∞ such that kTnk < M, for all n. The continuity of the norm implies that kT (x)k ≤ M kxk for all x ∈ H. Thus, T ∈ L. Let  > 0, and k be such that kTn − Tmk <  for m, n ≥ k. But,

kTm(x) − Tn(x)k ≤ kTm − Tnk kxk ≤  kxk for all n, m > k implies,

kT (x) − Tn(x)k = lim kTm(x) − Tn(x)k ≤  kxk m→∞ for all n > k and x ∈ H. That is, kT − Tnk ≤  for all n > k. This implies lim Tn = T ∈ L

Definition 2.2.3. The operator T ∗ : H → H is called adjoint of T if

∗ hT x, yi = hx, T yi , for all x, y ∈ H (2.7)

Proposition 2.2.3. Every bounded linear operator T ∈ L has a bounded linear ad- joint.

6 Texas Tech University, Krishna Kaphle, August-2011

Proof. Let y ∈ H be fixed. l(x) = hT x, yi , x ∈ H, is a linear functional on H. This functional is bounded because T is bounded and we have,

|hT x, yi| ≤ kT xk kyk ≤ kT k kxk kyk .

Thus, by the Riesz representation theorem there exists a representer T ∗y ∈ H of this functional such that l(x) = hT x, yi = hx, T ∗yi , x ∈ H. Next, hx, T ∗(ay + bz)i = hT x, ay + bzi = a hT x, yi + b hT x, zi = hx, aT ∗y + bT ∗zi . Thus, T ∗(ay + bz) = aT ∗y + bT ∗z, for all a, b ∈ R and for all y, z ∈ H. Therefore, T ∗ is linear. Fi- nally, kT ∗yk2 = hT ∗y, T ∗yi = hTT ∗y, yi ≤ kT k kT ∗yk kyk implies, kT ∗yk ≤ kT k kyk . Similarly, kT xk ≤ kT ∗k kxk. Thus, T ∗ is bounded and we have

kT k = kT ∗k . (2.8)

Definition 2.2.4. A linear operator T ∈ L is called Hermitian if T ∗ = T .

For any T ∈ L, the operator T ∗T : H → H, is Hermitian. Example 1: Let us consider H = l2 the space of infinite sequences. The operator T : l2 → l2, defined by,

∞ X 2 T (x) = T (x1, x2,...) = (x2, x3,...), where x = (x1, x2,...), xk < ∞, k=1 is known as shift operator. Since

∞ ∞ 2 X 2 X 2 2 kT (x1, x2,...)k = xk ≤ xk = k(x1, x2,...)k , k=2 k=1

T is bounded. Also

∞ ∞ X X ∗ hT x, yi = xk+1yk = x1 · 0 + xkyk−1 = hx, T yi , k=1 k=2

∗ where, T y = (0, y1, y2,...).

7 Texas Tech University, Krishna Kaphle, August-2011

Example 2: Let H = L2(0, 1). The operator known as primitivation operator is defined by Z x (T f)(x) = f(t)dt, 0 ≤ x ≤ 1, f ∈ L2(0, 1). 0 We have,

Z 1 Z x 2 kT fk2 = f(t)dt dx x=0 0 Z 1 2 = 1[0,x], f dx x=0 Z 1 2 2 ≤ 1[0,x] kfk x=0 Z 1 1 = kfk2 xdx = kfk2 . 0 2

1 1 √ This shows that kT k ≤ √ . In fact kT k = √ , as can be shown by taking f(t) = 3t 2 2 for t ∈ (0, 1). To find the adjoint, note that

Z 1 Z 1  hT f, gi = 1[0,x](t)f(t)dt g(x)dx x=0 t=0 Z 1 Z 1  = f(t) 1[0,x](t)g(x)dx dt t=0 x=0 = hf, T ∗gi ,

∗ R 1 where (T g)(t) = x=t g(x)dx.

2.2.1 Projections The class of projection operators is one of the most important classes of operators on a Hilbert space. They resemble the class of projection matrices in the case of Euclidean spaces. In this subsection we briefly discuss projection operators.

Definition 2.2.5. For any non-empty subset G of H the orthogonal complement G⊥ is defined to be the set ⊥ G = {x ∈ H : x⊥y, ∀y ∈ G} .

8 Texas Tech University, Krishna Kaphle, August-2011

Proposition 2.2.4. G⊥ is a closed linear subspace of H.

⊥ Proof. Let x1, x2 ∈ G and a, b ∈ R. We have for any y ∈ G, hax1 + bx2, yi = ⊥ a hx1, yi + b hx2, yi = 0. That is, G is a linear subspace of H. Next, let {xn} be any Cauchy sequence in G⊥ converging to x. For any y ∈ G, we have hx, yi = ⊥ ⊥ limn→∞ hxn, yi = 0. That is, x ∈ G . Hence, G is closed.

The following theorem is of extreme importance.

Theorem 2.2.2. Let L ⊂ H be a closed linear subspace of H. Then, for each x ∈ H, there exists a unique P x ∈ L such that

kx − P xk = min kx − yk , y∈L

Moreover, we have the unique decomposition

⊥ x = P x + x − P x, P x ∈ L, (x − P x) ∈ L .

Proof. Since L is a closed linear subspace, it is convex and complete. By the definition of minimum there exists a sequence {yn} ⊂ L such that dn → miny∈L kx − yk = d,

where dn = kx − ynk. Note that, by the parallelogram law,

2 2 kyn − ymk = k(yn − x) − (ym − x)k 2 2 2 ≤ 2 kyn − xk + kym − xk − k(yn − x) + (ym − x)k 2 2 2 = 2(dn + dm) − 2d → 0.

Thus, yn is Cauchy and hence, converges in L to y = P x. Also, P x ∈ L ⇒ kx − P xk ≥ d. Moreover,

kx − P xk ≤ kx − ynk + kyn − P xk = dn + kyn − P xk → d.

Hence, d = kx − P xk. If possible let z ∈ L be another such vector, i. e. kx − zk = d. By the parallelogram

9 Texas Tech University, Krishna Kaphle, August-2011 law, we have

kP x − zk2 = k(P x − x) + (x − z)k2 = 2 kP x − xk2 + kx − zk2 − kP x − x + x − zk2 2 2 2 (P x + z) 2 2 = 2d + 2d − 4 − x ≤ 4d − 4d = 0. 2

Therefore P x − z = 0 ⇒ P x = z. Hence we get the uniqueness. The rest can be proved exploiting uniqueness.

The vector P x is called the orthogonal projection of x onto L, and we have the following result associated with orthogonal projection.

Proposition 2.2.5. The map x 7−→ P x is linear and bounded with kP k = 1.

Proof. Let L ⊂ H be a closed linear subspace and x, y ∈ H, a, b ∈ R. We have

⊥ x = P x+(x−P x), and y = P y+(y−P y) with P x, P y ∈ L, and (x−P x), (y−P y) ∈ L .

Also, ax + by = P (ax + by) + (ax + by − P (ax + by)) with P (ax + by) ∈ L and ax + by − P (ax + by) ∈ L⊥. Moreover, ax + by = a(P x + (x − P x)) + b(P y + (y − P y)) = aP x + bP y + (a(x − P x) + b(y − P y)) with (aP x + bP y) ∈ L and (a(x − P x) + b(y − P y)) ∈ L⊥. The uniqueness of the decomposition implies, P (ax+ by) = aP x + bP y. Hence, the map is linear. By the Pythageorean equality, kxk2 = kP xk2 + kx − P xk2. Therefore kP xk ≤ kxk. Thus P is bounded and kP k ≤ 1. But, for x ∈ L, P x = x implies kP k = 1.

Theorem 2.2.3. A bounded linear operator P : H → H is a projection if and only if it is idempotent and Hermitian, that is, P 2 = P and P ∗ = P .

Proof. To prove the if part, let G = {g : P g = 0} and L = G⊥. Since P 2 = P , (x − P x) ∈ G for all x ∈ H, and for any g ∈ G, hP x, gi = hx, P gi = 0. Thus P x ∈ L⊥ with x = P x + (x − P x), and hence P is a projection. Conversely, let P be a projection. For any x ∈ H, we have x = P x + (x − P x) with

10 Texas Tech University, Krishna Kaphle, August-2011

P x ∈ L and x − P x ∈ L⊥. So, P x = P 2x for all x ∈ H, thus ⇒ P 2 = P . Moreover,

hP x, yi = hP x, P y + y − P yi = hP x, P yi hx, P yi = hP x + x − P x, P yi = hP x, P yi .

That is, hP x, yi = hx, P yi = hP ∗x, yi which implies P ∗ = P .

Theorem 2.2.4. Let L = span (e1, e2, . . . , en), where {e1, e2, . . . , en} is a finite or- thogonal system. If P is the projection onto L, we have

n X P x = hx, eki ek, x ∈ H. k=1

n ⊥ X Proof. We have, x = P x+(x−P x), P x ∈ L, and (x−P x) ∈ L . Thus, P x = ckek k=1 for some constants ck. In addition hx − P x, eji = 0, for each j = 1, 2, . . . , n. This implies ck = hx, eki for all k. Hence the result follows.

An orthonormal sequence e1, e2,... in H is called complete if hx, eki = 0 for all integers k implies x = 0. A complete orthonormal sequence is called an orthonormal basis, and we have the following two important results: (1) Any separable Hilbert space of infinite dimension has a countably infinite or- thogonal basis e1, e2,.... ∞ X (2) Any vector x ∈ H can be written as x = hx, eki ek in the sense that k=1

n X x − hx, eki ek → 0, as n → ∞, k=1 which yields the Parseval’s identity

∞ 2 X 2 kxk = hx, eki . k=1

2.2.2 Spectral Properties of Operators We will now discuss some important classes of bounded operators on H.

11 Texas Tech University, Krishna Kaphle, August-2011

Definition 2.2.6. A subset S of H is compact if every sequence in S contains a convergent subsequence with limit in S.

Closed and bounded sets are compact in Euclidean spaces. But, the same is not true in infinite dimensional Hilbert spaces. The following theorem shows one of the differences between Euclidean spaces and infinite dimensional Hilbert spaces.

Theorem 2.2.5. If H is infinite dimensional, the closed unit ball B = {x : kxk ≤ 1} is not compact in the norm - metric on H.

Proof. Let e1, e2,... be an orthonormal basis of H. We have, q √ kej − ekk = 2 − 2 hej, eki = 2 for j 6= k.

Hence no subsequence of e1, e2,... can be a Cauchy subsequence. That is, the se- quence does not have any converging subsequence. Since en ∈ B for all n, B can not be compact.

Definition 2.2.7. A set S ⊂ H is called precompact if every sequence of points in S contains a Cauchy subsequence. A linear operator T : H → H is called compact if the image TB of the unit ball B is precompact. Equivalently, T is compact if for every bounded sequence {xn} in H, the sequence {T xn} contains a convergent subsequence.

Proposition 2.2.6. Compact operators are bounded.

Proof. Let T be compact. If T is unbounded, there exists a sequence {xn} in H such that kxnk = 1 for all n, and kT xnk → ∞. But then {T xn} can not have a convergent subsequence.

Definition 2.2.8. A linear operator T is called positive if T is Hermitian and

hT x, xi > 0 ∀x ∈ H − {0} .

It is called nonnegative if we allow equality.

A positive operator is one-to-one. For, if there exists x 6= y with T x = T y, then hT (x − y), (x − y)i = 0 and (x − y) 6= 0, which is a contradiction.

12 Texas Tech University, Krishna Kaphle, August-2011

Definition 2.2.9. A number λ ∈ C is called an eigenvalue of an operator T if there exists x 6= 0 such that T x = λx.

It is easy to see that eigenvalues of a positive operator T are always real positive. The vector x in the above definition is called an eigenvector for the eigenvalue λ. The set of all eigenvectors together with 0 is a closed linear subspace. If the operator is compact, we have the following spectral theorem whose proof can be found in any standard functional analysis book (see [10], [25], [30], and [31]).

Theorem 2.2.6. If H is infinite dimensional and T : H → H is a positive , then T has an infinitely countable number of positive eigenvalues τk that can be arranged in a decreasing sequence that converges to zero. That is,

τ1 > τ2 > . . . ↓ 0, and the corresponding eigenspaces are all finite dimensional and mutually orthogonal.

If Ek is the orthogonal projection onto the eigenspace for τk, then we have,

∞ ∞ X X T = τkEk, with EkEl = 0 for k 6= l, and Ek = I. (2.9) k=1 k=1

The following relates the norm of an operator with its largest eigenvalue. The easy proof is omitted.

Theorem 2.2.7. For any positive compact operator T ,

kT k = τ1 = the largest eigenvalue.

Example: Let us take H = L2(0, 1). The kernel

K(s, t) = s ∧ t − st, (s, t) ∈ [0, 1] × [0, 1] ,

13 Texas Tech University, Krishna Kaphle, August-2011

defines an integral operator K : L2(0, 1) → L2(0, 1) by

Z 1 (Kf)(s) = K(s, t)f(t)dt 0 Z 1 Z 1 = (s ∧ t)f(t)dt − (st)f(t)dt 0 0 Z s Z 1 Z 1 = tf(t)dt + s f(t)dt − s tf(t)dt 0 s 0 Z s Z 1 = (1 − s) tf(t)dt + s (1 − t)f(t)dt, 0 s

where s ∈ [0, 1] and f ∈ L2(0, 1). This operator is known to be positive and compact. By differentiating twice the equation (Kf)(s) = λf(s), and using the above expression we get, −f(s) = λf 00(s), 0 ≤ s ≤ 1.

We also have the boundary conditions

f(0) = f 00(0) = 0.

This yields the eigenfunctions √ φk(s) = 2 sin(kπs), 0 ≤ s ≤ 1,

with eigenvalues 1 λ = , k ∈ . k (kπ)2 N

In this case all the eigenspaces are one dimensional, and φ1, φ2,... form an or- thonormal basis. The integral kernel considered here is the covariance kernel of the Brownian Bridge, the limiting process of the uniform empirical process.

Definition 2.2.10. A linear operator T ∈ L is called Hilbert- Schmidt if for some orthonormal basis e1, e2,... of H we have,

∞ X 2 kT ekk < ∞. (2.10) k=1

14 Texas Tech University, Krishna Kaphle, August-2011

The following proposition shows that the finite number in (2.10) is independent of the choice of basis.

Proposition 2.2.7. If T is Hilbert-Schmidt, then for any two orthonormal bases e1, e2,... and a1, a2,... we have,

∞ ∞ X 2 X 2 kT ekk = kT amk k=1 m=1

Proof. We have,

∞ ∞ * ∞ !+ X 2 X ∗ X kT ekk = ek,T T hek, ami am k=1 k=1 m=1 ∞ ∞ X X ∗ = hek, ami hek,T T ami k=1 m=1 ∞ ∞ X X ∗ = ham, eki ham,T T eki m=1 k=1 ∞ * ∞ !+ X ∗ X = am,T T ham, eki ek m=1 k=1 ∞ X 2 = kT amk . m=1

Proposition 2.2.8. If S,T ∈ LHS, and e1, e2,... an orthonormal basis of H, then

∞ X hS,T iHS = hSek, T eki (2.11) k=1 defines an inner product on LHS.

15 Texas Tech University, Krishna Kaphle, August-2011

Proof. Let S1,S2,T ∈ LHS and a, b ∈ R. We have,

∞ X haS1 + bS2,T iHS = haS1ek + bS2ek, T eki k=1 ∞ ∞ X X = a hS1ek, T eki + b hS2ek, T eki k=1 k=1

= a hS1,T iHS + b hS2,T iHS .

P∞ P∞ 2 Also, hT,T iHS = k=1 hT ek, T eki = k=1 kT ekk ≥ 0, for all T ∈ LHS. The inner product in (2.11) does not depend on the choice of basis as can be seen the same way as the norm. Equipped with this inner product the space of all Hilbert-

Schmidt operators LHS becomes a separable infinite dimensional Hilbert space in its own right and the Hilbert-Schmidt norm will be written as,

∞ 2 X 2 kT kHS = kT ekk . (2.12) k=1

Definition 2.2.11. For a, b ∈ H, the tensor product a ⊗ b is defined by the following relation (a ⊗ b)x = hx, bi a, ∀x ∈ H. (2.13)

This is an important example of an operator in LHS. The following proposition defines its norm.

Proposition 2.2.9. we have ka ⊗ bkHS = kak kbk. Proof. If b = 0, the result is obvious. Let b 6= 0, choose an orthonormal basis b e = , e ,.... We have, 1 kbk 2

2 ∞ 2 b X 2 ka ⊗ bk = (a ⊗ b) + k(a ⊗ b)ekk HS kbk k=2  2 ∞ hb, bi 2 X 2 2 = kak + he , bi kak kbk k k=2 = kak2 kbk2 .

16 Texas Tech University, Krishna Kaphle, August-2011

This completes the proof of the Proposition.

If u ∈ H has kuk = 1, u ⊗ u is the orthogonal projection onto the span(u), and

ku ⊗ ukHS = 1.

Theorem 2.2.8. If T ∈ LHS, then T is compact.

Proof. Let e1, e2,... be an orthonormal basis of H, and note that

N 2 X 2 CN = kT kHS − kT ekk → 0, as N → ∞. k=1

N X Let us define an operator TN by TN x = hx, eki T ek, x ∈ H, then TN is com- k=1 pact because it has finite dimensional range. Choose x ∈ H, and we have x = N ⊥ X ⊥ ⊥ xN + xN , where xN = hx, eki ek, and xN ⊥ span(e1, e2, . . . eN ). If xN = 0, k=1 ⊥ 2 2 ⊥ xN we have k(T − TN )xk = 0 ≤ CN kxk . If xN 6= 0, sete ˜N+1 = ⊥ , and let xN e˜1 = e1,..., e˜N = eN , eN˜+1, eN+2,... be also an orthonormal basis of H. Because the HS-norm is independent of basis it follows that

2 ⊥ 2 2 ⊥ 2 2 k(T − TN )xk = T xN = kT e˜N+1k xN ≤ CN kxk .

It follows from these arguments that, kT − TN k → 0 as N → ∞. But this implies that T is also compact because the limit of compact operators is also compact.

Definition 2.2.12. The of a positive bounded operator T is defined as

∞ X tr(T ) = kT ktr = hek, T eki , (2.14) k=1

where e1, e2,... is an orthonormal basis of H.

The number in (2.14) is independent of the choice of basis. The operator is called

if kT ktr < ∞. The family of all trace class operators will be denoted by

17 Texas Tech University, Krishna Kaphle, August-2011

Ltr, and we have the following relation.

Ltr ⊂ LHS ⊂ {all compact operators} ⊂ L. (2.15)

The following theorem justifies the name trace.

Theorem 2.2.9. Let T be positive and compact operator on an infinite dimensional

Hilbert space H with all eigenvalues τ1 > τ2,... ↓ 0 simple. Then,

∞ X tr(T ) = τk. (2.16) k=1

Proof. Since all the eigenvalues are simple, there exists an orthonormal basis e1, e2,... of eigenvectors and we have corresponding eigen projections Ek = ek ⊗ ek, so that T can be written as ∞ X T = τkek ⊗ ek. k=1 Thus we have ∞ ∞ ∞ X X 2 X tr(T ) = hek, T eki = τk kekk = τk. k=1 k=1 k=1

Thus in this special case the trace of an operator is nothing more than sum of eigenvalues.

18 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 3 RANDOM VARIABLES IN A HILBERT SPACE

3.1 Random variables In this section we define random variables in a Hilbert space and discuss their important basic properties. The Hilbert space H we consider here onward is infinite dimensional and separable. Let (Ω, F, P) be a probability space. A mapping X : Ω → H such that X is (F, B) − measurable, is called a random variable in H, where B is the smallest sigma algebra that contains the family of all open sets in H. A random variable induces a probability distribution PX = P on (H, B), defined by the relation

−1 P (B) = PX (B) = P(X (B)) = P {X ∈ B} ,B ∈ B. (3.1)

Now we define moments of the random variable in H.

Definition 3.1.1. Let X be a random variable in H with E kXk < ∞. The mean EX of X is the vector µ ∈ H uniquely determined by the relation

E ha, Xi = ha, µi , ∀a ∈ H. (3.2)

With the above defined mean, (X − µ) ⊗ (X − µ) is a in LHS, with the norm 2 k(X − µ) ⊗ (X − µ)kHS = k(X − µ)k . (3.3) If we assume that 2 E kXk < ∞, (3.4)

the random element (X − µ) ⊗ (X − µ) has a mean in LHS in its own right by essentially the same definition as for the mean of X, i.e. E (X − µ) ⊗ (X − µ) is uniquely determined by the requirement

E hT, (X − µ) ⊗ (X − µ)iHS = hT, ΣiHS , ∀T ∈ LHS. (3.5)

The uniqueness in these definitions is uniquely determined by the Riesz representation

19 Texas Tech University, Krishna Kaphle, August-2011

theorem, because the functionals a 7→ E ha, Xi and T 7→ E hT, (X − µ) ⊗ (X − µ)iHS are bounded.

The relation in (3.5) can be further simplified. We know that LHS is spanned by the operators of the form (a ⊗ b), a, b ∈ H. So choosing T = a ⊗ b we see that the left hand side of (3.5) is,

∞ X E ha ⊗ b, (X − µ) ⊗ (X − µ)iHS = E h(a ⊗ b)ek, ((X − µ) ⊗ (X − µ))eki k=1

= E hhb, e1i a, hX − µ, e1i (X − µ)i

= hb, e1i E hX − µ, e1i ha, X − µi = E ha, X − µi hX − µ, bi ,

b where we have assumed, b 6= 0, e = , e ,... is an orthonormal basis of . Also 1 kbk 2 H the right hand side of (3.5) reduces to

∞ X ha ⊗ b, ΣiHS = h(a ⊗ b)ek, Σeki k=1

= hb, e1i ha, Σe1i = ha, Σbi .

This establish a very important relation. The covariance operator Σ of X is uniquely determined by E ha, X − µi hX − µ, bi = ha, Σbi , ∀a, b ∈ H. (3.6) Apart from some elementary properties similar to those in the multivariate case [28] [11], covariance operators has some specific properties. The following theorem states some of the important properties of the covariance operator. Theorem 3.1.1. The covariance operator Σ is Hermitian and of trace class, and hence compact. It is nonnegative and positive if it is one to one. Proof. Using (3.6) is easy to see that Σ is Hermitian. To prove it is of trace class, let

e1, e2,... be any orthonormal basis in H. Then,

∞ X X − µ = hX − µ, eki ek. k=1

20 Texas Tech University, Krishna Kaphle, August-2011

Hence

∞ ∞ ∞ X X 2 X 2 2 hek, Σeki = E hX − µ, eki = E hX − µ, eki = E kX − µk < ∞. k=1 k=1 k=1

Therefore Σ is of finite trace. Moreover, for any a ∈ H,

2 ha, Σai = E ha, X − µi ≥ 0.

If Σ is one to one, the number on the left in the above inequality is positive and hence Σ is positive.

Now we define an important class of operators on a Hilbert space.

Definition 3.1.2. An operator is called covariance operator if it has the following properties (i) it is Hermitian; (ii) it is positive; (iii) it has finite trace. Hence, it is Hilbert- Schmidt and therefore compact.

Because of the compactness and positivity of Σ, it has the spectral decomposition as in Theorem 2.2.6. Let σ1 > σ2 > . . . ↓ 0 be the eigenvalues of Σ. If we assume that all the eigenspaces are one dimensional, there exists an orthonormal basis e1, e2,... such that the eigenprojections can be written as

Ek = ek ⊗ ek, k ∈ N, (3.7)

and we have the spectral representation

∞ X 2 Σ = σkek ⊗ ek. (3.8) k=1

2 2 Note that the variance of hX − µ, eki is hek, Σeki = σk, so that σk are in fact the variances as suggested by the notion.

21 Texas Tech University, Krishna Kaphle, August-2011

3.2 Probability Measures on H 3.2.1 Some remarks on B Let A be any σ−field of subsets of H, that is, ∅ ∈ A,A ∈ A ⇒ Ac ∈ A, and ∞ A1,A2,... ∈ A ⇒ ∪k=1Ak ∈ A. The sigma field generated by any collection C is the

sigma field σ(C) = ∩A⊃CA. That is, σ(C) is the smallest sigma field containing C. The sigma field B is generated by the collection of all open sets O. Each O ∈ O can ∞ be written as O = ∪k=1Bk, for some open balls B1,B2,.... A collection V of subsets of H is called a field if it is closed under complementation k and finite unions. For each k ∈ N let Bk denote the σ−field of Borel sets in R . A subset C of H is called a Borel cylinder set with base B, if it is of the form C = {x ∈ H :(ha1, xi ,..., hak, xi) ∈ B} , for some k ≥ 1, a1, a2, . . . , ak ∈ H, and

B ∈ Bk. Thus, C is an inverse image of a Borel set under a continuous function and hence measurable. The class V of all cylindrical sets is a field and a subset of B. It is not difficult to see that V also generates B. Since H is separable there exists a countable set of elements of xn ∈ H, xn 6= 0, which is everywhere dense in H. Let

xn un = , so that hxn, uni = kxnk , n ∈ N. kxnk

Using the Cauchy - Schwarz inequality it is easy to see that

∞ \ {x ∈ H : kxk ≤ α} ⊂ {x ∈ H : hx, uni ≤ α} . (3.9) n=1

Since the xn are everywhere dense in H, for x ∈ H with kxk > α, there exists n0 ∈ N 1 such that kx − x k < (kxk − α), so that n0 2 1 1 kx k ≥ kxk − kx − xk > kxk − (kxk − α) = (kxk + α). n0 n0 2 2

Moreover,

1 |hx, u i − kx k| = |hx − x , u i| ≤ ku k kx − x k < (kxk − α). n0 n0 n0 n0 n0 n0 2

22 Texas Tech University, Krishna Kaphle, August-2011

This entails hx, un0 i > α. Hence,

∞ [ {x ∈ H : kxk > α} ⊂ {x ∈ H : hx, uni > α} . n=1

Taking compliments we get

∞ \ {x ∈ H : kxk ≤ α} = {x ∈ H : hx, uni ≤ α} . (3.10) n=1

∞ The set ∩n=1 {x ∈ H : hx, uni ≤ α} is the intersection of a countable collection of cylinder sets and therefore an element of σ(V) for α > 0 and trivially for α = 0. But then, ∞ [  1  {x ∈ : kxk < α} = x ∈ : kxk ≤ α − , (3.11) H H n n=1 is also an element of σ(V) for each α > 0. This argument extends easily to all open balls so we have σ(V) ⊃ B. Because of the structure of σ(V), we can apply a fundamental result from measure

theory due to Caratheodory to claim : if P0 :V → [0, 1] satisfies P0(H) = 1 and is σ−additive on V, then P0 extends uniquely to a probability measure P on B.

3.2.2 Probability measures on H ∞ X Let x1, x2,... ∈ H and p1 ≥ 0, p2 ≥ 0,... such that pk = 1. The discrete k=1 probability measure on H is defined as X P (B) = pk,B ∈ B.

k:xk∈B

Since each point subset is in B we have P ({xk}) = pk. Important examples of discrete distributions are the empirical distributions.

Let us assume that X has a discrete distribution PX = P as above, and that

∞ X E kXk = kxkk pk < ∞. k=1

23 Texas Tech University, Krishna Kaphle, August-2011

Under this assumption the mean of X is given by

∞ X EX = µ = xkpk, k=1

because ∞ * ∞ + X X E ha, Xi = ha, xki pk = a, xkpk = ha, µi , ∀a ∈ H. k=1 k=1 Moreover, assuming that

∞ 2 X 2 E kXk = kxkk pk < ∞, k=1 the covariance operator of X equals

∞ X E(X − µ) ⊗ (X − µ) = Σ = {(xk − µ) ⊗ (xk − µ)} pk. k=1

To see this, choose an arbitrary a, b ∈ H, and note that

∞ X E ha, (X − µ)i h(X − µ), bi = ha, (xk − µ)i h(xk − µ), bi pk. k=1

This last expression should be ha, Σbi. To double check note that, with Σ as above we indeed have

∞ X ha, Σbi = ha, ((xk − µ) ⊗ (xk − µ))bi pk k=1 ∞ X = ha, hxk − µ, bi (xk − µ)i pk k=1 ∞ X = ha, xk − µi hxk − µ, bi pk. k=1

One of the major differences between Euclidean spaces and infinite dimensional Hilbert spaces is that there does not exists a Lebesgue measure on a Hilbert space. Consequently, densities with respect to Lebesgue measure can not be defined. Thus,

24 Texas Tech University, Krishna Kaphle, August-2011 we have to specify the probability distribution in a different manner. Let us recall that a positive Hermitian operator with finite trace is called a covariance operator. We will show that any such operator is indeed the covariance operator of some random variable in H.

Definition 3.2.1. The characteristic functional of a random variable X with induced ˆ probability function PX = P is P : H → R, given by Z iht,xi iht,Xi Pˆ(t) = Ee = e dP (x). (3.12) H The characteristic functional has the following properties: 1. Pˆ(0) = 0. 2. Pˆ : H → C is continuous. 3. Pˆ is semi - definite positive. That is, for every N ∈ N, for every set of numbers z1, z2, . . . zN ∈ C, and for every set of vectors t1, t2, . . . , tN ∈ H we have

N N X X ˆ zjz¯kP (tj − tk) ≥ 0. j=1 k=1

4. If Y is another random variable whose characteristic functional is the same as that of X then PX = PY .

5. For any  > 0 there exists a covariance operator Σ such that

ˆ 1 − Re(P(t)) < , whenever ht, Σti < 1. (3.13)

Moreover, the Minlos-Sazonov(1963)(see [27] and [33]) theorem states that the above mentioned properties are sufficient for any complex valued function Pˆ on H to be characteristic functional of a probability measure P on (H, B). Henceforth we will refer to any such function with these five properties as characteristic functional and notice that in principle probabilities on H can be given by specifying characteristic functionals. An important class of probability measure that can be thus defined is the Gaussian distribution.

25 Texas Tech University, Krishna Kaphle, August-2011

3.2.3 Gaussian distributions

Let µ ∈ H be a vector and Σ ∈ LHS be a covariance operator. Then the functional

iht,µi− 1 ht,Σti φ(t) = e 2 , t ∈ H, (3.14) is a characteristic functional. The corresponding probability distribution is called the Gaussian distribution with parameter µ and Σ, and we write it as G(µ, Σ).

Proposition 3.2.1. The random variable X whose characteristic functional is given by (3.14) has a mean µ and a covariance operator Σ.

Proof. Let a, b ∈ H and s1, s2 ∈ R. Then

i(s hx,ai+s hx,bi) ihx,s a+s bi Ee 1 2 = Ee 1 2 ihµ,s a+s bi− 1 hs a+s b,Σ(s a+s b)i = e 1 2 2 1 2 1 2   ha, Σai ha, Σbi 1 ∗ is1hµ,ai+is2hµ,bi− (s1,s2)  (s1,s2) 2 ha, Σbi hb, Σbi = e .

This shows that (hx, ai , hx, bi)∗ has a bivariate normal distribution and the mean of hx, ai is hµ, ai, also E hx − µ, ai hx − µ, bi = ha, Σbi. Hence the result follows. In general, probability measures on a Hilbert space are different from those in a Euclidean space. This can be demonstrated in terms of Gaussian measures.

Definition 3.2.2. Two probability measures P,Q on (H, B) are called equivalent (P ∼ Q) if P (B) = 0 ⇔ Q(B) = 0,B ∈ B, and they are called orthogonal (P ⊥Q) if

∃S ∈ B : P (S) = 1,Q(S) = 0.

Thus if P ∼ Q we have P << Q and Q << P , that is the measures are absolutely continuous with respect to each other, and we have the Radon - Nikodym derivatives

dP Z dQ Z = fP,Q : P (B) = fP,QdQ and = fQ,P : P (B) = fQ,P dP, B ∈ B. dQ B dP B

26 Texas Tech University, Krishna Kaphle, August-2011

On the real line any two normal distributions with nonzero variance are equivalent, but two Gaussian distribution on H can be orthogonal. We would like to state the following theorem (see [12], [13] for the proof).

Theorem 3.2.1. (Feldman, Ha´jek)(1958). Let P be G(µ, Σ) and Q be G(ν, Σ) be two Gaussian distributions on H with means µ and ν and common covariance operator P∞ 2 Σ = k=1 σkek ⊗ ek. Let µk = hµ, eki , νk = hν, eki . We have

∞ 2 X (µk − νk) P ∼ Q ⇔ < ∞. (3.15) σ2 k=1 k

If the sum is infinite they are orthogonal.

If two Gaussian measures are equivalent we can define the density of one with respect to the other.

Theorem 3.2.2. If P ∼ Q as defined above the density of P with respect to Q equals

! µk − νk µk + νk P∞ x − dP k=1 2 k = e σk 2 , dQ

P∞ P∞ where x = k=1 hx, eki ek = k=1 xkek.

Proof. We will only give a sketch. Let Pn and Qn be the restrictions of P and Q respectively to the subspaces generated by e1, e2, . . . , en, and let λn be the Lebesgue n measure on R . The density of Pn with respect to Qn equals

dP (dP /dλ )(x) n (x) = n n dQn (dQn/dλn)(x) µ − ν µ + ν ! 1 Pn k k k k Pn 1 2 2 xk− − {(xk−µk) −(xk−νk) } k=1 2 k=1 σ2 σ 2 = e 2 k = e k .

It can be shown that dP/dQ is equal to the limit of the above expression. Hence the result follows.

27 Texas Tech University, Krishna Kaphle, August-2011

3.2.4 Karhunen - Lo`eve expansion Gaussian random variables in a Hilbert space are easy to deal with because of several reasons. One of the reasons is that the calculations regarding a Gaussian random variables are relatively simple because they can be expressed in terms of a countable collection of standard normal random variables. This expansion is known as the Karhunen - Loe`ve expansion (see [18]). 2 Let X ∈ H be a mean 0 random variable with a covariance operator Σ. Let σ1 > 2 σ2 > . . . ↓ 0 be eigenvalues and e1, e2,... be the orthonormal basis of corresponding P∞ eigenvectors of Σ, then X = k=1 hX, eki ek, and thus,

2 E hej,Xi hX, eki = hej, Σeki = σkδj,k.

d 2 If X = G(0, Σ), then the hX, eji are normal with mean 0 and variance σj , and are uncorrelated. Consequently, they are independent. This yields the following Karhunen - Lo`eve expansion. ∞ X X = σkZkek, (3.16) k=1

where, Z1,Z2,... are iid N (0, 1) random variables. We will end this chapter with one important result which will be needed in the sequel.

d Theorem 3.2.3. Let µ ∈ H and Σ be a covariance operator, then X = G(µ, Σ) if d and only if hX, ai = N (ha, µi , ha, Σai) , ∀a ∈ H. d Proof. Let X = G(µ, Σ), and a ∈ H. For any s ∈ R we have

ishX,ai ihX,sai ihsa,µi− 1 hsa,Σsai Ee = Ee = e 2 isha,µi− 1 s2ha,Σai = e 2 .

d This implies that hX, ai = N (ha, µi , ha, Σai) for all a ∈ H. Conversely, let t ∈ H. Since hX, ti =d N (ht, µi , ht, Σti) , we have

ihX,ti iht,µi− 1 ht,Σti Ee = e 2 .

This implies X =d G(µ, Σ).

28 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 4 RANDOM SAMPLES AND LIMIT THEOREMS

4.1 Random Samples

In this chapter we will develop some tools for statistical analysis. Let X1,X2,... be an iid sample from PX = P . The two important parameters of the probability measure P are the mean µ and the covariance operator Σ. We will now define their sample analogues. The sample mean is defined to be

n 1 X µˆ = X¯ = X , (4.1) n n i i=1 and the sample covariance operator is defined as

n 1 X Σˆ = Σˆ = (X − X¯ ) ⊗ (X − X¯ ). (4.2) n n i n i n i=1

The assumption of finite fourth moment of the norm of X guarantees the existence and uniqueness of µ and Σ. We have the following theorem proving their unbiasedness.

Theorem 4.1.1. The sample mean is an unbiased estimator of the population mean and a rescaled sample covariance operator is an unbiased estimator of the population covariance operator.

Proof. We have,

n 1 X 1 a, X¯ = ha, X i = n ha, µi = ha, µi , ∀a ∈ . E n E i n H i=1

¯ This shows that EXn = µ. In order to prove the same for the covariance operator, we note that for any two random elements X and Y with finite second moment and means µ and ν respectively,

E(X − µ) ⊗ (Y − ν) = T ⇔ E ha, X − µi hY − ν, bi = ha, T bi , ∀a, b ∈ H. (4.3)

29 Texas Tech University, Krishna Kaphle, August-2011

Now, let n 1 X Σ˜ = (X − µ) ⊗ (X − µ) . (4.4) n i i i=1

Since the r.h.s is an average of iid elements in the Hilbert space LHS each with mean E (X − µ) ⊗ (X − µ) = Σ, this must be an unbiased estimator of Σ. That is EΣ˜ = Σ. Also, note that

n 1 X Σˆ = (X − µ) ⊗ (X − µ) − X¯ − µ ⊗ X¯ − µ (4.5) n i i i=1 = Σ˜ − X¯ − µ ⊗ X¯ − µ . (4.6)

Because Xi and Xj are independent for i 6= j we have E ha, Xi − µi hXj − µ, bi = 0. Using (4.3), E (Xi − µ) ⊗ (Xj − µ) = 0, and hence,

n n 1 X X X¯ − µ ⊗ X¯ − µ = (X − µ) ⊗ (X − µ) (4.7) E n2 E i j i=1 j=1 1 1 = (X − µ) ⊗ (X − µ) = Σ. (4.8) nE n

Combining these, we see that

1 n − 1 Σˆ = Σ − Σ = Σ. E n n n Thus Σˆ is an unbiased estimator of Σ. n − 1 The exact distribution of the sample mean for a random sample from a Gaussian

distribution is easy to find. If Xi are iid G(µ, Σ), using X as a generic sample element we have,

¯ 1 Pn n t on 1 Σ iht,Xi iht,Xj i ih ,Xi iht,µi− ht, ti Ee = Ee n j=1 = Ee n = e 2 n . (4.9)

¯ Σ This shows that X has a G(µ, n ).

30 Texas Tech University, Krishna Kaphle, August-2011

4.2 Some Central Limit Theorems There are various versions of univariate limit theorems [37]. To derive the asymp- totic distribution of X¯ and Σ we will need the notion of convergence in distribution in a Hilbert space H.

Definition 4.2.1. Let T,T1,T2,... be a sequence of random variables in H, and let d PT = P, PT1 = P1, PT2 = P2,..., be the induced probability measures. Then Tn → T in H or Pn → P , if Z Z Ef(Tn) = fdPn → fdP = Ef(T ), ∀f ∈ C0(H), (4.10)

where C0(H) is the class of all bounded and continuous real valued functions on H. As in the multivariate case, the above definition is equivalent to the following statement

P(Tn ∈ B) = Pn(B) → P (B) = P (T ∈ B), ∀B ∈ B : P (∂B) = 0. (4.11)

In the case of an infinite dimensional situation, the pointwise convergence of the characteristic functionals is no longer sufficient for establishing the convergence in distribution. We will need a further condition for convergence in distribution of a sequence of random variables in a Hilbert space. The following theorem will give the condition for this convergence (see [24]).

Theorem 4.2.1. Let T1,T2,..., be a sequence of random variables in H such that, for each a ∈ H, d hTn, ai → Ua, as n → ∞, in R, (4.12)

where Ua is some real valued random variable. Suppose, moreover, that for some

orthonormal basis e1, e2,...,

∞ X 2 sup E hTn, eki → 0, as ν → ∞. (4.13) n∈ N k=ν

d Then there exists an H− valued random variable T such that Tn → T as n → ∞, in H.

31 Texas Tech University, Krishna Kaphle, August-2011

Condition (4.12) is known as the convergence of finite dimensional distributions and condition (4.13) is known as the tightness of the sequence. Hence, we say that a sequence of random variable in H converges in distribution if (1) the finite dimensional distributions converge, and (2) the sequence is tight.

Now we will consider the simplest version of the central limit theorem in Hilbert spaces (see also [8]).

Theorem 4.2.2. Central Limit Theorem for for the sample mean. Let ¯ X1,X2,...,Xn be an iid sample with finite second moment of norm and Xn be the sample mean, then there exists a Gaussian random element G =d G(0, Σ) such that √ ¯ d n(Xn − µ) → G, as n → ∞, in H. (4.14)

Proof. Let e1, e2,... be an orthonormal basis of eigenvectors of Σ. Then

∞ √ ∞ X ¯ 2 X ¯ 2 sup E n(Xn − µ), ek = sup nE Xn − µ, ek n∈ n∈ N k=ν N k=ν ∞ X = sup hek, Σeki n∈ N k=ν ∞ X 2 = σk → 0, as ν → ∞, k=ν

√ ¯ because Σ is of finite trace. This shows that the sequence n(Xn − µ) is tight. Now, let a ∈ H and note that hXi − µ, ai are iid real valued random variables with mean 0 and variance ha, Σai. The central limit theorem in one dimension yields the existence d of a real valued random variable Ua = N (0, ha, Σai) such that √ ¯ d n(Xn − µ), a → Ua, as n → ∞, in R. (4.15)

Hence Theorem 4.2.1 applies and yields the existence of a random variable G ∈ H such that √ ¯ d n(Xn − µ) → G, as n → ∞, in H.

32 Texas Tech University, Krishna Kaphle, August-2011

According to continuous mapping theorem it follows that √ ¯ d n(Xn − µ), a → hG, ai , as n → ∞, in R. (4.16)

Thus, the equations (4.15) and (4.16) implies that

d hG, ai = N (0, ha, Σai) , ∀a ∈ H, (4.17)

and G is Gaussian by Theorem 3.2.3. ˆ To derive the asymptotic distribution for Σn, we will need some discussion about

the structure of LHS. As mentioned in Chapter 2, it is a Hilbert space with its own

inner product and norm. We define the tensor product S ⊗HS T for S,T ∈ LHS as follows.

(S ⊗HS T )U = hU, T iHS S,U ∈ LHS. (4.18)

Since (X − µ) ⊗ (X − µ) − Σ is a zero mean random element in LHS, its covariance operator V : LHS → LHS equals

V = E {(X − µ) ⊗ (X − µ) − Σ} ⊗HS {(X − µ) ⊗ (X − µ) − Σ} . (4.19)

This operator is uniquely determined by the property

E hS, (X − µ) ⊗ (X − µ) − ΣiHS h(X − µ) ⊗ (X − µ) − Σ,T iHS

= hS, VT iHS , ∀S ∈ LHS, ∀T ∈ LHS. (4.20)

Now we are in the stage to prove a central limit theorem for covariance operators.

Theorem 4.2.3. Let X1,X2,...Xn be a sequence of iid random variables with finite ˆ fourth moment of the norm. If Σn is the sample covariance operator as defined before, d then there exists a Gaussian random element G = GHS(0, V) such that √ ˆ d n(Σn − Σ) → G, as n → ∞, in LHS (4.21)

33 Texas Tech University, Krishna Kaphle, August-2011

ˆ Proof. Σn can be decomposed as

ˆ ˜ ¯ ¯ Σn = Σn − (Xn − µ) ⊗ (Xn − µ),

˜ 1 Pn where Σn = n i=1(Xi − µ) ⊗ (Xi − µ). ˜ Note that Σn is an average of iid random variables in LHS that have common mean 0 d √ ˜ and covariance operator V. Therefore applying Theorem 4.2.2 yields n(Σn−Σ) → G as n → ∞, in LHS. Also, we have (see Proposition 2.2.9)

√ 1 √ 2 n(Xn − µ) ⊗ (Xn − µ) = √ n(Xn − µ) . HS n

2 Since k.k is a continuous function on H, the continuous mapping theorem entails

√ 2 d 2 n(Xn − µ) → kGk , as n → ∞, in R.

√ 2 √1 √1 Therefore n k n(Xn − µ)k = OP ( n ) and does not contribute to the asymptotic √ ˜ distribution determined by n(Σn − Σ). Thus we have the desired result.

Using this simple version of the central limit theorems we will now prove a more general central limit theorem for a triangular array of random variables in H. For this generalization we will need some additional properties of covariance operators. We know that, for any bounded, positive and Hermitian operator A, there exists a √ bounded operator B such that B2 = A, and we write B = A. Since for any T ∈ L the operator T ∗T is positive and Hermitian, we have the following definition.

Definition 4.2.2. Let T ∈ L we define |T | by

|T | = p(T ∗T ). (4.22)

If C ⊂ LHS is the class of covariance operators, and T ∈ Cwe have

kT ktr = tr(|T |) = tr(T ). (4.23)

34 Texas Tech University, Krishna Kaphle, August-2011

If S,T ∈ C then (|S − T |) ∈ C, and we define

kS − T ktr = tr(|S − T |),T ∈ C. (4.24)

For any T ∈ L the following inclusions are immediate:

kT ktr < ∞ ⇒ kT kHS < ∞ ⇒ kT kL < ∞. (4.25)

The above notions can be easily extended to the class of covariance operators 2 defined on the space of Hilbert - Schmidt operators. We will write CHS ⊂ LHS for the

class of covariance operators on LHS with trace trHS(V) and norm kVkHS,tr, where the trace and norm are as defined above. Now we will prove a generalized version of the central limit theorem. For each

sampling stage n ∈ N, let Xn,1,...,Xn,n be independent and identically distributed H− valued random elements with the same distribution as Xn. A sufficient condition for all that follows is,

4+δ sup E kXnk < ∞, for some δ > 0. (4.26) n∈N

For each n ∈ N we write,

EXn,i = µn, E(Xn,i − µn) ⊗ (Xn,i − µn) = Σn, (4.27)

and introduce

n n 1 X 1 X X¯ = X , Σˆ = (X − µ ) ⊗ (X − µ ). (4.28) n n n,i n n n,i n n,i n i=1 i=1

Also define

E ((Xn,i − µn) ⊗ (Xn,i − µn) − Σn) ⊗HS ((Xn,i − µn) ⊗ (Xn,i − µn) − Σn.) = Vn. (4.29)

Theorem 4.2.4. Let us assume that there exist covariance operators Σ on H and V

35 Texas Tech University, Krishna Kaphle, August-2011

on LHS with,

kΣn − Σktr → 0 and kVn − VkHS,tr → 0. (4.30) Then √ ¯ d n(Xn − µn) → G, as n → ∞, in H, (4.31) where G is a Gaussian (0, Σ) random element, √ ˆ d n(Σn − Σn) → G, as n → ∞, in LHS, (4.32)

where G is a Gaussian (0, V) random element. √ ¯ Proof. Let Tn = n(Xn −µn), we need to prove that for each e ∈ H, hTn, ei converges to a random variable Ue ∈ R, and the sequence {Tn} is tight. Note that Yn,i = 1 √ hX − µ , ei are iid real valued random variables with mean 0 and variance s = n n,i n n,i n 1 2 2 2 X 2 2 4+δ σ (e) where, σn(e) = he, Σnei. Thus, sn = sn,i = σn(e). Since, sup kXn,ik < n n∈N i=1 ∞, 1 E kY k2+δ ≤ √ K, where, K = (2 kek)2+δ sup kX k2+δ . n,i 2+δ n,i ( n) n∈N Now, n X 2+δ E kYn,ik K i=1 ≤ √ → 0, as n → ∞. 2+δ δ 2+δ sn ( n) (σn(e)) By the Lyapeunov Central Limit Theorem for real valued random variables applied to n X {Yn,i} we get, Yn,i = hTn, ei converges in distribution to a normal random variable i=1 Ue ∈ R with mean 0 and variance he, Σnei. But,

|he, Σnei − he, Σei| = |he, (Σn − Σ) ei| ≤ kΣn − Σktr → 0.

That is, Ue is N(0, he, Σei).

For tightness, let {em} be an orthonormal basis of H and  > 0 be given. Note that √ ¯ 2 E n(Xn − µn), em = hem, Σnemi , and that |ha, (Σn − Σ) ai| ≤ ha, |Σn − Σ| ai , for

36 Texas Tech University, Krishna Kaphle, August-2011

each a ∈ H. It follows that

∞ ∞ ∞ X X X sup hem, Σnemi ≤ hem, Σemi + sup hem, |Σn − Σ| emi . (4.33) n∈ n∈ N m=N m=N N m=N

Since kΣn − Σktr → 0, there exists n() ∈ N such that

∞ X  sup he , |Σ − Σ| e i ≤ sup kΣ − Σk ≤ . (4.34) m n m n tr 2 n≥n() m=N n≥n()

Since each of the operators in the finite collection Σ, Σ1,..., Σn() has finite trace, there exists an index N() ∈ N such that

∞ X  he , Σe i < , (4.35) m m 4 m=N()

and

∞ X sup hem, |Σn − Σ| emi (4.36) 1≤n≤n() m=N() ∞ X  ≤ hem, Σemi + sup hem, Σnemi ≤ . 1≤n≤n() 2 m=N()

Combination of (4.34), (4.35), and (4.36) yields that the expression on the left in (4.33) is smaller than  provided that N ≥ N(). Consequently, there exists a random d element G ∈ H, such that Tn → G as n → ∞ in H. Using the continuous mapping theorem we get, d d hTn, ei → hG, ei ⇒ hG, ei = N(0, he, Σei).

This means that G =d G(0, Σ). n d 1 X To prove (4.32), note that Σˆ → Σ˜ = ((X − µ ) ⊗ (X − µ )). The rest n n n n,i n n,i n i=1 ˜ follows from the same arguments since Σn are iid random elements in LHS.

Corollary 4.2.1. Suppose that in addition to the assumptions in Theorem 4.2.4 we

37 Texas Tech University, Krishna Kaphle, August-2011

have √ n(Σn − Σ) → ∆, as n → ∞, in LHS, (4.37) then we have √ ˆ d n(Σn − Σ) → ∆ + G, as n → ∞, in LHS. (4.38)

Special case : local alternatives. The above result will be applied in a special instance of local alternatives, where the generic random element satisfies

d 2 Xn = X + Rn,X ⊥ Rn, E kRnk → 0, as n → ∞, (4.39)

4 4 with E kXk < ∞, EX = µ, E (X − µ) ⊗ (X − µ) = Σ, E kRnk < ∞, ERn ⊗ Rn =

Tn. In this situation we have,

Σn = E (Xn − µ) ⊗ (Xn − µ) = Σ + Tn. (4.40)

Furthermore, Vn is the covariance operator of (Xn − µ) ⊗ (Xn − µ), and for V we choose the covariance operator of (X − µ) ⊗ (X − µ). 2 Since E kRnk = tr(Tn) = tr (Σn − Σ), the first condition in (4.30) is fulfilled. To

verify the second, let e1, e2,... be the orthogonal basis of eigenvalues of Σ, and note that

hej ⊗ ek, Vnej ⊗ ekiHS (4.41)

= E hej ⊗ ek, {(Xn − µ) ⊗ (Xn − µ) − Σn} ⊗HS

{(Xn − µ) ⊗ (Xn − µ) − Σn} ej ⊗ ekiHS

= Var hej,Xn − µi hXn − µ, eki

= Var hej,X − µi hX − µ, eki + Var hej,Rni hX − µ, eki

+ Var hej,X − µi hRn, eki + Var hej,Rni hRn, eki

≥ Var hej,X − µi hX − µ, eki = hej ⊗ ek, Vej ⊗ ekiHS ,

because the covariances are all zero as follows from the first two assumption in (4.39).

38 Texas Tech University, Krishna Kaphle, August-2011

This inequality means that Vn − V ≥ 0, so that

tr |Vn − V| = tr (Vn − V) (4.42) X X = {Var hej,Rni hX − µ, eki j k

+Var hej,X − µi hRn, eki

+Var hej,Rni hRn, eki} → 0, as n → ∞.

2 To verify (4.42), the first two terms on the right are equal to ΣjE hej,Rni ΣkE hX 2  2 2 −µ, eki = tr(Tn)tr(Σ) → 0. The last term is bounded by ΣjE hej,Rni 2 = (tr(Tn)) → 0, and we are done. We will end this chapter with some discussions about the Karhunen - Loeve ex- pansion of the limit G in Theorems 4.2.3 and 4.2.4. We know that G is zero mean Gaussian random variable in LHS, but the eigenvalues and eigenvectors of V are unknown. Since the sequence {ej ⊗ ek} , j ∈ N, k ∈ N is a basis for LHS, we have

∞ ∞ X X G = hG, ej ⊗ ekiHS ej ⊗ ek. (4.43) j=1 k=1

Note that hG, ej ⊗ ekiHS are zero - mean normal random variables in R. Since the covariance stricture of G is the same as that of (X − µ) ⊗ (X − µ) − Σ, we have

E hej ⊗ ek, GiHS heα ⊗ eβ, GiHS (4.44)

= E hej ⊗ ek, (X − µ) ⊗ (X − µ) − ΣiHS h(X − µ) ⊗ (X − µ) − Σ, eα ⊗ eβiHS 2 2 = E ej, hX − µ, eki (X − µ) − σkek hX − µ, eβi (X − µ) − σβeβ, eα  2  2 = E hej,X − µi hX − µ, eki − σkδj,k hX − µ, eαi hX − µ, eβi − σβδα,β 2 2 = E hX − µ, eji hX − µ, eki hX − µ, eαi hX − µ, eβi − δj,kδα,βσkσβ

= v(j,k),(α.β).

Although the random variables hX − µ, e1i , hX − µ, e2i ,... are uncorrelated, they are not in general independent and the above numbers can not be further specified. If we assume X =d G(µ, Σ), the uncorrelatedness does entail independence and

39 Texas Tech University, Krishna Kaphle, August-2011

(4.44) simplifies to  0 (j, k) 6= (α, β)   4 v(j,k),(α,β) = 2σ (j, k) = (α, β), j = k (4.45) 2  j vj,k =  2 2  σj σk (j, k) = (α, β), j 6= k

In other words, in this case the operator V turns out to be diagonal in the basis ej ⊗ ek. Thus, if X is Gaussian (µ, Σ), the random variable G has the Karhunen - Loeve expansion ∞ ∞ X X G = vj,kZj,kej ⊗ ek, (4.46) j=1 k=1

where vj,k are given by (4.45), and the Zj,k, j ∈ N, k ∈ N are independent standard normal random variables.

40 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 5 FUNCTIONS OF COVARIANCE OPERATORS AND DELTA METHOD

In multivariate statistics, quite often the inverse and the square root of the inverse of a sample and the population covariance operator play a role. The same will be true in the case of functional statistics. Since Σ is assumed to be one to one, its inverse Σ−1 exists. One way to obtain this inverse is by using the spectral representation of Σ.

If Σ has one dimensional eigenspaces ek ⊗ ek corresponding to each of its eigenvalues 2 σk, k ∈ N, it is easy to see that

∞ X 1 Σ−1 = e ⊗ e (5.1) σ2 k k k=1 k

has all the property to be an inverse of Σ. It should be noted, however, that (a) Σ−1 is not defined on all of H, (b) Σ−1 is not bounded, where defined. P∞ −1 To see (a), choose x = k=1 σkek. Since Σ is of finite trace, x ∈ H. But, Σ x = 1 P∞ 1 2 2 ek, and because ( ) = ∞ we see that x∈ / . Also, since (σk) ↓ 0, as k ↑ ∞, σk k=1 σk H Σ−1 is unbounded. The situation with Σˆ is even worse, because Σˆ can never be one to one. The definition of Σˆ entails that its range is finite dimensional and contained in the linear  ¯ ¯ ¯ Pn ¯ span of X1 − X,X2 − X,...,Xn − X . Since i=1(Xi − X) = 0, the dimension of this linear span is at most n − 1. The above situations enforce the thought about some kind of generalization of the inverse of the covariance operator.

Definition 5.0.3. Let Σ be a covariance operator. A regularized inverse of Tikhnov type of Σ is defined by

−1 −1 Σ = (I + Σ) , for some  > 0, (5.2)

where  is called the regularization parameter.

41 Texas Tech University, Krishna Kaphle, August-2011

If Σ−1 has the spectral representation as in (5.1), we have

∞ X 1 Σ−1 = e ⊗ e . (5.3)   + σ2 k k k=1 k

The unboundedness of the inverse is taken care by the regularization process, and we have the following. 1 Proposition 5.0.1. We have kΣ−1k = .  L 

−1 1 2 Proof. Since Σ ek = 2 ek, for each k and σk ↓ 0 , we have  + σk

1 −1 −1 1 1 ≥ Σ L ≥ Σ ek = 2 → .   + σk 

1 Hence kΣ−1k = .  L  The following theorem justifies the name regularized inverse.

Theorem 5.0.5. For any x ∈ H we have

−1 (I + Σ) Σx − x → 0, as  ↓ 0. (5.4)

Proof.

2 ∞ 2 2 −1 2 X σk − ( + σk) (I + Σ) Σx − x = hx, eki ek  + σ2 k=1 k ∞ 2 X 2  = hx, e i . k ( + σ2)2 k=1 k

Introduce the functions f : N → [0, ∞), given by

2 2  f(k) = hx, eki 2 2 ,  > 0. (5.5) ( + σk)

2 Then 0 ≤ f(k) ≤ hx, eki = g(k), k ∈ N. If νN is the counting measure on N we have, R P∞ 2 gdν = hx, eki < ∞, so that the f are dominated by an integrable function. N N k=1

42 Texas Tech University, Krishna Kaphle, August-2011

Also f(k) → 0 as  ↓ 0, for each k ∈ N. Therefore the dominated convergence theorem applies and we get Z Z −1 2 lim (I + Σ) Σx − x = lim fdν = 0.dν = 0, ↓0 ↓0 N N N N hence the result.

Apart from the regularized inverse, other functions of Σ play a role in statistical analysis. In the next subsection we will focus on such functions of bounded linear operators in general.

5.1 Functions of bounded linear operators We begin this section with some definitions. Let T ∈ L.

Definition 5.1.1. The resolvent set ρ(T ) of T is defined by

 −1 ρ(T ) = z ∈ C :(zI − T ) is one to one and (zI − T ) ∈ L . (5.6)

The complement of ρ(T ) is called the spectrum σ(T ). That is,

σ(T ) = {ρ(T )}c . (5.7)

Definition 5.1.2. The resolvent of T is the bounded function R(z), z ∈ ρ(T ), defined by R(z) = (zI − T )−1 (5.8)

Since L is a complete, normed, linear space, we can see that

∞ −1 X k (I − S) = S ,S ∈ L : kSkL < 1. (5.9) k=0

This generalizes to

∞ X (S − U)−1 = S−1(I − S−1U)−1 = S−1 (SU)k, (5.10) k=0

43 Texas Tech University, Krishna Kaphle, August-2011

−1 −1 provided that S, U, S ∈ L and kS UkL < 1. With the help of these results we can see that ρ(T ) is an open subset of C. Let z0 ∈ ρ(T ) and z such that |z − z0| kR(z0)kL <

1. We have zI − T = z0I − T − (z0 − z)I, and (5.10) applies with S = z0I − T and

U = (z0 − z)I and yields that

∞ −1 X k k+1 R(z) = (zI − T ) = (z0 − z) R (z0), (5.11) k=0

exists and is continuous. Thus, ρ(T ) is open, and hence σ(T ) is closed. Also because of the power series expansion, R(z) is analytic. Moreover, σ(T ) is bounded and

σ(T ) ⊂ {z ∈ C : |z| ≤ kT kL} . (5.12)

Henceforth let Ω ⊃ σ(T ) be an open region in C with smooth boundary Γ = ∂Ω. ¯ ¯ Also let D ⊃ Ω be an open neighborhood of Ω and 0 < δΓ ≤ dist(Γ, σ(T )). We will now give a very general definition of an analytic function of bounded operators.

Definition 5.1.3. If φ : D → C be an analytic function we define 1 I φ(T ) = φ(z)R(z)dz. (5.13) 2πi Γ

The above integral is considered as a limit of Riemann sums and this definition is well defined since L is a complete linear space. The following special cases are interesting to justify our expectations on the above definition. (a) For any T ∈ L, if φ(z) = z we have

1 I φ(T ) = zR(z)dz = T. (5.14) 2πi

T To see this let Γ be boundary of {z ∈ C : |z| > kT kL}. Since z L < 1 we have, ∞ X zR(z) = z−kT k. Hence k=0

∞ 1 I 1 I X zR(z)dz = z−kT kdz = T. 2πi 2πi k=0

44 Texas Tech University, Krishna Kaphle, August-2011

∞ X (b) If T is positive and compact with spectral expansion T = τkEk, then it can k=1 be directly verified that,

∞ X 1 R(z) = Ek, (5.15) z − τk k=1 for each z ∈ ρ(T ). Therefore, for any φ we have

∞ ∞ 1 I 1 I X 1 X φ(T ) = φ(z)R(z)dz = φ(z) Ekdz = φ(τk)Ek. (5.16) 2πi 2πi z − τk Γ Γ k=1 k=1

In compact cases the last expression in (5.16) will be often used as a definition of φ(T ). The collection of all bounded linear operators on H and the collection of all analytic functions on D are both algebras [40]. The mapping φ 7→ φ(T ) establishes an algebra homeomorphism and we have

φ(T )ψ(T ) = (φψ)(T ), (5.17) for any φ, ψ analytic on D. Using this homeomorphism, we see that I n 1 n T = z R(z)dz, n ∈ N. (5.18) 2πi Γ

In particular, 1 I I = R(z)dz. (5.19) 2πi ¯ c Furthermore for any z0 ∈ Ω we have (see [1]),

1 I 1 R(z0) = R(z)dz. (5.20) 2πi Γ z0 − z

5.1.1 Fr´echet derivative Let T ∈ L. For an arbitrary Π ∈ L the operator T˜ = T + Π may be considered a perturbation of T. Let Mφ = max |φ(z)| < ∞, and LΓ = length of Γ < ∞. Let us z∈Γ

45 Texas Tech University, Krishna Kaphle, August-2011

˙ define φT : L → L by I ˙ 1 φT Π = φ(z)R(z)ΠR(z)dz, (5.21) 2πi Γ

and I 1 2 −1 ρT Π = φ(z)R(z)(ΠR(z)) (I − ΠR(z)) dz. (5.22) 2πi Γ The above operators are well defined for sufficiently small Π ∈ L and will play an important role in our future analysis. Using the fact that, there exists a constant 0 < K < ∞, such that kR(z)k ≤ K , ∀z ∈ Ωc, (see [8]) we will prove some properties L δΓ of these operators.

˙ Proposition 5.1.1. φT Π is linear and bounded. ˙ Proof. The operator φT Π is linear using the linearity of integration. To see it is bounded, note that

1 I 1 K 2 φ˙ Π ≤ M kR(z)k2 kΠk dz ≤ M L kΠk . (5.23) T φ L L φ Γ L L 2π Γ 2π δΓ

Although the operator ρT Π is not linear, we can prove it is bounded.

Proposition 5.1.2. ρT Π is bounded.

Proof. We have  3 1 K 2 kρT ΠkL ≤ MφLΓ kΠkL , (5.24) 2(1 − c)π δΓ

δΓ provided that kΠkL ≤ c K , for some 0 < c < 1. We will end this section with the following important theorem ( see [7]) and some remarks.

Theorem 5.1.1. Let T ∈ L and φ analytic on the domain D as defined before. Then n o ˜ cδΓ φ maps the neighborhood T = T + Π : Π ∈ L, kΠkL ≤ K into L, when defined in

46 Texas Tech University, Krishna Kaphle, August-2011

the usual way of functional calculus. This mapping is Fre´chet differentiable at T, ˙ tangentially to L, with bounded derivative φT , as defined before, and we have

˙ φ(T + Π) = φ(T ) + φT Π + ρT Π. (5.25)

Remark 5.1. If we assume T and Π commute, the Fre´chet derivative reduces to the numerical derivative in the functional sense. That is,

 I   I  ˙ 1 2 1 0 0 φT Π = φ(z)R (z)dz Π = φ (z)R(z)dz Π = φ (T )Π. (5.26) 2πi Γ 2πi Γ

To see this result, note that for z, w ∈ ρ(T ) we have the resolvent identity

R(z) − R(w) = (z − w)R(z)R(w). (5.27)

Thus, R(z) − R(w) R0(z) = lim = R2(z), (5.28) w→z z − w and the result follows by using integration by parts.

Remark 5.2. In the situation of commuting operators we have the following Taylor series expansion ∞ X φn(T ) φ(T + Π) = Πn. (5.29) n! n=0

Remark 5.3. If T is compact and positive Hermitian with eigenvalues τ1 > τ2 ... ↓ 0,

and corresponding eigenprojections E1,E2,... we get

∞ ˙ X 0 X X φ(τk) − φ(τj) φT Π = φ (τj)EjΠEj + EjΠEk, Π ∈ L. (5.30) τk − τj j=1 j6=k

Furthermore, if the operators commute, the double sum reduces to zero and then we get, ∞ ˙ X 0 φT Π = φ (τj)EjΠ. (5.31) j=1 ˙ Examples: (a) If φ(z) = z, we immediately see that φT Π = Π. −p (b) Let φ(z) = (α + z) , p > 0, α > δΓ > 0, z 6= −α. Note that the choice of α

47 Texas Tech University, Krishna Kaphle, August-2011

ensures that the pole at z = −α remains outside the contour Γ. In this situation we get,

∞ X 1 φ˙ Π = − p E ΠE (5.32) T (α + τ )p+1 j j j=1 k p p XX (α + τj) − φ(α + τk) + p p EjΠEk. j6=k (τk − τj)(α + τj) (α + τk)

5.1.2 Delta Method We know from multivariate statistics that if a sequence of matrices converges in distribution, then the sequence of a smooth function of the matrices converges in distribution whose limiting distribution depends on some kind of derivative of the function evaluated at the limit [34]. This is a special case of well known delta method (see also [7] and [16]). We will use the Fr´echet derivative discussed above to derive the delta method for a function of operators. Since eigenvalues, eigenvectors and eigenprojections are some functions of operators. This method will be extensively used in this research. Let (Ω, F, P) ba a probability space and B be the σ− field of Borel sets in L. Let ˆ for each n ∈ N we have random operators Tn ∈ LHS, and let there exist a T ∈ LHS such that ˆ d an(Tn − T ) = G, as n → ∞, in LHS, (5.33) √ where an are numbers such that an > 0 and an → ∞. Usually, we will have an = n because of the central limit theorem.

Since the mapping S 7→ S from LHS into L is continuous, the above result entails

ˆ d an(Tn − T ) = G, as n → ∞, in L. (5.34)

ˆ ˆ If Πn = (Tn − T ), with the help of the continuous mapping theorem we get, √ √ ˆ ˆ d nΠn = n (Tn − T ) = kGkHS , as n → ∞. (5.35) HS HS

This entails ˆ 1 Πn = O(√ ). (5.36) HS n

48 Texas Tech University, Krishna Kaphle, August-2011

From this result we can see that,   ˆ 1 P(Ωn) = P ω ∈ Ω: Πn,ω ≤ → 1, as n → ∞. (5.37) HS n1/3

Because of (4.25) the Hilbert- Schmidt norm can be replaced by operator norm with-

out changing the above results (see also [17]). With the help of the set (Ωn) we immediately get the following delta method for functions of operators.

Theorem 5.1.2. Let φ : D → C be analytic on the domain D as defined before, then we have √ ˆ d ˙ n(φ(Tn) − φ(T )) = φT G, as n → ∞, in L, (5.38) ˙ where φT G is given by (5.30).

Proof. Let Ωn be the set as defined in (5.37), and let n be sufficiently large to ensure that 1 δT 1 ≤ c , n 3 K

where K is as in Section 5.1. For such n we have, with Ωn as defined before, √ n o √ n o √ n o ˆ ˆ ˆ ˆ c n φ(Σn) − φ(Σ) = n φ(Σn) − φ(Σ) 1Ωn + n φ(Σn) − φ(Σ) 1Ωn √ n ˙ ˆ ˆ o = n φΣΠn + ρΣΠn 1Ωn + op(1).

Since ρΣ is bounded, using (5.37), (5.24) we have

 1  ρ Πˆ 1 = O , Σ n Ωn p 2 L n 3

√ n ˆ o √ ˙ ˆ so that n φ(Σn) − φ(Σ) has the same limiting distribution as nφΣΠn. Because ˙ φΣ : L → L is continuous we find that √ √ ˙ ˆ ˙ ˆ ˙ nφΣΠn = φΣ nΠn = φΣG,

as n → ∞, and we are done. Remark 5.4. If T = Σ is the covariance operator of a Gaussian random variable ˆ X with only simple eigenvalues and Σn the sample covariance operator, using the

49 Texas Tech University, Krishna Kaphle, August-2011

˙ Karhunen - Loeve expansion of G as in (4.46), we see that the first term of φΣG is ! ˙ X 0 2 X X φΣG = φ (σj )(ej ⊗ ej) vα,βZα,βeα ⊗ eβ (ej ⊗ ej) j α β X 0 2 X = φ (σj )(ej ⊗ ej) vα,jZα,j(ej ⊗ ej) j α X 0 2 = φ (σj )vj,jZj,j(ej ⊗ ej), j and similarly it can be seen that the second term is

2 2 XX φ(σk) − φ(σj ) = v Z (e ⊗ e ) σ2 − σ2 j,k j,k j j j6=k k j

50 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 6 PERTURBATION OF EIGENVALUES AND EIGENVECTORS

6.1 Perturbation theory for operators We will discuss results of convergence of eigenvalues, eigenvectors, and eigenpro- jections of a small perturbation T˜ = T + Π, where both T and Π are bounded and Hermitian. Although our aim is to apply these results for covariance operators, we will discuss the theory for more general operators. We will assume that T ∈ LH

has an isolated simple eigenvalue λ1, whose one-dimensional eigenspace is given by

E1 = e1 ⊗ e1 for some unit vector e1. Let Ω = Ω0 ∪ Ω1 be as defined in Chapter 5, where

Ω0 ⊃ σ(T )\{λ1} , Ω1 ⊃ {λ1} , and dist(Ω0, Ω1) > 0. (6.1)

Let us define an analytic ψ1 : D → C such that,

ψ1(z) = 1, z ∈ Ω1, ψ1(z) = 0 otherwise. (6.2)

We have

T = λ1E1 + T0, (6.3)

where T0 is Hermitian with σ(T0) ⊂ Ω0. Since there exists a resolution of the identity E(λ), λ ∈ σ(T ) with [1] 0 Z T0 = λdE(λ), (6.4) σ(T0) with resolvent Z 1 R0(z) = dE(λ), σ(T0) z − λ and

E1E(λ) = E(λ)E1 = 0. (6.5)

We have Z −1 λ1 1 1 R(z) = (zI − T ) = E1 + dE(λ). (6.6) z z − λ1 σ(T0) z − λ

51 Texas Tech University, Krishna Kaphle, August-2011

Thus we get

1 I 1 I  1 1 ψ1(T ) = R(z)d(z) = − E1d(z) = E1. (6.7) 2πi ∂Ω1 2πi ∂Ω1 z − λ1 z

Let Z 1 Q1 = dE(λ). (6.8) σ(T0) λ1 − λ

Now we are in the position to find the Fr´echet derivative of ψ1 at T .

Theorem 6.1.1. The Fre´chet derivative of ψ1 at T is given by

˙ ψ1,T Π = E1ΠQ1 + Q1ΠE1. (6.9)

Proof. We have I ˙ 1 ψ1,T Π = R(z)ΠR(z)dz, (6.10) 2πi Γ1 where Γ1 = ∂Ω1. Thus we get,

I 2 ˙ 1 λ1 ψ1,T Π = 2 2 E1ΠE1d(z) 2πi Γ1 z (z − λ1) I Z  1 λ1 1 + E1Π dE(λ) d(z) 2πi Γ1 z(z − λ1) σ(T0) z − λ I Z  1 1 λ1 + dE(λ) Π E1d(z) 2πi Γ1 σ(T0) z − λ z(z − λ1) 1 I Z 1  Z 1  + dE(λ) Π dE(µ) d(z) 2πi Γ1 σ(T0) z − λ σ(T0) z − µ

By the Cauchy integral formula the first term is the zero operator [6]. However, note that

52 Texas Tech University, Krishna Kaphle, August-2011

I Z  1 λ1 1 E1Π dE(λ) d(z) 2πi Γ1 z(z − λ1) σ(T0) z − λ Z I  1 λ1 1 = E1Π dz dE(λ) 2πi σ(T0) Γ1 z(z − λ1) (z − λ) 1 Z I  1 1  1 1   = E1Π + − dz dE(λ). 2πi σ(T0) Γ1 λz (λ1 − λ) (z − λ1) (z − λ)

Also, note that λ1 is inside of Γ1 but λ and 0 are outside. Thus we have

1 I 1 1 I 1 I 1 dz = 1, dz = 0, and dz = 0. 2πi Γ z − λ1 2πi Γ1 z − λ Γ1 z

Hence we get the second term

 1 Z 1  = E1Π dE(λ) = E1ΠQ1. (6.11) 2πi σ(T0) λ1 − λ

Similarly the third term equals Q1ΠE1. Since both λ and µ lies outside of the contour 1 the last term is zero because is analytic inside the contour. This proves (z − λ)(z − µ) (6.9).

Remark 6.1. The result of the above theorem holds true for any eigenprojection Ep by replacing λ1 by any isolated and simple λp.

The following result regarding the eigenprojection of a perturbation of T is of importance. For sufficiently small Π ∈ LH , we have the following. ˜ ˜ Theorem 6.1.2. T = T +Π has an isolated simple eigenvalue λ1 with eigenprojection ˜ E1 and ˜ E1 = E1 + E1ΠQ1 + Q1ΠE1. (6.12)

˜ 0 2 2 Proof. We have ψ1(T ) = ψ1(T )+ψ1,T Π+O(kΠkL) = E1+E1ΠQ1+Q1ΠE1+O(kΠkL). ˜ 2 2 ˜ ˜ ˜ Note that (ψ1(T )) = ψ1(T ) = ψ1(T ) (see (5.17)) , so ψ1(T ) is idempotent. Clearly, ˜ it is Hermitian so that it is a projection. Let the projection be E1. For sufficiently small Π, we have ˜ E1 − E1 < 1. (6.13) L

53 Texas Tech University, Krishna Kaphle, August-2011

˜ Hence E1 must also have dimension 1 (see [31]), and the the eigenvalue associated with ˜ it must be simple and E1 =e ˜1 ⊗ e˜1, for some unit vectore ˜1. Also with χ(z) = z, ∀z ∈ ˜ ˜ ˜ ˜ ˜ ˜ C, we see that, (χψ1)(T )˜e1 = T e˜1, and also (ψ1χ)(T )˜e1 = E1T e˜1 = (˜e1 ⊗ e˜1)T e˜1 = D ˜ E ˜ ˜ ˜ ˜ ˜ T e˜1, e˜1 e˜1 = λ1e˜1. Hence T e˜1 = λ1e˜1, and λ1 is an eigenvalue of T .

The following corollaries show similar results for eigenvectors and eigenvalues of

the small perturbation of operators, assuming that T ∈ LH has a simple and isolated

eigenvalue λ1 and corresponding eigenvalue e1. ˜ Corollary 6.1.1. If e˜1 and λ1 are the eigenvector and eigenvalue corresponding to e1 and λ1, then

2 e˜1 = e1 + Q1Πe1 + O kΠk , (6.14) ˜ 2 λ1 = λ1 + hΠe1, e1i + O kΠk . (6.15)

Proof. We have,

˜ 2 2 (E1 − E1)e1 = E1ΠQ1 + Q1ΠE1 + O(kΠkL) e1 = Q1Πe1 + O(kΠkL) , (6.16)

and that,

˜ e˜1 − e1 = E1e˜1 − E1e1 (6.17) ˜ ˜ = (E1 − E1)e1 + (E1 − E1)(˜e1 − e1) + E1(˜e1 − e1). (6.18)

Note that

˜ ˜ p (E1 − E1)(˜e1 − e1) ≤ E1 − E1 ke˜1 − e1k = O (kΠkL) 2(1 − he˜1, e1i) L L

54 Texas Tech University, Krishna Kaphle, August-2011

But,

2 2 2 1 − he˜1, e1i = he1, e1i + he˜1, e1i he˜1, e˜1i − he˜1, e1i he˜1, e1i − he˜1, e1i he˜1, e1i

= hhe˜1, e1i e˜1 − e1, he˜1, e1i e˜1 − e1i 2 = khe˜1, e1i e˜1 − e1k 2 = k(e ˜1 ⊗ e˜1)e1 − E1e1k 2 2 ˜ ˜ 2  = (E1 − E1)e1 ≤ E1 − E1 = O kΠkL . L

˜ 2  Thus we get, (E1 − E1)(˜e1 − e1) ≤ O kΠkL , and for sufficiently small Π, 2 ≥ L 1 1 + he˜1, e1i ≥ 2 . This implies that,

kE1(˜e1 − e1)k = khe˜1, e1i e1 − e1k = |1 − he˜1, e1i| 2 1 − he˜1, e1i 2  = ≤ o kΠkL |1 + he˜1, e1i|

2  Thus (6.16) and (6.18) show thate ˜1 − e1 = Q1Πe1 + O kΠkL . For the eigenvalue we observe that,

˜ D ˜ E 2  λ1 = T e˜1, e˜1 = h(T + Π)(e1 + Q1Πe1), (e1 + Q1Πe1)i + O kΠkL 2  = hT e1, e1i + hT e1,Q1Πe1i + hTQ1Πe1, e1i + hΠe1, e1i + O kΠkL .

Note that since T and Q1 are Hermitian and commute hT e1,Q1Πe1i = hQ1T e1, Πe1i = hTQ1Πe1, e1i = 0. Thus the above equation gives the required result.

Remark 6.2. If T is a compact operator with spectral representation as in (3.8), the ∞ X 1 above results hold true with Q = E . λ − λ j j=2 1 j

In statistics we may have an estimator Tˆ of T , and consider Πˆ = Tˆ − T a random perturbation of T . This leads to the following which is essentially contained in [8](see also Corollaries 3.3 and 3.4 in [16]). ˆ √ ˆ d Theorem 6.1.3. Let Tn ∈ L, n ∈ N be random operators such that n(Tn −T ) → T , ˆ as n → ∞ for some random operator T ∈ L. Then, for sufficiently large n, Tn has

55 Texas Tech University, Krishna Kaphle, August-2011

ˆ an isolated eigenvalue λ1 and we have, √ ˆ d n(λ1 − λ) → hT e1, e1i , as n → ∞, (6.19)

ˆ and , if E1 =e ˆ1 ⊗ eˆ1 is the corresponding eigenprojection, √ ˆ d n(E1 − E1) → E1T Q1 + Q1T E1, as n → ∞, (6.20) and √ d n(ˆe1 − e1) → Q1T e1, as n → ∞. (6.21)

√ d Proof. Let Π = (Tˆ − T ) then n(Tˆ − T ) → T implies Πˆ = O ( √1 ). Using n n n n p n (6.15) and Slutsky’s theorem we see that, √ ˆ d n(λ1 − λ) = hΠe1, e1i + Op(1) → hT e1, e1i as n → ∞.

Using the same idea as in (6.12) we get the other results.

6.2 Perturbation theory for matrices Although there are several different approaches (see [2], [28], [4]) for The limiting distributions of random perturbation of matrices in finite dimension, we can use similar technique as the one we use for infinite dimensional operators. Let M be ˜ symmetric matrix and M = M + Π, where Π is a symmetric matrix. If µ1 > µ2 >

. . . > µm be the m eigenvalues of M with eigenprojections P1,P2,...Pm, we have

m m X X M = = µjPj, where, PjPk = 0 for j 6= k, and Pj = I. (6.22) j=1 j=1

˜ Ifµ ˜1 > µ˜2 > . . . > µ˜m are the eigenvalues of M with corresponding eigenprojections ˜ ˜ ˜ P1, P2,..., Pm, we have following result for eigenprojections.

Proposition 6.2.1. For sufficiently small Π,

˜ 2 Pj = Pj + PjΠQj + QjΠPj + o(kΠk ), (6.23)

56 Texas Tech University, Krishna Kaphle, August-2011

X 1 where, Qj = Pk. Moreover Pj is an orthogonal projection. µj − µk k6=j For eigenvectors we have following result.

∗ Proposition 6.2.2. If µj is simple, and hence Pj = ppj for some pj with kpjk = 1, ˜ ∗ ˜ then Pj = p˜p˜j for some unit vector p˜j, then M has a single, simple eigenvalue µ˜j with

eigenvector p˜j satisfying

2 p˜j = pj + QjΠpj + O(kΠk ). (6.24)

Similarly, for eigenvalues we have the following.

Proposition 6.2.3. If µj be simple, then

∗ 2 µ˜j = µj + pj Πpj + O(kΠk ) (6.25)

The proofs of above mentioned proposition are similar to that for infinite dimen- sional case, however, we would like to discuss the statistical application of these m X results. Let Σ be a covariance matrix and Σ = µjPj, and that µ1 be its largest j=1 ˆ simple eigenvalue with corresponding eigenvector p1. Let Σ be its estimator such that √ √ n(Σˆ − Σ) = nΠ →d G, as n → ∞, (6.26)

1 where G is m dimensional random matrix, then kΠk = O (√ ), which is sufficiently p n small for large n.

Theorem 6.2.1. Under the assumption mentioned above, we have

√ ∗ n(ˆµ1 − µ1) → p1Gp1, as n → ∞, in R, (6.27)

√ ∗ m n(ˆp1 − p1) → Q1Gp1, as n → ∞, in R . (6.28) √ √ √ ∗ ˆ ∗ ˆ Proof. We have, n(ˆµ1 − µ1) = n(p1(Σ − Σ)p1) + Op(1) = p1 n(Σ − Σ)p1 + Op(1). Using Slutkshy’s theorem, we have (6.27). The proof of (6.28) follows similarly.

57 Texas Tech University, Krishna Kaphle, August-2011

The limiting distribution in Theorem 6.2.1 are not simple. Let p1, p2, . . . , pm be ∗ m×m orthonormal basis of eigenvectors of Σ. Since pjpk is an orthonormal basis of R , any m × m matrix M has an expansion,

m m X X ∗ ∗ M = tr(M)(pjpk)pjpk j=1 k=1 m m ( m ) X X X ∗ ∗ ∗ = (Mpi) (pjpk)pi pjpk j=1 k=1 i=1 m m ( m ) X X X ∗ ∗ ∗ = pi M δkipj pjpk j=1 k=1 i=1 m m X X ∗ ∗ = (pkMpj)pjpk. j=1 k=1

The zero mean m × m Gaussian matrix G can be written using above expansion as,

m m X X ∗ ∗ G = (pkGpj)pjpk. (6.29) j=1 k=1

The covariance SuperMatrix V of G is nothing but the collection of all covariances

∗ ∗ V(j,k)(a,b) = E(pkGpj)(paGpb)

Since covariances are inherited from the underlying random elements X. Let µ be the mean of X. We have,

∗ ∗ ∗ ∗ ∗ ∗ V(j,k)(a,b) = E (pk(X − µ)(X − µ) pj − pkΣpj)(pb (X − µ)(X − µ) pa − pb Σpa) ∗ ∗ ∗ ∗ = Epk(X − µ)pj (x − µ)pb (X − µ)pa(x − µ) − δjkδabµjµa.

If we introduce the random variables

∗ uj = pj (X − µ), then the uj are mean zero and Var(uj) = µj, and Cov(uj, uk) = 0 for j 6= k. Since this does not in general entails independence the expectation Eukujubua will not in

58 Texas Tech University, Krishna Kaphle, August-2011

d general simplify. But if we assume X = Nm(µ, Σ), we get,

√ uj = µjZj, j = 1, 2, . . . , m,

∗ where the Zj are independent standard normal. In this case pjpk are eigen matrices of V and it is diagonal in this matrix basis. That is,

V(j,k),(a,b) = 0 if (j, k) 6= (a, b). (6.30)

The diagonal elements are given by,

4 2 2 4 2 2 V(j,j),(j,j) = Euj − µj = µj EZ − µj = 2µj , and (6.31) 2 2 V(j,k),(j,k) = Euj Euk = µjµk, j 6= k

That is

V(j,k),(j,k) = (δjk + 1)µjµk (6.32)

Thus for normal data the asymptotic distribution of the largest eigenvalue of covari- ance matrix can be described by

√ d 2 n(ˆµ1 − µ1) → N(0, 2µ1). (6.33)

For the eigenvalues we observe that

X X ∗ Q1Gp1 = (pkGpj)δk1Q1pj j k X ∗ = p1GpjQ1pj j m m ! X ∗ X 1 ∗ = (p1Gpj) pkpk pj µ1 − µk j=2 k=2 m X 1 = p∗Gp , µ − µ 1 j j=2 1 j

59 Texas Tech University, Krishna Kaphle, August-2011

∗ where the p1Gpj are zero mean normal with variance µ1µj and covariance zero. Thus we have

m √ √ d X µ1µj n(ˆp − p ) → Z p , (6.34) 1 1 µ − µ j j j=2 1 j where the Zj are iid N(0, 1).

60 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 7 TESTING EQUALITY OF COVARIANCE OPERATORS

7.1 Finite dimensional case Testing equality of covariance matrices has always been an interesting topic in mul- tivariate statistics. In this chapter we will discuss some results for finite dimensional cases and its extension to infinite dimension. The null hypothesis considered in this chapter is

H0 :Σ1 = Σ2 = Σ,

and this null hypothesis will be tested against local alternatives of the type

−1/2 Hγ :Σ1 = Σ + γn T, Σ2 = Σ, γ > 0.

For finite dimensional data, let Xi1,Xi2,...,XiNi be independent samples from a normal Nm(µi, Σi) population (i = 1, 2). The test statistic for testing the null

hypothesis H0 against any alternative which says that H0 is not true can be derived using the Likelihood ratio principle [28]. Let

N 1 Xi X¯ = X i N ij i j=1

Ni X ¯ ¯ ∗ Ai = (Xij − Xi)(Xij − Xi) j=1

A = A1 + A2

N = N1 + N2.

Then we have the following theorem [28].

Theorem 7.1.1. For suitably chosen cα, the test that rejects the null hypothesis for

61 Texas Tech University, Krishna Kaphle, August-2011

Λ ≤ cα is of size α, where

2 Q (detA )Ni/2 N mN/2 Λ = i=1 i . . (7.1) (detA)N/2 Q2 mNi/2 i=1 Ni

2 1 Under the null hypothesis −2 log Λ has a χf distribution with f = 2 m(m+1)(r−1). For infinite dimensional data the determinant of the covariance operator is not easy to compute. It might be possible to generalize the above mentioned test statistic by using −ζ0 (0) −s det(Σ) = e Σ , where ζΣ(s) = trΣ . However, we will follow another approach. We will project data into a direction, compute the corresponding one dimensional Likelihood ratio statistic for these projected data, and then take the maximum of these statistics over all directions. That is, we use Roy’s union-intersection principle ([32]) in conjunction with univariate likelihood ratio statistics. Since the sample covariance operator is never one to one, to find a suitable inverse of this operator some regularization is essential as mentioned before. Now we will discuss the details of the testing procedure.

7.2 Infinite dimensional case 7.2.1 Test statistic under null hypothesis 4 Let, Xj1,Xj2,...,Xjn(j) be iid elements in H with E kXjik < ∞, and with mean

µj and covariance operator Σj, for j = 1, 2. Let

n(j) n(j) 1 X 1 X X¯ = X , Σˆ = (X − X¯ ) ⊗ (X − X¯ ), (7.2) j n(j) ji j n(j) ji j ji j i=1 i=1

n(j) and limn→∞ n = λj ∈ (0, 1). We want to test the null hypothesis

H0 :Σ1 = Σ2 = Σ > 0. (7.3)

In Ji and Ruymgaart (2008) the test statistic is derived by putting some restriction

on the null hypothesis. To test H0,C :Σ1 = Σ2 = Σ > 0, kΣk ≤ C ∈ (0, ∞), assuming that the largest eiegnvalue of Σ is simple the following results is derived.

Theorem 7.2.1. An asymptotic size- α test for testing the null hypothesis H0,C :

62 Texas Tech University, Krishna Kaphle, August-2011

Σ1 = Σ2 = Σ > 0, kΣk ≤ C ∈ (0, ∞) is obtained by rejecting the null when

ˆ C R − √ C+ −1 T,C = n > Φ (1 − α), 0 < α < ∞, (7.4) vˆ

ˆ ˆ −1/2 ˆ ˆ −1/2 2 where Φ is the standard normal cdf, R = (I + Σ2) Σ1(I + Σ2) , and vˆ is any consistent estimator (an estimator that converges in probability to the true value) of 2 E hRe1, e1i with R being a sum of products of functions of the covariance operators

of the samples, and e1 the eigenvector corresponding to the largest eigenvalue of Σ.

We will remove the unnecessary restriction on the norm of Σ and derive a relatively simple test statistic.

Proposition 7.2.1. Given  > 0, a test statistic for testing H0 is given by

ˆ ˆ ˆ T = largest eigenvalue of R = l.e.v.R = R , (7.5) L

where ˆ ˆ −1/2 ˆ ˆ −1/2 R = (I + Σ2) (I + Σ1)(I + Σ2) . (7.6)

Proof. Let us assume the data be Gaussian so that the projected data hf, Xjii are iid, 2 one-dimensional normal with mean µj(f) = hf, µji , and variance sj = hf, Σjfi j = 2 2 1, 2. Thus the likelihood ratio test statistic for testing H0(f): s1 = s2 is given by [5]

D ˆ E sˆ2 f, Σ1f T (f) = 1 = . (7.7) sˆ2 D ˆ E 2 f, Σ2f

Since H0 = ∩f6=0H0(f), using Roy’s union intersection principle, a test statistic should ˆ be of the form sup T (f). Because Σ2 has finite dimensional range, the supremum is f6=0 infinite and hence we have a degeneracy. Therefore we regularize the operators in the numerator and denominator by adding a fraction of the identity operator. That is, the test statistic will be D ˆ E f, (I + Σ1)f T = max D E. (7.8) f6=0 ˆ f, (I + Σ2)f

63 Texas Tech University, Krishna Kaphle, August-2011

hf, (I + Σ )fi 1 ˆ −1/2 ˆ ˆ −1/2 But, maxf6=0 = l.e.v. (I + Σ2) (I + Σ1)(I + Σ2) = l.e.v. hf, (I + Σ2)fi ˆ ˆ R = R . L ˆ ˆ −1/2 ˆ ˆ −1/2 The operator R = (I +Σ2) (I +Σ1)(I +Σ2) is no more Hilbert - Schmidt since the Identity operator is not Hilbert-Schmidt. But, it is important to note that its population analogue is

−1/2 −1/2 R = (I + Σ2) (I + Σ1)(I + Σ2) , (7.9)

and under H0 this is simply I which is an entirely known operator.

Using Theorem 4.2.3 and the independence of the samples we see that under H0 √ ˆ d 1 n(Σ1 − Σ) → √ D1, as n → ∞, (7.10) λ1 and √ ˆ d 1 n(Σ2 − Σ) → √ D2, as n → ∞, (7.11) λ2 where D1⊥D2,. The above convergence are in LHS, but the continuous mapping theorem entails convergence in L as well. 1 Let us take the analytic function φ (z) = √ , z ∈ , z 6= . Its derivative is   + z C 1 φ˙ (z) = − . Thus by using (5.30) , we get  2( + z)3/2

∞ 1 X 1 φ˙ D = − E D E (7.12) ,Σ 1 2 ( + σ2)3/2 j 1 j j=1 j 2 1/2 2 1/2 X X ( + σj ) − ( + σk) + E D E . (σ2 − σ2)(( + σ2)1/2(( + σ2)1/2 j 1 k j6=k k j j k ∞ 1 X 1 φ˙ D = − E D E (7.13) ,Σ 2 2 ( + σ2)3/2 j 2 j j=1 j 2 1/2 2 1/2 X X ( + σj ) − ( + σk) + E D E . (σ2 − σ2)(( + σ2)1/2(( + σ2)1/2 j 2 k j6=k k j j k

64 Texas Tech University, Krishna Kaphle, August-2011

Thus the delta method mentioned in Theorem 5.1.2 yields

√ ˆ d 1 ˙ n(φ(Σ1) − φ(Σ1)) → √ φ,ΣD1, as n → ∞, in L, (7.14) λ1 √ ˆ d 1 ˙ n(φ(Σ2) − φ(Σ2)) → √ φ,ΣD2, as n → ∞, in L. (7.15) λ2

Let us now define the following L−valued random elements,

1 1 1 ˙ R1 = √ φ(Σ)D1φ(Σ), and R2 = √ (I + Σ) 2 φ,ΣD2, (7.16) λ1 λ2

∗ and set R = R1 + R2 + R2. Using the fact the the samples are independent, that is ˆ R1⊥R2, we get the following theorem for the asymptotic distribution of R.

Theorem 7.2.2. Under H0 we have √ ˆ d n(R − I) → R, as n → ∞, in L (7.17)

Proof. Using the identity,

AˆBˆCˆ − ABC = (Aˆ − A)BˆCˆ + A(Bˆ − B)Cˆ + AB(Cˆ − C), (7.18)

we get,

ˆ ˆ −1/2 ˆ ˆ −1/2 −1/2 −1/2 (R − I) = (I + Σ2) (I + Σ1)(I + Σ2) − (I + Σ) (I + Σ)(I + Σ)  ˆ   ˆ  ˆ ˆ  ˆ = φ(Σ2) − φ(Σ) I + Σ1 φ(Σ2) + φ(Σ) Σ1 − Σ φ(Σ2)  ˆ  + φ(Σ) (I + Σ) φ(Σ2) − φ(Σ) .

Under H0, using (7.15) and (7.16) we get, √ ˆ d ∗ n(R − I) → R2 + R1 + R2 = R as n → ∞, in L.

ˆ Under the null hypothesis we have Σ1 = Σ2 = Σ, thus we estimate Σ by Σc = ˆ ˆ n(j) ˆ λ1Σ1 +λ2Σ2, where λj = n . Henceforth we usee ˆ1 and respectively E1 for eigenvalue

65 Texas Tech University, Krishna Kaphle, August-2011

ˆ and the eigenprojection associated with the largest eigenvalue of Σc. The following results follows immediately: √ ˆ  d p p n Σc − Σ → λ1D1 + λ2D2 = Dc, as n → ∞, both in L and in LHS, (7.19) √  ˆ  d n E1 − E1 → E1DcQ1 + Q1DcE1. (7.20)

Using (7.17) we get the following theorem.

Theorem 7.2.3. Under H0 and assuming the largest eigenvalue of Σ is simple, we have √  ˆ ˆ ˆ  d n E1RE1 − E1 → E1DcQ1 + Q1DcE1 + E1RE1 (7.21)

Proof. Using (7.18) and under H0 we get, √  ˆ ˆ ˆ  n E1RE1 − E1 = (7.22) √  ˆ ˆ ˆ ˆ ˆ ˆ  n (E1 − E1)RE1 + E1(R − I)E1 + E1R(E1 − E1)

ˆ p ˆ p Using (7.17) and (7.20) and noting that E1 → E1, and R → I we get, √  ˆ ˆ ˆ  d n E1RE1 − E1 → (E1DcQ1 + Q1DcE1)IE1 + E1RE1

+ E1I(E1DcQ1 + Q1DcE1)

= E1DcQ1 + Q1DcE1 + E1RE1.

We have used the fact that E1E1 = E1 and E1Q1 = Q1E1 = 0.

Remark 7.1. It should be noted that Dc and R can be entirely expressed in terms of D1 and D2. Since they are mutually independent the distribution of the random operator on the right of (7.21) is well defined.

Since E1 has an isolated, simple eigenvalue 1, using (6.20) we see that √  ˆ ˆ ˆ  d n E1RE1 − 1 → h(E1DcQ1 + Q1DcE1 + E1RE1)e1, e1i L

= hRe1, e1i .

66 Texas Tech University, Krishna Kaphle, August-2011

ˆ ˆ ˆ ˆ ˆ ˆ The fact that E1RE1 equals the largest eigenvalue of E1RE1 is used. Now we L have one of the main results of this section whose proof follows immediately from Theorem 6.1.3.

Theorem 7.2.4. Under H0 and under the assumption that the largest eigenvalue of Σ is simple, we have

√   ˆ ˆ ˆ d 2 n E1RE1 − 1 → N(0, v ), as n → ∞, (7.23) L where 2 2 v = E hRe1, e1i . (7.24)

2 Remark 7.2. Note that v contains unknown parameters so that we need a consistent 2 2 estimator vˆ of v which will be found at the end of this section. ˆ ˆ ˆ Remark 7.3. The statistic E1RE1 has a limiting distribution similar to that of L ˆ R , which was used by Ji and Ruymgaart(2008). A big advantage of the new test L statistic is that it can be standardized by the constant 1.

ˆ Remark 7.4. It should be noted that the modification of the statistic R could be done using any known bounded operator A and we would get the following asymptotics under

H0, √   ∗ ˆ ∗ d ∗ n A RA − A A → A RA, as n → ∞, in L. (7.25)

But, this limiting distribution involves more than just one eigenvector of Σ.

7.2.2 Estimation of variance 2 In order to find a consistent estimator of v we first note that since R1 is indepen- dent of R2

2 ∗ 2 E hRe1, e1i = E (hR1e1, e1i + hR2e1, e1i + hR2e1, e1i) (7.26) 2 2 = E hR1e1, e1i + 4E hR2e1, e1i . (7.27)

67 Texas Tech University, Krishna Kaphle, August-2011

Using (7.16), (7.15) we see that,

2 1 2 E hR1e1, e1i = 2 2 E hD1e1, e1i , (7.28) λ1( + σ1) 2 4 D 1 E 1 2 ˙ 2 E hR2e1, e1i = E φ,ΣD2e1, (I + Σ) e1 = 2 2 E hD2e1, e1i . (7.29) λ2 λ2( + σ1)

2 The following lemma is important to get a consistent estimator of v , i.e. an estimator 2 converging in probability to the true parameter v . Lemma 7.2.1. If D is the limit in distribution of properly standardized Σˆ (see The- orem 4.2.3) we have,

2 E hg, Dfi hDf, hi = E hX − µ, fi hX − µ, gi hX − µ, hi − hΣf, gi hΣf, hi (7.30) for any f, g, h ∈ H. Proof. Since Df is the limit in distribution of the properly standardized sum of n independent copies of the random element hX − µ, fi (X − µ) − Σf, we have

E(Df) ⊗ (Df) = E(hX − µ, fi (X − µ) − Σf) ⊗ (hX − µ, fi (X − µ) − Σf). (7.31)

Thus, by the definition of mean, we have

E hDf) ⊗ Df, g ⊗ hi =E h(hX − µ, fi (X − µ) − Σf)⊗ (7.32) (hX − µ, fi (X − µ) − Σf), g ⊗ hi .

Simplifying we get,

2 E hg, Dfi hDf, hi = E hX − µ, fi hX − µ, gi hX − µ, hi − hΣf, gi hΣf, hi . (7.33)

Choosing f = g = h = e1 we get the following corollary. Corollary 7.2.1. We have

2 4 4 hDe1, e1i = E hX − µ, e1i − σ1 (7.34)

68 Texas Tech University, Krishna Kaphle, August-2011

d Remark 7.5. If X = G(µ, Σ), then hX − µ, e1i is normal, and hence

2 4 hDe1, e1i = 2σ1. (7.35)

2 Henceforth, letσ ˆ1 ande ˆ1 be the largest eigenvalue and the corresponding eigenvec- ˆ tor of Σc. Thus a consistent estimator of hDe1, e1i is given by

n n o 1 X ¯ 4 4 Xi − X, eˆ1 − σˆ . (7.36) n 1 i=1

Now, a consistent estimators of hDje1, e1i , j = 1, 2 are given by

n(j) n o ˆ 2 1 X ¯ 4 4 ∆ = Xi − X, eˆ1 − σˆ . (7.37) j n(j) 1 i=1

2 2 hence a consistent estimatorv ˆ of v is given by

ˆ 2 ˆ 2 ! 2 1 ∆1 ∆2 vˆ = 2 + (7.38) ( +σ ˆ1) λ1 λ2

With the above mentioned estimator of the variance and using (7.23) we have the most important result of this section.

Theorem 7.2.5. An asymptotic size - α (0 < α < 1) test for testing H0 :Σ1 = Σ2 −1 is obtained by rejecting H0 when T > Φ (1 − α) where,

√ 1  ˆ ˆ ˆ  T = n E1RE1 − 1 , (7.39) vˆ L

and Φ is the standard normal cdf.

69 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 8 GENERALIZED TEST

In this chapter we will generalize the test in two directions: first, a more general test statistic involving the m largest eigenvalues will be considered and, secondly, the asymptotics will also include local alternatives. Throughout this chapter we will assume the following about the population covariance operator:

Σ is strictly positive or equivalently one - to one, (8.1)

all the eigenvalues of Σ are simple , i.e, have multiplicity one. (8.2)

The condition (8.1) can be always met by restricting Σ to the orthogonal com- plement of the null space. Regarding (8.2) it should be noted that the assumption is not uncommon for covariance matrices in theoretical multivariate statistics (An- derson, Murihead, Brenner, Hurowitz). The question of how to estimate unknown eigenvalues and their multiplicities and derive their asymptotics is a problem in its own right([8], [39], [14]), that will not be discussed here (see [14] for further detail). It should be noted that the above conditions are satisfied for large class of operators such as those connected with the Sturm - Liouville problem. One Important example is the covariance operator of Brownian motion. 2 2 As in the previous chapter, let σ1 > σ2,..., ↓ 0 be the simple eigenvalues of Σ and e1, e2,... be the corresponding orthonormal eigenvectors. Hence Σ has the spectral representation as in (3.8), and the cumulative projections are given by

m X 2 Fm = Ek, kFmkHS = m. (8.3) k=1

ˆ In the situation of Corollary (4.2.1) we may consider Σn as a small random pertur- −1 bation of Σ of order Op(n 2 ) . It is well known that in this situation, the convergence ˆ 2 2 of eigenvalues and eigenprojections of Σn can be obtained. Letσ ˆ1 > σˆ2, . . . , > 0 be ˆ ˆ ˆ ˆ the m largest distinct eigenvalues of Σn, and E1, E2,..., Em be the corresponding

70 Texas Tech University, Krishna Kaphle, August-2011

eigenprojections. The empirical counterpart of Fm will be

m ˆ X ˆ Fm = Ek. (8.4) k=1

We define, X 1 Q = E , k = 1, . . . , m. (8.5) k σ2 − σ2 j j6=k k j In order to derive the convergence results for cumulative eigenprojections we will generalize Theorem 6.1.1.

Theorem 8.0.6. Let T ∈ LH has simple isolated eigenvalues λ1, λ2,... and corre-

sponding eigenprojections E1,E2,.... Let Ω = Ω0p ∪ Ωp, where Ω0p ⊃ σ(T )\{λp} , Ωp

⊃ {λp} , and dist (Ω0p, Ωp) > 0. Define an analytic φp such that φp(z) = 1, z ∈ Ωp, Pm and 0 otherwise. If φ = p=1 φp, then for sufficiently small Π ∈ LH we have

m ˙ X φT Π = {EpΠQp + QpΠEp} , (8.6) p=1

R 1 where Qp = dEp(λ), and Ep(λ) is a resolution of identity of T0p = T −λpEp. σ(T0p) λp−λ ˙ Pm ˙ Proof. We have by linearity of the derivative φT Π = p=1 φp,T Π. Application of Remark 6.1 yields the required result. Figure 8.1 shows the contours used for the necessary integrations.

Figure 8.1. Contours used for integration

It is important to note that in the above situation if we take T = Σ, Qk reduces to

(8.5) and φ(Σ) = Fm.

Remark 8.1. Using the independence of contour, the above result holds true if we integrate around a sufficiently smooth contour Γ enclosing the first m eigenvalues (see

71 Texas Tech University, Krishna Kaphle, August-2011

Figure 8.2. Contours enclosing m eigenvalues

Figure 8.2). In case of covariance operators will show the same for m = 2 using a direct calculation. Let φ(z) be 1 inside of Γ and 0 outside, so that

I     ˙ 1 E1 E2 E1 E2 φT Π = + + R Π + + R dz 2πi Γ z − λ1 z − λ2 z − λ1 z − λ2 I   1 E1 E1 E1 = 2 ΠE1 + ΠE2 + ΠR + ... dz 2πi Γ (z − λ1) (z − λ1)(z − λ2) (z − λ1) where R = P∞ Ej . Note that using Cauchy integral formula the first two integrals j=3 z−λj are zero and

I   ∞ ∞ 1 E1 E2 X 1 X 1 ΠR + RΠ dz = E Π E + E ΠE 2πi (z − λ ) (z − λ ) 1 λ − λ j λ − λ j 2 Γ 1 2 j=3 j 1 j=3 j 2  1 1  = E1ΠQ1 + Q2ΠE2 − E1 ΠE2 + E1 ΠE2 = E1ΠQ1 + Q2ΠE2. λ2 − λ1 λ1 − λ2

√ ˆ d If n(Σ − Σ) → T + D, as n → ∞, in LHS (as in Corollary 4.2.1), application of delta method yields the following theorem.

Theorem 8.0.7. Under the assumptions (8.1) and (8.2), we have

√ m  ˆ  d X n Fm − Fm → {Qk(T + D)Ek + Ek(T + D)Qk} , as n → ∞, in LHS. (8.7) k=1

ˆ The following theorem gives the convergence of the dimension of Fm.

Theorem 8.0.8. Under the assumptions (8.1) and (8.2) we have

 2 P kFmkHS = m → 1, as n → ∞. (8.8)

72 Texas Tech University, Krishna Kaphle, August-2011

2 Proof. We know that the functional U 7−→ kUkHS is differentiable with derivative

2 h., V iHS at V ∈ LHS. Applying the delta method in (8.0.7) we get

m √  2  ˆ 2 d X n Fm − kFmkHS →2 h{Qk(T + D)Ek (8.9) HS k=1

+Ek(T + D)Qk} ,FmiHS .

Pm Note that hEk(T + D)Qk,FmiHS = α=1 hEk(T + D)Qkeα,Fmeαi. If α = k, Qkeα =

0, and if α 6= k, Fmeα = eα. Thus hEk(T + D)Qk,FmiHS = 0, and we have  2  2 √ ˆ d ˆ n Fm − m → 0. Since Fm is integer valued, we have the required HS HS result.

8.1 The two-sample case In this section we will prepare for the problem of actual interest: testing equality of the two covariance operators in a two-sample setting. Let the H− valued random elements Xn,j,i, n ∈ N, j = 1, 2, i = 1, 2, . . . n(j) be mutually independent. Suppose that Xn,j,1,...Xn,j,n(j) are iid with the same distribution as the generic element Xn,j, and with

4+δ sup kEXn,jk < ∞, for some δ > 0, (8.10) n∈N

EXn,j = µn,j, E(Xn,j − µn,j) ⊗ (Xn,j − µn,j) = Σn,j. (8.11)

We define,

n(j) n(j) 1 X 1 X X¯ = X , Σˆ = (X − X¯ ) ⊗ (X − X¯ ), (8.12) j n(j) n,j,i j n(j) n,j,i j n,j,i j i=1 i=1

and we assume that,

n(j) → λ ∈ (0, 1), as n → ∞, for j = 1, 2. (8.13) n j

73 Texas Tech University, Krishna Kaphle, August-2011

In this situation we want to test the null hypothesis

H0 :Σ1 = Σ2 = Σ, Σ satisfies (8.1) and (8.2). (8.14)

The local alternatives to be considered will be of the form

− 1 Hγ :Σn,1 = Σ + γn 2 T, Σn,2 = Σ, γ > 0, (8.15) where ∞ ∞ X 2 X 2 T = τk Ek, τk < ∞. (8.16) k=1 k=1 Application of Theorem 4.2.4 and Corollary 4.2.1 yields that

p ˆ  d p n(1) Σ1 − Σ → γ λ1T + D1, in LHS, (8.17) p ˆ  d n(2) Σ2 − Σ → D2, in LHS, (8.18)

d where Dj = Gaussian (0, Vj), and D1⊥D2. In order to estimate Σ we will use the pooled sample covariance operator

n(1) n(2) Σˆ = Σˆ + Σˆ . (8.19) p n 1 n 2 q q √ ˆ  n(1) p n(2) p Note that n Σp − Σ = n n(1) (Σ1 − Σ)+ n n(2) (Σ2 − Σ) . The eigen- ˆ values and corresponding eigenprojections of Σp will be denoted by

2 2 ˆ ˆ σˆp,1, σˆp,2,... ↓ 0, and Ep,1, Ep,2,... (8.20) respectively. Furthermore, we will write

m ˆ X ˆ Fp,m = Ep,k. (8.21) k=1

Under the null hypothesis and alternative hypothesis , i.e, for γ ≥ 0 the following theorem is an immediate extension of Theorem 4.2.4, Theorem 8.0.7, and Theorem 8.0.8.

74 Texas Tech University, Krishna Kaphle, August-2011

Theorem 8.1.1. Under Hγ(γ ≥ 0) we have √ ˆ  d p p n Σp − Σ → λ1γT + λ1D1 + λ2D2 = Dp,γ, as n → ∞, in LHS, (8.22) √ m  ˆ  d X n Fp,m − Fm → {QkDp,γEk + EkDp,γQk} = Fp,γ, as n → ∞, in LHS, k=1 (8.23)  2  ˆ P Fp,m = m → 1, as n → ∞. (8.24) HS

It is well known from [8] that the above theorem entails the asymptotic normality 2 ˆ of the eigenvaluesσ ˆp,k of Σp. This entails in particular the consistency

2 p 2 σˆp,k → σk, n → ∞, (8.25)

holds true.

8.2 Test statistics The application of Roy’s union-intersection principle in conjunction with the like- lihood ratio procedure discussed in Chapter 7 leads to a test statistic derived from

− 1 − 1 ˆ ˆ  ˆ  2  ˆ   ˆ  2 Rn, = R = I + Σ2 I + Σ1 I + Σ2 , (8.26)

where  > 0 is an arbitrary fixed number. The population analogue of this operator is − 1 − 1 Rn, = (I + Σn,2) 2 (I + Σn,1)(I + Σn,2) 2 . (8.27)

Under Hγ, γ ≥ 0 this operator equals

∞ γ X τ 2 R = I + √ k E , (8.28) n, n  + σ2 k k=1 k

and reduces to I under H0. In this chapter we will, more generally, consider a test statistic that consists of the

75 Texas Tech University, Krishna Kaphle, August-2011

ˆ m largest eigenvalues of the operator Rn,. Note that

m  2  2 X γ τ kF R F k = 1 + √ k , (8.29) m n, m HS n  + σ2 k=1 k

so that the above expression involves the first m largest eigenvalues. Under H0 the quantity on the right of the above equation is simply m. Thus the statistic

√  2  ˆ ˆ Tn = n Fp,mRn,Fp,m − m , (8.30) HS

seems to be a reasonable candidate for a test procedure. In order to derive the asymptotic distribution of the test statistic, we will first ˆ ˆ derive the asymptotics of the random operator Rn, = R.

Theorem 8.2.1. Under Hγ (γ ≥ 0) we have √  ˆ  d −1 n R − I →γ (I + Σ) T (8.31) 1  ˙  1 + √ φΣD2 (I + Σ)φ(Σ) + √ φ(Σ)D1φ(Σ) (8.32) λ2 λ1 1  ˙  + √ φ(Σ)(I + Σ) φΣD2 = R,γ. (8.33) λ2

ˆ Proof. Note that, R is a product of three operators and I = φ(Σ)(I +Σ)φ(Σ). Using (7.18) we see that, √ √ √  ˆ  n ˆ o ˆ ˆ ˆ  ˆ n R − I = n φ(Σ2) − φ(Σ) (I + Σ1)φ(Σ2) + φ(Σ) n Σ1 − Σ φ(Σ2) √ n ˆ o + φ(Σ)(I + Σ) n φ(Σ2) − φ(Σ) .

√   √ √     ˆ d 1  ˆ d 1 ˙ Since n Σ1 − Σ → √ D1 + γ λ1T and n φ(Σ2) − φ(Σ) → √ φΣD2, λ1 λ2 and since Σ and T commute and φ(Σ)φ(Σ) = (I + Σ)−1, we get the required result.

ˆ ˆ ˆ Next, we will find the asymptotics for Fp,mRn,Fp,m. Note that under H0 this

operator is FmIFm = Fm. Thus we will use (7.18) once again to get the following theorem.

76 Texas Tech University, Krishna Kaphle, August-2011

Theorem 8.2.2. Under Hγ, γ ≥ 0 we have √  ˆ ˆ ˆ  d n Fp,mRn,Fp,m − FmIFm → Fp,γFm + FmR,γFm + FmFp,γ, in LHs, as n → ∞ (8.34)

Proof. Using (7.18) we have, √ √ √  ˆ ˆ ˆ   ˆ  ˆ ˆ  ˆ  ˆ n Fp,mRn,Fp,m − FmIFm = n Fp,m − Fm Rn,Fp,m + Fm n Rn, − I Fm √  ˆ  + FmI n Fp,m − Fm .

√  ˆ  d √  ˆ  d ˆ p We know that n Rn, − I → R,γ, n Fp,m − Fm → Fp,γ, Fp,m → Fm, and ˆ p Rn, → I. Hence we get the result.

Finally the delta method yields the asymptotics for the the test statistic. We have the following main theorem of this section.

Theorem 8.2.3. Under Hγ, γ ≥ 0, we have

√  2  ˆ ˆ ˆ d Tn = n Fp,mRn,Fp,m − m → HS m X 2   1 1   γτ 2 + √ D + √ D e , e , (8.35)  + σ2 j λ 1 λ 2 j j j=1 j 1 2

as n → ∞, in R.

2 2 2 Proof. We have kFmIFmkHS = kFmkHS = m. The functional k.kHS : LHS → R is

differentiable with the derivative 2 h., FmiHS at Fm. Thus using the delta method we get,

√  2  ˆ ˆ ˆ d n Fp,mRn,Fp,m − m → 2 hFp,γFm + FmR,γFm + FmFp,γ,FmiHS HS m X = 2 h(Fp,γFm + FmR,γFm + FmFp,γ) ej, eji j=1 m m X X = 4 hFp,γej, eji + 2 hR,γej, eji . j=1 j=1

77 Texas Tech University, Krishna Kaphle, August-2011

Pm But, hFp,γej, eji = α=1 h(QαDp,γEα + EαDp,γQα) ej, eji = hDp,γej,Qjeji +

hDp,γQjej, eji = 0, Eαej = 0α 6= j, and Ejej = ej. Next, we observe that

−1  hR,γej, eji = γ (I + Σ) T ej, ej  1 ˙ 1 + √ (φΣD2)(I + Σ)φ(Σ) + √ φ(Σ)D1φ(Σ) λ2 λ1   1 ˙ +√ φ(Σ)(I + Σ)(φΣD2) ej, ej λ2 γτ 2 2 q D E 1 1 j 2 ˙ = 2 + √  + σj (φΣD2)ej, ej) + √ 2 hD1ej, eji .  + σj λ2 λ1  + σj

Since hEαD2Eβej, eji = hD2Eβej,Eαeji = 0, for α 6= β, the above Fr´echet derivative 0 2 1 d reduces to the numerical derivative. Also, since φ (σj ) = − 3 and −D2 = D2 2 2 2(+σj ) we have,

D ˙ E 1 (φΣD2)ej, ej) = hD2ej, eji . 2 3 2( + σj ) 2

Thus, we get the required result by substitution.

The random variable on the right in (8.35) has a normal distribution with mean

m 2 X τj ν = 2γ , (8.36) γ  + σ2 j=1 j

and variance

( m  )2 2 X 2 1 1 v = E √ hD1ej, eji + √ hD2ej, eji . (8.37)  + σ2 λ λ j=1 j 1 2

Note that v2 contains unknown parameters. In order to arrive at the test procedure, we need to find a consistent estimator of this variance. We will do it in the next section.

78 Texas Tech University, Krishna Kaphle, August-2011

8.3 Estimation of the variance To estimate the variance let us introduce the random variables

Tlj = hDlej, eji , l = 1, 2 (8.38)

∗ The covariance matrix of the vector Tl = (Tl1,...Tlm) has elements given by

vl(j, k) = E h((Xl − µl) ⊗ (Xl − µl) − Σ) ej, eji (8.39)

h((Xl − µl) ⊗ (Xl − µl) − Σ) ek, eki 2 2 2 2 = E hXl − µl, eji hXl − µl, eki − σj σk.

Since D1 and D2 are mutually independent, so are T1 and T2. Thus

m m   2 X X 4 1 1 v = 2 2 v1(j, k) + v2(j, k) . (8.40) ( + σ )( + σ ) λ1 λ2 j=1 k=1 j k

2 2 2 In order to estimate v consistently we will use the first m eigenvaluesσ ˆp,1,..., σˆp,m of ˆ 2 2 Σp to estimate σ1, . . . , σm. The corresponding eigenvectorse ˆp,1,..., eˆp,m can be used

to estimate the vl(j, k) and we get

n(l) 1 X 2 2 vˆ (j, k) = X − X¯ , eˆ X − X¯ , eˆ − σˆ2 σˆ2 . (8.41) p,l n(l) l,i l p,j l,i l p,k p,j p,k i=1 Substituting these estimates in (8.40) yields a consistent estimatev ˆ2 for v2 for each γ ≥ 0. Thus the actual test statistic will now be

2 ˆ ˆ ˆ √ Fp,mRn,Fp,m − m S = n HS , (8.42) n vˆ

and we have the following theorem which settles both the asymptotic level and the asymptotic power.

Theorem 8.3.1. Under Hγ, γ ≥ 0 we have

d Sn → N(0, 1). (8.43)

79 Texas Tech University, Krishna Kaphle, August-2011

−1 The test that rejects H0 for Sn > Φ (1 − α), 0 < α < 1, satisfies

m 2 ! γ X τj S > Φ−1(1 − α) → 1 − Φ Φ−1(1 − α) − 2 (8.44) P n vˆ  + σ2 j=1 j

as n → ∞.

ˆ ˆ  Remark 8.2. One could obviously base a test statistic on Σ1 − Σ2 . Note that under H 0 √ ˆ ˆ  d 1 1 n Σ1 − Σ2 → √ D1 + √ D2, as n → ∞, (8.45) λ1 λ2 where the random variables on the right has a zero mean Gaussian distribution on 1 1 LHS with covariance operator 1 + 2. This statistic, however, apparently has a λ1 V λ2 V limiting distribution of a complicated chi-square type. The statistic we have used has a univariate normal distribution.

Remark 8.3. Yet another alternative method to derive a test statistic is to rewrite the operator

Σn,1 − Σn,2 = Σn,1 + I − (Σn,2 + I) (8.46)  −1 = (I + Σn,2) (I + Σn,2) (Σn,1 + I) − I .

Under H0 and the present local alternatives this last expression equals

(Σn,2 + I)(Rn, − I) , (8.47)

 ˆ  and consequently essentially reduces to Rn, − I, with empirical analogue R − I as in (7.17).

8.3.1 The Gaussian case We will perform simulations assuming the observations are Gaussian. In this sub- section, we will discuss the computation of the variance if the populations are Gaus- sian. The estimation of the unknown parameter on the right in (8.40) becomes much simpler. Let us assume d Xl = G(µl, Σ), l = 1, 2. (8.48)

80 Texas Tech University, Krishna Kaphle, August-2011

It is easy to observe that

2 E hej,Xl − µli hXl − µl, eki = δjkσj , (8.49)

so that the hej,Xl − µli are mutually independent for all j and l. This entails that the elements of the operator Vl equal

Vl((j, k), (α, β)) = E hXl − µl, eji hXl − µl, eki hXl − µl, eαi hXl − µl, eβi − (8.50) 2 2 δjkδαβσj σα

Thus we have,  0, (j, k) 6= (α, β)  4 Vl((j, k), (α, β)) = 2σj , j = k = α = β (8.51)   2 2 σj σk, j 6= k, j = α, k = β

In turn , this means that

V1 = V2, diagonal in the basis of all ej ⊗ ek. (8.52)

Note that the vl(j, k) can be simply estimated by  2ˆσ4 , j = k, vˆ2(j, k) = p,j (8.53) 0, j 6= k

Substituting these in (8.40) we get,

m 4 8 X σˆp,j vˆ2 = . (8.54) λ λ ( +σ ˆ2 )2 1 2 j=1 p,j

81 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 9 SOME SIMULATIONS

In this chapter we will discuss some procedures to simulate functional data and evaluate the tests discussed in Chapters 7 and 8. In Chapter 7 the test statistic is √   1 ˆ ˆ ˆ taken to be T = vˆ n E1RE1 − 1 , which contains only the largest eigenvalue of the operator. In Chapter 8, we considered a test statistics with more eigenvalues √  2  included. The generalized test statistic T = 1 n Fˆ Rˆ Fˆ − m consists of m vˆ m  m the first m largest eigenvalues of the operator. In this chapter we will not only analyze the performance of these tests but also show that under fixed and local alternative the inclusion of more eigenvalues improves asymptotic power of the test. We will also analyze the power of the tests and try to get some insight into the regularization parameter . Since the simulation is performed for data from Gaussian distributions, we would like to discuss some known results for Gaussian distribution (see [21], [22], [38], [15], and [3] for detail discussion). As discussed in Chapter 3, two probability measures m1, m2 defined on a measurable space are equivalent (m1 ≈ m2 ) if m1(A) = 0 ⇔ m2(A) = 0 for all Borel sets and they are orthogonal (m1⊥m2) if there exists a measurable set A such that m1(A) = 0 but m2(A) = 1. If m1 ≈ m2 the Radon - Nikodym derivative ρ(x) = dm2 (x) exists. dm1 We know that a Gaussian measure on a Hilbert space can be identified with its mean µ and the covariance operator Σ(see [15], [20]). It is well known that two Gaussian measures in infinite dimensional Hilbert space are either equivalent or orthogonal. The condition for orthogonality and equivalence for two Gaussian measures with the same covariance operator is discussed in Chapter 3. We would like to mention the following criterion for equivalence of two Gaussian measure with mean 0 and arbitrary covariance operators (see [35], [23] for the proof).

Theorem 9.0.2. Let m1 and m2 be two Gaussian measure with mean 0 and covari- ance operators Σ1 and Σ2 on H. If there exists an Hilbert - Schmidt operator T on 1 1 − 2 − 2 H such that Σ2 = Σ1 + Σ1 T Σ1 , then m1 is equivalent to m2. Otherwise they are orthogonal.

2 Let W1 and W2 be two independent standard Wiener processes in L ([0, 1]), that

82 Texas Tech University, Krishna Kaphle, August-2011

is, two independent zero mean Gaussian processes with covariance operator Σ(s, t) = s ∧ t, (s, t) ∈ [0, 1] × [0, 1]. The eigenvalues and the corresponding eigenfunctions of this operator are known to be

1 √ 1 σ2 = , e (t) = 2 sin(k − )πt. (9.1) k 1 2 k ((k − 2 )π) 2

˜ If we take W = cW2, , then (3.15) implies that the probability measures induced by ˜ ˜ 1 W1 and W are orthogonal. On the other hand, if we let W = W1+(cT ) 2 W2 with c > 0 Pp 2 2 2 and T = k=1 τk ek ⊗ ek, where τ1 , τ2 , . . . τp ∈ (0, ∞), and the ek are eigenvalues of Σ, we get two Gaussian processes inducing equivalent measures on L2([0, 1]).

9.1 Test using single eigenvalue In this section we will study a simulation conducted for a very simple model.

Let X11,...,X1,n(1) be independent copies of the Wiener process W1, and let X21, ˜ 1 ...,X2,n(2) be independent copies of W = cW1. Thus, Σ1(s, t) = s∧t and Σ2 = 2 s∧t. Hence, for c = 1 we will be under the null hypothesis, and for c = √1 we will be under 2 the alternative hypothesis. In this situation we have,

n(j) 1 X X¯ (s) = X (s), j = 1, 2, (9.2) j n(j) ji i=1

and n(j) 1 X Σˆ (s, t) = (X (s) − X¯ (s))(X (t) − X¯ (t)), j = 1, 2. (9.3) j n(j) ji j ji j i=1 The functional data are generated using the Wiener stochastic differential equation

dW (t) = W (t),W (0) = 0. (9.4)

The data are generated on m = 200 equidistant steps, and the sample sizes are 1 n(1) = n(2) = 50. The shift parameter is taken to be dt = 200 . Performing 1000 simulations under H0, we find that 3.9 percent of the time the test statistic is larger than Φ−1(0.95), where Φ is standard normal cdf. Thus the claim that the test is size α = 0.05 is justified. Figure 9.1 shows the histogram of those 1000 values of the test

83 Texas Tech University, Krishna Kaphle, August-2011 statistic.

Figure 9.1. Histogram of 1000 test statistic values under Null

The simulation under the alternative hypothesis with Σ = √1 Σ is also performed. 1 2 2 The approximate largest eigenvalue of the pooled covariance operator is found to be σˆ2 = 0.3164 which is close to σ2 = 3 = 0.3040 with an error of |σ2 − σˆ2| = 0.0124. 1 1 π2 √ 1 1 πt Moreover, the estimatee ˆ1(t) compares with e1(t) = 2 sin( 2 ) (see Figure 9.2). We

Figure 9.2.e ˆ1 vs e1 Figure 9.3.v ˆ vs v

84 Texas Tech University, Krishna Kaphle, August-2011

notice that the true variance compares with the estimated variance (see Figure 9.3). The plot is as shown above. Table 9.1 shows various values of the regularization parameter and the test statistic computed with that particular . The null hypothesis is always rejected at the significance level 0.05.

Table 9.1. Test statistic and the regularization parameter

T: computed value of test statistic,  : the regularization parameter.

 0.1 0.2429 0.3857 0.5286 0.6714 0.8143 0.9571 1.1 T 3.06 2.67 2.50 2.40 2.34 2.30 2.69 2.54

9.2 Test using the first m largest eigenvalues In this section we will present results of simulations when the samples are drawn from two Gaussian processes which induce equivalent measures. As before, let W 2 P4 and W1 be two standard Wiener processes in L ([0, 1]), and T = k=1 ek ⊗ ek with

ek the eigenfunctions of the covariance operator of the Wiener process. If we take 1 ˜ 2 ˜ W = W + (cT ) W1, then Theorem 9.0.2 implies the measure induced by W1 and W are equivalent. Note that

4 ∞ 4 1 X X X T 2 W1 = ek ⊗ ek σjZjej = Zkσkek, (9.5) k=1 j=1 k=1

where Zk are iid standard normal random variables. Thus, our data consists of two samples. One from the distribution W and the the other from that of W˜ . Throughout the simulations the sample sizes of the two samples will be taken n equal: n(1) = n(2) = 2 = r. Simulations will be performed for the sample sizes r = 25, 50, 100 , and in each sampling situation the number of iterations used to simulate the distribution of the test statistic will be N = 1000.

9.2.1 The null Hypothesis The null hypothesis is obtained by taking c = 0. The null hypothesis is rejected about 4 percent of the time at the level of 0.05 when the sample size is 50. Figure

85 Texas Tech University, Krishna Kaphle, August-2011

9.4 shows the histograms of the test statistic values when two respectively three eigenvalues are considered.

Figure 9.4. Histogram of test statistic values under null when two and three eigenvalue are taken.

9.2.2 The fixed alternative A fixed alternative hypothesis is obtained by taking c = 1. Figure 9.5 shows the histograms of the test statistic values when two respectively three eigenvalues are considered fixing the sample size to 50. We computed fractions of rejections βˆ when m = 1, 2, 3 eigenvalues are taken respectively (see Table 9.2). This simulation corroborates the conjuncture that the inclusion of more eigenvalues increases the power against this type of fixed alternatives(see Table 9.2 ).

Table 9.2. Power vs inclusion of eigenvalues

m βˆ 1 0.77 2 0.90 3 0.95

86 Texas Tech University, Krishna Kaphle, August-2011

Figure 9.5. Histograms of test statistics under Local alternative when two, and three eigenvalue is taken.

9.2.3 Identification of regularization parameter In order to get some insight into the regularization parameter  , we perform some simulations under the null hypothesis with sample sizes r = 25, 50, 100 for different values of the . The Table 9.3 below shows the result. Figures 9.6, 9.7,9.8 are the

Table 9.3. Fraction of rejections under the null Hypothesis

r 25 50 100  m m m 1 2 3 1 2 3 1 2 3 0.1 0.13 0.14 0.14 0.12 0.11 0.11 0.09 0.09 0.08 0.5 0.08 0.07 0.07 0.07 0.07 0.07 0.07 0.06 0.06 1.0 0.08 0.07 0.07 0.07 0.07 0.07 0.07 0.06 0.06 1.5 0.07 0.07 0.07 0.07 0.07 0.07 0.06 0.06 0.06 2.0 0.07 0.07 0.07 0.05 0.05 0.05 0.04 0.04 0.04 histograms and Figures 9.9, 9.10, and 9.11 are the QQ plots of the the statistic values under different situations. These histograms and QQ plots shows that the distribution of test statistic values are close to the standard normal distribution. But, it should be noted that the larger value of  causes the the test statistic values to be smaller so they are more clustered

87 Texas Tech University, Krishna Kaphle, August-2011

Figure 9.6. Histogram of test statistics under Null when epsilon is 0.5 two, and three eigenvalue is taken.

around zero. This increases the precision of the test under the null but will lose the power of the test under alternatives. Among various choices of  our simulation shows that  being close to 1.0 gives the size α = 0.5 of the test and we get satisfactory power as explained before. This choice of  depends on various factors including the distribution of the population. We believe that some more statistical procedures including bootstrapping will help identifying a good regularization parameter .

9.2.4 Local alternatives √ γ The local alternative situation is achieved by taking c = 1 . With γ = 1, 5, 10, 15, n 4 20, 25, 30, 35, 40, we performed simulations when simple size is 50. The plot of the fraction of rejection versus γ when two eigenvalues are considered is shown in Figure 9.12 and it corroborates that power is always larger than the size α = 0.05 and for sufficiently large γ the test has power one. In order to understand the asymptotic power of the test we take γ = 0, 1, 5, 10 and  = 1.0 and observe the fraction of rejection for different sample sizes: r = 25, 50, 100, and compare it with the true  τ 2  −1 γ Pm j asymptotic power 1 − Φ Φ (1 − α) − 2 j=1 2 . This shows that the fraction vˆ +σj of rejections converges to the true power.

88 Texas Tech University, Krishna Kaphle, August-2011

Figure 9.7. Histogram of test statistics under Null when epsilon is 1.0 when two, and three eigenvalues are taken.

Figure 9.8. Histogram of test statistics under Null when epsilon is 1.5 when two, and three eigenvalues are taken.

89 Texas Tech University, Krishna Kaphle, August-2011

Figure 9.9. QQ plot of the test statistics under Null when epsilon is 0.5 when two, and three eigenvalue are taken.

Figure 9.10. QQ plot of the test statistics under Null when epsilon is 1.0 when two, and three eigenvalue are taken.

90 Texas Tech University, Krishna Kaphle, August-2011

Figure 9.11. QQ plot of the test statistics under Null when epsilon is 1.5 when two, and three eigenvalue is taken.

Figure 9.12. Fraction of rejection vs Gamma for fixed sample size

91 Texas Tech University, Krishna Kaphle, August-2011

Table 9.4. Type II error

α = 0.05 : theoretical level β : theoretical asymptotic power βˆ : fraction of rejections r: sample size, m: number eigenvalues taken

r= 25 r= 50 r= 100 γ β βˆ βˆ βˆ m = 2 m=3 m=2 m=3 m=2 m=3 m=2 m =3 0 0.05 0.05 0.089 0.088 0.077 0.076 0.060 0.059 1 0.107 0.110 0.162 0.163 0.141 0.143 0.141 0.145 5 0.642 0.677 0.562 0.579 0.552 0.575 0.584 0.604 10 0.991 0.994 0.876 0.893 0.882 0.900 0.936 0.9470

92 Texas Tech University, Krishna Kaphle, August-2011

CHAPTER 10 CONCLUSION

A testing procedure for covariance operators infinite dimensional data is discussed. A test is derived using the Likelihood ratio principle in conjunction with with union- intersection principle. The test is generalized using functional calculus and pertur- bation theory of operators. All necessary convergence results are proved together with a generalization of a central limit theorem for Hilbert space valued random vari- ables. Since eigenprojections are identified to be functions of the covariance operator a delta method is found to be very useful. We discussed the Fr´echet derivative of various functions involved in our calculation. Other possibilities of obtaining the test statistic are discussed. One of them lead to a asymptotic distribution of complicated chi-square type where as our test statistics has a limiting normal distribution. The other essentially reduces to the same statistic as we derieved. The simulation study verifies that the test performs nicely even for small sample sizes and that the asymptotic power of the test is better if more eigenvalues are included in the test statistics. The simulation includes the case where the induced probability measures are orthogonal as well as the case where they are equivalent. Also in the later case our test performs satisfactorily. The estimation and testing if the eigenvalues are not simple is a problem in its own right and this problem needs further attention. The identification of the best regularization parameter for inverting operators is similar to the bandwidth problem in kernel density estimation and hence requires further attention. For a date from a Gaussian distribution the covariance operator of sample covari- ance operators is found to have a simplified form. Obviously testing either the data are from a Gaussian distribution or not is an interesting question. There are various tests available to test the normality in multivariate statistics. The same question needs to be analyzed for Hilbert space valued random variables.

93 Texas Tech University, Krishna Kaphle, August-2011

BIBLIOGRAPHY

[1] N. I. Akhiezer and I. M. Glazman. Theory of linear operators in Hilbert space. Vol. II. Translated from the Russian by Merlynd Nestell. Frederick Ungar Pub- lishing Co., New York, 1963.

[2] T. W. Anderson. An introduction to multivariate statistical analysis. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, second edition, 1984.

[3] Charles R. Baker. On equivalence of probability measures. Ann. Probability, 1:690–698, 1973.

[4] Martin Bilodeau and David Brenner. Theory of multivariate statistics. Springer Texts in Statistics. Springer-Verlag, New York, 1999.

[5] George Casella and Roger L. Berger. Statistical inference. The Wadsworth & Brooks/Cole Statistics/Probability Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA, 1990.

[6] John B. Conway. Functions of one complex variable, volume 11 of Graduate Texts in Mathematics. Springer-Verlag, New York, second edition, 1978.

[7] J. Cupidon, R. Eubank, D. Gilliam, and F. Ruymgaart. Some properties of canonical correlations and variates in infinite dimensions. J. Multivariate Anal., 99(6):1083–1104, 2008.

[8] J. Dauxois, A. Pousse, and Y. Romain. Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J. Multivariate Anal., 12(1):136–154, 1982.

[9] E. Brian Davies. Linear operators and their spectra, volume 106 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2007.

[10] Nelson Dunford and Jacob T. Schwartz. Linear operators. Part I. Wiley Classics Library. John Wiley & Sons Inc., New York, 1988. General theory, With the assistance of William G. Bade and Robert G. Bartle, Reprint of the 1958 original, A Wiley-Interscience Publication.

[11] Kai Tai Fang and Yao Ting Zhang. Generalized multivariate analysis. Springer- Verlag, Berlin, 1990.

94 Texas Tech University, Krishna Kaphle, August-2011

[12] Jacob Feldman. Equivalence and perpendicularity of Gaussian processes. Pacific J. Math., 8:699–708, 1958.

[13] Yaroslav Gaek. On a property of normal distribution of any . Czechoslovak Math. J., 8 (83):610–618, 1958.

[14] G. Gains, K. Kaphle, and F. Ruymgaart. Estimation of multiple eigenvalues.

[15] ˘I. ¯I. G¯ıhman and A. V. Skorohod. The theory of stochastic processes. I. Springer-Verlag, New York, 1974. Translated from the Russian by S. Kotz, Die Grundlehren der mathematischen Wissenschaften, Band 210.

[16] D. S. Gilliam, T. Hohage, X. Ji, and F. Ruymgaart. The Fr´echet derivative of an analytic function of a bounded operator with some applications. Int. J. Math. Math. Sci., pages Art. ID 239025, 17, 2009.

[17] D. S. Gilliam, X. Ji, K. Kaphle, and F. Ruymgaart. Fr´echet derivative of func- tions of operators with application to testing the equality of two covariance operators. 2010.

[18] Ulf Grenander. Probabilities on algebraic structures. Second edition. Almqvist & Wiksell, Stockholm, 1968.

[19] Peter Hall and Joel L. Horowitz. Nonparametric methods for inference in the presence of instrumental variables. Ann. Statist., 33(6):2904–2929, 2005.

[20] Svante Janson. Gaussian Hilbert spaces, volume 129 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge, 1997.

[21] Thomas Kailath. On measures equivalent to Wiener measure. Ann. Math. Statist, 38:261–263, 1967.

[22] D. Kannan and Pl. Kannappan. On a characterization of Gaussian measures in a Hilbert Space. Ann. Inst. H. Poincar´eSect. B. (N.S.), 11(4):397–404 (1976), 1975.

[23] Hui Hsiung Kuo. Gaussian measures in Banach spaces. Lecture Notes in Math- ematics, Vol. 463. Springer-Verlag, Berlin, 1975.

[24] R. G. Laha and V. K. Rohatgi. Probability theory. John Wiley & Sons, New York-Chichester-Brisbane, 1979. Wiley Series in Probability and Mathematical Statistics.

[25] Peter D. Lax. Functional analysis. Pure and Applied Mathematics (New York). Wiley-Interscience [John Wiley & Sons], New York, 2002.

95 Texas Tech University, Krishna Kaphle, August-2011

[26] S. E. Leurgans, R. A. Moyeed, and B. W. Silverman. Canonical correlation analysis when the data are curves. J. Roy. Statist. Soc. Ser. B, 55(3):725–740, 1993.

[27] R. A. Minlos. Generalized random processes and their extension to a measure. In Selected Transl. Math. Statist. and Prob., Vol. 3, pages 291–313. Amer. Math. Soc., Providence, R.I., 1963.

[28] Robb J. Muirhead. Aspects of multivariate statistical theory. John Wiley & Sons Inc., New York, 1982. Wiley Series in Probability and Mathematical Statistics.

[29] J. O. Ramsay and B. W. Silverman. Functional data analysis. Springer Series in Statistics. Springer, New York, second edition, 2005.

[30] J. R. Retherford. Hilbert space: compact operators and the trace theorem, vol- ume 27 of London Mathematical Society Student Texts. Cambridge University Press, Cambridge, 1993.

[31] Frigyes Riesz and B´elaSz.-Nagy. Functional analysis. Dover Books on Advanced Mathematics. Dover Publications Inc., New York, 1990. Translated from the second French edition by Leo F. Boron, Reprint of the 1955 original.

[32] S. N. Roy. On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Statistics, 24:220–238, 1953.

[33] V. V. Sazonov. On a higher-dimensional central limit theorem. Litovsk. Mat. Sb., 3(1):219–224, 1963.

[34] V. Serdobolskii. Multivariate statistical analysis, volume 41 of Theory and Deci- sion Library. Series B: Mathematical and Statistical Methods. Kluwer Academic Publishers, Dordrecht, 2000. A high-dimensional approach.

[35] A. V. Skorohod. Integration in Hilbert space. Springer-Verlag, New York, 1974. Translated from the Russian by Kenneth Wickwire, Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 79.

[36] Christopher G. Small and D. L. McLeish. Hilbert space methods in probability and statistical inference. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, 1994. A Wiley-Interscience Publication.

[37] V. Statuleviˇcius,editor. Limit theorems of probability theory. Springer-Verlag, Berlin, 2000. Translation of ıt Probability theory. 6 (Russian), Itogi Nauki

96 Texas Tech University, Krishna Kaphle, August-2011

i Tekhniki, Sovrem. Probl. Mat. Fund. Naprav., 81, Akad. Nauk SSSR, Vs- esoyuz. Inst. Nauchn. i Tekhn. Inform. (VINITI), Moscow, 1991 [ MR1157205 (92k:60001)], Translation edited by Yu. V. Prokhorov and V. Statuleviˇcius.

[38] Dale E. Varberg. On equivalence of Gaussian measures. Pacific J. Math., 11:751– 762, 1961.

[39] Geoffrey S. Watson. Statistics on spheres. University of Arkansas Lecture Notes in the Mathematical Sciences, 6. John Wiley & Sons Inc., New York, 1983. A Wiley-Interscience Publication.

[40] Joachim Weidmann. Linear operators in Hilbert spaces, volume 68 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1980. Translated from the German by Joseph Sz¨ucs.

97 Texas Tech University, Krishna Kaphle, August-2011

APPENDIX: MATLAB CODE

clear clc close(’all’); n = 100; m = 200; gamma = [0,1,5,10]; true_sig = [4 / (pi^2), 4 / (9 * pi^2), 4 / (25 * pi^2) ]; power1 = zeros(1,4); power2 = zeros(1,4); power3 = zeros(1,4); truepower2 = zeros(1,4); truepower3 = zeros(1,4);

TS1 = zeros(1,1000); TS2 = zeros(1,1000); TS3 = zeros(1,1000); dt=1/m; sdt=sqrt(dt); t = 0:dt:1-dt; c = 0; % generate Wiener process W1 and W2 for kk = 1:1 c = gamma(kk)/ (2*n)^(1/2); oneeig = 0; twoeig = 0; threeeig = 0; for jj = 1:1000 w1 = zeros(n,m); w2 = zeros(n,m);

X1 = zeros(n,m) ; X2 = zeros(n,m) ; e_k = zeros(4,m) ; for j=1:n

98 Texas Tech University, Krishna Kaphle, August-2011 for k = 1:4 e_k(k,:) = sqrt(2)*sin((k - 1/2)*pi*t) ; sig_k = 1/(pi*(k - 1/2)) ; z_k = randn(1); X1(j,:) = X1(j,:)+ sig_k .* z_k .* e_k(k,:) ; end

Z_k = [0 randn(1, m - 1)]; X2(j,:) = cumsum(Z_k)*sqrt(dt); z = [0 randn(1,m-1)]; dw1 = sdt*z ; w2(j,:) = cumsum(dw1); w1(j,:)= sqrt(c)* X1(j,:) + X2(j,:); %w1(j,:)= 1* X1(j,:) + X2(j,:);

% w1(j,:)= (1* X1(j,:)) + X2(j,:); tt =(0:(m-1))*dt; end w1avg=sum(w1)/n; w1cov= dt*(w1-ones(n,1)*w1avg)’*(w1-ones(n,1)*w1avg)/n ; w2avg=sum(w2)/n; w2cov=dt*(w2-ones(n,1)*w2avg)’*(w2-ones(n,1)*w2avg)/n ;

% estimation of variance % sample covariance operator = SCO % 1. compute the pooled SCO for estimate sigma1. scov_c = .5*(w1cov+w2cov);

% 2. compute the largest eigenvalue SCO and % correspounding eigenvector p1

[v,D] = eig(scov_c); e = diag(D);

[e1,ind1]= sort(e,’descend’);

99 Texas Tech University, Krishna Kaphle, August-2011

v1=v(:,ind1); e=e1; v=v1; eigenvals_Sc = e(1); p1 = v(:,1)/sdt; p2 = v(:,2)/sdt; p3 = v(:,3)/sdt; sigma1hat_square = eigenvals_Sc; sigma1hat_square1 = e(2);

lam1=1/2; lam2=1/2;

% tau=4/pi^2; % ss0=(lam1+1/2*lam2); % ss1=ss0^2*tau^2; % % hh=3/pi^2; % o=hh-sigma1hat_square ;

epsil = 1.5; v_square = 32*(e(1)^2)/(epsil + sigma1hat_square)^2; v_square2 = 32*(e(1)^2 /(epsil + e(1))^2 + e(2)^2/(epsil + e(2))^2); v_square3 = 32*(e(1)^2 /(epsil + e(1))^2 + e(2)^2/(epsil + e(2))^2 + e(3)^2 /(epsil + e(3))^2); w2fac=(epsil*eye(m)+w2cov); w1fac=(epsil*eye(m)+w1cov); Repsi=w2fac^(-1/2)*w1fac*w2fac^(-1/2); g = (p1*p1’)*Repsi*(p1*p1’)*dt^2; g1 = (p2*p2’)*Repsi*(p2*p2’)*dt^2; g2 = (p1*p1’)*Repsi*(p1*p1’)*dt^2 + (p2*p2’)*Repsi*(p2*p2’)*dt^2 ; g3 = (p1*p1’)*Repsi*(p1*p1’)*dt^2 + (p2*p2’)*Repsi*(p2*p2’)*dt^2 + (p3*p3’)*Repsi*(p3*p3’)*dt^2;

100 Texas Tech University, Krishna Kaphle, August-2011

d = real(trace(g));

twoNorm = (norm(g2,’fro’))^2; threeNorm = (norm(g3,’fro’))^2; threeNorm1 = (real(trace(g)))^2 + (real(trace((p2*p2’)*Repsi*(p2*p2’)*dt^2)))^2 + (real(trace((p3*p3’)*Repsi*(p3*p3’)*dt^2)))^2;

sample_test = 1/sqrt(v_square)*sqrt(2*n)*(d^2-1); sample_test2 = 1/sqrt(v_square2)*sqrt(2*n)*(twoNorm - 2); %sample_test2 = sqrt(n)*(twoNorm - 2); sample_test3 = 1/sqrt(v_square3)*sqrt(2*n)*(threeNorm - 3);

if sample_test > norminv(.950) oneeig = oneeig + 1; sample_test; end if sample_test2 > norminv(.950) twoeig = twoeig + 1; sample_test2; end if sample_test3 > norminv(.950) threeeig = threeeig + 1; sample_test3; end TS1(jj) = sample_test; TS2(jj) = sample_test2; TS3(jj) = sample_test3; end power1(kk)= oneeig /1000; power2(kk) = twoeig / 1000; power3(kk) = threeeig / 1000; mean_ts2 = 2*gamma(kk)*( true_sig(1)/(epsil + true_sig(1)) + true_sig(2)/(epsil + true_sig(2))); true_var2 = 32 * (true_sig(1)^2/(epsil + true_sig(1))^2 + true_sig(2)^2/(epsil + true_sig(2))^2); truemean2 = mean_ts2 / sqrt(true_var2); truepower2(kk)= 1 - normcdf(norminv(.950) - truemean2);

101 Texas Tech University, Krishna Kaphle, August-2011 mean_ts3 = 2*gamma(kk)*( true_sig(1)/(epsil + true_sig(1) + true_sig(2)/(epsil + true_sig(2)) + true_sig(3)/(epsil + true_sig(3)) ); true_var3 = 32 * (true_sig(1)^2/(epsil + true_sig(1))^2 + true_sig(2)^2/(epsil + true_sig(2))^2 + true_sig(3)^2/(epsil + true_sig(3))^2); truemean3 = mean_ts3 / sqrt(true_var2); truepower3(kk)= 1 - normcdf(norminv(.950) - truemean3); % figure(kk) % hist(TS2) % figure(100 + kk) % qqplot(TS2) end % power1 % power2 % power3 %truepower_est= 1 - normcdf(norminv(.950) - mean(TS2)); % oneeig % twoeig % threeeig % % % % % % % % % % % % % % % % % % % % % % % % % % mm; % figure(50) % hist(TS1) % figure(150) % hist(TS2) % % figure(100) % qqplot(TS2) % figure(200) % qqplot(TS3)

102